7th Workshop on Biomedical and Bioinformatics Challenges for Computer Science (BBC) Session 1

Time and Date: 11:20 - 13:00 on 10th June 2014

Room: Bluewater I

Chair: Giuseppe A. Trunfio

294 Mining Association Rules from Gene Ontology and Protein Networks: Promises and Challenges. [abstract]
Abstract: The accumulation of raw experimental data about genes and proteins has been accompanied by the accumulation of functional information organized and stored into knowledge bases and ontologies. The assembly, organization and analysis of this data has given a considerable impulse to research. Usually biological knowledge is encoded by using annotation terms, i.e. terms describing for instance function or localization of genes and proteins. Such annotations are often organized into ontologies, that offer a formal framework to organize in a systematic way biological knowledge. For instance, Gene Ontology (GO) provides a set of annotations (namely GO Terms) of biological aspects. Consequently, for each biological concept, i.e. gene or protein a list of annotating terms is available. Each annotation may be derived using different methods, and an Evidence Code (EC) takes into account of this process. For instance electronically inferred annotations are distinguished from manual ones. Mining annotation data may thus extract biologically meaningful knowledge. For instance the analysis of these annotated data using association rules may evidence the co-occurrence of annotation helping for instance the classification of proteins starting from the annotation. Nevertheless, the use of frequent itemset mining is less popular with respect to other techniques, such as statistical based methods or semantic similarities. Here we give a short survey of these methods discussing possible future directions of research. We considered in particular the impact of the nature of annotation on association rule performances by discussing two case studies on protein complexes and protein families. As evidenced on this preliminary study the presence of electronic annotation has not a positive impact on the quality of association rules suggesting the possibility to introduce novel algorithm that are aware of evidence codes.
Pietro Hiram Guzzi, Marianna Milano, Mario Cannataro
53 Automated Microalgae Image Classification [abstract]
Abstract: In this paper we present a new method for automated recognition of 12 microalgae that are most commonly found in water resources of Thailand. In order to handle some difficulties encountered in our problem such as unclear algae boundary and noisy background, we proposed a new method for segmenting algae bodies from an image background and proposed a new method for computing texture descriptors from a blurry texture object. Feature combination approach is applied to handle a variation of algae shapes of the same genus. Sequential Minimal Optimization (SMO) is used as a classifier. An experimental result of 97.22% classification accuracy demonstrates an effectiveness of our proposed method.
Sansoen Promdaen, Pakaket Wattuya, Nuttha Sanevas
192 A Clustering Based Method Accelerating Gene Regulatory Network Reconstruction [abstract]
Abstract: One important direction of Systems Biology is to infer Gene Regulatory Networks and many methods have been developed recently, but they cannot be applied effectively in full scale data. In this work we propose a framework based on clustering to handle the large dimensionality of the data, aiming to improve accuracy of inferred network while reducing time complexity. We explored the efficiency of this framework employing the newly proposed metric Maximal Information Coefficient (MIC), which showed superior performance in comparison to other well established methods. Utilizing both benchmark and real life datasets, we showed that our method is able to deliver accurate results in fractions of time required by other state of the art methods. Our method provides as output interactions among groups of highly correlated genes, which in an application on an aging experiment were able to reveal aging related pathways.
Georgios Dimitrakopoulos, Ioannis Maraziotis, Kyriakos Sgarbas, Anastasios Bezerianos
208 Large Scale Read Classification for Next Generation Sequencing [abstract]
Abstract: Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and a pressing need for rapid identification as a prelude to annotation and further analysis. NGS data consists of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on significant attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.
James Hogan, Timothy Peut

Workshop on Cell Based and Individual Based modelling (CBIBM) Session 1

Time and Date: 11:00 - 12:40 on 11th June 2014

Room: Bluewater II

Chair: James Osborne

395 The future of cell based modelling: connecting and coupling individual based models [abstract]
Abstract: When investigating the development and function of multicellular biological systems it is not enough to only consider the behaviour of individual cells in isolation. For example when studying tissue development, how individual cells interact, both mechanically and biochemically, influences the resulting tissues form and function. Cell based modelling allows you to represent and track the interaction of individual cells in a developing tissue. Existing models including lattice based models (cellular automata and cellular Potts) and off-lattice based models (cell centre and vertex based representations) have given us insight into how tissues maintain homeostasis and how mutations spread. However, when tissues develop they interact biochemically and biomechanically with the environment and in order to capture these interactions, and the effect they have on development, the environment must be considered. We present a framework which allows multiple individual based models to be coupled together, in order to model both the tissue and the surrounding environment. The framework can use different modeling paradigms for each component, and subcellular behaviour (for example the cell cycle) can be considered. In this talk we present two examples of such a coupling, from the fields of developmental biology and vascular remodelling.
James Osborne
206 Discrete-to-continuum modelling of nutrient-dependent cell growth [abstract]
Abstract: Continuum partial differential equation models of the movement and growth of large numbers of cells generally involve constitutive assumptions about macro-scale cell population behaviour. It is difficult to know whether these assumptions accurately represent the mechanical and chemical processes that occur at the level of discrete cells. By deriving continuum models from individual-based models (IBMs) we can obtain PDE approximations to IBMs and conditions for their validity. We have developed a hybrid discrete-continuum model of nutrient-dependent growth of a line of discrete cells on a substrate in a nutrient bath. The cells are represented by linear springs connected in series, with resting lengths that evolve according to the local nutrient concentration. In turn, the continuous nutrient field changes as the cells grow due to the change in nutrient uptake with changes in cell density and the length of the cell line. Following Fozard et al. [Math. Med. and Biol., 27(1):39--74, 2010], we have derived a PDE continuum model from the discrete model ODEs for the motion of the cell vertices and cell growth by taking the large cell number limit. We have identified the conditions under which the continuum model accurately approximates the IBM by comparing numerical simulations of the two models. In addition to making the discrete and continuum frameworks more suitable for modelling cell growth by incorporating nutrient transport, our work provides conditions on the cell density to determine whether the IBM or continuum model should be used. This is an important step towards developing a hybrid model of tissue growth that uses both the IBM and its continuum limit in different regions.
Lloyd Chapman, Rebecca Shipley, Jonathan Whiteley, Helen Byrne and Sarah Waters
434 Distinguishing mechanisms of cell aggregate formation using pair-correlation functions [abstract]
Abstract: `
Edward Green
432 Cell lineage tracing in invading cell populations: superstars revealed! [abstract]
Abstract: Cell lineage tracing is a powerful tool for understanding how proliferation and differentiation of individual cells contribute to population behaviour. In the developing enteric nervous system (ENS), enteric neural crest (ENC) cells move and undergo massive population expansion by cell division within mesenchymal tissue that is itself growing. We use an agent-based model to simulate ENC colonisation and obtain agent lineage tracing data, which we analyse using econometric data analysis tools. Biological trials with clonally labelled ENS cells were also performed. In all realisations a small proportion of identical initial agents accounts for a substantial proportion of the total agent population. We term these individuals superstars. Their existence is consistent across individual realisations and is robust to changes in model parameters. However which individual agents will become a superstar is unpredictable. This inequality of outcome is amplified at elevated proliferation rate. Biological trials revealed identical and heretofore unexpected clonal behaviour. The experiments and model suggest that stochastic competition for resources is an important concept when understanding biological processes that feature high levels of cell proliferation. The results have implications for cell fate processes in the ENS and in other situations with invasive proliferative cells, such as invasive cancer.
Kerry Landman, Bevan Cheeseman and Donald Newgreen
435 Agent-based modelling of the mechanism of immune control at the cellular level in HIV infection [abstract]
Abstract: There are over 40 million people currently infected worldwide, and efforts to develop a vaccine would be improved greatly by a better understanding of how HIV survives and evolves. Recent studies discovered the ability of HIV target cells to present viral particles on the surface and trigger immune recognition and suppression by ÒkillerÓ cells of immune system. The effect of ÒkillersÓ remains to be poorly understood, however it plays a key role in control of HIV infection. While traditional vaccine approaches became unsuccessful, the vaccines against early expressed conservative viral parts are promising and would make possible managing the ability of the virus to mutate and avoid immune recognition. To discover the mechanism of ÒkillerÓ cells I developed an agent-based stochastic model of HIV dynamics at the cellular level. While the classic ODE approach is unable to simulate similar dynamics that I observed in the experimental data, the agent-based stochastic model is easily comprehensible and exposes similar kinetics. The complexity of the method increases greatly with the number of agents in the model and may be effectively resolved by using parallel computations on Graphics Processing Units (GPUs). I found that the simulated dynamics almost completely resembles the experimental data and provides answer on the addressed question. Also, the model may be applied in further developments on the design of experiments to distinguish mechanisms more precisely.
Alexey Martyushev

Workshop on Cell Based and Individual Based modelling (CBIBM) Session 2

Time and Date: 14:10 - 15:50 on 11th June 2014

Room: Bluewater II

Chair: James Osborne

82 How are individual cells distributed in a spreading cell front? [abstract]
Abstract: Spreading cell fronts are essential for embryonic development, tissue repair and cancer. Mathematical models used to describe the motion of cell fronts, such as Fisher’s equation and other partial differential equations, always invoke a mean-field assumption which implies that there is no spatial structure, such as cell clustering, present in the system. We test this ubiquitous assumption using a combination of in vitro cell migration assays, spatial statistics tools and discrete random walk simulations. In particular, we examine the conditions under which spatial structure can form in a spreading cell population. Our results highlight the importance of carefully examining these kinds of modelling assumptions that can be easily overlooked when applying partial differential equation models to describe the collective migration of a population of cells.
Katrina Treloar, Matthew Simpson and Dl Sean McElwain
170 An approximate Bayesian computation approach for estimating parameters of cell spreading experiments [abstract]
Abstract: Cell spreading process involves cell motility and cell proliferation, and is essential to developmental biology, wound healing and immune responses. Such process is inherently stochastic and should be modelled as such. Unfortunately, there is a lack of a general and principled technique to infer the parameters of these models and quantify the uncertainty associated with these estimates based on experimental data. In this talk we present a novel application of approximation Bayesian computation (ABC) that is able to achieve this goal in a coherent framework. We compare the parameter estimates based on two different implementations of the stochastic models. The first implementation uses the exact continuous time Gillespie (CTG) algorithm while the second is the discrete time approximate (DTA) algorithm. Our results indicate that the DTA algorithm provides very similar result to, but more computationally efficient than the CTG algorithm. The key parameter finding is that the posterior distribution of the time duration between motility events is highly correlated to the experimental time and the initial number of cells. That is, the more crowded cells or the longer experiment, the faster of cell motility rate. This trend also appears in the models with cell spreading driven by combined motility and proliferation. In similar studies, parameter estimates are typically based upon the size of the leading edge, since other sources of data from the experiments can be costly to collect. Our ABC analysis suggests that is possible to infer the time duration precisely from the leading edge but unfortunately brings very little information about the cell proliferation rate. This highlight the need to obtain more detailed information from the experimental observations of cell spreading, such as the cell density profile along a diameter, in order to quantify model parameters accurately.
Nho Vo, Christopher Drovandi, Anthony Pettitt and Matthew Simpson
431 Computer simulations of the mouse spermatogenic cycle [abstract]
Abstract: The mouse spermatogenic cycle describes the periodic development of male germ cells in the testicular tissue. Understanding the spermatogenic cycle has important clinical relevance, because disruption of the process leads to infertility or subfertility, and being able to regulate the process would provide new avenues to male contraceptives. However, the lengthy process prevents visualizing the cycle through dynamic imaging. Moreover, the precise action of germ cells that leads to the emergence of testicular tissue patterns remains uncharacterized. We develop an agent-based model to simulate the mouse spermatogenic cycle on a cross-section of the seminiferous tubule over a time scale of hours to years, taking consideration of multiple cellular behaviors including feedback regulation, mitotic and meiotic division, differentiation, apoptosis, and movement. The computer model is able to elaborate the temporal-spatial dynamics of germ cells in a time-lapse movie format, allowing us to trace individual cells as they change state and location. More importantly, the model provides the mechanistic understanding of the fundamentals of male fertility, namely, how testicular morphology and sperm production are achieved. By manipulating cellular behaviors either individually or collectively in silico, the model predicts the causal events to the altered arrangement of germ cells upon genetic and environmental perturbations. This in silico platform can serve as an interactive tool to perform long-term simulations and identify optimal approaches for infertility treatment and contraceptive development. Such approach may also be applicable to human spermatogenesis and, hence, may lay the foundation for increasing the effectiveness of male fertility regulation.
Ping Ye

Solving Problems with Uncertainties (SPU) Session 1

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Tully III

Chair: Vassil Alexandrov

37 Wind field uncertainty in forest fire propagation prediction [abstract]
Abstract: Forest fires are a significant problem, especially in Mediterranean countries. To fight against these hazards, it is necessary to have an accurate prediction of its evolution beforehand. So, propagation models have been developed to determine the expected evolution of a forest fire. Such propagation models require input parameters to produce the predictions. Such parameters must be as accurate as possible in order to provide a prediction adjusted to the actual fire behavior. However, in many cases the information concerning the values of the input parameter is obtained by indirect measurements. Such indirect estimations imply an uncertainty degree concerning the values ​​of the parameters. This problem is very significant in the case of parameters that have a spatial distribution or variation, such as wind. The wind provided by a global weather forecast model or measured at a meteorological station in some particular point is modified by the topography of the terrain and has a different value at every point of the terrain. To estimate the wind speed and direction at each point of the terrain it is necessary to apply a wind field model that determines those values ​​at each point depending on the terrain topography. WindNinja is a wind field simulator that provides an estimate wind direction and wind speed at each point of the terrain given a meteorological wind. However, the calculation of the wind field takes some time when the map has a considerable size (30x30 Km) and the resolution is high (30x30meters). This time penalizes the prediction of forest fire spread and may eventually make impractical the effective prediction of fire spread with wind field. On the other hand, it must be considered that the data structures needed to calculate the wind field of a large map requires a large amount of memory that may not be available on a single node of a current system. To reduce the computation time of the wind field a data partition method has been applied. In this case the wind field is calculated in parallel on each part of the map and then the wind fields of the different parts are joined to form the global wind field. Furthermore, by partitioning the terrain map, the data structures necessary to resolve the wind field in each part are reduced significantly and can be stored in the memory of a single node in a current parallel system. Therefore, the existing nodes can perform computation in parallel with data that fit the capacity of the memory on each node. However, the calculation of the wind field is a complex problem which has certain border effects, so that the wind direction and speed in the points next to the border of each part may have some variability and differ from those they would have obtained if they were far from the border, for example if the wind field is calculated over a single complete map. To solve this problem, it is necessary to include a degree of overlap among the map parts. So, there is a margin from the beginning of the part and the part cells itself. The overall wind field aggregation is obtained by discarding the calculated margin fields overlap of each part. The inclusion of an overlap each part increases the execution time, but the variation in the wind field is reduced. The methodology has been tested with several terrain maps, and it was found that parts of 400x400 cells with an overlap of 50 cells per side provide a reasonable execution time (150 sec) with virtually no variation with respect to the wind field obtained with a global map. With this type of partitioning, each process solves an effective part of a map of 300x300 cells.
Gemma Sanjuan, Carlos Brun, Tomas Margalef, Ana Cortes
307 A Framework for Evaluating Skyline Query over Uncertain Autonomous Databases [abstract]
Abstract: The perception of skyline query is to find a set of objects that is much preferred in all dimensions. While this theory is easily applicable on certain and complete database, however, when it comes to data integration of databases where each has different representation of data in a same dimension, it would be difficult to determine the dominance relation between the underlying data. In this paper, we propose a framework, SkyQUD, to efficiently compute the skyline probability of datasets in uncertain dimensions. We explore the effects of having datasets with uncertain dimensions in relation to the dominance relation theory and propose a framework that is able to support skyline queries on this type of datasets.
Nurul Husna Mohd Saad, Hamidah Ibrahim, Ali Amer Alwan, Fatimah Sidi, Razali Yaakob
253 Efficient Data Structures for Risk Modelling in Portfolios of Catastrophic Risk Using MapReduce [abstract]
Abstract: The QuPARA Risk Analysis Framework~\cite{IEEEbigdata} is an analytical framework implemented using MapReduce and designed to answer a wide variety of complex risk analysis queries on massive portfolios of catastrophic risk contracts. In this paper, we present data structure improvements that greatly accelerate QuPARA's computation of Exceedance Probability (EP) curves with secondary uncertainty.
Andrew Rau-Chaplin, Zhimin Yao, Norbert Zeh
40 Argumentation Approach and Learning Methods in Intelligent Decision Support Systems in the Presence of Inconsistent Data [abstract]
Abstract: This paper contains a description of methods and algorithms for working with inconsistent data in intelligent decision support systems. An argumentation approach and application of rough sets for generalization problems are considered. The methods for finding the conflicts and the generalization algorithm based on rough sets are proposed. Noise models in the generalization algorithm are viewed. Experimental results are introduced. A decision of some problems that are not solvable in classical logics is given.
Vadim N. Vagin, Marina Fomina, Oleg Morosin
365 Enhancing Monte Carlo Preconditioning Methods for Matrix Computations [abstract]
Abstract: An enhanced version of a stochastic SParse Approximate Inverse (SPAI) preconditioner for general matrices is presented. This method is used in contrast to the standard deterministic preconditioners computed by the deterministic SPAI, and its further optimized parallel variant- Modified SParse Approximate Inverse Preconditioner (MSPAI). Thus we present a Monte Carlo preconditioner that relies on the use of Markov Chain Monte Carlo (MCMC) methods to compute a rough matrix inverse first, which is further optimized by an iterative filter process and a parallel refinement, to enhance the accuracy of the preconditioner. Monte Carlo methods quantify the uncertainties by enabling us to estimate the non-zero elements of the inverse matrix with a given precision and certain probability. The advantage of this approach is that we use sparse Monte Carlo matrix inversion whose complexity is linear of the size of the matrix. The behaviour of the proposed algorithm is studied, its performance measured and compared with MSPAI.
Janko Strassburg, Vassil Alexandrov

Main Track (MT) Session 1

Time and Date: 11:20 - 13:00 on 10th June 2014

Room: Kuranda

Chair: J. Betts

12 SparseHC: a memory-efficient online hierarchical clustering algorithm [abstract]
Abstract: Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space in terms of the number objects, the design of memory-efficient approaches is of high importance to research. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted (possibly sparse) distance matrix chunk-by-chunk and a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The key insight used is that for finding the cluster pair with the smallest distance, it is not necessary to wait for completing the computation of all cluster pair distances. Partial information can be used to determine a lower bound on cluster pair distances that are used for cluster distance comparison. Our experimental results show that SparseHC achieves a linear empirical memory complexity, which is a significant improvement compared to existing algorithms.
Thuy Diem Nguyen, Bertil Schmidt, Chee Keong Kwoh
131 Tuning Basic Linear Algebra Routines for Hybrid CPU+GPU Platforms [abstract]
Abstract: The introduction of auto-tuning techniques in linear algebra routines using hybrid combinations of multiple CPU and GPU computing resources is analyzed. Basic models of the execution time and} information obtained during the installation of the routines are used to optimize the execution time with a balanced assignation of the work to the computing components in the system. The study is carried out with a basic kernel (matrix-matrix multiplication) and a higher level routine (LU factorization) using GPUs and the host multicore processor. Satisfactory results are obtained, with experimental execution times close to the lowest experimentally achievable.
Gregorio Bernabé, Javier Cuenca, Domingo Gimenez, Luis-Pedro García
143 A portable OpenCL Lattice Boltzmann code for multi- and many-core processor architectures [abstract]
Abstract: The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one must carefully assess the relative costs of portability versus computing efficiency, and find a reasonable tradeoff point. In this paper we address precisely this issue, using as test-bench a Lattice Boltzmann code implemented in OpenCL. We analyze its performance on several different state-of-the-art processors: NVIDIA GPUs and Intel Xeon-Phi many-core accelerators, as well as more traditional Ivy Bridge and Opteron multi-core commodity CPUs. We also compare with results obtained with codes specifically optimized for each of these systems. Our work shows that a properly structured OpenCL code runs on many different systems reaching performance levels close to those obtained by architecture-tuned CUDA or C codes.
Enrico Calore, Sebastiano F. Schifano, Raffaele Tripiccione
146 Accelerating Solid-Fluid Interaction using Lattice-Boltzmann and Immersed Boundary Coupled Simulations on Heterogeneous Platforms [abstract]
Abstract: We propose a numerical approach based on the Lattice-Boltzmann (LBM) and Immersed Boundary (IB) methods to tackle the problem of the interaction of solids with an incompressible fluid flow. The proposed method uses a Cartesian uniform grid that incorporates both the fluid and the solid domain. This is a very optimum and novel method to solve this problem and is a growing research topic in Computational Fluid Dynamics. We explain in detail the parallelization of the whole method on both GPUs and an heterogeneous GPU-Multicore platform and describe different optimizations, focusing on memory management and CPU-GPU communication. Our performance evaluation consists of a series of numerical experiments that simulate situations of industrial and research interest. Based on these tests, we have shown that the baseline LBM implementation achieves satisfactory results on GPUs. Unfortunately, when coupling LBM and IB methods on GPUs, the overheads of IB degrade the overall performance. As an alternative we have explored an heterogeneous implementation that is able to hide such overheads and allows us to exploit both Multicore and GPU resources in a cooperative way.
Pedro Valero-Lara, Alfredo Pinelli, Manuel Prieto-Matías
110 Spatio-temporal Sequential Pattern Mining for Tourism Sciences [abstract]
Abstract: Flickr presents an abundance of geotagged photos for data mining. Particularly, we propose the concept of extracting spatio-temporal meta data from Flickr photos, combining a collection of such photos together results in a spatio-temporal entity movement trail, a \textit{trajectory} describing an individual's movements. Using these spatio-temporal Flickr photographer trajectories we aim to extract valuable tourist information about where people are going, what time they are going there, and where they are likely to go next. In order to achieve this goal we present our novel spatio-temporal trajectory RoI mining and SPM framework. It is different from previous work because it forms RoIs taking into consideration both space and time simultaneously, and thus we reason producing higher-quality RoIs and thus by extension higher-quality sequential patterns too. We test our framework's ability to uncover interesting patterns for the tourism sciences industry by performing experiments using a large dataset of Queensland photo taker movements for the year 2012. Experimental results validate the usefulness of our approach at finding new, information rich spatio-temporal tourist patterns from this dataset, especially in comparison the 2D approaches shown in the literature.
Bermingham Luke, Ickjai Lee

Main Track (MT) Session 2

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Kuranda

Chair: Young Choon Lee

152 An Empirical Study of Hadoop's Energy Efficiency on a HPC Cluster [abstract]
Abstract: Map-Reduce programming model is commonly used for efficient scientific computations, as it executes tasks in parallel and distributed manner on large data volumes. The HPC infrastructure can effectively increase the parallelism of map-reduce tasks. However such an execution will incur high energy and data transmission costs. Here we empirically study how the energy efficiency of a map-reduce job varies with increase in parallelism and network bandwidth on a HPC cluster. We also investigate the effectiveness of power-aware systems in managing the energy consumption of different types of map-reduce jobs. We comprehend that for some jobs the energy efficiency degrades at high degree of parallelism, and for some it improves at low CPU frequency. Consequently we suggest strategies for configuring the degree of parallelism, network bandwidth and power management features in a HPC cluster for energy efficient execution of map-reduce jobs.
Nidhi Tiwari, Santonu Sarkar, Umesh Bellur, Maria Indrawan-Santiago
167 Optimal Run Length for Discrete-Event Distributed Cluster-Based Simulations [abstract]
Abstract: In scientific simulations the results generated usually come from a stochastic process. New solutions with the aim of improving these simulations have been proposed, but the problem is how to compare these solutions since the results are not deterministic. Consequently how to guarantee that the output results are statistically trusted. In this work we apply a statistical approach in order to define the transient and steady state in discrete event distributed simulation. We used linear regression and batch method to find the optimal simulation size. As contributions of our work we can enumerate: we have applied and adapted the simple statistical approach in order to define the optimal simulation length; we propose the approximate approach to normal distribution instead of generate replications sufficiently large; and the method can be used in other kind of non-terminating science simulations where the data either have a normal distribution or can be approximated by a normal distribution.
Francisco Borges, Albert Gutierrez-Milla, Remo Suppi, Emilio Luque
173 A CUDA Based Solution to the Multidimensional Knapsack Problem Using the Ant Colony Optimization [abstract]
Abstract: The Multidimensional Knapsack Problem (MKP) is a generalization of the basic Knapsack Problem, with two or more constraints. It is an important optimization problem with many real-life applications. It is an NP-hard problem and finding optimal solutions for MKP may be intractable. In this paper we use a metaheuristic algorithm based on ant colony optimization (ACO). Since several steps of the algorithm can be carried out concurrently, we propose a parallel implementation under the GPGPU paradigm (General Purpose Graphics Processing Units) using CUDA. To use the algorithm presented in this paper, it is necessary to balance the number of ants, number of rounds used, and whether local search is used or not, depending on the quality of the solution desired. In other words, there is a compromise between time and quality of solution. We obtained very promising experimental results and we compared our implementation with those in the literature. The results obtained show that ant colony optimization is a viable approach to solve MKP efficiently, even for large instances, with the parallel approach.
Henrique Fingler, Edson Cáceres, Henrique Mongelli, Siang Song
174 Comparison of High Level FPGA Hardware Design for Solving Tri-Diagonal Linear Systems [abstract]
Abstract: Reconfigurable computing devices can increase the performance of compute intensive algorithms by implementing application specific co-processor architectures. The power cost for this performance gain is often an order of magnitude less than that of modern CPUs and GPUs. Exploiting the potential of reconfigurable devices such as Field-Programmable Gate Arrays (FPGAs) is typically a complex and tedious hardware engineering task. Re- cently the major FPGA vendors (Altera, and Xilinx) have released their own high-level design tools, which have great potential for rapid development of FPGA based custom accelerators. In this paper, we will evaluate Altera’s OpenCL Software Development Kit, and Xilinx’s Vivado High Level Sythesis tool. These tools will be compared for their per- formance, logic utilisation, and ease of development for the test case of a tri-diagonal linear system solver.
David Warne, Neil Kelson, Ross Hayward
181 Blood Flow Arterial Network Simulation with the Implicit Parallelism Library SkelGIS [abstract]
Abstract: Implicit parallelism computing is an active research domain of computer science. Most implicit parallelism solutions to solve partial differential equations, and scientific simulations, are based on the specificity of numerical methods, where the user has to call specific functions which embed parallelism. This paper presents the implicit parallel library SkelGIS which allows the user to freely write its numerical method in a sequential programming style in C++. This library relies on four concepts which are applied, in this paper, to the specific case of network simulations. SkelGIS is evaluated on a blood flow simulation in arterial networks. Benchmarks are first performed to compare the performance and the coding difficulty of two implementations of the simulation, one using SkelGIS, and one using OpenMP. Finally, the scalability of the SkelGIS implementation, on a cluster, is studied up to 1024 cores.
Hélène Coullon, Jose-Maria Fullana, Pierre-Yves Lagrée, Sébastien Limet, Xiaofei Wang

Main Track (MT) Session 3

Time and Date: 11:00 - 12:40 on 11th June 2014

Room: Kuranda

Chair: E. Luque

186 Triplet Finder: On the Way to Triggerless Online Reconstruction with GPUs for the PANDA Experiment [abstract]
Abstract: PANDA is a state-of-the-art hadron physics experiment currently under construction at FAIR, Darmstadt. In order to select events for offline analysis, PANDA will use a software-based triggerless online reconstruction, performed with a data rate of 200 GB/s. To process the raw data rate of the detector in realtime, we design and implement a GPU version of the Triplet Finder, a fast and robust first-stage tracking algorithm able to reconstruct tracks with good quality, specially designed for the Straw Tube Tracker subdetector of PANDA. We reduce the algorithmic complexity of processing many hits together by splitting them into bunches, which can be processed independently. We evaluate different ways of processing bunches, GPU dynamic parallelism being one of them. We also propose an optimized technique for associating hits with reconstructed track candidates. The evaluation of our GPU implementation demonstrates that the Triplet Finder can process almost 6 Mhits/s on a single K20X GPU, making it a promising algorithm for the online event filtering scheme of PANDA.
Andrew Adinetz, Andreas Herten, Jiri Kraus, Marius Mertens, Dirk Pleiter, Tobias Stockmanns, Peter Wintz
189 A Technique for Parallel Share-Frequent Sensor Pattern Mining from Wireless Sensor Networks [abstract]
Abstract: WSNs generate huge amount of data in the form of streams and mining useful knowledge from these streams is a challenging task. Existing works generate sensor association rules using occurrence frequency of patterns with binary frequency (either absent or present) or support of a pattern as a criterion. However, considering the binary frequency or support of a pattern may not be a sufficient indicator for finding meaningful patterns from WSN data because it only reflects the number of epochs in the sensor data which contain that pattern. The share measure of sensorsets could discover useful knowledge about numerical values associated with sensor in a sensor database. Therefore, in this paper, we propose a new type of behavioral pattern called share-frequent sensor patterns by considering the non-binary frequency values of sensors in epochs. To discover share-frequent sensor patterns from sensor dataset, we propose a novel parallel and distributed framework. In this framework, we develop a novel tree structure, called parallel share-frequent sensor pattern tree (PShrFSP-tree) that is constructed at each local node independently, by capturing the database contents to generate the candidate patterns using a pattern growth technique with a single scan and then merges the locally generated candidate patterns at the final stage to generate global share-frequent sensor patterns. Comprehensive experimental results show that our proposed model is very efficient for mining share-frequent patterns from WSN data in terms of time and scalability.
Md Mamunur Rashid, Dr. Iqbal Gondal, Joarder Kamruzzaman
205 Performance-Aware Energy Saving Mechanism in Interconnection Networks for Parallel Systems [abstract]
Abstract: Growing processing power of parallel computing systems require interconnection networks a higher level of complexity and higher performance, thus consuming more energy. Link components contributes a substantial proportion of the total energy consumption of the networks. Many researchers have proposed approaches to judiciously change the link speed as a function of traffic to save energy when the traffic is light. However, the link speed reduction incurs an increase in average packet latency, thus degrades network performance. This paper addresses that issue with several proposals. The simulation results show that the extended energy saving mechanism in our proposals outperforms the energy saving mechanisms in open literature.
Hai Nguyen, Daniel Franco, Emilio Luque
214 Handling Data-skew Effects in Join Operations using MapReduce [abstract]
Abstract: For over a decade, MapReduce has become a prominent programming model to handle vast amounts of raw data in large scale systems. This model ensures scalability, reliability and availability aspects with reasonable query processing time. However these large scale systems still face some challenges: data skew, task imbalance, high disk I/O and redistribution costs can have disastrous effects on performance. In this paper, we introduce MRFA-Join algorithm: a new frequency adaptive algorithm based on MapReduce programming model and a randomised key redistribution approach for join processing of large-scale datasets. A cost analysis of this algorithm shows that our approach is insensitive to data skew and ensures perfect balancing properties during all stages of join computation. These performances have been confirmed by a series of experimentations.
Mostafa Bamha, Frédéric Loulergue, Mohamad Al Hajj Hassan
216 Speeding-Up a Video Summarization Approach using GPUs and Multicore-CPUs [abstract]
Abstract: The recent progress of digital media has stimulated the creation, storage and distribution of data, such as digital videos, generating a large volume of data and requiring ecient technologies to increase the usability of these data. Video summarization methods generate concise summaries of video contents and enable faster browsing, indexing and accessing of large video collections, however, these methods often perform slow with large duration and high quality video data. One way to reduce this long time of execution is to develop a parallel algorithm, using the advantages of the recent computer architectures that allow high parallelism. This paper introduces parallelizations of a summarization method called VSUMM, targetting either Graphic Processor Units (GPUs) or multicore Central Processor Units (CPUs), and ultimately a sensible distribution of computation steps onto both hardware to maximise performance, called \hybrid". We performed experiments using 180 videos varying frame resolution (320 x 240, 640 x 360, and 1920 x 1080) and video length (1, 3, 5, 10, 20, and 30 minutes). From the results, we observed that the hybrid version reached the best results in terms of execution time, achieving 7 speed up in average.
Suellen Almeida, Antonio Carlos Nazaré Jr, Arnaldo De Albuquerque Araújo, Guillermo Cámara-Chávez, David Menotti

Main Track (MT) Session 4

Time and Date: 14:10 - 15:50 on 11th June 2014

Room: Kuranda

Chair: Y. Cui

222 GPU Optimization of Pseudo Random Number Generators for Random Ordinary Differential Equations [abstract]
Abstract: Solving differential equations with stochastic terms involves a massive use of pseudo random numbers. We present an application for the simulation of wireframe buildings under stochastic earthquake excitation. The inherent potential for vectorization of the application is used to its full extent on GPU accelerator hardware. A representative set of pseudo random number generators for uniformly and normally distributed pseudo random numbers has been implemented, optimized, and benchmarked. The resulting optimized variants outperform standard library implementations on GPUs. The techniques and improvements shown in this contribution using the Kanai-Tajimi model can be generalized to other random differential equations or stochastic models as well as other accelerators.
Christoph Riesinger, Tobias Neckel, Florian Rupp, Alfredo Parra Hinojosa, Hans-Joachim Bungartz
229 Design and Implementation of Hybrid and Native Communication Devices for Java HPC [abstract]
Abstract: MPJ Express is a messaging system that allows computational scientists to write and execute parallel Java applications on High Performance Computing (HPC) hardware. The software is capable of executing in two modes namely cluster and multicore modes. In the cluster mode, parallel applications execute in a typical cluster environment where multiple processing elements communicate with one another using a fast interconnect like Gigabit Ethernet or other proprietary networks like Myrinet and Infiniband. In this context, the MPJ Express library provides communication devices for Ethernet and Myrinet. In the multicore mode, the parallel Java application executes on a single system comprising of shared memory or multicore processors. In this paper, we extend the MPJ Express software to provide two new communication devices namely the native and hybrid device. The goal of the native communication device is to interface the MPJ Express software with native—typically written in C—MPI libraries. In this setting the bulk of messaging logic is offloaded to the underlying MPI library. This is attractive because MPJ Express can exploit latest features, like support for new interconnects and efficient collective communication algorithms of the native MPI library. The second device, called the hybrid device, is developed to allow efficient execution of parallel Java applications on clusters of shared memory or multicore processors. In this setting the MPJ Express runtime system runs a single multithreaded process on each node of the cluster—the number of threads in each process is equivalent to processing elements within a node. Our performance evaluation reveals that the native device allows MPJ Express to achieve comparable performance to native MPI libraries—for latency and bandwidth of point-to-point and collective communications—which is a significant gain in performance compared to existing communication devices. The hybrid communication device—without any modifications at application level—also helps parallel applications achieve better speedups and scalability. We witnessed comparative performance for various benchmarks—including NAS Parallel Benchmarks—with hybrid device as compared to the existing Ethernet communication device on a cluster of shared memory/multicore processors.
Bibrak Qamar, Ansar Javed, Mohsan Jameel, Aamir Shafi, Bryan Carpenter
231 Deploying a Large Petascale System: the Blue Waters Experience [abstract]
Abstract: Deployment of a large parallel system is typically a very complex process, involving several steps of preparation, delivery, installation, testing and acceptance. Despite the availability of various petascale machines currently, the steps and lessons from their deployment are rarely described in the literature. This paper presents the experiences observed during the deployment of Blue Waters, the largest supercomputer ever built by Cray and one of the most powerful machines currently available for open science. The presentation is focused on the final deployment steps, where the system was intensively tested and accepted by NCSA. After a brief introduction of the Blue Waters architecture, a detailed description of the set of acceptance tests employed is provided, including many of the obtained results. This is followed by the major lessons learned during the process. Those experiences and lessons should be useful to guide similarly complex deployments in the future.
Celso Mendes, Brett Bode, Gregory Bauer, Jeremy Enos, Cristina Beldica, William Kramer
248 FPGA-based acceleration of detecting statistical epistasis in GWAS [abstract]
Abstract: Genotype-by-genotype interactions (epistasis) are believed to be a significant source of unexplained genetic variation causing complex chronic diseases but have been ignored in genome-wide association studies (GWAS) due to the computational burden of analysis. In this work we show how to benefit from FPGA technology for highly parallel creation of contingency tables in a systolic chain with a subsequent statistical test. We present the implementation for the FPGA-based hardware platform RIVYERA S6-LX150 containing 128 Xilinx Spartan6-LX150 FPGAs. For performance evaluation we compare against the method iLOCi. iLOCi claims to outperform other available tools in terms of accuracy. However, analysis of a dataset from the Wellcome Trust Case Control Consortium (WTCCC) with about 500,000 SNPs and 5,000 samples still takes about 19 hours on a MacPro workstation with two Intel Xeon quad-core CPUs, while our FPGA-based implementation requires only 4 minutes.
Lars Wienbrandt, Jan Christian Kässens, Jorge González-Domínguez, Bertil Schmidt, David Ellinghaus, Manfred Schimmler

Main Track (MT) Session 5

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Kuranda

Chair: M. Wagner

288 OS Support for Load Scheduling in Accelerator-based Heterogeneous Systems [abstract]
Abstract: The involvement of accelerators is becoming widespread in the field of heterogeneous processing, performing computation tasks through a wide range of applications. With the advent of the various computing architectures existing currently, the need for a system-wide multitasking environment is increasing. Therefore, we present an OpenCL-based scheduler that is designed as a multi-user computing environment to make use of the full potential of available resources while running as a daemon. Multiple tasks can be issued by means of a C++ API that relies on the OpenCL C++! wrapper. At this point, the daemon takes over the control immediately and performs load scheduling. Due to its implementation, our approach can be easily applicable to a common OS. We validate our method through extensive experiments deploying a set of applications, which show that the low scheduling costs remain constant in total over a wide range of input size. Besides the different CPUs, a variety of modern GPU and other accelerator architectures are used in the experiments.
Ayman Tarakji, Niels Ole Salscheider, David Hebbeker
369 Efficient Global Element Indexing for Parallel Adaptive Flow Solvers [abstract]
Abstract: Many grid-based solvers for partial differential equations (PDE) assemble matrices explicitely for discretizing the underlying PDE operators and/or for the underlying (non-)linear systems of equations. Often, the data structures or solver packages require a consecutive global numbering of the degrees of freedom across the boundaries of different parallel subdomains. Straightforward approaches to realize this global indexing in parallel frequently result in serial parts of the assembling algorithms which causes a considerable bottleneck, in particular in large-scale applications. We present an efficient way to set up such a global indexing numbering scheme for large configurations via a position-based numeration on all parallel processes locally. The global number of shared nodes is determined via a tree-based communication pattern. We verified our implementation via state-of-the-art benchmark scenarios for incompressible flow simulations. A small performance study shows the parallel capability of our approach. The corresponding results can be generalized to other grid-based solvers that demand for global indexing in the context of large-scale parallelization.
Michael Lieb, Tobias Neckel, Hans-Joachim Bungartz, Thomas Schöps
382 Performance Improvements for a Large-Scale Geological Simulation [abstract]
Abstract: Geological models have been successfully used to identify and study geothermal energy resources. Many computer simulations based on these models are data-intensive applications. Large-scale geological simulations require high performance computing (HPC) techniques to run within reasonable time constraints and performance levels. One research area that can benefit greatly from HPC techniques is the modeling of heat flow beneath the Earth’s surface. This paper describes the application of HPC techniques to increase the scale of research with a well-established geological model. Recently, a serial C++ application based on this geological model was ported to a parallel HPC applications using MPI. An area of focus was to increase the performance of the MPI version to enable state or regional scale simulations using large numbers of processors. First, synchronous communications among MPI processes was replaced by overlapping communication and computation (asynchronous communication). Asynchronous communication improved performance over synchronous communications by averages of 28% using 56 cores in one environment and 46% using 56 cores in another. Second, an approach for load balancing involving repartitioning the data at the start of the program resulted in runtime performance improvements of 32% using 48 cores in the first environment and 14% using 24 cores in the second when compared to the asynchronous version. An additional feature, modeling of erosion, was also added to the MPI code base. The performance improvement techniques under erosion were less effective.
David Apostal, Kyle Foerster, Travis Desell, Will Gosnold
168 Lattice Gas Model for Budding Yeast: A New Approach for Density Effects [abstract]
Abstract: Yeasts in culture media grow exponentially in early period but eventually stop growing. The saturation of population growth is due to “density effect”. The budding yeast, Saccharomyces cerevisiae, is known to exhibit an age-dependent cell division. Daughter cell, which gives no birth, has longer generation time than mother, because daughter needs maturing period. So far, investigations in exponential growth period have been intensively accumulated; very little is known for the stage dependence of density effect. Here we present an "in vivo" study of density effect, applying a lattice gas model to explore the age-structure dynamics. It is, however hard to solve basic equations, because they have an infinite number of variables and parameters. The basic equations are constructed from several simplified models which have few variables and parameters. These simplified models are compared with experimental data to report two findings for stage-dependent density effect: 1) paradox of decline birthrate (PDB), and 2) mass suicide. These events suddenly and temporarily occur at early stage of density effect. The mother-daughter model leads to PDB. Namely, when the birthrate of population is decreased, then the fraction of daughter is abruptly increased. Moreover, find the average age of yeast population suddenly decreases at the inflection point. This means the mass apoptosis of aged mothers. Our results imply the existence of several types of "pheromones" that specifically inhibit the population growth.
Kei-Ichi Tainaka, Takashi Ushimaru, Toshiyuki Hagiwara, Jin Yoshimura
185 Characteristics of displacement data due to time scale for the combination of Brownian motion with intermittent adsorption [abstract]
Abstract: Single-molecule tracking data near solid surfaces contains information on diffusion that is potentially affected by adsorption. However, molecular adsorption can occur in an intermittent manner, and the overall phenomenon is regarded as slower yet normal diffusion if the time scale of each adsorption event is sufficiently shorter than the interval of data acquisition. We compare simple numerical model systems that vary in the time scale of adsorption event while sharing the same diffusion coefficient, and show that the shape of the displacement distribution depends on the time resolution. We also evaluate the characteristics by statistical quantities related to the large deviation principle.
Itsuo Hanasaki, Satoshi Uehara, Satoyuki Kawano

Main Track (MT) Session 6

Time and Date: 11:00 - 12:40 on 12th June 2014

Room: Kuranda

Chair: Andrew Lewis

199 Mechanism of Traffic Jams at Speed Bottlenecks [abstract]
Abstract: In the past 20 years of complexity science, traffic has been studied as a complex sys- tem with a large amount of interacting agents. Since traffic has become an important aspect of our lives, understanding traffic system and how it interacts with various factors is essential. In this paper, the interactions between traffic flow and road topology will be studied, particularly regarding the relationship between a sharp bend in a road segment and traffic jams. As suggested by Sugiyama[1], when car density exceed a critical density, the fluctuations in speed of each car triggers a greater fluctuation in speed of the car be- hind. This enhancement of fluctuation leads to the congestion of vehicles. Using a cellular automata model modified from Nagel-Schreckenberg CA model[2], the simulation results suggests that the mechanism of traffic jam at bottlenecks is similar to this. Instead of directly causing the congestion in cars, bottleneck on roads only causes the local density of traffic to increase. The resultant congestion is still due to the enhancement of fluctuations. Results of this study opened up a large number of possible analytical studies which could be used as grounds for future works.
Wei Liang Quek, Lock Yue Chew
234 Computing, a powerful tool in flood prediction [abstract]
Abstract: Floods have caused widespread damages throughout the world. Modelling and simulation provide solutions and tools enabling us to face this reality in order to forecast and to make necessary prevention. One problem that must be handled by physical systems simulators is the parameters uncertainty and their impact on output results, causing prediction errors. In this paper, we address input parameters uncertainty towards providing a methodology to tune a flood simulator and achieve lower error between simulated and observed results. The tuning methodology, through a parametric simulation technique, implements a first stage to finding an adjusted set of critical parameters which will be used in a next stage to validate the predictive capability of the simulator in order to reduce the disagreement between observed data and simulated results. We concentrate our experiments in three significant monitoring stations and the percentage of improvement over the original simulator values ranges from 33 to 60%.
Adriana Gaudiani, Emilo Luque, Pablo Garcia, Mariano Re, Marcelo Naiouf, Armando De Giusti
117 Benchmarking and Data Envelopment Analysis. An Approach Based on Metaheuristics [abstract]
Abstract: Data Envelopment Analysis (DEA) is a non-parametric technique to estimate the current level of efficiency of a set of entities. DEA provides information on how to remove inefficiency through the determination of benchmarking information. This paper is devoted to study DEA models based on closest efficient targets, which are related to the shortest projection to the production frontier and allow inefficient firms to find the easiest way to improve their performance. Usually, these models have been solved by means of unsatisfactory methods since all of them are related in some sense to a combinatorial NP-hard problem. In this paper, the problem is approached by metaheuristic techniques. Due to the high number of restrictions of the problem, finding solutions to be used in the metaheuristic algorithm is a difficult problem. Thus, this paper analyzes and compares some heuristic algorithms to obtain solutions of the problem. Each restriction determines the design of these heuristics. Thus, the problem is considered by adding constraints one by one. In this paper, the problem is presented and studied taking into account 9 of the 14 constraints, and the solution to this new problem is an upper bound of the optimal value of the original problem.
Jose J. Lopez-Espin, Juan Aparicio, Domingo Gimenez, Jesús T. Pastor
249 Consensus reaching in swarms ruled by a hybrid metric-topological distance [abstract]
Abstract: Recent empirical observations of three-dimensional bird flocks and human crowds have challenged the long-prevailing assumption that a metric interaction distance rules swarming behaviors. In some cases, individual agents are found to be engaged in local information exchanges with a fixed number of neighbors, i.e. a topological interaction. However, complex system dynamics based on pure metric or pure topological distances both face physical inconsistencies in low and high density situations. Here, we propose a hybrid metric-topological interaction distance overcoming these issues and enabling a real-life implementation in artificial robotic swarms. We use network- and graph-theoretic approaches combined with a dynamical model of locally interacting self-propelled particles to study the consensus reaching process for a swarm ruled by this hybrid interaction distance. Specifically, we establish exactly the probability of reaching consensus in the absence of noise. In addition, simulations of swarms of self-propelled particles are carried out to assess the influence of the hybrid distance and noise.
Yilun Shang and Roland Bouffanais
258 Simulating Element Creation in Supernovae with the Computational Infrastructure for Nuclear Astrophysics at nucastrodata.org [abstract]
Abstract: The elements that make up our bodies and the world around us are produced in violent stellar explosions. Computational simulations of the element creation processes occurring in these cataclysmic phenomena are complex calculations that track the abundances of thousands of species of atomic nuclei throughout the star. These species are created and destroyed by ~60,000 thermonuclear reactions whose rates are stored in continually updated databases. Previously, delays of up to a decade were experienced before the latest experimental reaction rates were used in astrophysical simulations. The Computational Infrastructure for Nuclear Astrophysics (CINA), freely available at the website nucastrodata.org, reduces this delay from years to minutes! With over 100 unique software tools developed over the last decade, CINA comprises a “lab-to-star” connection. It is the only cloud computing software system in this field and it is accessible via an easy-to-use, web-deliverable, cross-platform Java application. The system gives users the capability to robustly simulate, share, store, analyze and visualize explosive nucleosynthesis events such as novae, X-ray bursts and (new in 2013) core-collapse supernovae. In addition, users can upload, modify, merge, store and share the complex input data required by these simulations. Presently, we are expanding the capabilities of CINA to meet the needs of our users who currently come from 141 institutions and 32 countries. We will describe CINA’s current suite of software tools and the comprehensive list of online nuclear astrophysics datasets available at the nucastrodata.org website. This work is funded by the DOE’s Office of Nuclear Physics under the US Nuclear Data Program.
E. J. Lingerfelt, M. S. Smith, W. R. Hix and C. R. Smith

Main Track (MT) Session 7

Time and Date: 14:10 - 15:50 on 12th June 2014

Room: Kuranda

Chair: Maria Indrawan-Santiago

321 Evolving Agent-based Models using Complexification Approach [abstract]
Abstract: This paper focuses on parameter search for multi-agent based models using evolutionary algorithms. Large numbers and variable dimensions of parameters require a search method which can efficiently handle a high dimensional search space. We are proposing the use of complexification as it emulates the natural way of evolution by starting with a small constrained search space and expanding it as the evolution progresses. We examined the effects of this method on an EA by evolving parameters for two multi-agent based models.
Michael Wagner, Wentong Cai, Michael Harold Lees, Heiko Aydt
356 Discrete modeling and simulation of business processes using event logs. [abstract]
Abstract: An approach to business process modelling for short term KPI prediction, based on event logs and values of environment variables, is proposed. Ready-for-simulation process model is built semi-automatically, expert only inputs desired environment variables, which are used as features during the learning process. Process workflow is extracted as a Petri Net model using a combination of process mining algorithms. Dependencies between features and process variables are formalized using decision and regression trees techniques. Experiments were conducted to predict KPIs of real companies.
Ivan Khodyrev, Svetlana Popova
376 Modeling and Simulation Framework for Development of Interactive Virtual Environments [abstract]
Abstract: The article presents a framework for interactive virtual environments’ development for simulation and modeling of complex systems. The framework uses system’s structural model as a core concept for composition and control of simulation-based scientific experiments not in terms of technological processes or workflows but in terms of domain-specific objects and their interconnection within the investigated system. The proposed framework enables integration and management of resources available within a cloud computing environment in order to support automatic simulation management and to provide the user with an interactive visual domain-specific interface to the system.
Konstantin Knyazkov, Sergey Kovalchuk
34 Using interactive 3D game play to make complex medical knowledge more accessible [abstract]
Abstract: This research outlines a new approach, that takes complex medical, nutritional & activity data and presents it to the diabetic patient in the form of a mobile app/game that uses interactive 3D computer graphics & game play to make this complex information more accessible. The pilot randomized control study results indicate that the Diabetes Visualizer’s use of interactive 3D game play increased the participants understanding of the condition, and its day-to-day management. More importantly the Diabetes Visualizer app stimulated participants interest in, and desire to engage in the task of diabetes management.
Dale Patterson

Main Track (MT) Session 8

Time and Date: 11:20 - 13:00 on 10th June 2014

Room: Tully I

Chair: Michela Taufer

115 The influence of network topology on reverse-engineering of gene-regulatory networks [abstract]
Abstract: Modeling and simulation of gene-regulatory networks (GRNs) has become an important aspect of modern computational biology investigations into gene regulation. A key challenge in this area is the automated inference (reverse-engineering) of dynamic, mechanistic GRN models from gene expression time-course data. Common mathematical formalisms used to represent such models capture both the relative weight or strength of a regulator gene and the type of the regulator (activator, repressor) with a single model parameter. The goal of this study is to quantify the role this parameter plays in terms of the computational performance of the reverse-engineering process and the predictive power of the inferred GRN models. We carried out three sets of computational experiments on a GRN system consisting of 22 genes. While more comprehensive studies of this kind are ultimately required, this computational study demonstrates that models with similar training (reverse-engineering) error that have been inferred under varying degrees of a priori known topology information, exhibit considerably different predictive performance. This study was performed with a newly developed multiscale modeling and simulation tool called MultiGrain/MAPPER.
Alexandru Mizeranschi, Noel Kennedy, Paul Thompson, Huiru Zheng, Werner Dubitzky
188 Maximizing the Cumulative Influence through a Social Network when Repeat Activation Exists [abstract]
Abstract: We study the problem of employing social networks for propagate influence when repeat activation is involved. While influence maximization has been extensively studied as the fundamental solution, it neglects the reality that a user may purchase a product/service repeatedly, incurring cumulative sales of the product/service. In this paper, we explore a new problem of cumulative influence maximization that brings the influence maximization a step closer to real-world viral marketing applications. In our problem setting, repeat activation exists and we aim to find a set of initial users, through which the maximal cumulative influence can be stimulated in a given time period. To describe the repeat activation behavior, we adopt the voter model to reflect the variation of activations over time. Under the voter model, we formulate the maximization problem and present an effective algorithm. We test and compare our method with heuristic algorithms on real-world data sets. Experimental results demonstrate the utility of the proposed method.
Chuan Zhou, Peng Zhang, Wenyu Zang, Li Guo
320 Mining Large-scale Knowledge about Events from Web Text [abstract]
Abstract: This paper addresses the problem of automatic acquisition of semantic relations between events. Since most of the previous researches rely on annotated corpus, the main challenge is the need for more generic methods to identify related event pairs and to extract event-arguments (particularly the predicate, subject and object). Motivated by this background, we develop a three-phased approach that acquires causality from the Web. Firstly, we use explicit connective markers (such as “because”) as linguistic cues to discover causal related events. Then, we extract the event-arguments based on local dependency parse trees of event expressions. In the final phase, we propose a statistical model to measure the potential causal relations. The present results of our empirical evaluation on a large-scale Chinese Web corpus have shown that (a) the use of local dependency tree extensively improves both the accuracy and recall of event-arguments extraction task; (b) our measure which is an improvement on PMI has a better performance.
Yanan Cao, Peng Zhang, Jing Guo, Li Guo
200 Discovering Multiple Diffusion Source Nodes in Social Networks [abstract]
Abstract: Social networks have greatly amplified spread of information across different communities. However, we recently observe that various malicious information, such as computer virus and rumors, are broadly spread via social networks. To restrict these malicious information, it is critical to develop effective method to discover the diffusion source nodes in social networks. Many pioneer works have explored the source node identification problem, but they all based on an ideal assumption that there is only a single source node, neglecting the fact that malicious information are often diffused from multiple sources to intentionally avoid network audit. In this paper, we present a multi-source locating method based on a given snapshot of partially and sparsely observed infected nodes in the network. Specifically, we first present a reverse propagation method to detect recovered and unobserved infected nodes in the network, and then we use community cluster algorithms to change the multi-source locating problem into a bunch of single source locating problems. At the last step, we identify the nodes having the largest likelihood estimations as the source node on the infected clusters. Experiments on three different types of complex networks show the performance of the proposed method.
Wenyu Zang, Peng Zhang, Chuan Zhou, Li Guo
293 The Origin of Control in Complex Networks [abstract]
Abstract: Recent work at the borders of network science and control theory have begun to investigate the control of complex systems by studying their underlying network representations. A majority of the work in this nascent field has looked at the number of controls required in order to fully control a network. In this talk I will present research that provides a ready breakdown of this number into categories that are both easy to observe in real world networks as well as instructive in understanding the underlying functional reasons for why the controls must exist. This breakdown is able to shed light on several observations made in the previous literature regarding controllability of networks. This decomposition produces a mechanism to cluster networks into classes that are consistent with their large scale architecture and purpose. Finally, we observe that synthetic models of formation generate networks with control breakdowns substantially different from what is observed in real world networks.
Justin Ruths

Main Track (MT) Session 9

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Tully I

Chair: S. Chuprina

301 Study of the Network Impact on Earthquake Early Warning in the Quake-Catcher Network Project [abstract]
Abstract: The Quake-Catcher Network (QCN) project uses the low cost sensors, i.e., accelerometers attached to volunteers' computers, to detect earthquakes. The master-worker topology currently used in QCN and other similar projects suffers from major weaknesses. The centralized master can fail to collect data if the volunteers' computers are not connected to the network, or it can introduce significant delays in the warning if the network is congested. We propose to solve these problems by using multiple servers in a more advanced network topology than the simple master-worker configuration. We first consider several critical scenarios in which the current master-worker configuration of QCN can hinder the early warning of an earthquake, and then integrate the advanced network topology around multiple servers and emulate these critical scenarios in a simulation environment to quantify the benefits and costs of our proposed solution. We show how our solution can reduce the time to detect an earthquake from 1.8 s to 173 ms in case of network congestion and the number of lost trickle messages from 2,013 to 391 messages in case of network failure.
Marcos Portnoi, Samuel Schlachter, Michela Taufer
315 The p-index: Ranking Scientists using Network Dynamics [abstract]
Abstract: The indices currently used by scholarly databases, such as Google scholar, to rank scientists, do not attach weights to the citations. Neither is the underlying network structure of citations considered in computing these metrics. This results in scientists cited by well-recognized journals not being rewarded, and may lead to potential misuse if documents are created purely to cite others. In this paper we introduce a new ranking metric, the p-index (pagerank-index), which is computed from the underlying citation network of papers, and uses the pagerank algorithm in its computation. The index is a percentile score, and can potentially be implemented in public databases such as Google scholar, and can be applied at many levels of abstraction. We demonstrate that the metric aids in fairer ranking of scientists compared to h-index and its variants. We do this by simulating a realistic model of the evolution of citation and collaboration networks in a particular field, and comparing h-index and p-index of scientists under a number of scenarios. Our results show that the p-index is immune to author behaviors that can result in artificially bloated h-index values.
Upul Senanayake, Mahendrarajah Piraveenan, Albert Zomaya
191 A Clustering-based Link Prediction Method in Social Networks [abstract]
Abstract: Link prediction is an important task in social network analysis, which also has applications in other domains like, recommender systems, molecular biology and criminal investigations. The classical methods of link prediction are based on graph topology structure and path features but few consider clustering information. The cluster in graphs is densely connected group of vertices and sparsely connected to other groups. Actually, the clustering results contain the essential information for link prediction, and these vertices common neighbors may play different roles depending on if they belong to the same cluster. Based on this assumption and characteristics of the common social networks, in this paper, we propose a link prediction method based on clustering and global information. Our experiments on both synthetic and real-world networks show that this method can improve link prediction accuracy as the number of cluster grows.
Fenhua Li, Jing He, Guangyan Huang, Yanchun Zhang, Yong Shi
345 A Technology for BigData Analysis Task Description using Domain-Specific Languages [abstract]
Abstract: The article presents a technology for dynamic knowledge-based building of Domain-Specific Languages (DSL) for description of data-intensive scientific discovery tasks using BigData technology. The proposed technology supports high level abstract definition of analytic and simulation parts of the task as well as integration into the composite scientific solutions. Automatic translation of the abstract task definition enables seamless integration of various data sources within single solution.
Sergey Kovalchuk, Artem Zakharchuk, Jiaqi Liao, Sergey Ivanov, Alexander Boukhanovsky
66 Characteristics of Dynamical Phase Transitions for Noise Intensities [abstract]
Abstract: We simulate and analyze dynamical phase transitions in a Boolean neural network with initial random connections. Since we treat a stochastic evolution by using a noise intensity, we show from our condition that there exists a critical value for the noise intensity. The nature of the phase transition are found numerically and analytically in two connections (of probability density function) and one random network.
Muyoung Heo, Jong-Kil Park, Kyungsik Kim

Main Track (MT) Session 10

Time and Date: 11:00 - 12:40 on 11th June 2014

Room: Tully I

Chair: S. Smanchat

18 A Workflow Application for Parallel Processing of Big Data from an Internet Portal [abstract]
Abstract: The paper presents a workflow application for efficient parallel processing of data downloaded from an Internet portal. The workflow partitions input files into subdirectories which are further split for parallel processing by services installed on distinct computer nodes. This way, analysis of the first ready subdirectories can start fast and is handled by services implemented as parallel multithreaded applications using multiple cores of modern CPUs. The goal is to assess achievable speed-ups and determine which factors influence scalability and to what degree. Data processing services were implemented for assessment of context (positive or negative) in which the given keyword appears in a document. The testbed application used these services to determine how a particular brand was recognized by either authors of articles or readers in comments in a specific Internet portal focused on new technologies. Obtained execution times as well as speed-ups are presented for data sets of various sizes along with discussion on how factors such as load imbalance and memory/disk bottlenecks limit performance.
Pawel Czarnul
273 A comparative study of scheduling algorithms for the multiple deadline-constrained workflows in heterogeneous computing systems with time windows [abstract]
Abstract: Scheduling tasks with precedence constraints on a set of resources with different performances is a well-known NP-complete problem, and a number of effective heuristics has been proposed to solve it. If the start time and the deadline for each specific workflow are known (for example, if a workflow starts execution according to periodic data coming from the sensors, and its execution should be completed before data acquisition), the problem of multiple deadline-constrained workflows scheduling arises. Taking into account that resource providers can give only restricted access to their computational capabilities, we consider the case when resources are partially available for workflow execution. To address the problem described above, we study the scheduling of deadline-constrained scientific workflows in non-dedicated heterogeneous environment. In this paper, we introduce three scheduling algorithms for mapping the tasks of multiple workflows with different deadlines on the static set of resources with previously known free time windows. Simulation experiments show that scheduling strategies based on a proposed staged scheme give better results than merge-based approach considering all workflows at once.
Klavdiya Bochenina
292 Fault-Tolerant Workflow Scheduling Using Spot Instances on Clouds [abstract]
Abstract: Scientific workflows are used to model applications of high throughput computation and complex large scale data analysis. In recent years, Cloud computing is fast evolving as the target platform for such applications among researchers. Furthermore, new pricing models have been pioneered by Cloud providers that allow users to provision resources and to use them in an efficient manner with significant cost reductions. In this paper, we propose a scheduling algorithm that schedules tasks on Cloud resources using two different pricing models (spot and on-demand instances) to reduce the cost of execution whilst meeting the workflow deadline. The proposed algorithm is fault tolerant against the premature termination of spot instances and also robust against performance variations of Cloud resources. Experimental results demonstrate that our heuristic reduces up to 70% execution cost as against using only on-demand instances.
Deepak Poola, Kotagiri Ramamohanarao, Rajkumar Buyya
308 On Resource Efficiency of Workflow Schedules [abstract]
Abstract: This paper presents the Maximum Effective Reduction (MER) algorithm, which optimizes the resource efficiency of a workflow schedule generated by any particular scheduling algorithm. MER trades the minimal makespan increase for the maximal resource usage reduction by consolidating tasks with the exploitation of resource inefficiency in the original workflow schedule. Our evaluation shows that the rate of resource usage reduction far outweighs that of the increase in makespan, i.e., the number of resources used is halved on average while incurring an increase in makespan of less than 10%.
Young Choon Lee, Albert Y. Zomaya, Hyuck Han
346 GridMD: a Lightweight Portable C++ Library for Workflow Management [abstract]
Abstract: In this contribution we present the current state of the open source GridMD workflow library (http://gridmd.sourceforge.net). The library was originally designed for programmers of distributed Molecular Dynamics (MD) simulations, however nowadays it serves as a universal tool for creating and managing general workflows from a compact client application. GridMD is a programming tool aimed at the developers of distributed software that utilizes local or remote compute capabilities to perform loosely coupled computational tasks. Unlike other workflow systems and platforms, GridMD is not integrated with heavy infrastructure such as Grid systems, web portals, user and resource management systems and databases. It is a very lightweight tool accessing and operating on a remote site by delegated user credentials. For starting compute jobs the library supports Globus Grid environment; a set of cluster queuing managers such as PBS(Torque) or SLURM and Unix/Windows command shells. All job starting mechanisms may either be used locally or remotely via the integrated SSH protocol. Working with different queues, starting of parallel (MPI) jobs and changing job parameters is generically supported by the API. The jobs are started and monitored in a “passive” way, not requiring any special task management agents to be running or even installed on the remote system. The workflow execution is monitored by an application (task manager performing GridMD API calls) running on a client machine. Data transfer between different compute resources and from the client machine and a compute resource is performed by the exchange of files (gridftp or ssh channels). Task manager is able to checkpoint and restart the workflow and to recover from different types of errors without recalculating the whole workflow. Task manager itself can easily be terminated/restarted on the client machine or transferred to another client without breaking the workflow execution. Apart from the separated tasks such as command series or application launches, GridMD workflow may also manage integrated tasks that are described by the code compiled as part of task manager. Moreover, the integrated tasks may change the workflow dynamically by adding additional jobs or dependencies to the existing workflow graph. The dynamical management of the workflow graph is an essential feature of GridMD, which adds large flexibility for the programmer of the distributed scenarios. GridMD also provides a set of useful workflow skeletons for standard distributed scenarios such as Pipe, Fork, Parameter Sweep, Loop (implemented as dynamical workflow). In the talk we will discuss the architecture and special features of GridMD. We will also briefly describe the recent applications of GridMD as a base for distributed job manager, for example in the multiscale OLED simulation platform (EU-Russia IM3OLED project).
Ilya Valuev and Igor Morozov

Main Track (MT) Session 11

Time and Date: 14:10 - 15:50 on 11th June 2014

Room: Tully I

Chair: Dieter Kranzlmuller

360 Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms [abstract]
Abstract: With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
Jianwu Wang, Prakashan Korambath, Ilkay Altintas, Jim Davis, Daniel Crawl
36 Large Eddy Simulation of Flow in Realistic Human Upper Airways with Obstructive Sleep Apnea [abstract]
Abstract: Obstructive sleep apnea (OSA) is a common type of sleep disorder characterized by abnormal repetitive cessation in breathing during sleep caused by partial or complete narrowing of pharynx in the upper airway. The upper airway surgery is commonly performed for this disorder, however the success rate is limited because the lack of the thorough understanding of the primary mechanism associated with OSA. The computational fluid dynamics (CFD) simulation with Large Eddy Simulation approach is conducted to investigate a patient-specific upper airway flow with severe OSA. Both pre and post-surgical upper airway models are simulated to reveal the effect of the surgical treatment. Only the inhaled breathing is conducted with six periods (about 15 second) unsteady flow. Compared with the results before and after treatment, it is illustrated that there exists a significant pressure and shear stress dropping region near the soft palate before treatment; and after the treatment the flow resistance in the upper airway is decreased and the wall shear stress value is significantly reduced.
Mingzhen Lu, Yang Liu, Jingying Ye, Haiyan Luo
86 Experiments on a Parallel Nonlinear Jacobi-Davidson Algorithm [abstract]
Abstract: The Jacobi-Davidson (JD) algorithm is very well suited for the computation of a few eigenpairs of large sparse complex symmetric nonlinear eigenvalue problems. The performance of JD crucially depends on the treatment of the so-called correction equation, in particular the preconditioner, and the initial vector. Depending on the choice of the spectral shift and the accuracy of the solution, the convergence of JD can vary from linear to cubic. We investigate parallel preconditioners for the Krylov space method used to solve the correction equation. We apply our nonlinear Jacobi-Davidson (NLJD) method to quadratic eigenvalue problems that originate from the time-harmonic Maxwell equation for the modeling and simulation of resonating electromagnetic structures.
Yoichi Matsuo, Hua Guo, Peter Arbenz
184 Improving Collaborative Recommendation via Location-based User-Item Subgroup [abstract]
Abstract: Collaborative filter has been widely and successfully applied in recommendation system. It typically associates a user with a group of like-minded users based on their preferences over all the items, and recommends to the user those items enjoyed by others in the group. Some previous studies have explored that there exist many user-item subgroups each consisting of a subset of items and a group of like-minded users on these items and subgroup analysis can get better accuracy. While, we find that geographical information of user have impacts on user group preference for items. Hence, In this paper, we propose a Bayesian generative model to describe the generative process of user-item subgroup preference under considering users' geographical information. Experimental results show the superiority of the proposed model.
Zhi Qiao, Peng Zhang, Yanan Cao, Chuan Zhou, Li Guo
90 Optimizing Shared-Memory Hyperheuristics on top of Parameterized Metaheuristics [abstract]
Abstract: This paper studies the auto-tuning of shared-memory hyperheuristics developed on top of a unified shared-memory metaheuristic scheme. A theoretical model of the execution time of the unified scheme is empirically adapted for particular metaheuristics and hyperheuristics through experimentation. The model is used to decide at running time the number of threads to obtain a reduced execution time. The number of threads is different for the different basic functions in the scheme, and depends on the problem to be solved, the metaheuristic scheme, the implementation of the basic functions and the computational system where the problem is solved. The applicability of the proposal is shown with a problem of minimization of electricity consumption in exploitation of wells. Experimental results show that satisfactory execution times can be achieved with auto-tuning techniques based on theoretical-empirical models of the execution time.
José Matías Cutillas Lozano, Domingo Gimenez

Main Track (MT) Session 12

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Tully I

Chair: Luiz DeRose

187 The K computer Operations: Experiences and Statistics [abstract]
Abstract: The K computer, released on September 29, 2012, is a large-scale parallel supercomputer system consisting of 82,944 compute nodes. We have been able to resolve a significant number of operation issues since its release. Some system software components have been fixed and improved to obtain higher stability and utilization. We achieved 94% service availability because of a low hardware failure rate and approximately 80% node utilization by careful adjustment of operation parameters. We found that the K computer is an extremely stable and high utilization system.
Keiji Yamamoto, Atsuya Uno, Hitoshi Murai, Toshiyuki Tsukamoto, Fumiyoshi Shoji, Shuji Matsui, Ryuichi Sekizawa, Fumichika Sueyasu, Hiroshi Uchiyama, Mitsuo Okamoto, Nobuo Ohgushi, Katsutoshi Takashina, Daisuke Wakabayashi, Yuki Taguchi, Mitsuo Yokokawa
195 Quantum mechanics study of hexane isomers through gamma-ray and graph theory combined with C1s binding energy and nuclear magnetic spectra (NMR) [abstract]
Abstract: Quantum mechanically calculated positron-electron annihilation gamma-ray spectra, C1s binding energy spectra and NMR spectra are employed to study the electronic structures of hexane and its isomers, which is assisted using graph theory. Our recent positron-electron annihilation gamma-ray spectral study of n-hexane in gas phase and core ionization (IPs) spectral studies for small alkanes and their isomers, have paved the path for the present correlation study where quantum mechanics is combined with graph theory, C1s ionization spectroscopy and nuclear magnetic resonance (NMR), to further understand the electronic structure and topology for the hexane isomers. The low-energy plane wave positron (LEPWP) model indicated that the positrophilic electrons of a molecule are dominated by the electrons in the lowest occupied valence orbital (LOVO). The most recent results using NOMO indicated that the electronic wave functions dominate the electron-positron wave functions for molecular systems. In addition to quantum mechanics, chemical graphs are also studied and are presented in the present study.
Subhojyoti Chatterjee and Feng Wang
257 Dendrogram Based Algorithm for Dominated Graph Flooding [abstract]
Abstract: In this paper, we are concerned with the problem of flooding undirected weighted graphs under ceiling constraints. We provide a new algorithm based on a hierarchical structure called {\em dendrogram}, which offers the significant advantage that it can be used for multiple flooding with various scenarios of the ceiling values. In addition, when exploring the graph through its dendrogram structure in order to calculate the flooding levels, independent sub-dendrograms are generated, thus offering a natural way for parallel processing. We provide an efficient implementation of our algorithm through suitable data structures and optimal organisation of the computations. Experimental results show that our algorithm outperforms well established classical algorithms, and reveal that the cost of building the dendrogram highly predominates over the total running time, thus validating both the efficiency and the hallmark of our method. Moreover, we exploit the potential parallelism exposed by the flooding procedure to design a multi-thread implementation. As the underlying parallelism is created on the fly, we use a queue to store the list of the sub-dendrograms to be explored, and then use a dynamic round-robin scheduling to assign them to the participating threads. This yields a load balanced and scalable process as shown by additional benchmark results. Our program runs in few seconds on an ordinary computer to flood graphs with more that $20$ millions of nodes.
Claude Tadonki
278 HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON [abstract]
Abstract: The demands of improving energy efficiency for high performance scientific applications arise crucially nowadays. Software-controlled hardware solutions directed by Dynamic Voltage and Frequency Scaling (DVFS) have shown their effectiveness extensively. Although DVFS is beneficial to green computing, introducing DVFS itself can incur non-negligible overhead, if there exist a large number of frequency switches issued by DVFS. In this paper, we propose a strategy to achieve the optimal energy savings for distributed matrix multiplication via algorithmically trading more computation and communication at a time adaptively with user-specified memory costs for less DVFS switches, which saves 7.5% more energy on average than a classic strategy. Moreover, we leverage a high performance communication scheme for fully exploiting network bandwidth via pipeline broadcast. Overall, the integrated approach achieves substantial energy savings (up to 51.4%) and performance gain (28.6% on average) compared to ScaLAPACK pdgemm() on a cluster with an Ethernet switch, and outperforms ScaLAPACK and DPLASMA pdgemm() respectively by 33.3% and 32.7% on average on a cluster with an Infiniband switch.
Li Tan, Longxiang Chen, Zizhong Chen, Ziliang Zong, Rong Ge, Dong Li
279 Evaluating the Performance of Multi-tenant Elastic Extension Tables [abstract]
Abstract: An important challenge in the design of databases that support multi-tenant applications is to provide a platform to manage large volumes of data collected from different businesses, social media networks, emails, news, online texts, documents, and other data sources. To overcome this challenge we proposed in our previous work a multi-tenant database schema called Elastic Extension Tables (EET) that combines multi-tenant relational tables and virtual relational tables in a single database schema. Using this approach, the tenants’ tables can be extended to support the requirements of individual tenants. In this paper, we discuss the potentials of using EET multi-tenant database schema, and show how it can be used for managing physical and virtual relational data. We perform several experiments to measure the feasibility and effectiveness of EET by comparing it with a commercially available multi-tenant schema mapping technique used by SalesForce.com. We report significant performance improvements obtained using EET when compared to Universal Table Schema Mapping (UTSM), making the EET schema a good candidate for the management of multi-tenant data in Software as a Service (SaaS) and Big Data applications.
Haitham Yaish, Madhu Goyal, George Feuerlicht

Main Track (MT) Session 13

Time and Date: 11:00 - 12:40 on 12th June 2014

Room: Tully I

Chair: I. Moser

344 Finite difference method for solving acoustic wave equation using locally adjustable time-steps [abstract]
Abstract: Explicit finite difference method has been widely used for seismic modeling in heterogeneous media with strong discontinuities in physical properties. In such cases, due to stability considerations, the time step size is primarily determined by the medium with higher wave speed propagation, resulting that the higher the speed, the lower the time step needs to be to ensure stability throughout the whole domain. Therefore, the use of different temporal discretizations can greatly reduce the computational cost involved when solving this kind of problem. In this paper we propose an algorithm for the local temporal discretization setting named Region Triangular Transition (RTT), which allows the local temporal discretizations to be related by any integer value that enables these discretizations to operate at the stability limit of the finite difference approximations used.
Alexandre Antunes, Regina Leal-Toledo, Otton Filho, Elson Toledo
347 Identifying Self-Excited Vibrations with Evolutionary Computing [abstract]
Abstract: This study uses Differential Evolution to identify the coefficients of second-order differential equations of self-excited vibrations from a time signal. The motivation is found in the ample occurrence of this vibration type in engineering and physics, in particular in the real-life problem of vibrations of hydraulic structure gates. In the proposed method, an equation structure is assumed at the level of the ordinary differential equation and a population of candidate coefficient vectors undergoes evolutionary training. In this way the numerical constants of non-linear terms of various self-excited vibration types were recovered from the time signal and the velocity value only at the initial time. Comparisons are given regarding accuracy and computing time. The presented evolutionary method shows good promise for future application in engineering systems, in particular operational early-warning systems that recognise oscillations with negative damping before they can cause damage.
Christiaan Erdbrink, Valeria Krzhizhanovskaya
85 Rendering of Feature-Rich Dynamically Changing Volumetric Datasets on GPU [abstract]
Abstract: Interactive photo-realistic representation of dynamic liquid volumes is a challenging task for today's GPUs and state-of-the-art visualization algorithms. Methods of the last two decades consider either static volumetric datasets applying several optimizations for volume casting, or dynamic volumetric datasets with rough approximations to realistic rendering. Nevertheless, accurate real-time visualization of dynamic datasets is crucial in areas of scientific visualization as well as areas demanding for accurate rendering of feature-rich datasets. An accurate and thus realistic visualization of such datasets leads to new challenges: due to restrictions given by computational performance, the datasets may be relatively small compared to the screen resolution, and thus each voxel has to be rendered highly oversampled. With our volumetric datasets based on a real-time lattice Boltzmann fluid simulation creating dynamic cavities and small droplets, existing real-time implementations are not applicable for a realistic surface extraction. This work presents a volume tracing algorithm capable of producing multiple refractions which is also robust to small droplets and cavities. Furthermore we show advantages of our volume tracing algorithm compared to other implementations.
Martin Schreiber, Atanas Atanasov, Philipp Neumann, Hans-Joachim Bungartz
136 Motor learning in physical interfaces for computational problem solving [abstract]
Abstract: Continuous Interactive Simulation (CIS) maps computational problems concerning the control of dynamical systems to physical tasks in a 3D virtual environment for users to perform. However, deciding on the best mapping for a particular problem is not straightforward. This paper considers how a motor learning perspective can assist when designing such mappings. To examine this issue an experiment was performed to compare an arbitrary mapping with one designed by considering a range of motor learning factors. The particular problem studied was a nonlinear policy setting problem from economics. The results show that choices about how a problem is presented can indeed have a large effect on the ability of users to solve the problem. As a result we recommend the development of guidelines for the application of CIS based on motor learning considerations.
Rohan McAdam
151 Change Detection and Visualization of Functional Brain Networks using EEG Data [abstract]
Abstract: Mining dynamic and non-trivial patterns of interactions of functional brain network has gained significance due to the recent advances in the field of computational neuroscience. Sophisticated data search capabilities, advanced signal processing techniques, statistical methods, complex network and graph mining algorithms to unfold and discover hidden patterns in the functional brain network supported with efficient visualization techniques are essential for making potential inferences of the results obtained. Visualization of change in activity during cognitive function is useful to discover and get insights into the hidden, novel and complex neuronal patterns and trends during the normal and cognitive load conditions from the graph/temporal representation of the functional brain network. This paper explores novel methods to explore and model the dynamics and complexity of the brain. It also uses a new tool called Functional Brain Network Analysis and Visualization (FBNAV) tool to visualize the outcomes of various computational analyses to enable us to identify and study the changing neuronal patterns during various states of the brain activity using augmented/customised Topoplots and Headplots. These techniques may be helpful to locate and identify patterns in certain abnormal mental states resulting due to some mental disorders such as stress.
R Vijayalakshmi, Naga Dasari, Nanda Nandagopal, R Subhiksha, Bernadine Cocks, Nabaraj Dahal, M Thilaga

Main Track (MT) Session 14

Time and Date: 14:10 - 15:50 on 12th June 2014

Room: Tully I

Chair: Jin Chao Jin

207 Visual Analytics of Topological Higher Order Information for Emergency Management based on Tourism Trajectory Datasets [abstract]
Abstract: Trajectory datasets have presented new opportunities for spatial computing applications and geo-informatics technologies with regard to emergency management. Existing research of trajectory analysis and data mining mainly employs algorithmic approaches and analyzing geometric information of trajectories. This study presents an efficient analytics tool based on visualization approaches for analyzing large volume of trajectory datasets. This approach is particular useful for emergency management when critical decisions based on semantic information are needed. Tourism trajectory datasets are used to demonstrate the proposed approach.
Ye Wang, Kyungmi Lee, Ickjai Lee
238 Modulight : A Framework for Efficient Dynamic Interactive Scientific Visualization [abstract]
Abstract: The interactive scientific visualization applications are based on heterogeneous codes to implement simulation or data processing, visualization and interaction parts. These different parts need to be precisely assemble to construct an efficient application running in interactive time. Component-based approach is a good paradigm to express this kind of applications. The interactive scientific visualization domain is now classically extended with visual analysis applications. In this case, some parts of the application need to be added or removed dynamically during its execution. In this paper, we describe a component-based approach dedicated to dynamic interac- tive scientific visualization applications. We propose a framework called Modulight which implements our approach using the MPI2 library and the optimized socket library ØMQ. The performance of this framework is also analyzed from a real-life application of molecular dynamics.
Sébastien Limet, Millian Poquet, Sophie Robert
289 Visualization of long-duration acoustic recordings of the environment [abstract]
Abstract: Acoustic recordings of the environment are an important aid to ecologists monitoring biodiversity and environmental health. However, rapid advances in recording technology, storage and computing make it possible to accumulate thousands of hours of recordings, of which, ecologists can only listen to a small fraction. The big-data challenge addressed in this paper is to visualize the content of long-duration audio recordings on multiple scales, from hours, days, months to years. The visualization should facilitate navigation and yield ecologically meaningful information. Our approach is to extract (at one minute resolution) acoustic indices which reflect content of ecological interest. An acoustic index is a statistic that summarizes some aspect of the distribution of acoustic energy in a recording. We combine indices to produce false-color images that reveal acoustic content and facilitate navigation through recordings that are months or even years in duration.
Michael Towsey, Liang Zhang, Mark Cottman-Fields, Jason Wimmer, Jinglan Zhang, Paul Roe
362 A computational science agenda for programming language research [abstract]
Abstract: Scientific models are often expressed as large and complicated programs. These programs embody numerous assumptions made by the developer (e.g. for differential equations, the discretization strategy and resolution). The complexity and pervasiveness of these assumptions means that often the only true description of the model is the software itself. This has led various researchers to call for scientists to publish their source code along with their papers. We argue that this is unlikely to be beneficial since it is almost impossible to separate implementation assumptions from the original scientific intent. Instead we advocate higher-level abstractions in programming languages, coupled with lightweight verification techniques such as specification and type systems. In this position paper, we suggest several novel techniques and outline an evolutionary approach to applying these to existing and future models. One-dimensional heat flow is used as an example throughout.
Dominic Orchard, Andrew Rice

Agent Based Simulations, Adaptive Algorithms and Solvers (ABS-AA-S) Session 1

Time and Date: 11:20 - 13:00 on 10th June 2014

Room: Tully III

Chair: Maciej Paszynski

130 PETIGA: HIGH-PERFORMANCE ISOGEOMETRIC ANALYSIS OF PHASE-FIELD MODELS [abstract]
Abstract: \begin{document} We have developed fast implementations of B-spline/NURBS based finite element solvers, written using PETSc. PETSc is frequently used in software packages to leverage its optimized and parallel implementation of solvers, however we also are using PETSc data structures to assemble the linear systems. These structures in PETSC (called DA’s) were originally intended for the parallel assembly of linear systems resulting from finite differences. We have reworked this structure for linear systems resulting from isogeometric analysis based on tensor product spline spaces. The result of which is the PetIGA framework for solving problems using isogeometric analysis which is scalable and greatly simplified over previous solvers. Our infrastructure has also allowed us to develop scalable solvers for a variety of problems. We have chosen to pursue nonlinear time dependent problems~\cite{PetIGAp, PetIGAc}, such as: \begin{itemize} \item Cahn-Hilliard \item Navier-Stokes-Korteweg \item Variational Multiscale for Navier-Stokes \item Diffusive Wave Approximation to Shallow Water Equations \item Phase-Field Crystal (PFC) equation and its time integration \item Divergence-conforming B-spline modeling of nanoparticle suspensions \end{itemize} We also have solvers for an assortment of linear problems: Poisson, elasticity, Helmholtz, thin shells, advection-diffusion, and diffusion-reaction. All solvers are written to be inherently parallel and run on anything from a laptop to a supercomputer such as Shaheen, KAUST’s IBM-BlueGeneP supercomputer. In this presentation we will focus on new time integration techniques for phase-field modeling which are energy stable and allow for stable linearizations of the underlying non-linear model~\cite{PFC}. \begin{thebibliography}{99} \setlength{\parskip}{0pt} \bibitem{PetIGAp} N. Collier, L. Dalcin, and V.M. Calo, ``PetIGA: High-Performance Isogeometric Analysis,'' submitted, 2013. \bibitem{PetIGAc} L. Dalcin and N. Collier, ``PetIGA: A framework for high performance Isogeometric Analysis,'' https://bitbucket.org/dalcinl/petiga/, 2013 \bibitem{PFC} P. Vignal, L. Dalcin, D.L. Brown, N. Collier, and V.M. Calo, ``Energy-stable time-discretizations for the phase-field crystal equation,'' in preparation, 2014. \end{thebibliography}
Victor Calo, Nathan Collier, Lisandro Dalcin and Philippe Vignal
44 Graph grammar based multi-thread multi-frontal direct solver with Galois scheduler [abstract]
Abstract: In this paper, we present a multi-frontal solver algorithm for the adaptive finite element method expressed by graph grammar productions. The graph grammar productions construct first the binary elimination tree, and then process frontal matrices stored in distributed manner in nodes of the elimination tree. The solver is specialized for a class of one, two and three dimensional h refined meshes whose elimination tree has a regular structure. In particular, this class contains all one dimensional grids, two and three dimensional grids refined towards point singularities, two dimensional grids refined in an anisotropic way towards edge singularity as well as three dimensional grids refined in an anisotropic way towards edge or face singularities. In all these cases, the structure of the elimination tree and the structure of the frontal matrices are similar. The solver is implemented within the Galois environment, which allows parallel execution of graph grammar productions. We also compare the performance of the Galois implementation of our graph grammar based solver with the MUMPS solver
Damian Goik, Konrad Jopek, Maciej Paszynski, Andrew Lenharth, Donald Nguyen, Keshav Pingali
154 Automatically Adapted Perfectly Matched Layers for Problems with High Contrast Materials Properties [abstract]
Abstract: For the simulation of wave propagation problems, it is necessary to truncate the computational domain. Perfectly Matched Layers are often employed for that purpose, especially in high contrast layered materials where absorbing boundary conditions are difficult to design. In here, we define a Perfectly Matched Layer that automatically adjusts its parameters without any user interaction. The user only has to indicate the desired decay in the surrounding layer. With this Perfectly Matched Layer, we show that even in the most complex scenarios where the material contrast properties are as high as sixteen orders of magnitude, we do not introduce numerical reflections when truncating the domain, thus, obtaining accurate solutions.
Julen Alvarez-Aramberri, David Pardo, Helene Barucq, Elisabete Alberdi Celaya
127 A Linear Complexity Direct Solver for H-adaptive Grids With Point Singularities [abstract]
Abstract: In this paper we present a theoretical proof of linear computational cost and complexity for a recently developed direct solver driven by hypergraph grammar productions. The solver is specialized for computational meshes with point singularities in two and three dimensions. Linear complexity is achieved due to utilizing the special structure of such grids. We describe the algorithm and estimate the exact computational cost on an example of a two-dimensional mesh containing a point singularity. We extend this reasoning to the three dimensional meshes. Numerical results fully support our theoretical estimates.
Piotr Gurgul
436 Towards a new software tool for conservation planning [abstract]
Abstract: In a dynamic world, the process of prioritizing where to invest limited conservation resources is extremely complex. It needs to incorporate information on features (species, or landforms), planning units, ongoing or predicted future threats, and the costs and effectiveness of potential conservation actions. Extended research has been conducted on the spatial and temporal conservation prioritization using software tools such as Marxan, C-Plan, and Zonation to aid managers in their decision-making process. However, these tools are limited in various ways in addressing the full complexity of day-to-day management decisions. Some tools fail to consider variation in: land values in space and time; multiple threats and their spatio-temporal variations; multiple conservation actions applied to individual areas; the feasibility, effectiveness, and varying costs of actions; and the dynamic nature of biodiversity responses in space and time. Optimizing such a multi-dimensional system is a large challenge in complexity mathematics. What is needed is a new software tool that builds on current approaches, but allows for more realistic scenarios as described above, developed and parameterised in close collaboration with managers. This includes the modification of existing tools and the creation of new algorithms. The new software will be trialled in conservation planning exercises for islands in north-western Western Australia and the Great Barrier Reef. The current approaches mostly exploit simulated annealing as it was proven the fastest and sufficiently efficient for problems which do not need the best solution. The new software, however, intends to include sub-models on threats, costs, and contribution of action on individual islands. We are examining the option of use constraint programming to incorporate these sub-models into the decision process, with desirable time resolution.
Jana Brotankova, Bob Pressey, Ian Craigie, Steve Hall, Amelia Wenger

Agent Based Simulations, Adaptive Algorithms and Solvers (ABS-AA-S) Session 2

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Tully III

Chair: Piotr Gurgul

180 Modeling phase-transitions using a high-performance, Isogeometric Analysis framework [abstract]
Abstract: In this paper, we present a high-performance framework for solving partial differential equations using Isogeometric Analysis. It is called PetIGA, and in this work we show how it can be used to solve phase-field problems. We specifically chose the Cahn-Hilliard equation, and the phase-field crystal equation as study-problems. These two models allow us to highlight some of the main advantages that we have access to while using PetIGA for scientific computing.
Philippe Vignal, Lisandro Dalcin, Nathan Collier, Victor Calo
233 Micropolar Fluids using B-spline DivergenceConforming Spaces [abstract]
Abstract: We discretized the two-dimensional linear momentum, microrotation, energy and mass conservation equations from the microrotational theory, with the finite element method, using B-spline basis to create divergence conforming spaces to obtain pointwise divergence free solutions [8]. Weak boundary conditions impositions was handled using Niche’s method for tangential conditions, while normal conditions were imposed strongly.We solved the heat driven cavity problem as a test case, including a variation of the parameters that differentiate micropolar fluids from conventional fluids under different Rayleigh numbers, for a better understanding of the system.
Adel Sarmiento, Daniel Garcia, Lisandro Dalcin, Nathan Collier, Victor Calo
24 Hypergraph grammar based adaptive linear computational cost projection solvers for two and three dimensional modeling of brain [abstract]
Abstract: In this paper we present a hypergraph grammar model for transformations of two and three dimensional grids. The hypergraph grammar describes the proces for generating uniform grids with two or three dimensional rectangular or hexahedral elements, followed by the proces of h refinements, which involves breaking selected elements into four or eight son elements, in two or three dimensions, respectively. We also provide graph grammar productions for two projection algorithms we use to pre-process material data. The first one is the projection based interpolation solver algorithm used for computing H1 or L2 projections of MRI scan of human head, in two and three dimensions. The second one is utilized for solving the non-stationary problem modeling the three dimensional heat transport in the human head generated by the cellphone usage.
Damian Goik, Marcin Sieniek, Maciej Woźniak, Anna Paszyńska, Maciej Paszynski
160 Implementation of an adaptive BDF2 formula and comparison with the MATLAB ode15s [abstract]
Abstract: After applying the Finite Element Method (FEM) to the diffusion-type and wave-type Partial Differential Equations (PDEs), a first order and a second order Ordinary Differential Equation (ODE) systems are obtained respectively. These ODE systems usually present high stiffness, so numerical methods with good stability properties are required in their resolution. MATLAB offers a set of open source adaptive step functions for solving ODEs. One of these functions is the ode15s recommended for stiff problems and which is based on the Backward Differentiation Formulae (BDF). We describe the error estimation and the step size control implemented in this function. The ode15s is a variable order algorithm, and even though it has an adaptive step size implementation, the advancing formula and the local error estimation that uses correspond to the constant step size formula. We have focused on the second order accurate and unconditionally stable BDF (BDF2) and we have implemented a real adaptive step size BDF2 algorithm using the same strategy as the BDF2 implemented in the ode15s, resulting the new algorithm more efficient than the one implemented in MATLAB.
Elisabete Alberdi Celaya, Juan José Anza Aguirrezabala, Panagiotis Chatzipantelidis
63 Fast graph transformation based direct solver algorithm for regular three dimensional grids [abstract]
Abstract: This paper presents a graph-transformation-based multi-frontal direct solver with an optimization technique that allows for a significant decrease of time complexity in some multi-scale simulations of the Step and Flash Imprint Lithography (SFIL). The multi-scale simulation consists of a macro-scale linear elasticity model with thermal expansion coefficient and a nano-scale molecular statics model. The algorithm is exemplified with a photopolimerization simulation that involves densification of a polymer inside a feature followed by shrinkage of the feature after removal of the template. The solver is optimized thanks to a mechanism of reusing sub-domains with similar geometries and similar material properties. The graph transformation formalism is used to describe the algorithm - such an approach helps automatically localize sub-domains that can be reused.
Marcin Sieniek

Agent Based Simulations, Adaptive Algorithms and Solvers (ABS-AA-S) Session 3

Time and Date: 11:00 - 12:40 on 11th June 2014

Room: Tully III

Chair: Aleksander Byrski

325 Agent-based Evolutionary Computing for Difficult Discrete Problems [abstract]
Abstract: Hybridizing agent-based paradigm with evolutionary computation can enhance the field of meta-heuristics in a significant way, giving to usually passive individuals autonomy and capabilities of perception and interaction with other ones, treating them as agents. In the paper as a follow-up to the previous research, an evolutionary multi-agent system (EMAS) is examined in difficult discrete benchmark problems. As a means for comparison, classical evolutionary algorithm (constructed along with Michalewicz model) implemented in island-model is used. The results encourage for further research regarding application of EMAS in discrete problem domain.
Michal Kowol, Aleksander Byrski, Marek Kisiel-Dorohinicki
225 Translation of graph-based knowledge representation in multi-agent system [abstract]
Abstract: Agents provide a feasible mean for maintaining and manipulating large scale data. This paper deals with the problem of information exchange between different agents. It uses graph based formalism for the representation of knowledge maintained by an agent and graph transformations as a mean of knowledge exchange. Such a rigorous formalism ensures the cohesion of graph-based knowledge held by agents after each modification and exchange action. The approach presented in this paper is illustrated by a case study dealing with the problem of personal data held in different places (maintained by different agents) and the process of transmitting such information
Leszek Kotulski, Adam Sedziwy, Barbara Strug
239 Agent-based Adaptation System for Service-Oriented Architectures Using Supervised Learning [abstract]
Abstract: In this paper we propose an agent-based system for Service-Oriented Architecture self-adaptation. Services are supervised by autonomous agents which are responsible for deciding which service should be chosen for interoperation. Agents learn the choice strategy autonomously using supervised learning. In experiments we show that supervised learning (Naive Bayes, C4.5 and Ripper) allows to achieve much better efficiency than simple strategies such as random choice or round robin. What is also important, supervised learning generates a knowledge in a readable form, which may be analyzed by experts.
Bartlomiej Sniezynski
324 Generation-free Agent-based Evolutionary Computing [abstract]
Abstract: Metaheuristics resulting from the hybridization of multi-agent systems with evolutionary computing are efficient in many optimization problems. Evolutionary multi-agent systems (EMAS) are more similar to biological evolution than classical evolutionary algorithms. However, technological limitations prevented the use of fully asynchronous agents in previous EMAS implementations. In this paper we present a new algorithm for agent-based evolutionary computations. The individuals are represented as fully autonomous and asynchronous agents. Evolutionary operations are performed continuously and no artificial generations need to be distinguished. Our results show that such asynchronous evolutionary operators and the resulting absence of explicit generations lead to significantly better results. An efficient implementation of this algorithm was possible through the use of Erlang technology, which natively supports lightweight processes and asynchronous communication.
Daniel Krzywicki, Jan Stypka, Piotr Anielski, Lukasz Faber, Wojciech Turek, Aleksander Byrski, Marek Kisiel-Dorohinicki
27 Hypergraph grammar based linear computational cost solver for three dimensional grids with point singularities [abstract]
Abstract: In this paper we present a hypergraph grammar based multi-frontal solver for three dimensional grids with point singularities. We show experimentally that the computational cost of the resulting solver algorithm is linear with respect to the number of degrees of freedom. We also propose a reutilization algorithm that enables to reuse LU factorizations over unrefined parts of the mesh when new local refinements are executed by the hypergraph grammar productions.
Piotr Gurgul, Anna Paszynska, Maciej Paszynski

Mathematical Methods and Algorithms for Extreme Scale (MMAES) Session 1

Time and Date: 11:20 - 13:00 on 10th June 2014

Room: Mossman

Chair: Vassil Alexandrov

227 Fast Iterative Method in solving Eikonal equations : a multi-level parallel approach [abstract]
Abstract: The fast marching method is widely used to solve the eikonal equation. By introducing a new way of managing propagation interfaces which avoid the use of expensive data structures, the fast iterative method reveals to be a faster variant with a higher parallel potential compared to the fast marching method. We investigate in this paper a multi-level parallel approach for the Fast Iterative Method which is well fitted for today heterogenous and hierarchical architectures. We show experiment results which focus on the fine-grained parallel level of the algorithm and we give a performance analysis.
Florian Dang, Nahid Emad
304 A Parallel Implementation of Singular Value Decomposition for Video-on-Demand Services Design Using Principal Component Analysis [abstract]
Abstract: We have developed a mathematical model for video on demand server design based on principal component analysis. Singular value decomposition on the video correlation matrix is used to perform the PCA. The challenge is to counter the computational complexity, which grows proportionally to n^3, where n is the number of video streams. We present a solution from high performance computing, which splits the problem up and computes it in parallel on a distributed memory system.
Raul Ramirez-Velarde, Martin Roderus, Carlos Barba-Jimenez, Raul Perez-Cazares
213 Boosting FDTD Performance by Compiler- Generated Recursive Space-Time Update Schemes [abstract]
Abstract: Traditional implementations of the explicit Finite-Difference Time-Domain (FDTD) solvers use layer by layer time update which requires reload of the whole mesh data into memory (and synchronization between processes for parallel solvers) at every time step. This type of update is cache inefficient and renders the task to be memory bound by the slowest bandwidth in the computer memory hierarchy. Depending on the task size and computer architecture, this can be cache, shared memory, interprocessor communication or disk I/O bandwidth. For large scale calculations, especially when mesh data size exceeds the total available processor memory, the traditional FDTD simulation becomes prohibitively inefficient. There exist alternative approaches to implement FDTD solvers that explore the whole time-space problem and utilize additional data locality due to the time dimension. It was shown that it is possible to reach the theoretical peak rate of algorithm efficiency for any given task size and compute architecture, even for problems with very large computational grids (10^12 Yee cells) [1]. Efficient usage of the whole computer memory bandwidth hierarchy when implementing numerical algorithms is crucial for extreme scale computing since one may expect this hierarchy be rather diverse in the (future) exascale computers. In this work we present a systematic way of implementing a space-time update scheme which is based on a recursive decomposition of N+1 dimensional space-time data dependency graph of the whole calculation into smaller subgraphs. We compose a Locally Recursive Nonlocally Asynchronous (LRnLA) update algorithm [1]: each subgraph is split locally into similar subgraphs while processing of independent subgraphs may be performed concurrently. For explicit stencils the dependency graph is locally governed by the stencil slope in coordinate-time plane. We use primitive triangular up- and down-pointed geometric shapes to represent the projections of the dependency graph on any coordinate-time plane and develop universal mnemonic rules to process the shapes for arbitrary space dimension. In our implementation these shapes and rules are encoded into C++ template class hierarchy by using boost fusion functionality [2], thus the update algorithm is generated mainly at compile time by a standard C++ compiler. This keeps the implementation highly portable and minimizes the overhead for run-time analysis of the data dependency graph. Termination of the recurrence (smallest subgraph) performs the actual FDTD stencil operation corresponding to a time update of a single mesh cell. The resulting FDTD algorithm is cache oblivious [3] and also allows for concurrent execution of subtasks of different size. Concurrent task scheduling is governed by the dependencies between subgraphs which become known in course of the shape decomposition. Depending on the computer architecture the scheduling may simultaneously take into account different parallel execution levels such as MPI and multithreading. Concurrent execution mechanisms are switched on (programmed) for subgraphs reaching some suitable size (rank) in course of recursion. In this presentation we discuss the implementation and analyze the performance of the implemented 3D FDTD algorithm for various computer architectures, including multicore systems and large clusters (up to 9000 cores). We demonstrate the FDTD update performance reaching up to 75% of the estimated CPU peak which is 10-30 times higher than that of the traditional FDTD solvers. We also demonstrate an almost perfect parallel scaling of the implemented solver. We discuss the effect of mesh memory layouts such as Z-curve (Morton order) increasing locality of data or interleaved layouts for vectorized updates. The implementation of the algorithm for GPU is discussed with some preliminary results. [1] Zakirov A V and Levchenko V D 2009 PIERS proceedings, Moscow, Russia 580--584 [2] www.boost.org/libs/fusion/ [3] H. Prokop. Cache-Oblivious Algorithms. Master’s thesis, MIT. 1999.
Ilya Valuev and Andrey Zakirov
409 Challenges of Big Data Mining [abstract]
Abstract: At present, Big Data becomes reality that no one can ignore. Big Data is our environment whenever we need to make a decision. Big Data is a buzz word that makes everyone understands how important it is. Big Data shows a big opportunity for academia, industry and government. Big Data then is a big challenge for all parties. This talk will discuss some fundamental issues of Big Data problems, such as data heterogeneity vs. decision heterogeneity, data stream research and data-driven decision management. Furthermore, this talk will provide a number of real-life Bid Data Applications. In the conclusion, the talk suggests a number of open research problems in Data Science, which is a growing field beyond Big Data.
Yong Shi
410 Scalable Stochastic and Hybrid Methods and Algorithms for Extreme Scale Computing [abstract]
Abstract: Novel mathematics and mathematical modelling approaches together with scalable algorithms are needed to enable key applications at extreme-scale. This is especially true as HPC systems continue to scale up in compute node and processor core count. At the moment computational scientists are at the critical point/threshold of novel mathematics development as well as large-scale algorithm development and re-design and implementation that will affect most of the application areas. Thus the paper will focus on the mathematical and algorithmic challenges and approaches towards exascale and beyond and in particular on stochastic and hybrid methods that in turn lead to scalable scientific algorithms with minimal or no global communication, hiding network and memory latency, have very high computation/communication overlap, have no synchronization points.
Vassil Alexandrov

Urgent Computing: Computations for Decision Support in Critical Situations (UC) Session 1

Time and Date: 14:10 - 15:50 on 11th June 2014

Room: Mossman

Chair: Alexander Boukhanovsky

429 High Performance Computations for Decision Support in Critical Situations: Introduction to the Third Workshop on Urgent Computing [abstract]
Abstract: This paper is the preface to the Third Workshop on Urgent Computing. The Urgent Computing workshops have been traditionally embedded in frame of International Conference of Computational Science (ICCS) since 2012. They are aimed to develop a dialogue on the present and future of research and applications associated with the large-scale computations for decision support in critical situations. The key workshop topics in 2014 are: methods and principles of urgent computing, middleware, platforms and infrastructures, simulation-based decision support for complex systems control, interactive visualization and virtual reality for decision support in emergency situations, domain-area applications to emergency situations, including natural and man-made disasters, e.g. transportation problems, epidemics, criminal acts, etc.
Alexander Boukhanovsky, Marian Bubak
342 Personal decision support mobile service for extreme situations [abstract]
Abstract: This article discusses aspects of implementation of a massive personal decision support mobile service for evacuation process in extreme situations, based on second-generation cloud computation platform CLAVIRE and a virtual society model. The virtual society model was constructed using an agent-based approach. To increase credibility the individual motivation methods (personal decision support and user training) were used.
Vladislav A. Karbovskii, Daniil V. Voloshin, Kseniia A. Puzyreva, Aleksandr S. Zagarskikh
357 Evaluation of in-vehicle decision support system for emergency evacuation [abstract]
Abstract: One of the most important issues in Decision Support Systems (DSS) technology is in ensuring their effectiveness and efficiency for future implementations and use. DSS is prominent tool in disaster information system, which allows the authority to provide life safety information directly to the mobile devices of anyone physically located in the evacuation area. After that a personal DSS guides users to a safe point. Due to the large uncertainty in initial conditions and assumptions on underlying process such DSS is extremely hard for implementation and evaluation, particularly in real environment. We propose a simulation methodology for the evaluation of in-vehicle DSS for emergency evacuation based on transport system and human decision-making modeling.
Sergei Ivanov, Konstantin Knyazkov
358 Problem solving environment for development and maintenance of St. Petersburg’s Flood Warning System [abstract]
Abstract: Saint-Petersburg Flood Warning System (FWS) is a life-critical system that requires permanent maintenance and development. Tasks that arise during these processes could be much more resource-intensive than an operational loop of the system and may involve complex problems for research. Thereby it is essential to have a special software tool to handle a collection of different models, data sources and auxiliary software that they could be combined in different ways according to a particular research problem to be solved. This paper aims to share the idea of Saint-Petersburg FWS evolution with help of problem-solving environment based on the cloud platform CLAVIRE.
Sergey Kosukhin, Anna Kalyuzhnaya, Denis Nasonov

Urgent Computing: Computations for Decision Support in Critical Situations (UC) Session 2

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Mossman

Chair: Alexander Boukhanovsky

366 Hybrid scheduling algorithm in early warning [abstract]
Abstract: Investigations in development of efficient early warning systems (EWS) are essentially for prediction and warning of upcoming natural hazards. Besides providing of communication and computationally intensive infrastructure, the high resource reliability and hard deadline option are required for EWS scenarios processing in order to get guaranteed information in time-limited conditions. In this paper planning of EWS scenarios execution is investigated and the efficient hybrid algorithm for urgent workflows scheduling is developed based on traditional heuristic and meta-heuristic approaches within state-of-art cloud computing principles.
Denis Nasonov, Nikolay Butakov
400 On-board Decision Support System for Ship Flooding Emergency Response [abstract]
Abstract: The paper describes a real-time software system to support emergency planning decisions when ship flooding occurs. The events of grounding and collision are considered, where the risk of subsequent flooding of hull compartments is very high, and must be avoided or at least minimized. The system is based on a highly optimized algorithm that estimates, ahead in time, the progressive flooding of the compartments according to the current ship status and existent damages. Flooding times and stability parameters are measured, allowing for the crew to take the adequate measures, such as isolate or counter-flood compartments, before the flooding takes incontrollable proportions. The simulation is visualized in a Virtual Environment in real-time, which provides all the functionalities to evaluate the seriousness and consequences of the situation, as well as to test, monitor and carry out emergency actions. Being a complex physical phenomena that occurs in an equally complex structure such as a ship, the real-time flooding simulation combined with the Virtual Environment requires large computational power to ensure the reliability of the simulation results. Moreover, the distress normally experienced by the crew in such situations, and the urgent (and hopefully appropriate) required counter-measures, leave no room for inaccuracies or misinterpretations, caused by the lack of computational power, to become acceptable. For the events considered, the system is primarily used as a decision support tool to take urgent actions in order to avoid or at least minimize disastrous consequences such as oil spilling, sinking, or even loss of human lives.
Jose Varela, Jose Rodrigues, Carlos Guedes Soares

Bridging the HPC Tallent Gap with Computational Science Research Methods (BRIDGE) Session 1

Time and Date: 11:00 - 12:40 on 11th June 2014

Room: Bluewater I

Chair: Vassil Alexandrov

153 In Need of Partnerships – An Essay about the Collaboration between Computational Sciences and IT Services [abstract]
Abstract: Computational Sciences (CS) are challenging in many aspects, not only from the scientific domain they address, but especially also from its needs of the most sophisticated IT infrastructures to perform their research. Often, the latest and most powerful supercomputers, high-performance networks and high-capacity data storages are utilized for CS, while being offered, developed and operated by experts outside CS. This standard service approach has certainly been useful for many domains, but more and more often it represents a limitation to the needs of CS and the restrictions of the IT services. The partnership initiative πCS established at the Leibniz Supercomputing Centre (LRZ) moves the collaboration between Computational Scientists and IT service providers to a new level, moving from a service-centered approach to an integrated partnership. The interface between them is a gateway to an improved collaboration between equal partners, such that future IT services address the requirements of CS in a better, optimized, and more efficient way. In addition, it sheds some light on future professional development.
Anton Frank, Ferdinand Jamitzky, Helmut Satzger, Dieter Kranzlmüller
281 Development of Multiplatform Adaptive Rendering Tools to Visualize Scientific Experiments [abstract]
Abstract: In this paper, we propose methods and tools for multiplatform adaptive visualization system development adequate to the specific visualization goals of the experiments in the different fields of science. Approach proposed was implemented and we present a client-server rendering system SciVi (Scientific Visualizer) which provides multiplatform portability and automated integration with different solvers based on ontology engineering methods. SciVi is developed in Perm State University to help scientists and researchers acquire the multidisciplinary skills and to solve real scientific problems.
Konstantin Ryabinin, Svetlana Chuprina
296 Education 2.0: Student Generated Learning Materials through Collaborative Work [abstract]
Abstract: In order to comply with the Integrated Learning Processes model a course on operating systems was redesigned in such a way that students would generate most of their learning materials as well a significant part of their evaluation exams. This new approach resulted in a statistical significant improvement of student’s grade as measured by a standardized exam compared with a previous student intake.
Raul Ramirez-Velarde, Raul Perez-Cazares, Nia Alexandrov, Jose Jesus Garcia-Rueda
413 Challenges of Big Data and the Skills Gap [abstract]
Abstract: At present, Big Data becomes reality that no one can ignore. Big Data is our environment whenever we need to make a decision. Big Data is a buzz word that makes everyone understands how important it is. Big Data shows a big opportunity for academia, industry and government. Big Data then is a big challenge for all parties. This talk will discuss some fundamental issues of Big Data problems, such as data heterogeneity vs. decision heterogeneity, data stream research and data-driven decision management. Furthermore, this talk will provide a number of real-life Bid Data Applications and will outline the challenges in bridging the skills gap in while focusing on Big Data.
Yong Shi and Yingjie Tian

Bridging the HPC Tallent Gap with Computational Science Research Methods (BRIDGE) Session 2

Time and Date: 14:10 - 15:50 on 11th June 2014

Room: Bluewater I

Chair: Vassil Alexandrov

412 The HPC Talent Gap: an Australian Perspective [abstract]
Abstract: The recent Super Science initiative by the Australian government has provided funding for two petascale supercomputers to support research nationally, along with cloud, storage and network infrastructure. While some research areas are well-established in the use of HPC, much of the potential user base is still working with desktop computing. To be able to make use of the new infrastructure, these users will need training, support and associated funding. It is important to not only increase uptake in computational science, but also to nurture the workforce based on identified roles and ongoing support for careers and career pathways. This paper will present a survey of a range of efforts made in Australia to increase uptake and skills in HPC, and reflect on successes and the challenges ahead.
Valerie Maxville
418 Measuring Business Value of Learning Technology Implementation in Higher Education Setting [abstract]
Abstract: This paper introduces the concept of Business Value of Learning Technology and presents an approach how to measure the Business Value of Learning Technology in Higher Education setting based on a case study in Computational Science and cognate areas. Computational Science subject area is used as a pilot for the studies described in this paper since it is a multidisciplinary area, attracting students from diverse backgrounds and Computational Science is both the natural environment to promote collaborative teaching methods and collaborative provision of courses and as such requires more streamlined management processes. The paper, based on the above case study, presents the motivators and hygiene factors for Learning Technology Implementation in Higher Education setting. Finally, the Intersecting Influences Model presents the influences of pedagogy, technology and management over the motivation and hygiene factors, together with the corresponding generalization for PG level HE setting.
Nia Alexandrov

Modeling and Simulation of Large-scale Complex Urban Systems (MASCUS) Session 1

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Bluewater II

Chair: Heiko Aydt

111 Analysing the Effectiveness of Wearable Wireless Sensors in Controlling Crowd Disasters [abstract]
Abstract: The Love Parade disaster in Duisberg, Germany lead to several deaths and injuries. Disasters like this occur due to the existence of high densities in a limited area. We propose a wearable electronic device that helps reduce such disasters by directing people and thus controlling the density of the crowd. We investigate the design and effectiveness of such a device through an agent based simulation using social force. We also investigate the effect of device failure and participants not paying attention in order to determine the critical number of devices and attentive participants required for the device to be effective.
Teo Yu Hui Angela, Vaisagh Viswanathan, Michael Lees, Wentong Cai
204 Individual-Oriented Model Crowd Evacuations Distributed Simulation [abstract]
Abstract: Emergency plan design is an important problem in building design to evacuate people as fast as possible. Evacuation simulation exercises as fire drills are not a realistic situation to understand the behaviour of people. In the case of crowd evacuations the complexity and uncertainty of the systems increases. Computer simulation allows us to run crowd dynamics models and extract information from emergency situations. Several models solve the emergency evacuation problem. Individual oriented modelling allows to describe rules for individual and simulate interactions between them. Because the variation on the emergency situations results have to be statistically reliable. This reliability increases the computing demand. Distributed and parallel paradigms solve the performance problem. In the present work we developed a model to simulate crowd evacuations. We implemented two versions of the model. One using Netlogo and another using C with MPI. We chose a real environment to test the simulator: building 2 of Fira de Barcelona building, able to hold thousands of persons. The distributed simulator was tested with 62,820 runs in a distributed environment with 15,000 individuals. In this work we show how the simulator has a linear speedup and scales efficiently.
Albert Gutierrez-Milla, Francisco Borges, Remo Suppi, Emilio Luque
133 Simulating Congestion Dynamics of Train Rapid Transit using Smart Card Data [abstract]
Abstract: Investigating congestion in train rapid transit systems (RTS) in today's urban cities is a challenge compounded by limited data availability and difficulties in model validation. Here, we integrate information from travel smart card data, a mathematical model of route choice, and a full-scale agent-based model of the Singapore RTS to provide a more comprehensive understanding of the congestion dynamics than can be obtained through analytical modelling alone. Our model is empirically validated, and allows for close inspection of the dynamics including station crowdedness, average travel duration, and frequency of missed trains---all highly pertinent factors in service quality. Using current data, the crowdedness in all 121 stations appears to be distributed log-normally. In our preliminary scenarios, we investigate the effect of population growth on service quality. We find that the current population (2 million) lies below a critical point; and increasing it beyond a factor of approximately 10% leads to an exponential deterioration in service quality. We also predict that incentivizing commuters to avoid the most congested hours can bring modest improvements to the service quality provided the population remains under the critical point. Finally, our model can be used to generate simulated data for statistical analysis when such data are not empirically available, as is often the case.
Nasri Othman, Erika Fille Legara, Vicknesh Selvam, Christopher Monterola
177 A method to ascertain rapid transit systems' throughput distribution using network analysis [abstract]
Abstract: We present a method of predicting the distribution of passenger throughput across stations and lines of a city rapid transit system by calculating the normalized betweenness centrality of the nodes (stations) and edges of the rail network. The method is evaluated by correlating the distribution of betweenness centrality against throughput distribution which is calculated using actual passenger ridership data. Our ticketing data is from the rail transport system of Singapore that comprises more than 14 million journeys over a span of one week. We demonstrate that removal of outliers representing about 10\% of the stations produces a statistically significant correlation above 0.7. Interestingly, these outliers coincide with stations that opened six months before the time the ridership data was collected, hinting that travel routines along these stations have not yet settled to its equilibrium. The correlation is improved significantly when the data points are split according to their separate lines, illustrating differences in the intrinsic characteristics of each line. The simple procedure established here shows that static network analysis of the structure of a transport network can allow transport planners to predict with sufficient accuracy the passenger ridership, without requiring dynamic and complex simulation methods.
Muhamad Azfar Ramli, Christopher Monterola, Gary Kee Khoon Lee, Terence Gih Guang Hung
236 Fast and Accurate Optimization of a GPU-accelerated CA Urban Model through Cooperative Coevolutionary Particle Swarms [abstract]
Abstract: The calibration of Cellular Automata (CA) models for simulating land-use dynamics requires the use of formal, well-structured and automated optimization procedures. A typical approach used in the literature to tackle the calibration problem, consists of using general optimization metaheuristics. However, the latter often require thousands of runs of the model to provide reliable results, thus involving remarkable computational costs. Moreover, all optimization metaheuristics are plagued by the so called curse of dimensionality, that is a rapid deterioration of eciency as the dimensionality of the search space increases. Therefore, in case of models depending on a large number of parameters, the calibration problem requires the use of advanced computational techniques. In this paper, we investigate the eectiveness of combining two computational strategies. On the one hand, we greatly speed up CA simulations by using general-purpose computing on graphics processing units. On the other hand, we use a specifically designed cooperative coevolutionary Particle Swarm Optimization algorithm, which is known for its ability to operate eectively in search spaces with a high number of dimensions.
Ivan Blecic, Arnaldo Cecchini, Giuseppe A. Trunfio

Workshop on Advances in the Kepler Scientific Workflow System and Its Applications (KEPLER) Session 1

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Bluewater I

Chair: Ilkay Altintas

260 Design and Implementation of Kepler Workflows for BioEarth [abstract]
Abstract: BioEarth is an ongoing research initiative for the development of a regional-scale Earth System Model (EaSM) for the U.S. Pacific Northwest. Our project seeks to couple and integrate multiple stand-alone EaSMs developed through independent efforts for capturing natural and human processes in various realms of the biosphere: atmosphere (weather and air quality), terrestrial biota (crop, rangeland, and forest agro-ecosystems) and aquatic (river flows, water quality, and reservoirs); hydrology links all these realms. Due to the need to manage numerous complex simulations, an application of automated workflows was essential. In this paper, we present a case study of workflow design for the BioEarth project using the Kepler system to manage applications of the Regional Hydro-Ecologic Simulation System (RHESSys) model. In particular, we report on the design of Kepler workflows to support: 1) standalone executions of the RHESSys model under serial and parallel applications, and 2) a more complex case of performing calibration runs involving multiple preprocessing modules, iterative exploration of parameters and parallel RHESSys executions. We exploited various Kepler features including a user-friendly design interface and support for parallel execution on a cluster. Our experiments show a performance speedup between 7–12x, using 16 cores of a Linux cluster, and demonstrate the general effectiveness of our Kepler workflows in managing RHESSys runs. This study shows the potential of Kepler to serve as the primary integration platform for the BioEarth project, with implications for other data- and compute-intensive Earth systems modeling projects.
Tristan Mullis, Mingliang Liu, Ananth Kalyanaraman, Joseph Vaughan, Christina Tague, Jennifer Adam
327 Tools, methods and services enhancing the usage of the Kepler-based scientific workflow framework [abstract]
Abstract: Scientific workflow systems are designed to compose and execute either a series of computational or data manipulation steps, or workflows in a scientific application. They are usually part of the larger eScience environment. The usage of workflow systems, while very beneficial, is mostly not trivial for the scientists. There are many requirements for additional functionalities around scientific workflows systems that need to be taken into account, like ability of sharing workflows, provision of the user-friendly GUI tools for automation of some tasks, or for submission to distributed computing infrastructures, etc. In this paper we present a tools developed in the response to the requirements of three different scientific communities. These tools simplifies and empower they work with the Kepler scientific workflow system. The usage of such tools and services are presented on the Nanotechnology, Astronomy and Fusion scenarios examples.
Marcin Plociennik, Szymon Winczewski, Paweł Ciecieląg, Frederic Imbeaux, Bernard Guillerminet, Philippe Huynh, Michał Owsiak, Piotr Spyra, Thierry Aniel, Bartek Palak, Tomasz Żok, Wojciech Pych, Jarosław Rybicki
371 Progress towards automated Kepler scientific workflows for computer-aided drug discovery and molecular simulations [abstract]
Abstract: We describe the development of automated workflows that support computed-aided drug discovery (CADD) and molecular dynamics (MD) simulations and are included as part of the National Biomedical Computational Resource (NBCR). The main workflow components include: file-management tasks, ligand force field parameterization, receptor-ligand molecular dynamics (MD) simulations, job submission and monitoring on relevant high-performance computing (HPC) resources, receptor structural clustering, virtual screening (VS), and statistical analyses of the VS results. The workflows aim to standardize simulation and analysis and promote best practices within the molecular simulation and CADD communities. Each component is developed as a stand-alone workflow, which allows easy integration into larger frameworks built to suit user needs, while remaining intuitive and easy to extend.
Pek U. Ieong, Jesper Sørensen, Prasantha L. Vemu, Celia W. Wong, Özlem Demir, Nadya P. Williams, Jianwu Wang, Daniel Crawl, Robert V. Swift, Robert D. Malmstrom, Ilkay Altintas, Rommie E. Amaro
341 Flexible approach to astronomical data reduction workflows in Kepler [abstract]
Abstract: The growing scale and complexity of cataloguing and analyzing of astronomical data forces scientists to look for a new technologies and tools. The workflow environments appear best suited for their needs, but in practice they prove to be too complicated for most users. Before such enviroments are used commonly, they have to be properly adapted for domain specific needs. We have created a universal solution based on the Kepler workflow environment to that end. It consists of a library of domain modules, ready-to-use workflows and additional services for sharing and running worklows. There are three access levels depending on the needs and skills of the user: 1) desktop application, 2) web application 3) on-demand Virtual Research Environment. Everything is set up in the context of Polish grid infrastructure, enabling access to its resources.For flexibility, our solution includes interoperability mechanisms with the domain specific applications and services (including astronomical Virtual Observatory) as well as with other domain grid services.
Paweł Ciecieląg, Marcin Płóciennik, Piotr Spyra, Michał Urbaniak, Tomasz Żok, Wojciech Pych
282 Identifying Information Requirement for Scheduling Kepler Workflow in the Cloud [abstract]
Abstract: Kepler scientific workflow system has been used to support scientists to automatically perform experiments of various domains in distributed computing systems. An execution of a workflow in Kepler is controlled by a director assigned in the workflow. However, users still need to specify compute resources on which the tasks in the workflow are executed. To further ease the technical effort required by scientists, a workflow scheduler that is able to assign workflow tasks to resources for execution is necessary. To this end, we identify from a review of several cloud workflow scheduling techniques the information that should be made available in order for a scheduler to schedule Kepler workflow in the cloud computing context. To justify the usefulness, we discuss each type of information regarding workflow tasks, cloud resources, and cloud providers based on their benefit on workflow scheduling.
Sucha Smanchat, Kanchana Viriyapant

Workshop on Data Mining in Earth System Science (DMESS) Session 1

Time and Date: 14:10 - 15:50 on 11th June 2014

Room: Tully III

Chair: Jay Larson

375 Stochastic Parameterization to Represent Variability and Extremes in Climate Modeling [abstract]
Abstract: Unresolved sub-grid processes, those which are too small or dissipate too quickly to be captured within a model's spatial resolution, are not adequately parameterized by conventional numerical climate models. Sub-grid heterogeneity is lost in parameterizations that quantify only the `bulk effect' of sub-grid dynamics on the resolved scales. A unique solution, one unreliant on increased grid resolution, is the employment of stochastic parameterization of the sub-grid to reintroduce variability. We administer this approach in a coupled land-atmosphere model, one that combines the single-column Community Atmosphere Model and the single-point Community Land Model, by incorporating a stochastic representation of sub-grid latent heat flux to force the distribution of precipitation. Sub-grid differences in surface latent heat flux arise from the mosaic of Plant Functional Types (PFT's) that describe terrestrial land cover. With the introduction of a stochastic parameterization framework to affect the distribution of sub-grid PFT's, we alter the distribution of convective precipitation over regions with high PFT variability. The stochastically forced precipitation probability density functions show lengthened tails demonstrating the retrieval of rare events. Through model data analysis we show that the stochastic model increases both the frequency and intensity of rare events in comparison to conventional deterministic parameterization.
Roisin Langan, Richard Archibald, Matthew Plumlee, Salil Mahajan, Daniel Ricciuto, Cheng-En Yang, Rui Mei, Jiafu Mao, Xiaoying Shi, Joshua Fu
426 Understanding Global Climate Variability, Change and Stability through Densities, Distributions, and Informatics [abstract]
Abstract: Climate modelling as it is generally practised is the act of generating large volumes of simu- lated weather through integration of primitive-equation/general circulation model-based Earth system models (ESMs) and subsequent statistical analysis of these large volumes of model-generated history files. This ap- proach, though highly successful, entails explosively growing data volumes, and may not be practicable on exascale computers. This situation begs the question: Can we model climate’s governing dynamics directly? If we pursue this tactic, there are two clear avenues to pursue: i) analysis of the combined primitive equations and subgridscale parameterisations to formulate an “envelope theory” applicable to the system’s larger spa- tiotemporal scales; and ii) a search for governing dynamics through analysis of the existing corpus of climate observation assimilated and simulated data. Our work focuses on strategy ii). Climate data analysis concentrates primarily on statistical moments, quantiles, and extremes, but rarely on the most complete statistical descriptor—the probability density function (PDF). Long-term climate variabil- ity motivates a moving-window-sampled PDF, which we call a time-dependent PDF (TDPDF). The TDPDF resides within a PDF/information-theoretic framework that provides answers to several key questions of cli- mate variability, stability, and change, including: How does the climate evolve in time? How representative is any given sampling interval of the whole record? How rapidly is the climate changing? In this study, we pursue probability density estimation globally sampled climate data using two techniques that are readily applicable to spatially weighted data and yield closed-form PDFs: the Edgworth expansion and kernel smoothing. We explore our concerns regarding serial correlation in the data and effective sample size due to spatiotemporal correlations. We introduce these concepts for a simple dataset: the Central England Temperature Record. We then apply these techniques to larger, spatially-weghted climate data sets, including the USA National Center for Environmental Predictions NCEP-1 Reanalysis, the Australian Water Availability Project (AWAP) dataset, and the Australian Water and Carbon Observatory dataset.
Jay Larson and Padarn Wilson
52 Integration of artificial neural networks into operational ocean wave prediction models for fast and accurate emulation of exact nonlinear interactions [abstract]
Abstract: In this paper, an implementation study was undertaken to employ Artificial Neural Networks (ANN) in third-generation ocean wave models for direct mapping of wind-wave spectra into exact nonlinear interactions. While the investigation expands on previously reported feasibility studies of Neural Network Interaction Approximations (NNIA), it focuses on a new robust neural network that is implemented in Wavewatch III (WW3) model. Several idealistic and real test scenarios were carried out. The obtained results confirm the feasibility of NNIA in terms of speeding-up model calculations and is fully capable of providing operationally acceptable model integrations. The ANN is able to emulate the exact nonlinear interaction for single- and multi-modal wave spectra with a much higher accuracy then Discrete Interaction Approximation (DIA). NNIA performs at least twice as fast as DIA and at least two hundred times faster than exact method (Web-Resio-Tracy, WRT) for a well trained dataset. The accuracy of NNIA is network configuration dependent. For most optimal network configurations, the NNIA results and scatter statistics show good agreement with exact results by means of growth curves and integral parameters. Practical possibilities for further improvements in achieving fast and highly accurate emulations using ANN for emulating time consuming exact nonlinear interactions are also suggested and discussed.
Ruslan Puscasu

Large Scale Computationl Physics (LSCP) Session 1

Time and Date: 11:00 - 12:40 on 11th June 2014

Room: Mossman

Chair: Fukuko YUASA

404 Development of lattice QCD simulation code set ``Bridge++'' on accelerators [abstract]
Abstract: We are developing a new code set ``Bridge++'' for lattice QCD (Quantum Chromodynamics) simulations. It aims at an extensible, readable, and portable workbench, while achieving high performance. Bridge++ covers popular lattice actions and numerical algorithms. The code set is constructed in C++ with an object oriented programming. In this paper, we describe our code design focusing on the use of accelerators such as GPGPUs. For portability our implementation employs OpenCL to control the devices while encapsulates the details of manipulation by providing generalized interfaces. The code is successfully applied to several recent accelerators.
Shinji Motoki, Shinya Aoki, Tatsumi Aoyama, Kazuyuki Kanaya, Hideo Matsufuru, Yusuke Namekawa, Hidekatsu Nemura, Yusuke Taniguchi, Satoru Ueda, Naoya Ukita
406 GPGPU Application to the Computation of Hamiltonian Matrix Elements between Non-orthogonal Slater Determinants in the Monte Carlo Shell Model [abstract]
Abstract: We apply the computation with a GPU accelerator to calculate Hamiltonian matrix elements between non-orthogonal Slater determinants utilized in the Monte Carlo shell model. The bottleneck of this calculation is the two-body part in the computation of Hamiltonian matrix elements. We explain an efficient computational method to overcome this bottleneck. For General-Purpose computing on the GPU (GPGPU) of this method, we propose a computational procedure to avoid the unnecessary costs of data transfer into a GPU device and aim for efficient computation with the cuBLAS interface and the OpenACC directive. As a result, we achieve about 40 times better performance in FLOPS as compared with a single-threaded process of CPU for the two-body part in the computation of Hamiltonian matrix elements.
Tomoaki Togashi, Noritaka Shimizu, Yutaka Utsuno, Takashi Abe, Takaharu Otsuka

Dynamic Data Driven Application Systems (DDDAS) Session 1

Time and Date: 11:20 - 13:00 on 10th June 2014

Room: Tully II

Chair: Craig Douglas

427 DDDAS – Bridging from the Exa-Scale to the Sensor-Scale [abstract]
Abstract: This talk will provide an overview of new opportunities created by DDDAS (Dynamic Data Driven Applications Systems) and engendering a new vision for Exa-Scale computing and Big Data. Exa-Scale is considered the next frontier of high-end computational power and Big-Data seen as the the next generation of data-intensive. The presentation will discuss new opportunities that exist through DDDAS in synergism with a vision of additional dimensions to the Exa-Scale and Big Data, namely considering that the next wave of Big Data and Big Computing will result not only from the Exa-Scale frontiers but also from the emerging trend of “ubiquitous sensing” - ubiquitous instrumentation of systems by multitudes of distributed and heterogeneous collections of sets of sensors and controllers. Undeniably, achieving and exploiting Exa-Scale will enable larger scale simulations and complex “systems of systems” modeling, which will produce large sets of computed data contributing to the Big Data deluge, and adding to the data avalanche created by large scientific and engineering instruments. The emerging trend of large-scale, ubiquitous instrumentation through multitudes of sensors and controllers creates another dimension to computing and to data, whereby data and computations for processing and analyzing such data will be performed in combinations of collections of sensor and higher performance platforms – including the Exa-Scale. DDDAS provides a driver for such environments and an opportunity for new and advanced capabilities. The DDDAS paradigm, by its definition of dynamically integrating in a feed-back control loop the computational model with the instrumentation aspects of an application system, premises a unified computational-instrumentation platform supporting DDDAS environments. In general this unified computational-instrumentation platform will consist of a range of systems such as high-end (petascale, exascale), mid-range and personal computers and mobile devices, and instrumentation platforms such as large instruments or collections of sensors and controllers, such networks of large numbers of heterogeneous sensors and controllers. Otherwise stated, in DDDAS the computational and data environments of a given application span a range of platforms from the high-end computing to the data collection instruments - from the exa-scale to sensor-scale. Consequently, DDDAS environments present these kinds of unprecedented levels of computational resource heterogeneity and dynamicity which require new systems software to support the dynamic and adaptive runtime requirements of such environments. In addition to the role of DDDAS in unifying these two extremes of computing and data, there are also technological drivers that lead us to consider the extremes and the range of scales together. Thus far, conquering the exascale has been considered as having “unique” challenges in terms power efficiency requirements at the multicore unit level, dynamic management of the multitudes of such resources for optimized performance, fault tolerance and resilience, to new application algorithms. However, ubiquitous instrumentation environments comprising of sensors (and controllers) have corresponding requirements in terms of power efficiencies, fault tolerance, application algorithms dealing with sparse and incomplete data, etc. Moreover, it is quite possible that the same kinds of multicores that will populate exascale platforms will also be the building blocks of sensors and controllers. In fact, it is likely that these sensors and controllers – these new “killer micros” – they will drive the technologies at the device and chip levels. Leveraging common technologies for the range of platforms from the Exa-cale to the Sensor-Scale, not only is driven by the underlying technologies, but is also driven by the trends in the application requirements. Commonality in the building blocks (e.g. at the chip and multicore levels) across the range and the extremes of the computational and instrumentation platforms will simplify the challenges of supporting DDDAS environments. Such considerations create new opportunities for synergistically advancing and expediting advances in the two extreme scales of computing. The talk will address such challenges and opportunities in the context of projects pursuing capability advances through DDDAS such as those presented in the 2014 ICCCS/DDDAS Workshop and elsewhere.
Frederica Darema
287 Control of Artificial Swarms with DDDAS [abstract]
Abstract: A framework for incorporating a swarm intelligent system with the Dynamic Data Driven Application System (DDDAS) is presented. Swarm intelligent systems, or artificial swarms, self-organize into useful emergent structures that are capable of solving complex problems, but are difficult to control and predict. The DDDAS concept utilizes repeated simulations of an executing application to improve analytic and predictive capability by creating a synergistic feedback loop. Incorporating DDDAS with swarm applications can significantly improve control of the swarm. An overview of the DDDAS framework for swarm control is presented, and then demonstrated with an example swarm application.
Robert Mccune, Greg Madey
114 Multifidelity DDDAS Methods with Application to a Self-Aware Aerospace Vehicle [abstract]
Abstract: A self-aware aerospace vehicle can dynamically adapt the way it performs missions by gathering information about itself and its surroundings and responding intelligently. We consider the specific challenge of an unmanned aerial vehicle that can dynamically and autonomously sense its structural state and re-plan its mission according to its estimated current structural health. The challenge is to achieve each of these tasks in real time---executing online models and exploiting dynamic data streams---while also accounting for uncertainty. Our approach combines information from physics-based models, simulated offline to build a scenario library, together with dynamic sensor data in order to estimate current flight capability. Our physics-based models analyze the system at both the local panel level and the global vehicle level.
Doug Allaire, David Kordonowy, Marc Lecerf, Laura Mainini, Karen Willcox
198 Model Based Design Environment for Data-Driven Embedded Signal Processing Systems [abstract]
Abstract: In this paper, we investigate new design methods for data-driven digital signal processing (DSP) systems that are targeted to resource- and energy-constrained embedded environments, such as UAVs, mobile communication platforms and wireless sensor networks. Signal processing applications, such as keyword matching, speaker identification, and face recognition, are of great importance in such environments. Due to critical application constraints on energy consumption, real-time performance, computational resources, and core application accuracy, the design spaces for such applications are highly complex. Thus, conventional static methods for configuring and executing such embedded DSP systems are severely limited in the degree to which processing tasks can adapt to current operating conditions and mission requirements. We address this limitation by developing a novel design framework for multi-mode, data driven signal processing systems, where different application modes with complementary trade-offs are selected, configured, executed, and switched dynamically, in a data-driven manner. We demonstrate the utility of our proposed new design methods on an energy-constrained, multi-mode face detection application.
Kishan Sudusinghe, Inkeun Cho, Mihaela van der Schaar, Shuvra Bhattacharyya
46 A Dynamic Data Driven Application System for Vehicle Tracking [abstract]
Abstract: Tracking the movement of vehicles in urban environments using fixed position sensors, mobile sensors, and crowd-sourced data is a challenging but important problem in applications such as law enforcement and defense. A dynamic data driven application system (DDDAS) is described to track a vehicle’s movements by repeatedly identifying the vehicle under investigation from live image and video data, predict probable future locations of the vehicle, and reposition sensors or retarget requests for information, in order to reacquire the vehicle under surveillance. An overview of the system is described that includes image processing algorithms to detect and recapture the vehicle from live image data, a computational framework to predict probable vehicle locations at future points in time, and an information and power aware data distribution system to disseminate data and requests for information. A prototype of the envisioned system is described that is under development in the midtown area of Atlanta, Georgia in the United States.
Richard Fujimoto, Angshuman Guin, Michael Hunter, Haesun Park, Ramakrishnan Kannan, Gaurav Kanitkar, Michael Milholen, Sabra Neal, Philip Pecher

Dynamic Data Driven Application Systems (DDDAS) Session 2

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Tully II

Chair: Frederica Darema

43 Towards a Dynamic Data Driven Wildfire Behavior Prediction System at European Level [abstract]
Abstract: Southern European countries are severely affected by forest fires every year, which lead to very large environmental damages and great economic investments to recover affected areas. All affected countries invest lots of resources to minimize fire damages. Emerging technologies are used to help wildfire analysts determine fire behavior and spread aiming at a more efficient use of resources in fire fighting. In the case of trans-boundary fires, the European Forest Fire Information System (EFFIS) works as a complementary system to national and regional systems in the countries, providing information required for international collaboration on forest fire prevention and fighting. In this work, we describe a way of exploiting all the available information in the system to feed a dynamic data driven wildfire behavior prediction model that can deliver results to support operational decisions. The model is able to calibrate the unknown parameters based on the real observed data, such as wind condition and fuel moistures, using a steering loop. Since this process is computational intensive, we exploit multi-core platforms using a hybrid MPI-OpenMP programming paradigm.
Tomàs Artés, Andrés Cencerrado, Ana Cortes, Tomas Margalef, Darío Rodríguez, Thomas Petroliagkis, Jesus San Miguel
91 Fast Construction of Surrogates for UQ Central to DDDAS -- Application to Volcanic Ash Transport [abstract]
Abstract: In this paper we present new ideas to greatly enhance the quality of uncertainty quantification in the DDDAS framework. We build on ongoing work in large scale transport of geophysical mass of volcanic origin -- a danger to both land based installations and airborne vehicles.
A. K. Patra, E. R. Stefanescu, R. M. Madankan, M. I Bursik, E. B. Pitman, P. Singla, T. Singh, P. Webley
306 A Dynamic Data-driven Decision Support for Aquaculture Farm Closure [abstract]
Abstract: We present a dynamic data-driven decision support for aquaculture farm closure. In decision support, we use machine learning techniques in predicting closures of a shellfish farm. As environmental time series are used in closure, we propose two approaches using time series and machine learning for closure prediction. In one approach, we consider time series prediction and then using expert rules to predict closure. In other approach, we use time series classification for closure prediction. Both approaches exploit a dynamic data-driven technique where prediction models are updated with the update of new data to predict closure decisions. Experimental results at a case study shellfish farm validate the applicability of the proposed method in aquaculture decision support.
Md. Sumon Shahriar, John McCulloch
76 An Open Framework for Dynamic Big-Data-Driven Application Systems (DBDDAS) Development [abstract]
Abstract: In this paper, we outline key features that dynamic data-driven application systems (DDDAS) have. The term Big Data (BD) has come into being in recent years that is highly applicable to most DDDAS since most applications use networks of sensors that generate an overwhelming amount of data in the lifespan of the application runs. We describe what a dynamic big-data-driven application system (DBDDAS) toolkit must have in order to provide all of the essential building blocks that are necessary to easily create new DDDAS without re-inventing the building blocks.
Craig C. Douglas

Dynamic Data Driven Application Systems (DDDAS) Session 3

Time and Date: 11:00 - 12:40 on 11th June 2014

Room: Tully II

Chair: Abani Patra

80 A posteriori error estimates for DDDAS inference problems [abstract]
Abstract: Inference problems in dynamically data-driven application systems use physical measurements along with a physical model to estimate the parameters or state of a physical system. Errors in measurements and uncertainties in the model lead to inaccurate inference results. This work develops a methodology to estimate the impact of various errors on the variational solution of a DDDAS inference problem. The methodology is based on models described by ordinary differential equations, and use first-order and second-order adjoint methodologies. Numerical experiments with the heat equation illustrate the use of the proposed error estimation machinery.
Vishwas Hebbur Venkata Subba Rao, Adrian Sandu
162 Mixture Ensembles for Data Assimilation in Dynamic Data-Driven Environmental Systems [abstract]
Abstract: Many inference problems in environmental DDDAS must contend with high dimensional models and non-Gaussian uncertainties, including but not limited to Data Assimilation, Targeting and Planning. In this this paper, we present the Mixture Ensemble Filter (MEnF) which extends ensemble filtering to non-Gaussian inference using Gaussian mixtures. In contrast to the state of the art, MEnF embodies an exact update equation that neither requires explicit calculation of mixture element moments nor ad-hoc association rules between ensemble members and mixture elements. MEnF is applied to the chaotic Lorenz-63 model and to a chaotic soliton model that allows idealized and systematic studies of localized phenomena. In both cases, MEnF outperforms contemporary approaches, and replaces ad-hoc Gaussian Mixture approaches for non-Gaussian inference.
Piyush Tagade, Hansjorg Seybold, Sai Ravela
169 Optimizing Dynamic Resource Allocation [abstract]
Abstract: We present a formulation, solution method, and program acceleration techniques for two dynamic control scenarios, both with the common goal of optimizing resource allocations. These approaches allocate resources in a non-myopic way, accounting for long-term impacts of current control decisions via nominal belief-state optimization (NBO). In both scenarios, the solution techniques are parallelized for reduced execution time. A novel aspect is included in the second scenario: dynamically allocating the computational resources in an online fashion which is made possible through constant aspect ratio tiling (CART).
Lucas Krakow, Louis Rabiet, Yun Zou, Guillaume Iooss, Edwin Chong, Sanjay Rajopadhye
165 A Dataflow Programming Language and Its Compiler for Streaming Systems [abstract]
Abstract: The dataflow programming paradigm shows an important way to improve the programming productivity for domain experts. In this position paper we propose COStream,a programming language that is based on synchronization dataflow execution model for application. We also propose a compiler framework for COStream on multi-core architecture. In the compiler, we use an inter-thread software pipelining schedule to exploit the parallelism among the cores. We implement the COStream compiler framework on x86 multi-core architecture and perform the experiments to evaluate the system.
Haitao Wei, Stephane Zuckerman, Xiaoming Li, Guang Gao
280 Static versus Dynamic Data Information Fusion analysis using DDDAS for Cyber Security Trust [abstract]
Abstract: Information fusion includes signals, features, and decision-level analysis over various types of data including imagery, text, and cyber security detection. With the maturity of data processing, the explosion of big data, and the need for user acceptance; the Dynamic Data-Driven Application System (DDDAS) philosophy fosters insights into the usability of information systems solutions. In this paper, we explore a notion of an adaptive adjustment of secure communication trust analysis that seeks a balance between standard static solutions versus dynamic-data driven updates. A use case is provided in determining trust for a cyber security scenario exploring comparisons of Bayesian versus evidential reasoning for dynamic security detection updates. Using the evidential reasoning proportional conflict redistribution (PCR) method, we demonstrate improved trust for dynamically changing detections of denial of service attacks.
Erik Blasch, Youssif Al-Nashif, Salim Hariri

Dynamic Data Driven Application Systems (DDDAS) Session 4

Time and Date: 14:10 - 15:50 on 11th June 2014

Room: Tully II

Chair: Ana Cortes

74 Dynamic Data Driven Crowd Sensing Task Assignment [abstract]
Abstract: To realize the full potential of mobile crowd sensing, techniques are needed to deal with uncertainty in participant locations and trajectories. We propose a novel model for spatial task assignment in mobile crowd sensing that uses a dynamic and adaptive data driven scheme to assign moving participants with uncertain trajectories to sensing tasks, in a near-optimal manner. Our scheme is based on building a mobility model from publicly available trajectory history and estimating posterior location values using noisy/uncertain measurements upon which initial tasking assignments are made. These assignments may be refined locally (using exact information) and used by participants to steer their future data collection, which completes the feedback loop. We present the design of our proposed approach with rationale to suggest its value in effective mobile crowd sensing task assignment in the presence of uncertain trajectories.
Layla Pournajaf, Li Xiong, Vaidy Sunderam
79 Context-aware Dynamic Data-driven Pattern Classification* [abstract]
Abstract: This work aims to mathematically formalize the notion of context, with the purpose of allowing contextual decision-making in order to improve performance in dynamic data driven classification systems. We present definitions for both intrinsic context, i.e. factors which directly affect sensor measurements for a given event, as well as extrinsic context, i.e. factors which do not affect the sensor measurements directly, but do affect the interpretation of collected data. Supervised and unsupervised modeling techniques to derive context and context labels from sensor data are formulated. Here, supervised modeling incorporates the a priori known factors affecting the sensing modalities, while unsupervised modeling autonomously discovers the structure of those factors in sensor data. Context-aware event classification algorithms are developed by adapting the classification boundaries, dependent on the current operational context. Improvements in context-aware classification have been quantified and validated in an unattended sensor-fence application for US Border Monitoring. Field data, collected with seismic sensors on different ground types, are analyzed in order to classify two types of walking across the border, namely, normal and stealthy. The classification is shown to be strongly dependent on the context (specifically, soil type: gravel or moist soil).
Shashi Phoha, Nurali Virani, Pritthi Chattopadhyay, Soumalya Sarkar, Brian Smith, Asok Ray

Tools for Program Development and Analysis in Computational Science (TOOLS) Session 1

Time and Date: 11:00 - 12:40 on 12th June 2014

Room: Bluewater II

Chair: Jie Tao

335 High Performance Message-Passing InfiniBand Communication Device for Java HPC [abstract]
Abstract: MPJ Express is a Java messaging system that implements an MPI-like interface. It is used for writing parallel Java applications on High Performance Computing (HPC) hardware including commodity clusters. The software is capable of executing in multicore and cluster mode. In the cluster mode, it currently supports Ethernet and Myrinet based interconnects and provide specialized communication devices for these networks. One recent trend in distributed memory parallel hardware is the emergence of InfiniBand interconnect, which is a high-performance proprietary network and provides low latency and high bandwidth for parallel MPI applications. Currently there is no direct support available in Java (and hence MPJ Express) to exploit the performance benefits of InfiniBand networks. The only option to run distributed Java programs over InfiniBand networks is to rely on TCP/IP emulation layers like IP over InfiniBand (IPoIB) and Sockets Direct Protocol (SDP), which provide poor communication performance. To tackle this issue in the context of MPJ Express, this paper presents a low-level communication device called ibdev that can be used to execute parallel Java applications on InfiniBand clusters. MPJ Express is based on a layered architecture and hence users can opt to use ibdev at runtime on an InfiniBand equipped commodity cluster. ibdev improves Java application performance with access to InfiniBand hardware using native verbs API. Our performance evaluation reveals that MPJ Express achieves much better latency and bandwidth using this new device, compared to IPoIB and SDP. Improvement in communication performance is also evident in NAS parallel benchmark results where ibdev helps MPJ Express achieve better scalability and speedups as compared to IPoIB and SDP. The results show that it is possible to reduce the performance gap between Java and native languages with efficient support for low level communication libraries.
Omar Khan, Mohsan Jameel, Aamir Shafi
300 A High Level Programming Environment for Accelerator-based Systems [abstract]
Abstract: Some of the critical hurdles for the widespread adoption of accelerators in high performance computing are portability and programming difficulty. To be an effective HPC platform, these systems need a high level software development environment to facilitate the porting and development of applications, so they can be portable and run efficiently on either accelerators or CPUs. In this paper we present a high level parallel programming environment for accelerator-based systems, which consists of tightly coupled compilers, tools, and libraries that can interoperate and hide the complexity of the system. Ease of use is possible with compilers making it feasible for users to write applications in Fortran, C, or C++ with OpenACC directives, tools to help users port, debug, and optimize for both accelerators and conventional multi-core CPUs, and with auto-tuned scientific libraries.
Luiz Derose, Heidi Poxon, James Beyer, Alistair Hart
277 Supporting relative debugging for large-scale UPC programs [abstract]
Abstract: Relative debugging is a useful technique for locating errors that emerge from porting existing code to new programming language or to new computing platform. Recent attention on the UPC programming language has resulted in a number of conventional parallel programs, for example MPI programs, being ported to UPC. This paper gives an overview on the data distribution concepts used in UPC and establishes the challenges in supporting relative debugging technique for UPC programs that run on large supercomputers. The proposed solution is implemented on an existing parallel relative debugger ccdb, and the performance is evaluated on a Cray XE6 system with 16,348 cores.
Minh Ngoc Dinh, David Abramson, Jin Chao, Bob Moench, Andrew Gontarek, Luiz Derose

Tools for Program Development and Analysis in Computational Science (TOOLS) Session 2

Time and Date: 14:10 - 15:50 on 12th June 2014

Room: Bluewater II

Chair: Jie Tao

97 Near Real-time Data Analysis of Core-Collapse Supernova Simulations With Bellerophon [abstract]
Abstract: We present an overview of a software system, Bellerophon, built to support a production-level HPC application called CHIMERA, which simulates core-collapse supernova events at the petascale. Developed over the last four years, Bellerophon enables CHIMERA’s geographically dispersed team of collaborators to perform data analysis in near real-time. Its n-tier architecture provides an encapsulated, end-to-end software solution that enables the CHIMERA team to quickly and easily access highly customizable animated and static views of results from anywhere in the world via a web-deliverable, cross-platform desktop application. In addition, Bellerophon addresses software engineering tasks for the CHIMERA team by providing an automated mechanism for performing regression testing on a variety of supercomputing platforms. Elements of the team’s workflow management needs are met with software tools that dynamically generate code repository statistics, access important online resources, and monitor the current status of several supercomputing resources.
E. J. Lingerfelt, O. E. B. Messer, S. S. Desai, C. A. Holt, E. J. Lentz
148 Toward Better Understanding of the Community Land Model within the Earth System Modeling Framework [abstract]
Abstract: One key factor in the improved understanding of earth system science is the development and improvement of high fidelity earth system models. Along with the deeper understanding of system processes, the complexity of software systems of those modelling systems becomes a barrier for further rapid model improvements and validation. In this paper, we present our experience on better understanding the Community Land Model (CLM) within an earth system modelling framework. First, we give an overview of the software system of the global offline CLM system. Second, we present our approach to better understand the CLM software structure and data structure using advanced software tools. After that, we focus on the practical issues related to CLM computational performance and individual ecosystem function. Since better software engineering practices are much needed for general scientific software systems, we hope those considerations can be beneficial to many other modeling research programs involving multiscale system dynamics.
Dali Wang, Joseph Schuchart, Tomislav Janjusic, Frank Winkler, Yang Xu, Christos Kartsaklis
155 Detecting and visualising process relationships in Erlang [abstract]
Abstract: Static software analyser tools can help in program comprehension by detecting relations among program parts. Detecting relations among the concurrent program parts, e.g. relations between processes, is not straightforward. In case of dynamic languages only a (good) approximation of the real dependencies can be calculated. In this paper we present algorithms to build a process relation graph for Erlang programs. The graph contains direct relation through message passing and hidden relations represented by the ETS tables.
Melinda Tóth, István Bozó

Workshop on Computational Finance and Business Intelligence (CFBI) Session 1

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Tully II

Chair: ?

100 Twin Support Vector Machine in Linear Programs [abstract]
Abstract: This paper propose a new algorithm, termed as LPTWSVM, for binary classification problem by seeking two nonparallel hyperplanes which is an improved method for TWSVM. We improve the recently proposed ITSVM and develop Generalized ITSVM. A linear function is chosen in the object function of Generalized ITSVM which leads to the primal problems of LPTWSVM. Comparing with TWSVM, a 1-norm regularization term is introduced to the objective function to implement structural risk minimization and the quadratic programming problems are changed to linear programming problems which can be solved fast and easily. Then we do not need to compute the large inverse matrices or use any optimization trick in solving our linear programs and the dual problems are unnecessary in the paper. We can introduce kernel function directly into nonlinear case which overcome the serious drawback of TWSVM. The numerical experiments verify that our LPTWSVM is very effective.
Dewei Li, Yingjie Tian
240 Determining the time window threshold to identify user sessions of stakeholders of a commercial bank portal [abstract]
Abstract: In this paper, we focus on finding the suitable value of the time threshold, which is then used in the method of user session identification based on the time. To determine its value, we used the Length variable representing the time a user spent on a particular site. We compared two values of time threshold with experimental methods of user session identification based on the structure of the web: Reference Length and H-ref. When comparing the usefulness of extracted rules using all four methods, we proved that the use of the time threshold calculated from the quartile range is the most appropriate method for identifying sessions for web usage mining.
Jozef Kapusta, Michal Munk, Peter Svec, Anna Pilkova
183 Historical Claims Data Based Hybrid Predictive Models for Hospitalization [abstract]
Abstract: Over $30 billion are wasted on unnecessary hospitalization each year, therefore it is needed to nd a better quantitative way to identify patients who are mostly likely to be hospitalized and then provide them utmost care. As a good starting point, the objective of this paper was to develop a predictive model to predict how many days patients may spend in the hospital next year based on patients' historical claims dataset, which is provided by the Heritage Health Prize Competition. The proposed predictive model applied the ensemble of binary classication and regression techniques. The model is evaluated on testing dataset in terms of the Root-Mean-Square-Error (RMSE). The best RMSE score was 0.474, and the corresponding prediction accuracy 81.9% was reasonably high. Therefore it is convincing to conclude that predictive models have the potentials to predict hospitalization and improve patients' quality of life.
Chengcheng Liu, Yong Shi

Workshop on Teaching Computational Science (WTCS) Session 1

Time and Date: 11:00 - 12:40 on 11th June 2014

Room: Rosser

Chair: Angela Shiflet

56 An Introduction to Agent-Based Modeling for Undergraduates [abstract]
Abstract: Agent-based modeling (ABM) has become an increasingly important tool in computational science. Thus, in the final week of the in 2013 fall semester, Wofford College's undergraduate Modeling and Simulation for the Sciences course (COSC/MATH 201) considered ABM using the NetLogo tool. The students explored existing ABMs and completed two tutorials that developed models on unconstrained growth and the average distance covered by a random walker. The models demonstrated some of the utility of ABM and helped illustrate the similarities and differences between agent-based modeling and previously discussed techniques—system dynamics modeling, empirical modeling, and cellular automaton simulations. Improved test scores and questionnaire results support the success of the goals for the week.
Angela Shiflet, George Shiflet
220 Computational Science for Undergraduate Biologists via QUT.Bio.Excel [abstract]
Abstract: Molecular biology is a scientific discipline which has changed fundamentally in character over the past decade to rely on large scale datasets – public and locally generated - and their computational analysis and annotation. Undergraduate education of biologists must increasingly couple this domain context with a data-driven computational scientific method. Yet modern programming and scripting languages and rich computational environments such as R and matlab present significant barriers to those with limited exposure to computer science, and may require substantial tutorial assistance over an extended period if progress is to be made. In this paper we report our experience of undergraduate bioinformatics education using the familiar, ubiquitous spreadsheet environment of Microsoft Excel. We describe a configurable extension called QUT.Bio.Excel, a custom ribbon, supporting a rich set data sources, external tools and interactive processing within the spreadsheet, and a range of problems to demonstrate its utility and success in addressing the needs of students over their studies.
Lawrence Buckingham, James Hogan
54 A multiple intelligences theory-based 3D virtual lab environment for digital systems teaching [abstract]
Abstract: This paper describes a 3D virtual lab environment that was developed using OpenSim software integrated into Moodle. Virtuald software tool was used to provide pedagogical support to the lab by enabling to create online texts and delivering them to the students. The courses taught in this virtual lab are methodologically in conformity to theory of multiple intelligences. Some results are presented
Toni Amorim, Norian Marranghello, Alexandre C.R. Silva, Aledir S. Pereira, Lenadro Tapparo
349 Exploring Rounding Errors in Matlab using Extended Precision [abstract]
Abstract: We describe a simple package of Matlab programs which implements an extended-precision class in Matlab. We give some examples of how this class can be used to demonstrate the effects of rounding errors and truncation errors in scientific computing. The package is based on a representation called Double-Double, which represents each floating-point real as an unevaluated sum of IEEE double-precision floating point numbers. This allows Matlab computations that are accurate to 30 decimal digits. The data structure, basic arithmetic and elementary functions are implemented as a Matlab class, entirely using the Matlab programming language.
Dina Tsarapkina, David Jeffrey

Workshop on Teaching Computational Science (WTCS) Session 2

Time and Date: 14:10 - 15:50 on 11th June 2014

Room: Rosser

Chair: Angela Shiflet

339 Double-Degree Master's Program in Computational Science: Experiences of ITMO University and University of Amsterdam [abstract]
Abstract: We present a new double-degree graduate (Master's) programme developed together by the ITMO University, Russia and University of Amsterdam, The Netherlands. First, we look into the global aspects of integration of different educational systems and list some funding opportunities from European foundations. Then we describe our double-degree program curriculum, suggest the timeline of enrollment and studies, and give some examples of student research topics. Finally, we discuss the peculiarities of joint programs with Russia, reflect on the first lessons learnt, and share our thoughts and experiences that could be of interest to the international community expanding the educational markets to the vast countries like Russia, China or India. The paper is written for education professionals and contains useful information for potential students.
Alexey Dukhanov, Valeria Krzhizhanovskaya, Anna Bilyatdinova, Alexander Boukhanovsky, Peter Sloot
254 Critical Issues in the Teaching of High Performance Computing to Postgraduate Scientists [abstract]
Abstract: High performance computing is in increasing demand, especially with the need to conduct parallel processing on very large datasets, whether evaluated by volume, velocity and variety. Unfortunately the necessary skills - from familiarity with the command line interface, job submission, scripting, through to parallel programming - is not commonly taught at the level required for most researchers. As a result the uptake of HPC usage remains disproportionately low, with emphasis on system metrics taking priority, leading to a situation described as 'high performance computing considered harmful'. Changing this is not of a problem of computational science but rather a problem for computational science which can only be resolved from an multi-disciplinary approach. The following example addresses the main issues in such teaching and thus makes an appeal to some universality in application which may be useful for other institutions. For the past several years the Victorian Partnership for Advanced Computing (VPAC) has conducted a range of training courses designed to bring the capabilities of postgraduate researchers to a level of competence useful for their research. These courses have developed in this time, in part through providing a significantly wider range of content for varying skillsets, but more importantly by introducing some of the key insights from the discipline of adult and tertiary education in the context of the increasing trend towards lifelong learning. This includes an androgogical orientation, providing integrated structural knowledge, encouraging learner autonomy, self-efficacy, and self-determination, utilising appropriate learning styles for the discipline, utilising modelling and scaffolding for example problems (as a contemporary version of proximal learning), and following up with a connectivist mentoring and outreach program in the context of a culturally diverse audience.
Lev Lafayette
89 A High Performance Computing Course Guided by the LU Factorization [abstract]
Abstract: This paper presents an experience of Problem-based learning in a High Performance Computing course. The course is part of the specialization of High Performance Architectures and Supercomputing in a Master on New Technologies in Computer Science. It is supposed the students have a basic knowledge of Parallel Programming, but previous studies and the place where they were taken mean the group is heterogeneous. The Problem-based learning approach therefore has to facilitate the individual development and supervision of the students. The course focuses on HPC, matrix computation, parallel libraries, heterogeneous computing and scientific applications of parallelism. The students work on the different aspects of the course using the LU factorization, developing their own implementations, using different libraries, combining different levels of parallelism and conducting experiments in a small heterogeneous cluster composed of multicores of different characteristics and with GPU of different types.
Gregorio Bernabé, Javier Cuenca, Luis P. Garcia, Domingo Gimenez, Sergio Rivas-Gomez
50 Teaching High Performance Computing using BeesyCluster and Relevant Usage Statistics [abstract]
Abstract: The paper presents motivations and experiences from using the BeesyCluster middleware for teaching high performance computing at the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology. Features of BeesyCluster well suited for conducting courses are discussed including: easy-to-use WWW interface for application development and running hiding queuing systems, publishing applications as services and running in a sandbox by novice users, team work and workflow management environments. Additionally, practical experiences are discussed from courses: High Performance Computing Systems and Architectures of Internet Services. For the former, activities such as the number of team work activities, numbers of applications run on clusters and the number of WWW user sessions are shown over the period of one semester. Results of survey from a general course on BeesyCluster for HPC conducted for the university staff and students are also presented.
Pawel Czarnul

Multiscale Modelling and Simulation (MSCALE) Session 1

Time and Date: 16:20 - 18:00 on 11th June 2014

Room: Rosser

Chair: Valeria Krzhizhanovskaya

126 Restrictions in model reduction for polymer chain models in dissipative particle dynamics [abstract]
Abstract: We model high molecular weight homopolymers in semidilute concentration via Dissipative Particle Dynamics (DPD). We show that in model reduction methodologies for polymers it is not enough to preserve system properties (i.e., density $\rho$, pressure $p$, temperature $T$, radial distribution function $g(r)$) but preserving also the characteristic shape and length scale of the polymer chain model is necessary. In this work we apply a DPD-model-reduction methodology recently proposed; and demonstrate why the applicability of this methodology is limited upto certain maximum polymer length, and not suitable for solvent coarse graining.
Nicolas Moreno, Suzana Nunes, Victor M. Calo
353 Simulation platform for multiscale and multiphysics modeling of OLEDs [abstract]
Abstract: We present a simulation platform which serves as an integrated framework for multiscale and multiphysics modeling of Organic Light Emitting Diodes (OLEDs) and their components. The platform is aimed at the designers of OLEDs with various areas of expertise ranging from the fundamental theory to the manufacturing technology. The platform integrates an extendable set of in-house and third-party computational programs that are used for predictive modeling of the OLED parameters important for device performance. These computational tools describe properties of atomistic, mesoscale and macroscopic levels. The platform automates data exchange between these description levels and allows one to build simulation workflows and manage remote task execution. The integrated database provides data exchange and storage for the calculated and experimental results.
Maria Bogdanova, Sergey Belousov, Ilya Valuev, Andrey Zakirov, Mikhail Okun, Denis Shirabaykin, Vasily Chorkov, Petr Tokar, Andrey Knizhnik, Boris Potapkin, Alexander Bagaturyants, Ksenia Komarova, Mikhail Strikhanov, Alexey Tishchenko, Vladimir Nikitenko, Vasili Sukharev, Natalia Sannikova, Igor Morozov
336 Scaling acoustical calculations on multicore, multiprocessor and distributed computer environment [abstract]
Abstract: Applying computer systems to calculate acoustic fields is a commonly used practice due to generally high complexity of such tasks. Although implementing algorithmic and software solutions to calculate acoustical fields faces a wide variety of problems caused by impossibility to represent algorithmically all of the physical laws involved in calculation of the field distribution, in all variety of mediums, with wide sets of the field parameters and its sources. Therefore there are lots of limitations on tasks being solved by one simulation system. At the same time a large number of calculations are required to perform general simulation tasks for all of the sets of input parameters. Therefore it is important to develop new algorithmic solutions to calculate acoustic fields for wider range of input parameters, providing scalability to many parallel and distributed computers to increase maximum allowed levels of computation loads with adequate time and cost consumptions caused by the simulation. Tasks of calculating acoustic fields may belong to various domains from the point of view to the physical laws involved in the calculation. In the article a general architecture of the simulation system is presented providing structure and functionality of the system at the top level and its domain independent subsystems. The complete architecture can be defined only for the specific class of calculation tasks. The two classes of them are described: simulating acoustical fields in enclosed rooms and in natural stochastic deep-water waveguides.
Andrey Chusov, Lubov Statsenko, Yulia Mirgorodskaya, Boris Salnikov and Evgeniya Salnikova
384 PyGrAFT: Tools for Representing and Managing Regular and Sparse Grids [abstract]
Abstract: Many computational science applications perform compute-intensive operations on scalar and vector fields residing on multidimensional grids. Typically these codes run on supercomputers--large multiprocessor commodity clusters or hybrid platforms that combine CPUs with accelerators such as GPUs. The Python Grids and Fields Toolkit (PyGrAFT) is set of classes, methods, and library functions for representing scalar and vector fields residing on multidimensional, logically Cartesian--including curvilinear--grids. The aim of PyGrAFT is to accelerate development of numerical analysis applications by combining the high programmer productivity of Python with the high performance of lower-level programming languages. The PyGrAFT data model--which leverages the NumPy {\sf{ndarray}} class--enables representation of tensor product grids of arbitrary dimension, and collections of scalar and/or vector fields residing on these grids. Furthermore, the PyGrAFT data model allows the user to choose field storage ordering for optimal performance for the target application. Class support methods and library functions are implemented to employ where possible reliable, well-tested, high-performance packages from the python software ecosystem (e.g., NumPy, SciPy, mpi4py). The PyGrAFT data model and library is applicable to global address spaces and distributed-memory platforms that utilise MPI. Library operations include intergrid interpolation and support for multigrid solver techniques such as the sparse grid combination technique. We describe the PyGrAFT data model, its parallelisation, and strategies currently underway to explore opportunities for providing multilevel parallelism with relatively little user effort. We illustrate the PyGrAFT data model, library functions, and resultant programming model in action for a variety of applications, including function evaluation, PDE solvers, and sparse grid combination technique solvers. We demonstarte the language interoperability PyGrAFT with a C++ example, and outline strategies for using PyGrAFT with legacy codes written in other programming languages. We explore the implications of this programming model for an emerging problem in computational science and engineering---modelling multiphysics and multiscale systems. We conclude with an outline of the PyGrAFT development roadmap, including full support for vector fields and calculations in curvilinear coordinates, support for GPUs and other parallelisation schemes, and extensions to the PyGrAFT model to accomodate general multiresolution numerical methods.
Jay Larson

Computational Optimization, Modelling and Simulation (COMS) Session 1

Time and Date: 11:00 - 12:40 on 12th June 2014

Room: Tully II

Chair: Leifur Leifsson

94 Fast Low-fidelity Wing Aerodynamics Model for Surrogate-Based Shape Optimization [abstract]
Abstract: Variable-fidelity optimization (VFO) can be efficient in terms of the computational cost when compared with traditional approaches, such as gradient-based methods with adjoint sensitivity information. In variable-fidelity methods, the direct optimization of the expensive high-fidelity model is replaced by iterative re-optimization of a physics-based surrogate model, which is constructed from a corrected low-fidelity model. The success of VFO is dependent on the reliability and accuracy of the low-fidelity model. In this paper, we present a way to develop a fast and reliable low-fidelity model suitable for aerodynamic shape of transonic wings. The low-fidelity model is component based and accounts for the zero-lift drag, induced drag, and wave drag. The induced drag can be calculated by a proper method, such lifting line theory or a panel method. The zero-lift drag and the wave drag can be calculated by two-dimensional flow model and strip theory. Sweep effects are accounted for by simple sweep theory. The approach is illustrated by a numerical example where the induced drag is calculated by a vortex lattice method, and the zero-lift drag and wave drag are calculated by MSES (a viscous-inviscid method). The low-fidelity model is roughly 320 times faster than a high-fidelity computational fluid dynamics models which solves the Reynolds-averaged Navier-Stokes equations and the Spalart-Allmaras turbulence model. The responses of the high- and low-fidelity models compare favorably and, most importantly, show the same trends with respect to changes in the operational conditions (Mach number, angle of attack) and the geometry (the airfoil shapes).
Leifur Leifsson, Slawomir Koziel, Adrian Bekasiewicz
128 Minimizing Inventory Costs for Capacity-Constrained Production using a Hybrid Simulation Model [abstract]
Abstract: A hybrid simulation model is developed to determine the cost-minimizing target level for a single-item, single-stage production-inventory system. The model is based on a single discrete-event simulation of the unconstrained production system, from which an analytical approximation of the inventory shortfall is derived. Using this analytical expression it is then possible to evaluate inventory performance, and associated costs, at any target level. From these calculations, the cost-minimizing target level can be found efficiently using a local search. Computational experiments show the model remains highly accurate at high levels of demand variation, where existing analytical methods are known to be inaccurate. By deriving an expression for the shortfall distribution via simulation, no user modelling of the demand distribution or estimation of demand parameters is required. Thus this model can be applied to situations when the demand distribution does not have an identifiable analytical form.
John Betts
23 Computation on GPU of Eigenvalues and Eigenvectors of a Large Number of Small Hermitian Matrices [abstract]
Abstract: This paper presents an implementation on Graphics Processing Units of QR-Householder algorithm used to find all the eigenvalues and eigenvectors of many small hermitian matrices ( double precision) in a very short time to address time constraints for Radar issues.
Alain Cosnuau
299 COFADMM: A Computational features selection with Alternating Direction Method of Multipliers [abstract]
Abstract: Due to the explosion in size and complexity of Big Data, it is increasingly important to be able to solve problems with very large number of features. Classical feature selection procedures involves combinatorial optimization, with computational time increasing exponentially with the number of features. During the last decade, penalized regression has emerged as an attractive alternative for regularization and high dimensional feature selection problems. Alternating Direction Method of Multipliers (ADMM) optimization is suited for distributed convex optimization and distributed computing for big data. The purpose of this paper is to propose a broader algorithm COFADMM which combines the strength of convex penalized techniques in feature selection for big data and the power of the ADMM for optimization. We show that combining the ADMM algorithm with COFADMM can provide a path of solutions efficiently and quickly. COFADMM is easy to use, is available in C, Matlab upon request from the corresponding author.
Mohammed Elanbari, Sidra Alam, Halima Bensmail
101 Computational Optimization, Modelling and Simulation: Past, Present and Future [abstract]
Abstract: An integrated part of modern design practice in both engineering and industry is simulation and optimization. Significant challenges still exist, though huge progress has been made in the last few decades. This 5th workshop on Computational Optimization, Modelling and Simulation (COMS 2014) at ICCS 2014 will summarize the latest developments of optimization and modelling and their applications in science, engineering and industry. This paper reviews the past developments, the state-of-the-art present and the future trends, while highlighting some challenging issues in these areas. It can be expected that future research should focus on the data intensive applications, approximations for computationally expensive methods, combinatorial optimization, and large-scale applications.
Xin-She Yang, Slawomir Koziel, Leifur Leifsson

Computational Optimization, Modelling and Simulation (COMS) Session 2

Time and Date: 14:10 - 15:50 on 12th June 2014

Room: Tully II

Chair: Leifur Leifsson

75 Low-Cost EM-Simulation-Driven Multi-Objective Optimization of Antennas [abstract]
Abstract: A surrogate-based method for efficient multi-objective antenna optimization is presented. Our technique exploits response surface approximation (RSA) model constructed from sampled low-fidelity antenna model (here, obtained through coarse-discretization EM simulation). The RSA model enables fast determination of the best available trade-offs between conflicting design goals. A low-cost RSA model construction is possible through initial reduction of the design space. Optimization of the RSA model has been carried out using a multi-objective evolutionary algorithm (MOEA). Additional response correction techniques have been subsequently applied to improve selected designs at the high-fidelity EM antenna model level. The refined designs constitute the final Pareto set representation. The proposed approach has been validated using an ultra-wideband (UWB) monocone and a planar Yagi-Uda antenna.
Adrian Bekasiewicz, Slawomir Koziel, Leifur Leifsson
47 Solution of the wave-type PDE by numerical damping control multistep methods [abstract]
Abstract: The second order Ordinary Differential Equation (ODE) system obtained after semidiscretizing the wave-type partial differential equation (PDE) with the finite element method (FEM) shows strong numerical stiffness. Its resolution requires the use of numerical methods with good stability properties and controlled numerical dissipation in the high-frequency range. The HHT-$\alpha$ and BDF-$\alpha$ methods are second order precision, unconditionally stable and able to dissipate high-modes for some values of the parameters. The finite element method has been applied to the one-dimensional linear wave-type PDE and to a non-linear version of a string of a guitar. The ODE systems obtained after applying FEM are solved by these two methods, proving that both are able to dissipate the high-modes.
Elisabete Alberdi Celaya, Juan José Anza Aguirrezabala
274 Preference-Based Fair Resource Sharing and Scheduling Optimization in Grid VOs [abstract]
Abstract: In this paper, we deal with problems of efficient resource management and scheduling in utility Grids. There are global job flows from external users along with resource owners’ local tasks upon resource non-dedication condition. Competition for resource reservation between independent users, local and global job flows substantially complicates scheduling and the requirement to provide the necessary quality of service. A meta-scheduling model, justified in this work, assumes a complex combination of job flow dispatching and application-level scheduling methods for jobs, as well as resource sharing and consumption policies established in virtual organizations (VOs) and based on economic principles. A solution to the problem of fair resource sharing among VO stakeholders with simulation studies is proposed.
Victor Toporkov, Anna Toporkova, Alexey Tselishchev, Dmitry Yemelyanov, Petr Potekhin
370 Variable Neighborhood Search Based Set covering ILP model for the Vehicle Routing Problem with time windows [abstract]
Abstract: In this paper we propose a hybrid metaheuristic based on General Variable Neighbor- hood search and Integer Linear Programming for solving the vehicle routing problem with time windows (VRPTW).The problem consists in determining the minimum cost routes for a homogeneous fleet of vehicles to meet the demand of a set of customers within a specified time windows. The proposed heuristic, called VNS-SCP is considered as a matheuristic where the hybridization of heuristic (VNS) and exact (Set Covering Problem (SCP)) method is used in this approach as an intertwined collaborative cooperation manner. In this approach an initial solution is first created using Solomon route-construction heuristics, the nearest neighbor algorithm. In the second phase the solutions are improved in terms of the total distance traveled using VNS-SCP. The algorithm is tested using Solomon benchmark. Our findings indicate that the proposed procedure outperforms other local searches and metaheuristics.
Amine Dhahri, Kamel Zidi, Khaled Ghedira
70 Nested Space Mapping Technology for Expedite EM-driven Design of Compact RF/microwave Components [abstract]
Abstract: A robust simulation-driven methodology for rapid and reliable design of RF/microwave circuits comprising compact microstrip resonant cells (CMRCs) is presented. We introduce a nested space mapping (NSM) technology, in which the inner space mapping layer is utilized to improve the generalization capabilities of the equivalent circuit model corresponding to a constitutive element of the circuit under consideration. The outer layer enhances the surrogate model of the entire structure under design. We demonstrate that NSM significantly improves performance of surrogate-based optimization of composite RF/microwave structures. It is validated using two examples of UWB microstrip matching transformers (MTs) and compared to other competitive surrogate-assisted methods attempting to solve the problem of compact RF/microwave component design.
Adrian Bekasiewicz, Slawomir Koziel, Piotr Kurgan, Leifur Leifsson

Computational Optimisation in the Real World (CORW) Session 1

Time and Date: 11:00 - 12:40 on 12th June 2014

Room: Tully III

Chair: Timoleon Kipouros

276 Extending the Front: Designing RFID Antennas using Multiobjective Differential Evolution with Biased Population Selection [abstract]
Abstract: RFID antennas are ubiquitous, so exploring the space of high efficiency and low resonant frequency antennas is an important multiobjective problem. Previous work has shown that the continuous solver differential evolution (DE) can be successfully applied to this discrete problem, but has difficulty exploring the region of solutions with lowest resonant frequency. This paper introduces a modified DE algorithm that uses biased selection from an archive of solutions to direct the search toward this region. Results indicate that the proposed approach produces superior attainment surfaces to the earlier work. The biased selection procedure is applicable to other population-based approaches for this problem.
James Montgomery, Marcus Randall, Andrew Lewis
396 Local Search Enabled Extremal Optimisation for Continuous Inseparable Multi-objective Benchmark and Real-World Problems [abstract]
Abstract: Local search is an integral part of many meta-heuristic strategies that solve single objective optimisation problems. Essentially, the meta-heuristic is responsible for generating a good starting point from which a greedy local search will find the local optimum. Indeed, the best known solutions to many hard problems (such as the travelling salesman problem) have been generated in this hybrid way. However, for multiple objective problems, explicit local search strategies are relatively rarely mentioned or applied. In this paper, a generic local search strategy is developed, particularly for problems where it is difficult or impossible to determine the contribution of individual solution components (often referred to as inseparable problems). The meta-heuristic adopted to test this is extremal optimisation, though the local search technique may be used by any meta-heuristic. To supplement the local search strategy a diversication strategy that draws from the external archive is incorporated into the local search strategy. Using benchmark problems, and a real-world airfoil design problem, it is shown that this combination leads to improved solutions.
Marcus Randall, Andrew Lewis, Jan Hettenhausen, Timoleon Kipouros
411 A Web-Based System for Visualisation-Driven Interactive Multi-Objective Optimisation [abstract]
Abstract: Interactive Multi-Objective Optimisation is an increasingly growing field of evolutionary and swarm intelligence-based algorithms. By involving a human decision a set of relevant non-dominated points can often be acquired at significantly lower computational costs than with \textit{a posteriori} algorithms. An often neglected issue in interactive optimisation is the issue of user interface design and the application of interactive optimisation as a design tool in engineering applications. This paper will discuss recent advances made in and moduli for an interactive multi-objective particle swarm optimisation algorithm. The focus of current implementation is on an aeronautics engineering applications, however, use of it for a wide range of other optimisation problems is conceivable.
Jan Hettenhausen, Andrew Lewis, Timoleon Kipouros

Computational Optimisation in the Real World (CORW) Session 2

Time and Date: 14:10 - 15:50 on 12th June 2014

Room: Tully III

Chair: Andrew Lewis

92 A Hybrid Harmony Search Algorithm for Solving Dynamic Optimisation Problems [abstract]
Abstract: Many optimisation problems are dynamic in the sense that changes occur during the optimisation process, and therefore are more challenging than the stationary problems. The occurrences of such problems have attracted researchers into studying areas of artificial intelligence and operational research. To solve dynamic optimisation problems, the proposed approaches should not only attempt to seek the global optima but be able to also keep track of changes in the track record of landscape solutions. Population-based approaches have been intensively investigated to address these problems, as solutions are scattered over the entire search space and therefore helps in recognizing any changes that occur in the search space but however, optimisation algorithms that have been used to solve stationary problems cannot be directly applied to handle dynamic problems without any modifications such as in maintaining population diversity. In this research work, one of the most recent new population-based meta-heuristic optimisation technique called a harmony search algorithm for dynamic optimization problems is investigated. This technique mimics the musical process when a musician attempts to find a state of harmony. In order to cope with a dynamic behaviour, the proposed harmony search algorithm was hybridised with a (i) random immigrant, (ii) memory mechanism and (iii) memory based immigrant scheme. This hybridisation processes help to keep track of the changes and to maintain the population diversity. The performance of the proposed harmony search is verified by using the well-known dynamic test problem called the Moving Peak Benchmark (MPB) with a variety of peaks. The empirical results demonstrate that the proposed algorithm is able to obtain competitive results, but not the best for most of the cases, when compared to the best known results in the scientific literature published so far.
Ayad Turky, Salwani Abdullah, Nasser Sabar
313 Constraint Programming and Ant Colony System for the Component Deployment Problem [abstract]
Abstract: Contemporary motor vehicles have increasing numbers of automated functions to augment the safety and comfort of a car. The automotive industry has to incorporate increasing numbers of processing units in the structure of cars to run the software that provides these functionalities. The software components often need access to sensors or mechanical devices which they are designed to operate. The result is a network of hardware units which can accommodate a limited number of software programs, each of which has to be assigned to a hardware unit. A prime goal of this deployment problem is to nd softwareto-hardware assignments that maximise the reliability of the system. In doing so, the assignments have to observe a number of constraints to be viable. This includes limited memory of a hardware unit, collocation of software components on the same hardware units, and communication between software components. Since the problem consists of many constraints with a signicantly large search space, we investigate an ACO and constraint programming (CP) hybrid for this problem. We nd that despite the large number of constraints, ACO on its own is the most eective method providing good solutions by also exploring infeasible regions.
Dhananjay Thiruvady, I. Moser, Aldeida Aleti, Asef Nazari
416 Electrical Power Grid Network Optimisation by Evolutionary Computing [abstract]
Abstract: A major factor in the consideration of an electrical power network of the scale of a national grid is the calculation of power flow and in particular, optimal power flow. This paper considers such a network, in which distributed generation is used, and examines how the network can be optimized, in terms of transmission line capacity, in order to obtain optimal or at least high-performing configurations, using multi-objective optimisation by evolutionary computing methods.
John Oliver, Timoleon Kipouros, Mark Savill

International Workshop on Advances in High-Performance Computational Earth Sciences (IHPCES) Session 1

Time and Date: 11:00 - 12:40 on 12th June 2014

Room: Bluewater I

Chair: Kengo Nakajima

408 Application-specific I/O Optimizations on Petascale Supercomputers [abstract]
Abstract: Data-intensive science frontiers and challenges are emerging as computer technology has evolved substantially. Large-scale simulations demand significant I/O workload, and as a result the I/O performance often becomes a bottleneck preventing high performance in scientific applications. In this paper we introduce a variety of I/O optimization techniques developed and implemented when scaling a seismic application to petascale. These techniques include file system striping, data aggregation, reader/writer limiting and less interleaving of data, collective MPI-IO, and data staging. The optimizations result in nearly perfect scalability of the target application on some of the most advanced petascale systems. The techniques introduced in this paper are applicable to other scientific applications facing similar petascale I/O challenges.
Efecan Poyraz, Heming Xu, Yifeng Cui
264 A physics-based Monte Carlo earthquake disaster simulation accounting for uncertainty in building structure parameters [abstract]
Abstract: Physics-based earthquake disaster simulations are expected to contribute to high-precision earthquake disaster prediction; however, such models are computationally expensive and the results typically contain significant uncertainties. Here we describe Monte Carlo simulations where 10,000 calculations were carried out with stochastically varied building structure parameters to model 3,038 buildings. We obtain the spatial distribution of the damage caused for each set of parameters, and analyze these data statistically to predict the extent of damage to buildings.
Shunsuke Homma, Kohei Fujita, Tsuyoshi Ichimura, Muneo Hori, Seckin Citak, Takane Hori
391 A quick earthquake disaster estimation system with fast urban earthquake simulation and interactive visualization [abstract]
Abstract: In the immediate aftermath of an earthquake, quick estimation of damage to city structures can facilitate prompt, effective post-disaster measures. Physics-based urban earthquake simulations, using measured ground motions as input, are a possible means of obtaining reasonable estimates. The difficulty of such estimation lies in carrying out the simulation and arriving at a thorough understanding of large-scale time series results in a limited amount of time. We developed an estimation system based on fast urban earthquake disaster simulation, together with an interactive visualization method suitable for GPU workstations. Using this system, an urban area with more than 100,000 structures can be analyzed within an hour and visualized interactively.
Kohei Fujita, Tsuyoshi Ichimura, Muneo Hori, M. L. L. Wijerathne, Seizo Tanaka
397 Several hundred finite element analyses of an inversion of earthquake fault slip distribution using a high-fidelity model of the crustal structure [abstract]
Abstract: To improve the accuracy of inversion analysis of earthquake fault slip distribution, we performed several hundred analyses using a 10^8-degree-of-freedom finite element (FE) model of the crustal structure. We developed a meshing method and an efficient computational method for these large FE models. We applied the model to the inversion analysis of coseismic fault slip distribution for the 2011 Tohoku-oki Earthquake. The high resolution of our model provided a significant improvement of the fidelity of the simulation results compared to existing computational approaches.
Ryoichiro Agata, Tsuyoshi Ichimura, Kazuro Hirahara, Mamoru Hyodo, Takane Hori, Muneo Hori

International Workshop on Advances in High-Performance Computational Earth Sciences (IHPCES) Session 2

Time and Date: 14:10 - 15:50 on 12th June 2014

Room: Bluewater I

Chair: Huilin Xing

334 An out-of-core GPU approach for Accelerating Geostatistical Interpolation [abstract]
Abstract: Geostatistical methods provide a powerful tool to understand the complexity of data arising from Earth sciences. Since the mid 70’s, this numerical approach is widely used to understand the spatial variation of natural phenomena in various domains like Oil and Gas, Mining or Environmental Industries. Considering the huge amount of data available, standard imple- mentations of these numerical methods are not efficient enough to tackle current challenges in geosciences. Moreover, most of the software packages available for geostatisticians are de- signed for a usage on a desktop computer due to the trial and error procedure used during the interpolation. The Geological Data Management (GDM ) software package developed by the French geological survey (BRGM) is widely used to build reliable three-dimensional geological models that require a large amount of memory and computing resources. Considering the most time-consuming phase of kriging methodology, we introduce an efficient out-of-core algorithm that fully benefits from graphics cards acceleration on desktop computer. This way we are able to accelerate kriging on GPU with data 4 times bigger than a classical in-core GPU algorithm, with a limited loss of performances.
Victor Allombert, David Michea, Fabrice Dupros, Christian Bellier, Bernard Bourgine, Hideo Aochi, Sylvain Jubertie
401 Mesh generation for 3D geological reservoirs with arbitrary stratigraphic surface constraints [abstract]
Abstract: With the advanced image, drilling and field observation technology, geological structure of reservoirs can be described in more details. A novel 3D mesh generation method for meshing reservoir models is proposed and implemented with arbitrary stratigraphical surface constraints, which ensures the detailed structure geometries and material properties of reservoirs are better described and analysed. The stratigraphic interfaces are firstly extracted and meshed, and then a tetrahedron mesh is generated in 3D with the constraints of such meshed surfaces. The proposed approach includes the following five steps: (1) extracting stratum interfaces; (2) creating a background mesh with size field on the interfaces; (3) constructing geodesic isolines from interface boundaries to the interior; (4) employing a geodesic-based approach to create surface triangles on the area between adjacent isolines and then merge them together; (5) generating tetrahedron mesh for 3D reservoirs with constraints of generated surface triangular mesh. The proposed approach has been implemented and applied to the Lawn Hill reservoir as a practical example to demonstrate its effectiveness and usefulness.
Huilin Xing, Yan Liu
403 Performance evaluation and case study of a coupling software ppOpen-MATH/MP [abstract]
Abstract: We are developing a coupling software ppOpen-MATH/MP. ppOpen-MATH/MP is characterized by its wide applicability. This feature comes from the design that grid point correspondence and interpolation coefficients should be calculated in advance. However, calculation of these values on the unstructured grid model requires a lot of computation time in general. So, we developed new effective algorithm and program for calculating the grid point correspondence as a pre-processor of ppOpen-MATH/MP. In this article, an algorithm and performance evaluation of the program is presented in the first half, and next, an application example of ppOpen-MATH/MP, targeting atmospheric model NICAM and ocean model COCO coupling, is described.
Takashi Arakawa, Takahiro Inoue, Masaki Sato
402 Implementation and Evaluation of an AMR Framework for FDM Applications [abstract]
Abstract: In order to execute various finite-difference method applications on large-scale parallel computers with a reasonable cost of computer resources, a framework using an adaptive mesh refinement (AMR) technique has been developed. AMR can realize high-resolution simulations while saving computer resources by generating and removing hierarchical grids dynamically. In the AMR framework, a dynamic domain decomposition (DDD) technique, as a dynamic load balancing method, is also implemented to correct the computational load imbalance between each process associated with parallelization. By performing a 3D AMR test simulation, it is confirmed that dynamic load balancing can be achieved and execution time can be reduced by introducing the DDD technique.
Masaharu Matsumoto, Futoshi Mori, Satoshi Ohshima, Hideyuki Jitsumoto, Takahiro Katagiri, Kengo Nakajima

Workshop on Computational Chemistry and Its Applications (CCA) Session 1

Time and Date: 11:20 - 13:00 on 10th June 2014

Room: Bluewater II

Chair: Ponnadurai Ramasami

112 Computer-aided design of stereocontrol agents for radical polymerization [abstract]
Abstract: Controlling the stereochemistry of a polymer is highly desirable as this can affect its physical properties, such as its crystallinity, melting point, solubility and mechanical strength. Stereoregular polymers are normally prepared using expensive transition metal catalysts, which typically require demanding reaction conditions, and extending stereochemical-control to free radical polymerization has been a long sought goal. For monomers containing carbonyl groups certain Lewis acids have been shown to be capable of manipulating the stereochemistry, presumably via coordination to the polymer and/or monomer side chains so as to constrain their relative orientations during the propagation step. However, specific mechanistic details have yet to be clarified, and the degree of stereocontrol remains poor. To address these problems, we have been using computational chemistry, supported by polymerization experiments, to study the mechanism of stereocontrol in a variety of free-radical polymerization processes, and to predict the effect of solvents and novel control agents on polymer tacticity. Interestingly we have discovered that many Lewis acids do selectively coordinate to the terminal and penultimate radical side chains in a manner that should, in principle, facilitate control. However, this coordination is driven by the resulting stabilization of the propagating radical, which, ironically, deactivates it toward propagation. At the same time a less energetically favourable, non-controlling coordination mode involving the monomer side chain catalyzes propagation reaction and this provides the dominant reaction path. On this basis we suggest that simultaneous coordination to the monomer and propagating radical using bridging ligands or mixed Lewis acids may provide the way forward.
Michelle Coote, Benjamin Noble and Leesa Smith
38 Correlation between Franck-Condon Factors and Average Internulcear Separations for Diatomics Using the Fourier Grid Hamiltonian Method [abstract]
Abstract: The Fourier Grid Hamiltonian (FGH) Method is used to compute the vibrational eigenvalues and eigenfunctions of bound states of diatomic molecules. For these computations, the Hulburt and Hirschfelder (HH) potential model for diatomics is used. These potential energy functions are used for constructing and diagonalizing the molecular Hamiltonians. The vibrational wave functions for the ground and the excited states are used to calculate the Franck-Condon factors (FCFs), r-Centroids and average internuclear separations which play a significant role in determining the intensity of the bands in electronic transitions. The results of FCFs and r-Centroids for diatomic molecules such as H2, N2, CO, I2 and HF using the FGH method are compared with other methods. The FGH method provides an efficient and accurate alternative to calculate FCFs and other parameters that depend on the vibrational wavefunctions of the ground and exited electronic states. The Franck-Condon profiles indicate a strong correlation between the values of Franck-Condon factors and the mean internuclear separations for the corresponding transitions.
Mayank Kumar Dixit, Abhishek Jain, Bhalachandra Laxmanrao Tembe
122 Using hyperheuristics to improve the determination of the kinetic constants of a chemical reaction in heterogeneous phase [abstract]
Abstract: The reaction in the human stomach when neutralizing acid with an antacid tablet is simulated and the evolution over time of the concentration of all chemical species present in the reaction medium is obtained. The values of the kinetic parameters of the chemical reaction can be determined by integrating the equation of the reaction rate. This is a classical optimization problem that can be approached with metaheuristic methods. The use of a parallel, parameterized scheme for metaheuristics facilitates the development of metaheuristics and their application. The unified scheme can also be used to implement hyperheuristics on top of parameterized metaheuristics, so selecting appropriate values for the metaheuristic parameters, and consequently the metaheuristic itself. The hyperheuristic approach provides satisfactory values for the metaheuristic parameters and, consequently, satisfactory metaheuristics for the problem of determining the kinetic constants.
José Matías Cutillas Lozano, Domingo Gimenez
267 Speeding up Monte Carlo Molecular Simulation by a Non-Conservative Early Rejection Scheme [abstract]
Abstract: Molecular simulation describes fluid systems in detailed fashion. In general, they are more accurate and representative than equations of state. However, they require much more computational efforts. Several techniques have been developed in order to speed up Monte Carlo (MC) molecular simulations while preserving their precision. In particular, early rejection schemes are capable of reducing computational cost by reaching the rejection decision for the undesired MC trials at early stages. In this work, the introduced scheme is based on the fact that the energy due to interaction between any couple of Lennard-Jones (LJ) sites cannot be lower than a certain minimum energy that can be easily computed. It is called “non-conservative” as it generates slightly different Markov chains than the ones generated by the conventional algorithms. Nonetheless, the numerical experiments conducted show that these modifications are not significant, and both the proposed and the conventional methods converge to the same ensemble averages. In this study, the non-conservative scheme is first introduced and then compared to the conservative and bond formation early rejection schemes. The method was tested for LJ particles in canonical ensemble at several thermodynamic conditions. Results showed a relation between the thermodynamic conditions and the percentage of the CPU time saved. In principle, more CPU time was saved at conditions with high rejection rates for the MC trials. The non-conservative early rejection scheme was successful in saving more than 45 % of the CPU time needed by the conventional algorithms in canonical ensemble. Finally, this work presents an efficient early rejection method to accelerate MC molecular simulations which is easily extendable to other ensembles and complex molecules.
Ahmad Kadoura, Amgad Salama and Shuyu Sun

Workshop on Computational Chemistry and Its Applications (CCA) Session 2

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Bluewater II

Chair: Ponnadurai Ramasami

21 A Computational Study of 2-Selenobarbituric Acid: Conformational Analysis, Enthalpy of Formation, Acidity and Basicity [abstract]
Abstract: A computational study of the compound containing selenium, 2-selenobarbituric acid, has been carried out. Tautomerism has been studied not only in neutral forms but also in the protonated and deprotonated species. The most stable tautomers for neutral and deprotonated species are equivalent to those obtained by different authors for the analogous barbituric and 2-thiobarbituric acids. However, the most stable tautomer for the protonated 2-selenobarbituric acid differs of that proposed for the analogous compounds. The enthalpy of formation in the gas phase, and the gas-phase acidity and basicity of 2-selenobarbituric acid have been calculated at the G3 and G4 levels, together with the corresponding values for barbituric and 2-thiobarbituric acids. The calculated acidity shows that 2-selenobarbituric acid is a very strong Brønsted acid in the gas phase.
Rafael Notario
139 Origin of the Extra Stability of Alloxan.A Computation Study [abstract]
Abstract: Detailed DFT computations and quantum dynamics simulations have been carried out to establish the origin of the extra stability of alloxan.. The effect of solvent, basis set and DFT methods have been examined. Two non-covalent intermolecular dimers of alloxan, namely the H-bonded and the dipolar dimers have been investigated to establish their relative stability. Quantum chemical topology features and NBO analysis have been performed.
Saadullah Aziz, Rifaat Hilal, Basmah Allehyani, Shabaan Elroby
303 The Impact of p-orbital on Optmization of ReH7(PMe3)2 Compound [abstract]
Abstract: This study investigates the importance of the p-function used in the computational modeling. The geometric changes of ReH7(PMe3)2 system is used as the model compound. 6-31G, 6-311G and 6-311++G basis sets were used for all elements except Re, which used Christiansen et. al. basis set. Upon removing the p-function on metal, we noticed the geometric changes are minimal as long as triple-zeta basis sets are used for rest of elements. While the relative energy profile of a reaction would still reasonably assemble each other, a direct comparison in energy between the basis set with and without p-function is not recommended
Nnenna Elechi, Daniel Tran, Mykala Taylor, Odaro Adu, Huajun Fan
60 Exploring the Conical Intersection Seam in Cytosine: A DFT and CASSCF Study [abstract]
Abstract: The geometry, energetics and dipole moment of the most stable conformers of cytosine in the ground state were calculated at different density functional methods, namely, B3LYP, M06-2X, ωB97-D and PEBPEB methods and the 6-311++G(3df,3pd) basis set. The most stable conformer, the keto-amino conformer is only 1 Kcal/mol more stable than the imino-enol form. The ultrafast radiationless decay mechanism has been theoretically investigated using Complete Active Space Multiconfiguration SCF calculation. The conical intersection seam was searched in the full dimensional space for the vibrational degrees of freedom. A new conical intersection has been identified, a semi-planar conical intersection (SPCI) with main deformations inside the cytosine ring and C=O bond. The g-vector and h-vector for the semi-planar conical intersection were calculated and discussed along with their geometrical parameters. A classical trajectory dynamic simulation has been performed to characterize and identify the evolution of geometry and energy changes of the SPCI with time.
Rifaat Hilal, Saadullah Aziz, Shabaan Elrouby, Walid Hassan

Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY) Session 1

Time and Date: 16:30 - 18:10 on 10th June 2014

Room: Bluewater I

Chair: Stéphane Louise

348 τC: C with Process Network Extensions for Embedded Manycores [abstract]
Abstract: Current and future embedded manycores targets bring complex and heterogeneous architectures with a large number of processing cores, making both parallel programming to this scale and understanding the architecture itself a daunting task. Process Networks and other dataflow based Models of Computation (MoC) are a good base to present a universal model of the underlying manycore architectures to the programmer. If a language displays a simple to grasp MoC in a consistent way across architectures, the programmer can concentrate the efforts on optimizing the expression of parallelism in the application instead of porting a given code on a given system. Such goal would provide the C-language equivalent for the manycores. In this paper, we present a process network extension to C called τ C and its mapping to both a POSIX target and the P2012/STHORM platform, and show how the language offers an architecture independent solution of this problem.
Thierry Goubier, Damien Couroussé, Selma Azaiez
96 Application-Level Performance Optimization: A Computer Vision Case Study on STHORM [abstract]
Abstract: Computer vision applications constitute one of the key drivers for embedded many-core architectures. In order to exploit the full potential of such systems, a balance between computation and communication is critical, but many computer vision algorithms present a highly data-dependent behavior that complexify this task. To enable application performance optimization, the development environment must provide the developer with tools for fast and precise application-level performance analysis. We describe the process to port and optimize a face detection application onto the STHORM many-core accelerator using the STHORM OpenCL SDK. We identify the main factors that limit performance and discern the contributions arising from: the application itself, the OpenCL programming model, and the STHORM OpenCL SDK. Finally, we show how these issues can be addressed in the future to enable developers to further improve application performance.
Vítor Schwambach, Sébastien Cleyet-Merle, Alain Issard, Stéphane Mancini
387 Generating Code and Memory Buffers to Reorganize Data on Many-core Architectures [abstract]
Abstract: The dataflow programming model has shown to be a relevant approach to efficiently run massively parallel applications over many-core architectures. In this model, some particular builtin agents are in charge of data reorganizations between user agents. Such agents can Split, Join and Duplicate data onto their communication ports. They are widely used in signal processing for example. These system agents, and their associated implementations, are of major importance when it comes to performances, because they can stand on the critical path (think about Amdhal's law). Furthermore, a particular data reorganization can be expressed by the developer in several ways, that may lead to inefficient solutions (mostly unneeded data copies and transfers). In this paper, we propose several strategies to manage data reorganization at compile time, with a focus on indexed accesses to shared buffers to avoid data copies. These strategies are complementary: they ensure correctness for each system agent configuration, as well as performance when possible. They have been implemented within the Sigma-C industry-grade compilation toolchain and evaluated over the Kalray MPPA 256-core processor.
Loïc Cudennec, Paul Dubrulle, François Galea, Thierry Goubier, Renaud Sirdey
359 Self-Timed Periodic Scheduling For Cyclo-Static DataFlow Model [abstract]
Abstract: Real-time and Time constrained applications programmed on many-core systems can suffer from unmet timing constraints even with correct-by-construction schedules. Such unexpected results are usually caused by unaccounted for delays of resource sharing (\emph{e.g.} the communication medium). In this paper we address the three main sources of unpredictable behaviors: First, we propose to use a deterministic Model of Computation (MoC), more specifically, the well-formed CSDF subset of process networks; Second, we propose a run-time management strategy of shared resources to avoid unpredictable timings; Third, we promote the use of a new scheduling policy, the so-said Self-Timed Periodic (STP) scheduling, to improve performance and decrease synchronization costs by taking into account resource sharing or resource constraints. This is a quantitative improvement above state-of-the-art scheduling policies which assumed fixed delays of inter-processor communication and did not take correctly into account subtle effects of synchronization.
Dkhil Ep.Jemal Amira, Xuankhanh Do, Stephane Louise, Dubrulle Paul, Christine Rochange