Main Track (MT) Session 1
Time and Date: 11:20 - 13:00 on 10th June 2014
Room: Kuranda
Chair: J. Betts
12 | SparseHC: a memory-efficient online hierarchical clustering algorithm [abstract] Abstract: Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space in terms of the number objects, the design of memory-efficient approaches is of high importance to research. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted (possibly sparse) distance matrix chunk-by-chunk and a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The key insight used is that for finding the cluster pair with the smallest distance, it is not necessary to wait for completing the computation of all cluster pair distances. Partial information can be used to determine a lower bound on cluster pair distances that are used for cluster distance comparison. Our experimental results show that SparseHC achieves a linear empirical memory complexity, which is a significant improvement compared to existing algorithms. |
Thuy Diem Nguyen, Bertil Schmidt, Chee Keong Kwoh |
131 | Tuning Basic Linear Algebra Routines for Hybrid CPU+GPU Platforms [abstract] Abstract: The introduction of auto-tuning techniques in linear algebra routines using hybrid combinations of multiple CPU and GPU computing resources is analyzed. Basic models of the execution time and} information obtained during the installation of the routines are used to optimize the execution time with a balanced assignation of the work to the computing components in the system. The study is carried out with a basic kernel (matrix-matrix multiplication) and a higher level routine (LU factorization) using GPUs and the host multicore processor. Satisfactory results are obtained, with experimental execution times close to the lowest experimentally achievable. |
Gregorio Bernabé, Javier Cuenca, Domingo Gimenez, Luis-Pedro García |
143 | A portable OpenCL Lattice Boltzmann code for multi- and many-core processor architectures [abstract] Abstract: The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one must carefully assess the relative costs of portability versus computing efficiency, and find a reasonable tradeoff point. In this paper we address precisely this issue, using as test-bench a Lattice Boltzmann code implemented in OpenCL. We analyze its performance on several different state-of-the-art processors: NVIDIA GPUs and Intel Xeon-Phi many-core accelerators, as well as more traditional Ivy Bridge and Opteron multi-core commodity CPUs. We also compare with results obtained with codes specifically optimized for each of these systems. Our work shows that a properly structured OpenCL code runs on many different systems reaching performance levels close to those obtained by architecture-tuned CUDA or C codes. |
Enrico Calore, Sebastiano F. Schifano, Raffaele Tripiccione |
146 | Accelerating Solid-Fluid Interaction using Lattice-Boltzmann and Immersed Boundary Coupled
Simulations on Heterogeneous Platforms
[abstract] Abstract: We propose a numerical approach based on the Lattice-Boltzmann (LBM) and Immersed Boundary (IB) methods to tackle the problem of the interaction of solids with an incompressible fluid flow. The proposed method uses a Cartesian uniform grid that incorporates both the fluid and the solid domain. This is a very optimum and novel method to solve this problem and is a growing research topic in Computational Fluid Dynamics. We explain in detail the parallelization of the whole method on both GPUs and an heterogeneous GPU-Multicore platform and describe different optimizations, focusing on memory management and CPU-GPU communication. Our performance evaluation consists of a series of numerical experiments that simulate situations of industrial and research interest. Based on these tests, we have shown that the baseline LBM implementation achieves satisfactory results on GPUs. Unfortunately, when coupling LBM and IB methods on GPUs, the overheads of IB degrade the overall performance. As an alternative we have explored an heterogeneous implementation that is able to hide such overheads and allows us to exploit both Multicore and GPU resources in a cooperative way. |
Pedro Valero-Lara, Alfredo Pinelli, Manuel Prieto-Matías |
110 | Spatio-temporal Sequential Pattern Mining for Tourism Sciences [abstract] Abstract: Flickr presents an abundance of geotagged photos for data mining. Particularly, we propose the concept of extracting spatio-temporal meta data from Flickr photos, combining a collection of such photos together results in a spatio-temporal entity movement trail, a \textit{trajectory} describing an individual's movements. Using these spatio-temporal Flickr photographer trajectories we aim to extract valuable tourist information about where people are going, what time they are going there, and where they are likely to go next. In order to achieve this goal we present our novel spatio-temporal trajectory RoI mining and SPM framework. It is different from previous work because it forms RoIs taking into consideration both space and time simultaneously, and thus we reason producing higher-quality RoIs and thus by extension higher-quality sequential patterns too. We test our framework's ability to uncover interesting patterns for the tourism sciences industry by performing experiments using a large dataset of Queensland photo taker movements for the year 2012. Experimental results validate the usefulness of our approach at finding new, information rich spatio-temporal tourist patterns from this dataset, especially in comparison the 2D approaches shown in the literature. |
Bermingham Luke, Ickjai Lee |