ICCS 2018 Posters (POSTER) Session 1
Time and Date: 12:05 - 12:35 on 11th June 2018, 11:45 - 12:15 on 12th June 2018
Room: Coffee Break Area 1
Chair: None
7 | Efficient Characterization of Hidden Processor Memory Hierarchies [abstract] Abstract: A processor's memory hierarchy has a major impact on the performance of running code. However, computing platforms, where the actual hardware characteristics are hidden from both the end user and the tools that mediate execution, such as a compiler, a JIT and a runtime system, are used more and more, for example, performing large scale computation in cloud and cluster. Even worse, in such environments, a single computation may use a collection of processors with dissimilar characteristics. Ignorance of the performance-critical parameters of the underlying system makes it difficult to improve performance by optimizing the code or adjusting runtime-system behaviors; it also makes application performance harder to understand.
To address this problem, we have developed a suite of portable tools that can efficiently derive many of the parameters of processor memory hierarchies, such as levels, effective capacity, and latency of caches and TLBs, in a matter of seconds. The tools use a series of carefully considered experiments to produce cache response curves and a set of automatic tools that analyze the response curves. The tools are inexpensive enough to be used in a variety of contexts that may include install time, compile time or runtime adaption, or performance understanding tools. |
Keith Cooper and Xiaoran Xu |
17 | Adaptive Time-Splitting Scheme for Nanoparticles Transport with Two-phase Flow in Heterogeneous Porous Media [abstract] Abstract: In this work, we introduce an efficient scheme using an adaptive time-splitting method to simulate the problem of nanoparticles transport with a two-phase flow in heterogeneous porous media. The pressure and saturation equations are coupled with the capillary pressure which is linearized in terms of saturation. An IMplicit Pressure Explicit Saturation-IMplicit Concentration (IMPES-IMC) scheme is used to solve the problem under consideration. The cell-centered finite difference (CCFD) method is used for the spatial discretization. The external time interval is divided into three levels, the first level is for the pressure, the second level of the saturation, while the third level is applied for the concentrations. This method can reduce the computational cost arisen from the implicit solution of the pressure equation and concentration equation as well as the rapid changes in saturation and concentration. The time step-sizes for saturation and concentration equations are adaptive under computing and satisfying the Courant-Friedrichs-Lewy (CFL<1) condition, iteratively. These results show the good performance of the scheme. Moreover, a numerical example of a highly heterogeneous porous medium is introduced and adaptive time step-sizes are shown in graphs for different cases, in addition to saturation, nanoparticles concentration contours. |
Mohamed El-Amin, Jisheng Kou and Shuyu Sun |
27 | Identifying Central Individuals in Organised Criminal Groups and Underground Marketplaces [abstract] Abstract: Traditional organised criminal groups are becoming more active in the cyber domain. They form online communities and use these as marketplaces for illegal materials, products and services, which drives the Crime as a Service business model. The challenge for law enforcement of investigating and disrupting the underground marketplaces is to know which individuals to focus effort on. Because taking down a few high impact individuals can have more effect on disrupting the criminal services provided. This paper present our study on social network centrality measures' performance for identifying important individuals in two networks. We focus our analysis on two distinctly different network structures: Enron and Nulled.IO. The first resembles an organised criminal group, while the latter is a more loosely structured hacker forum. Our result show that centrality measures favour individuals with more communication rather than individuals usually considered more important: organised crime leaders and cyber criminals who sell illegal materials, products and services. |
Jan William Johnsen and Katrin Franke |
28 | Guiding the optimization of parallel codes on multicores using an analytical cache model [abstract] Abstract: Optimizers have to choose among different versions of a code and
among different values for the optimization parameters. These decisions are
not easy to take, as it is very difficult to predict their impact on performance
given the complexity of current systems. A key factor in the performance of
computing systems, particularly hard to predict and can be largely influenced
by the optimization decisions, is the behavior of the memory hierarchy. This
is even more true in modern multicore processors, where the interactions of
multiple threads in the shared caches further complicate the estimation of the
performance of their memory subsystem. Thus it is difficult to take decisions
at compile-time about which optimization choice is the most cache-friendly.
This paper presents an analytical model able to evaluate the cache performance of the whole cache hierarchy for parallel applications taking as input
their source code and the cache configuration. The model provides its predictions in less than one second, which makes it very suitable to drive compiler
decisions. While the model does not tackle some advanced hardware features,
it can help optimizers to take reasonably good decisions in a very short time.
This is supported by an evaluation based on two modern architectures and
four different case studies, in which the model predictions differ on average
just 5.05% from the results of a detailed hardware simulator and correctly
guide different optimization decisions.
|
Diego Andrade, Basilio B. Fraguela and Ramón Doallo |
36 | Topological street-network characterization through feature-vector and cluster analysis [abstract] Abstract: Complex networks provide a means to describe cities through their street mesh, expressing characteristics that refer to the structure and organization of an urban zone. Although other studies have used complex networks to model street meshes, we observed a lack of methods to characterize the relationship between cities by using their topological features. Accordingly, this paper aims to describe interactions between cities by using vectors of topological features extracted from their street meshes represented as complex networks. The methodology of this study is based on the use of digital maps. Over the computational representation of such maps, we extract global complex-network features that embody the characteristics of the cities. These vectors allow for the use of multidimensional projection and clustering techniques, enabling a similarity-based comparison of the street meshes. We experiment with 645 cities from the Brazilian state of Sao Paulo. Our results show how the joint of global features describes urban indicators that are deep-rooted in the network's topology and how they reveal characteristics and similarities among sets of cities that are separated from each other. |
Gabriel Spadon, Gabriel Gimenes and Jose Fernando Rodrigues Jr. |
46 | LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition [abstract] Abstract: This paper proposes a method of scoring sequences generated by recurrent neural network (RNN)
for automatic Tanka composition.
Our method gives sequences a score based on topic assignments provided by latent Dirichlet allocation (LDA).
When many word tokens in a sequence are assigned to the same topic, we give the sequence a high score.
While a scoring of sequences can also be achieved by using RNN output probabilities,
the sequences having large probabilities are likely to share much the same subsequences
and thus are doomed to be deprived of diversity.
The experimental results, where we scored Japanese Tanka poems generated by RNN, show
that the top-ranked sequences selected by our method were likely to contain a wider variety of subsequences
than those selected by RNN output probabilities. |
Tomonari Masada and Atsuhiro Takasu |
52 | Computing Simulation of Interactions between α+β Protein and Janus Nanoparticle [abstract] Abstract: Janus nanoparticles have surfaces with two or more distinct physical properties, allowing different types of chemical properties to occur on the same particle and thus making possible many unique applications. It is necessary to investigate the interaction between proteins and Janus nanoparticles (NPs), which are two typical building blocks for making bio-nano-objects. Here we computed the phase diagrams for an α+β protein(GB1) and Janus NP using coarse-grained model and molecular dynamics simulations, and studied how the secondary structures of proteins, the binding interface and kinetics are affected by the nearby NP. Two phases were identified for the system. In the folded phase, the formation of β-sheets are always enhanced by the presence of NPs, while the formation of α-helices are not sensitive to NPs. The underlying mechanism of the phenomenon was attributed to the geometry and flexibility of the β-sheets. The knowledge gained in this study is useful for understanding the interactions between proteins and Janus NP which may facilitate designing new bio-nanomaterials or devices. |
Xinlu Guo, Xiaofeng Zhao, Shuguang Fang, Yunqiang Bian and Wenbin Kang |
58 | A modified bandwidth reduction heuristic based on the WBRA and George-Liu algorithm [abstract] Abstract: This paper presents a modified heuristic based on the Wonder Bandwidth Reduction Algorithm with starting vertex given by the George-Liu algorithm. The results are obtained on a dataset of instances taken from the SuiteSparse matrix collection when solving linear systems using the zero-fill incomplete Cholesky-preconditioned conjugate gradient method. The numerical results show that the improved vertex labeling heuristic compares very favorably in terms of efficiency and performance with the well-known GPS algorithm for bandwidth and profile reductions. |
Sanderson L. Gonzaga de Oliveira, Guilherme O. Chagas, Diogo T. Robaina, Diego N. Brandão and Mauricio Kischinhevsky |
63 | Fast PIES In Solving 2D Boundary Value Problems Described By Laplace’s Equation [abstract] Abstract: In process of solving large-scale boundary value problems (BVPs) we want to obtain high accuracy of the results simultaneously with high speed of the solution and low occupancy of computers memory (RAM). It is very difficult to satisfy all requirements. Numerical solution of BVPs requires computations on large matrices. Only application of appropriate models and algorithms gives accurate results. In our previous papers we applied parametric integral equations system (PIES) in modelling and solving BVPs to satisfy first requirement - the results were obtained with very high accuracy. This paper presents novel approach to accelerate PIES to fulfil other requirements. For this purpose we include fast multipole method (FMM) into conventional PIES, therefore we obtain fast PIES. |
Andrzej Kużelewski, Eugeniusz Zieniuk and Marta Kapturczak |
72 | Improving Large-scale Fingerprint-based Queries in Distributed Memory-Computing Infrastructure [abstract] Abstract: Fingerprints are usually used in sketching mechanism, which maps elements into concise and representative synopsis using small space. Large-scale fingerprint-based queries can be used as an important tool in applications of big data analytics, such as set membership queries, rank-based queries and correlationship queries etc. In this paper, we propose an efficient distributed memory-computing framework to improve the performance of large-scale fingerprint-based queries. The framework maintains fingerprints structure at local sites distributively to improve scalability of applications for large-scale dataset processing. At early stage of the queries, we first transform the fingerprints sketch into space constrained global rank-based sketch at query site via collecting minimal information from local sites. The time-consuming operations, such as local fingerprints construction and searching, are pushed down into local sites. The proposed framework can construct large-scale and scalable fingerprints efficiently and freely, meanwhile it can also supervise continuous queries by utilizing the global sketch, and run an appropriate number of jobs over distributed computing environments. We implement our approach in
Spark, and evaluate its performance over real-world datasets. When compared with native SparkSQL, our approach can outperforms the native routines on query response time by 4 orders of magnitude. |
Shupeng Wang, Guangjun Wu, Binbin Li, Ge Fu and Chao Li |
73 | A Truth Discovery Algorithm with Multi-Source Sparse Data [abstract] Abstract: The problem to find out the truth from inconsistent information is defined as Truth Discovery. The essence of truth discovery is to estimate source quality. Therefore the measuring mechanism of data source will immensely affect the result and process of truth discovery. However the state-of-the-art algorithms don’t consider how source quality is affected when null is provided by source. We propose to use the Silent Rate, True Rate and False Rate to measure source quality in this paper. In addition, we utilize Probability Graphical Model to model truth and source quality which is measured through null and real data. Our model makes full use of all claims and null to improve the accuracy of truth discovery. Compared with prevalent approaches, the effectiveness of our approach is verified on three real datasets and the recall has improved significantly. |
Jiyuan Zhang, Shupeng Wang and Guangjun Wu |
77 | Blackboard Meets Dijkstra for Resource Allocation Optimization [abstract] Abstract: This paper presents the integration of Dijkstra's algorithm into a Blackboard framework to optimize the selection of web resources from service providers. In addition, methods are presented how dynamic changes during the workflow execution can be handled; specifically, how changes of the service parameters have effects on the system.
The architectural framework of the implementation of the proposed Blackboard approach and its components in a real life scenario is laid out.
For justification of approach, and to show practical feasibility, a sample implementation is presented. |
Christian Vorhemus and Erich Schikuta |
79 | Augmented Self-paced Learning with Generative Adversarial Networks [abstract] Abstract: Learning with very limited training data is a challenging but typical scenario in machine learning applications. In order to achieve a robust learning model, on one hand, the instructive labeled instances should be fully leveraged; on the other hand, extra data source need to be further explored. This paper aims to develop an effective learning framework for robust modeling, by naturally combining two promising advanced techniques, i.e. generative adversarial networks and self-paced learning. To be specific, we present a novel augmented self-paced learning with generative adversarial networks (ASPL-GANs), which consists of three component modules, i.e. a generator G, a discriminator D, and a self-paced learner S. Via competition between G and D, realistic synthetic instances with specific class labels are generated. Receiving both real and synthetic instances as training data, classifier S simulates the learning process of humans in a self-paced fashion and gradually proceeds from easy to complex instances in training. The three components are maintained in a unified framework and optimized jointly via alternating iteration. Experimental results validate the effectiveness of the proposed algorithm in classification tasks. |
Xiao-Yu Zhang, Shupeng Wang, Yanfei Lv, Peng Li and Haiping Wang |
90 | Benchmarking Parallel Chess Search in Stockfish on Intel Xeon and Intel Xeon Phi Processors [abstract] Abstract: The paper presents results from benchmarking the parallel
multithreaded Stockfish chess engine on selected multi- and many-core
processors. It is shown how the strength of play for an n-thread
version compares to 1-thread version on both Intel Xeon and latest
Intel Xeon Phi x200 processors. Results such as the number
of wins, losses and draws are presented and how these change for growing numbers of
threads. Impact of using particular cores on Intel Xeon Phi is
shown. Finally, strengths of play for the tested computing
devices are compared. |
Pawel Czarnul |
93 | Leveraging Uncertainty Analysis of Data to Evaluate User Influence Algorithms of Social Networks [abstract] Abstract: Identifying of highly influential users in social networks is critical in various practices, such as advertisement, information recommendation, and surveillance of public opinion. According to recent studies, different existing user influence algorithms generally produce different results. There are no effective metrics to evaluate the representation abilities and the performance of these algorithms for the same dataset. Therefore, the results of these algorithms cannot be accurately evaluated and their limits cannot be effectively observered. In this paper, we propose an uncertainty-based Kalman filter method for predicting user influence optimal results. Simultaneously, we develop a novel evaluation metric for improving maximum correntropy and normalized discounted cumulative gain (NDCG) criterion to measure the effectiveness of user influence and the level of uncertainty fluctuation intervals of these algorithms. Experimental results validate the effectiveness of the proposed algorithm and evaluation metrics for different datasets. |
Jianjun Wu, Ying Sha, Rui Li, Jianlong Tan and Bin Wang |
102 | E-Zone: A faster Neighbor Point Query Algorithm For Matching Spacial Objects [abstract] Abstract: Latest astronomy projects observe the spacial objects with astronomical cameras generating images continuously. To identify tran- sient objects, the position of these objects on the images need to be com- pared against a reference table on the same portion of the sky, which is a complex search task called cross match. We designed Euclidean-Zone(E- Zone), a method for faster neighbor point queries which allows efficient cross match between spatial catalogs. The restriction of prevalent meth- ods is the equatorial coordinate system the method based on, which take angles as distance unit and could not avoid quantities of mathemati- cal functions consuming lots of computational resources. Consequently, in this paper, we implemented E-Zone algorithm based on traditional Zone algorithm, which utilize euclidean distance between celestial ob- jects with pixel coordinates to avoid these complex mathematical func- tions. Meanwhile, we did a survey on the parameters of our model and other system factors to find optimal configures of this algorithm. In ad- dition to sequential algorithm, we also modified the serial program and implemented OpenMP paralleled version. For serial version, the results of our algorithm achieved a speedup of 2.07 times over using equato- rial coordinate system. And we achieved 19ms for serial queries and 5ms for parallel queries for 200,000 objects on single CPU processor over a 230,520 synthetic reference template database.
|
Xiaobin Ma, Zhihui Du, Yankui Sun, Yuan Bai, Suping Wu, Yang Xu, Chao Wu and Jianyan Wei |
117 | Application of Algorithmic Differentiation for Exact Jacobians to the Universal Laminar Flame Solver [abstract] Abstract: We introduce algorithmic differentiation (AD) to the C++ Universal Laminar Flame (ULF) solver code.
ULF is used for solving generic laminar flame configurations in the field of combustion engineering.
We describe in detail the required code changes based on the operator overloading-based AD tool CoDiPack.
In particular, we introduce a global alias for the scalar type in ULF and generic data structure using templates.
To interface with external solvers, template-based functions which handle data conversion and type casts through specialization for the AD type are introduced.
The differentiated ULF code is numerically verified and performance is measured by solving two canonical models in the field of chemically reacting flows, a homogeneous reactor and a freely propagating flame.
The models stiff set of equations is solved with Newtons method.
The required Jacobians, calculated with AD, are compared with the existing finite differences (FD) implementation.
We observe improvements of AD over FD.
The resulting code is more modular, can easily be adapted to new chemistry and transport models, and enables future sensitivity studies for arbitrary model parameters. |
Alexander Hück, Sebastian Kreutzer, Danny Messig, Arne Scholtissek, Christian Bischof and Christian Hasse |
124 | Morph Resolution Based on Autoencoders Combined with Effective Context Information [abstract] Abstract: In social networks, people often create morphs, a special type of fake alternative names for avoiding internet censorship or some other purposes. How to resolve these morphs to the entities that they really refer to is very important for natural language processing tasks. Although some methods have been proposed, they do not use the context information of morphs or target entities effectively; only use the information of neighbor words of morphs or target entities. In this paper, we proposed a new approach to resolving morphs based on autoencoders combined with effective context information. First, in order to represent the semantic meanings of morphs or target candidates more precisely, we proposed a method to extract effective context information. Next, by integrating morphs or target candidates and their effective context information into autoencoders, we get the embedding representation of morphs and target candidates. Finally, we ranked target candidates based on similarity measurement of semantic meanings of morphs and target candidates. Thus, our method needs little annotated data, and experimental results demonstrated that our approach can significantly outperform state-of-the-art methods. |
Jirong You, Ying Sha, Qi Liang and Bin Wang |
128 | Old Habits Die Hard: Fingerprinting Websites On The Cloud [abstract] Abstract: To detect malicious websites on the cloud where a variety of network traffic mixed together, precise detection method is needed. Such method ought to classify websites over composite network traffic and fit to the practical problems like unidirectional flows in ISP gateways. In this work, we investigate the website fingerprinting methods and propose a novel model to classify websites on the cloud. The proposed model can recognize websites from traffic collected with multi-tab setting and performs better than the state of the art method. Furthermore, the method keeps excellent performances with unidirectional flows and real world traffic by utilizing features only extracted from the request side. |
Xudong Zeng, Cuicui Kang, Junzheng Shi, Zhen Li and Gang Xiong |
132 | Deep Streaming Graph Representations [abstract] Abstract: Learning graph representations generally indicate mapping the vertices of a graph into a low-dimension space, in which the proximity of the original data can be preserved in the latent space. However, traditional methods that based on adjacent matrix suffered from high computational cost when encountering large graphs. In this paper, we propose a deep autoencoder driven streaming methods to learn low-dimensional representations for graphs. The proposed method process the graph as a data stream fulfilled by sampling strategy to avoid straight computation over the large adjacent matrix. Moreover, a graph regularized deep autoencoder is employed in the model to keep different aspects of proximity information. The regularized framework is able to improve the representation power of learned features during the learning process. We evaluate our method in clustering task by the features learned from our model. Experiments show that the proposed method achieves competitive results comparing with methods that directly apply deep models over the complete graphs. |
Minglong Lei, Yong Shi, Peijia Li and Lingfeng Niu |
141 | Adversarial Reinforcement Learning for Chinese Text Summarization [abstract] Abstract: This paper proposes a novel Adversarial Reinforcement Learning architecture for Chinese text summarization. Previous abstractive methods commonly use Maxi-mum Likelihood Estimation (MLE) to optimize the generative models, which makes auto-generated summary less incoherent and inaccuracy. To address this problem, we innovatively apply the Adversarial Reinforcement Learning strategy to narrow the gap between the generated summary and the human summary. In our model, we use a generator to generate summaries, a discriminator to distinguish between generated summaries and real ones, and reinforcement learning (RL) strategy to iteratively evolve the generator. Besides, in order to better tackle Chinese text summarization, we use a character-level model rather than a word-level one, and append Text-Attention in the generator. Experiments were run on two Chinese corpora, respectively consisting of long documents and short texts. Experimental Results showed that our model significantly outperforms previous deep learning models on rouge score. |
Hao Xu, Yanan Cao, Yanmin Shang, Yanbing Liu, Jianlong Tan and Li Guo |
143 | Column Concept Determination for Chinese Web Tables via Convolutional Neural Network [abstract] Abstract: Hundreds of millions of tables on the Internet contain a considerable wealth of high-quality relational data. However, the web tables tend to lack explicit key semantic information. Therefore, information extraction in tables is usually supplemented by recovering the semantics of tables, where column concept determination is an important issue. In this paper, we focus on column concept determination in Chinese web tables. Different from previous research works, convolutional neural network (CNN) was applied in this task. The main contributions of our work lie in three aspects: firstly, datasets were constructed automatically based on the infoboxes in Baidu Encyclopedia; secondly, to determine the column concepts, a CNN classifier was trained to annotate cells in tables and the majority vote method was used on the columns to exclude incorrect annotations; thirdly, to verify the effectiveness, we performed the method on the real tabular dataset. Experimental results show that the proposed method outperforms the baseline methods and achieves an average accuracy of 97% for column concept determination. |
Jie Xie, Cong Cao, Yanbing Liu, Yanan Cao, Baoke Li and Jianlong Tan |
154 | Service-oriented approach for Internet of Things [abstract] Abstract: . The new era of industrial automation has been developed and implemented quickly and it is impacting different areas on society. Especially in recent years, much progress has been made in this area, leading to some people talking about the fourth industrial revolution. Every day factories are more connected and able to communicate and interact in real time between industrial systems. There is a need to flexibilization on the shop floor to promote a higher customization of products in a short life cycle and service-oriented architecture is a good option to materialize this. This chapter discusses challenges of this new revolution, also known as Industry 4.0, addressing the introduction of modern communication and computing technologies to maximize interoperability across all the different existing systems. Moreover, it will cover technologies that support this new industrial revolution and discuss impacts, possibilities, needs and adaptation. |
Eduardo Moraes |
160 | Adversarial Framework for General Image Inpainting [abstract] Abstract: We present a novel adversarial framework to solve the arbitrarily sized image random inpainting problem, where a pair of convolution generator and discriminator is trained jointly to fill the relatively large but random “holes”. The generator is a symmetric encoder- decoder just like an hourglass but with added skip connections. The skip connections act like information shortcut to transfer some necessary details that discarded by the “bottleneck” layer. Our discriminator is trained to distinguish whether an image is natural or not and find out the hidden holes from a reconstructed image. A combination of a standard pixel-wise L2 loss and an adversarial loss is used to guided the generator to preserve the known part of the origin image and fills the missing part with plausible result. Our experiment is conducted on over 1.24M images with uniformly random 25% missing part. We found the generator is good at capturing structure context and performs well in arbitrary size images without complex texture. |
Wei Huang and Hongliang Yu |
169 | A Stochastic Model to Simulate the Spread of Leprosy in Juiz de Fora [abstract] Abstract: The Leprosy, also known as Hansen's disease, is an infectious disease in which the main etiological agent is the Mycobacterium leprae. The disease mainly affects the skin and peripheral nerves and can cause physical disabilities. For this reason, represents a global public health concern, especially in Brazil, where more than twenty-five thousand of new cases were reported in 2016. This work aims to simulate the spread of Leprosy in a Brazilian city, Juiz de Fora, using the SIR model and considering some of its pathological aspects. SIR models divide the studied population into compartments in relation to the disease, in which S, I and R compartments refer to the groups of susceptible, infected and recovered individuals, respectively. The model was solved computationally by a stochastic approach using the Gillespie algorithm. Then, the results obtained by the model were validated using the public health records database of Juiz de Fora. |
Vinícius Clemente Varella, Aline Mota Freitas Matos, Henrique Couto Teixeira, Angélica Da Conceição Oliveira Coelho, Rodrigo Santos and Marcelo Lobosco |
173 | Data Fault Identification and Repair Method of Traffic Detector [abstract] Abstract: In the analysis of traffic big data, data quality control and judgment are the prerequisites for subsequent application. In order to improve the resolution of traffic detector fault data, this paper combines the wavelet packet energy analysis and PCA (principal component analysis) to achieve traffic detector data fault identification. On the basis of traditional multi-scale principal component analysis, wavelet packet multi-scale decomposition is used to get detailed information, and principal component analysis models are established on different scale matrices, respectively, and fault data are separated by wavelet packet energy difference. The correlation coefficient is calculated according to the time characteristics and spatial correlation of the detector data, and the real value of the fault data is estimated. Through case analysis, the feasibility verification of traffic flow data identification and correction method is carried out respectively. The results show that the method proposed in this paper is effective to identify and repair traffic fault data. |
Xiaolu Li, Jiaxu Chen, Xinming Yu, Xi Zhang, Fangshu Lei, Peng Zhang and Guangyu Zhu |
176 | The valuation of CCIRS with a new design [abstract] Abstract: This work presents a study of pricing a credit derivatives -- credit contingent interest rate swap (CCIRS) with a new design, which allows some premium to be paid later when default event doesn't happen. This item makes the contract more flexible and supplies cash liquidity to the buyer, so that the contract is more attractive. Under the framework of reduced form model, we provide the pricing model with the default intensity relevant to the interest rate, which follows Cox-Ingersoll-Ross (CIR) process. With a semi-closed form solution, numerical results and parameters
analysis have been carried on. Especially, it is discussed that a trigger point for the proportion of the later possible payment which causes the zero initial premium.
|
Huaying Guo and Jin Liang |
187 | Method of Node Importance Measurement in Urban Road Network [abstract] Abstract: The node importance measurement plays an important role in analyzing the reliability of the urban road network. In this thesis, the topological structure, geographic information and traffic flow characteristics of urban road network are all considered, and methods of node importance measurement of urban road network are proposed based on a spatially weighted degree model and h-index from different perspectives. Experiments are given to show the efficiency and practicability of the proposed methods. |
Danqi Liu, Jialin Wang, Xiaolu Li, Xinming Yu, Kang Song, Xi Zhang, Fangshu Lei, Peng Zhang and Guangyu Zhu |
190 | AdaBoost-LSTM Ensemble Learning for Financial Time Series Forecasting [abstract] Abstract: A hybrid ensemble learning approach is proposed to forecast financial time series combining AdaBoost algorithm and Long Short-Term Memory (LSTM) net-work. Firstly, by using AdaBoost algorithm the database is trained to get the training samples. Secondly, the LSTM is utilized to forecast each training sample separately. Thirdly, AdaBoost algorithm is used to integrate the forecasting re-sults of all the LSTM predictors to generate the ensemble results. Two major dai-ly exchange rate datasets and two stock market index datasets are selected for model evaluation and model comparison. The empirical results demonstrate that the proposed AdaBoost-LSTM ensemble learning approach outperforms some other single forecasting models and ensemble learning approaches. This suggests that the AdaBoost-LSTM ensemble learning approach is a highly promising ap-proach for financial time series data forecasting, especially for the time series data with nonlinearity and irregularity, such as exchange rates and stock indexes. |
Shaolong Sun, Yunjie Wei and Shouyang Wang |
199 | Study on an N-parallel FENE-P constitutive model based on multiple relaxation times for viscoelastic fluid [abstract] Abstract: An N-parallel FENE-P constitutive model based on multiple relaxation times is put forward in this paper, which aims at accurately describing the apparent viscosity of viscoelastic fluids. The establishment of N-parallel FENE-P constitutive model and the numerical approach to calculate apparent viscosity by solving the proposed model are respectively presented in detail. To validate the performance of the proposed model, it is compared with the conventional FENE-P constitutive model (It only has single relaxation time) in estimating the apparent viscosity of two common viscoelastic fluids: polymer and surfactant solutions. The comparative results indicate the N-parallel FENE-P constitutive model can represent the apparent viscosity of polymer solutions more accurate than the traditional model in the whole range of shear rate (0.1s-1~1000s-1), and the advantage is more noteworthy especially when the shear rate is higher (10s-1~1000s-1). Despite both the proposed model and the traditional model can’t de-scribe the interesting shear thickening behavior of surfactant solutions, the pro-posed constitutive model still possesses advantage over the traditional one in depicting the apparent viscosity and first normal stress difference. In addition, the N-parallel FENE-P constitutive model demonstrates a better applicability as well as favorable adjustability of the model parameters |
Jingfa Li, Bo Yu, Shuyu Sun and Dongliang Sun |
206 | RADIC based Fault Tolerance System with Dynamic Resource Controller [abstract] Abstract: The continuously growing High-Performance Computing requirements increments the number of components and at the same time failure probabilities. Long-running parallel applications are directly affected by this phenomena, disrupting its executions on failure occurrences. MPI, a well-known standard for parallel applications follows a fail-stop semantic, requiring the application owners restart the whole execution when hard failures appear losing time and computation data.
Fault Tolerance (FT) techniques approach this issue by providing high availability to the users' applications execution, though adding significant resource and time costs.
In this paper, we present a Fault Tolerance Manager (FTM) framework based on RADIC architecture, which provides FT protection to parallel applications implemented with MPI, in order to successfully complete executions despite failures.
The solution is implemented in the application-layer following uncoordinated and semi-coordinated rollback recovery protocols. It uses a sender-based message logger to store exchanged messages between the application processes; and checkpoints only the processes data required to restart them in case of failures. The solution uses the concepts of ULFM for failure detection and recovery. Furthermore, a dynamic resource controller is added to the proposal, which monitors the message logger buffers and performs actions to maintain an acceptable level of protection.
Experimental validation verifies the FTM functionality using two private clusters infrastructures. |
Jorge Villamayor, Dolores Rexachs and Emilo Luque |
209 | Effective Learning with Joint Discriminative and Representative Feature Selection [abstract] Abstract: Feature selection plays an important role in various machine learning tasks such as classification. In this paper, we focus on both discriminative and representative abilities of the features, and propose a novel feature selection method with joint exploration on both labeled and unlabeled data. In particular, we implement discriminative feature selection to extract the features that can best reveal the underlying classification labels, and develop representative feature selection to obtain the features with optimal self-expressive performance. Both methods are formulated as joint l_2,1-norm minimization problems. An effective alternate minimization algorithm is also introduced with analytic solutions in a column-by-column manner. Extensive experiments on various classification tasks demonstrate the advantage of the proposed method over several state-of-the-art methods. |
Shupeng Wang, Xiao-Yu Zhang, Xianglei Dang, Binbin Li and Haiping Wang |
234 | Agile tuning method in successive steps for a river flow simulator [abstract] Abstract: Scientists and engineers continuously build models to interpret axiomatic theories or explain the reality of the universe of interest to reduce the gap between formal theory and observation in practice. We focus our work on dealing with the uncer-tainty of the input data of the model to improve the quality of the simulation. To reduce this error, scientist and engineering implement techniques for model tun-ing and they look for ways to reduce their high computational cost. This article proposes a methodology for adjusting a simulator of a complex dynamic system that models the wave translation along rivers channels, with emphasis on the re-duction of computation resources. We propose a simulator calibration by using a methodology based on successive adjustment steps of the model. We based our process in a parametric simulation. The input scenarios used to run the simulator at every step were obtained in an agile way, achieving a model improvement up to 50% in the reduction of the simulated data error. These results encouraged us to extend the adjustment process over a larger domain region. |
Mariano Trigila, Adriana Gaudiani and Emilio Luque |
235 | A Parallel Quicksort Algorithm on Manycore Processors in Sunway TaihuLight [abstract] Abstract: In this paper we present a highly efficient parallel quicksort algorithm on SW26010, a heterogeneous manycore processor that makes Sunway TaihuLight the Top-One supercomputer in the world. Motivated by the software-cache and on-chip communication design of SW26010, we propose a two-phase quicksort algorithm, with the first counting elements and the second moving elements. To make the best of such manycore architecture, we design a decentralized workflow, further optimize the memory access and balance the workload. Experiments show that our algorithm scales efficiently to 64 cores of SW26010, achieving more than 32X speedup for int32 elements on all kinds of data distributions. The result outperforms the strong scaling one of Intel TBB (Threading Building Blocks) version of quicksort on x86-64 architecture. |
Siyuan Ren, Shizhen Xu and Guangwen Yang |
239 | How is the Forged Certificates in the Wild: Practice on Large-scale SSL Usage Measurement and Analysis [abstract] Abstract: Forged certificate is a prominent issue in the real world deployment of SSL/TLS - the most widely used encryption protocols for Internet security, which is typically used in man-in-the-middle (MITM) attacks, proxies, anonymous or malicious services, personal or temporary services, etc. It wrecks the SSL encryption, leading to privacy leakage and severe security risks. In this paper, we study forged certificates in the wild based on a long term large scale passive measurement. With the combination of certificate transparency (CT) logs and our measurement results, nearly 3 million forged certificates against the Alexa Top 10K sites are identified and studied. Our analysis reveals the causes and preference of forged certificates, as well as several significant differences from the benign ones. Finally, we discover several IP addresses used for MITM attacks by forged certificate tracing and deep behavior analysis. We believe our study can definitely contribute to research on SSL/TLS security as well as real world protocol usage. |
Mingxin Cui, Zigang Cao and Gang Xiong |
253 | Managing Cloud Data Centers with Three-state Server Model under Job Abandonment Phenomenon [abstract] Abstract: In fact, to improve the quality of system service, for user job requests, which already have waited for a long time in queues of a busy cluster, cloud vendors often migrate the jobs to other available clusters. This strategy brings about the occurrence of job abandonment phenomenon in data centers, which disturbs the server management mechanisms in the manner of decreasing effectiveness control, increasing energy consumptions, and so on. In this paper, based on the three-state model proposed in previous works, we develop a novel model and its management strategy for cloud data centers using a finite queue. Our proposed model is tested in a simulated cloud environment using CloudSim. The achieved outcomes show that our three-state server model for data centers operates well under the job abandonment phenomenon. |
Binh Minh Nguyen, Bao Hoang, Huy Tran and Viet Tran |
264 | The Analysis of the Effectiveness of the Perspective-based Observational Tunnels Method by the Example of the Evaluation of Possibilities to Divide the Multidimensional Space of Coal Samples [abstract] Abstract: Methods of qualitative analysis of multidimensional data using visualization of this data consist in using the transformation of a multidimensional space into a two-dimensional one. In this way, multidimensional complicated data can be presented on a two-dimensional computer screen. This allows to conduct the qualitative analysis of this data in a way which is the most natural for people, through the sense of sight. The application of complex algorithms targeted to search for multidimensional data of specific properties can be replaced with such a qualitative analysis. Some qualitative characteristics are simply visible in the two-dimensional image representing this data. The new perspective-based observational tunnels method is an example of the multidimensional data visualization method. This method was used in this paper to present and analyze the real set of seven-dimensional data describing coal samples obtained from two hard coal mines. This paper presents for the first time the application of perspective-based observational tunnels method for the evaluation of possibilities to divide the multidimensional space of coal samples by their susceptibility to fluidal gasification. This was performed in order to verify whether it will be possible to indicate the possibility of such a division by applying this method. Views presenting the analyzed data, enabling to indicate the possibility to separate areas of the multidimensional space occupied by samples with different applicability for the gasification process, were obtained as a result. |
Dariusz Jamroz |
275 | Urban data and class theory of place: analysis of food services in St. Petersburg historical center [abstract] Abstract: This paper presents an approach to study segregation of urban public services, in particular, food services, in the city space based on analysis of urban data from different open sources. We depict how ‘class theory of place’ can benefit from computation of digital urban data representing several aspects of class of venues - economic, social and symbolic, we also consider ‘class’ through a combination of objective parameters of services segregation such as average check and of subjective parameters such as user-generated rating of service venues. We interpret results received on classes of food services located in the area of our study - St.Petersburg historical centre - in accordance to environmental parameters of the latter, such as mobility, accessibility, and attractors of the territory, which we examine as geographic predictors for spatial segregation of venues. We observe that all classes of food services in the area are clustered along the main mobility routes, while attractors such as public spaces do not draw food venues, the fact we interpret as a contextual feature of St.Petersburg having poor culture of public spaces in general. |
Aleksandra Nenko, Artem Konyukhov and Sergey Mityagin |
278 | Control driven lighting design for large-scale installations [abstract] Abstract: Large-scale photometric computations carried out in the course of lighting design preparation were already subject of numerous works.
They focused either on improving the quality of design, for example related to energy-efficiency, or dealt with issues concerning the computation complexity and computations as such.
However, mutual influence of the design process and dynamic dimming of luminaires has not yet been addressed.
If road segments are considered separately, suboptimal results can occur in places such as junctions. Considering the entire road network at once complicates the computation procedures and requires additional processing time.
This paper focuses on a method to make this more efficient approach viable by applying reversed scheme of design and control. The crucial component of both design and control modules is data inventory which role is also discussed in the paper. |
Adam Sędziwy, Leszek Kotulski, Sebastian Ernst and Igor Wojnicki |
281 | An OpenMP implementation of the TVD-Hopmoc method based on a synchronization mechanism using locks between adjacent threads on Xeon Phi accelerators [abstract] Abstract: This work focuses on the study of the 1-D TVD-Hopmoc method executed in shared memory manycore environments. In particular, this paper studies barrier costs on Intel(R) Xeon Phi(TM) (KNC and KNL) accelerators when using the OpenMP standard. This paper employs an explicit synchronization mechanism to reduce spin and thread scheduling times in an OpenMP implementation of the 1-D TVD-Hopmoc method. Basically, we define an array that represents threads and the new scheme consists of synchronizing only adjacent threads. Moreover, the new approach reduces the OpenMP scheduling time by employing an explicit work-sharing strategy. In the beginning of the process, the array that represents the computational mesh of the numerical method is partitioned among threads, instead of permitting the OpenMP API to perform this task. Thereby, the new scheme diminishes the OpenMP spin time by avoiding OpenMP barriers using an explicit synchronization mechanism where a thread only waits for its two adjacent threads. The results of the new approach is compared with a basic parallel implementation of the 1-D TVD-Hopmoc method. Specifically, numerical simulations shows that the new approach achieves promising performance gains in shared memory manycore environments for an OpenMP implementation of the 1-D TVD-Hopmoc method. |
Frederico Cabral, Carla Osthoff Barros, Gabriel Costa, Sanderson L. Gonzaga de Oliveira, Diego N. Brandão and Mauricio Kischinhevsky |
291 | Data-Aware Scheduling of Scientific Workflows in Hybrid Clouds [abstract] Abstract: In this paper, we address the scheduling of scientific workflows in hybrid clouds considering data placement and present the Hybrid Scheduling for Hybrid Clouds (HSHC) algorithm. HSHC is a two-phase scheduling algorithm with a genetic algorithm based static phase and dynamic programming based dynamic phase. We evaluate HSHC with both a real-world scientific workflow application and random workflows in terms of makespan and costs. |
Amirmohammad Pasdar, Khaled Almi'Ani and Young Choon Lee |
292 | Large margin proximal non-parallel support vector classifiers [abstract] Abstract: In this paper, we propose a novel large margin proximal non-parallel twin support vector machine for binary classification. The significant advantages over twin support vector machine are that the structural risk minimization principle is implemented and by adopting uncommon constraint formulation for the primal problem, the proposed method avoids the computation of the large inverse matrices before training which is inevitable in the formulation of twin support vector machine. In addition, the dual coordinate descend algorithm is used to solve the optimization problems to accelerate the training efficiency. Experimental results exhibit the effectiveness and the classification accuracy of the proposed method. |
Ming-Zeng Liu and Yuan-Hai Shao |
298 | The multi-core optimization of the unbalanced calculation in the clean numerical simulation of Rayleigh-Benard turbulence [abstract] Abstract: The so-called clean numerical simulation (CNS) is used to simulate the Rayleigh-B\'{e}nard (RB) convection system. Compared with direct numerical simulation (DNS), the accuracy and reliability of investigating turbulent flows improve largely. Although CNS can well control the numerical noises, the cost of calculation is more expensive. In order to simulate the system in a reasonable period, the calculation schemes of CNS require redesign. In this paper, aiming at the CNS of the two-dimension RB system, we first propose the notions of equal difference matrix and balance point set which are crucial to model the unbalanced calculation of the system under multi-core platform. Then, according to the notions, we present algorithms to optimize the unbalanced calculation. We prove our algorithm is optimal when the core number is the power of $2$ and our algorithm approaches the optimal when the core number is not the power of $2$. Finally, we compare the results of our optimized algorithms with others to demonstrate the effectiveness of our optimization. |
Lu Li, Zhiliang Lin and Yan Hao |
301 | ES-GP: An Effective Evolutionary Regression Framework with Gaussian Process and Adaptive Segmentation Strategy [abstract] Abstract: This paper proposes a novel evolutionary regression framework with Gaussian process (GP) and adaptive segmentation strategy (named ES-GP) for regression problems. The proposed framework consists of two components, namely, the outer DE and the inner DE. The outer DE focuses on finding the best segmentation scheme, while the inner DE focuses on optimizing the hyper-parameters of GP model associated to each segment. These two components work cooperatively to find a piecewise gaussian process solution which is flexible and effective for complicated regression problems. The proposed ES-GP has been tested on four artificial regression problems and two real-world time series regression problems. The experimental results have demonstrated that the ES-GP is capable of providing very promising performance in terms of prediction accuracy. |
Shijia Huang and Jinghui Zhong |
303 | Evaluating Dynamic Scheduling of Tasks in Mobile Architectures using ParallelME Framework [abstract] Abstract: Recently we observe that mobile phones stopped being just devices for basic communication to become providers of many applications that require increasing performance for good user experience. Inside today's mobile phones we find different processing units (PU) with high computational capacity, as multicore architectures and co-processors like GPUs. Libraries and run-time environments have been proposed to provide a set of ways to improve applications' performance and taking advantage of different PUs in a transparent way. Among these environments we can highlight the ParallelME. Despite the importance of task scheduling strategies in these environments, ParallelME has implemented only the First Come Firs Serve (FCFS) strategy. In this paper we extended the ParallelME framework by implementing and evaluating two different dynamic scheduling strategies, named HEFT and PAMS. We evaluate these new scheduling strategies considering synthetic applications with different characteristics. Comparing the proposals with the FCFS implemented in ParallelME, our results show that the new scheduling strategies, in special PAMS, are capable of achieving the best results in different scenarios, further improving the ParallelME's performance. For some scenarios, PAMS was up to 39% more efficient than FCFS. These gains usually imply on lower energy consumption, which is very desirable when working with mobile architectures. |
Rodrigo Carvalho, Guilherme Andrade, Diogo Santana, Thiago Silveira, Daniel Madeira, Rafael Sachetto, Renato Ferreira and Leonardo Rocha |
308 | An OAuth2.0-Based Unified Authentication System for Secure Services in the Smart Campus Environment [abstract] Abstract: Based on the construction of Shandong Normal University’s smart authentication system, this paper researches the key technologies of Open Authorization(OAuth) protocol, which allows secure authorization in a simple and standardized way from third-party applications accessing online services. Through the analysis of OAuth2.0 standard and the open API details between different applications, and concrete implementation procedure of the smart campus authentication platform, this paper summarizes the research methods of building the smart campus application system with existing educational resources in cloud computing environment. Through the conducting of security experiments and theoretical analysis, this system has been proved to run stably and credibly, flexible, easy to integrate with existing smart campus services, and efficiently improve the security and reliability of campus data acquisition. Also, our work provides a universal reference and significance to the authentication system construction of the smart campus. |
Baozhong Gao, Fangai Liu, Shouyan Du and Fansheng Meng |
310 | Time Series Cluster analysis on electricity consumption of North Hebei Province in China [abstract] Abstract: In recent years, China has vigorously promoted the building of an ecological civilization and regarded green low-carbon development as one of the important directions and tasks for industrial transformation and upgrading. It demanded that accelerating industrial energy saving and consumption reduction, speeding up the implementation of cleaner production, speeding up the recycling of resources, and promoting industrial savings and cleanliness, Low-carbon, efficient production changes, and promote industrial restructuring and upgrading. A series of measures have had a negative impact on the scale of industrial production in the region, thereby affecting the electricity consumption in the region. Based on the electricity consumption data of 31 counties in northern Hebei, this paper uses the time series clustering method to cluster the electricity consumption of 31 counties in Hebei Province. The results show that the consumption of electricity in different counties is different. The macro-control policies have different impacts on different types of counties. |
Luhua Zhang, Miner Liu, Jingwen Zhang and Kun Guo |
322 | Effective Semi-supervised Learning Based on Local Correlation [abstract] Abstract: As the machine learning mechanism that simultaneously explores both labeled and unlabeled instances, semi-supervised learning has received great success in various applications. Traditionally, the manipulation of unlabeled instances is solely based on prediction of the existing model, which is vulnerable to ill-posed training set, especially when the labeled instances are limited or imbalances. To address this issue, this paper investigate the local correlation based on the entire data distribution, which is leveraged as informative guidance to ameliorate the negative influence of biased model. To formulate the self-expressive property between instances within a limited vicinity, we develop the sparse self-expressive representation learning method based on column-wise sparse matrix optimization. Optimization algorithm is presented via alternating iteration. Then we further propose a novel framework, named semi-supervised learning based on local correlation, to effectively integrate the explicit prior knowledge and the implicit data distribution. In this way, the individual prediction from the learning model is refined by collective representation, and the pseudo-labeled instances are selected more effectively to augment the semi-supervised learning performance. Experimental results on multiple classification tasks indicate the effectiveness of the proposed algorithm. |
Xiao-Yu Zhang, Shupeng Wang, Xin Jin, Xiaobin Zhu and Binbin Li |
328 | Detection and Prediction of House Price Bubbles: Evidence from a New City [abstract] Abstract: In the early stages of growth of a city, housing market fundamentals are uncertain. This could attract speculative investors as well as actual housing demand. Sejong is a recently built administrative city in South Korea. Most government departments and public agencies have moved into it, while others are in the process of moving or plan to do so. In Sejong, a drastic escalation in house prices has been noted over the last few years, but at the same time, the number of vacant housing units has increased. Using the present value model, lease-price ratio, and log-periodic power law, this study examines the bubbles in the Sejong housing market. The analysis results indicate that (i) there are signicant house price bubbles, (ii) the bubbles are driven by speculative investment, and (iii) the bubbles are likely to burst earlier here than in other cities. The approach in this study can be applied to identifying pricing bubbles in other cities. |
Hanwool Jang, Kwangwon Ahn, Dongshin Kim and Yena Song |
329 | A Novel Parsing-based Automatic Domain Terminology Extraction Method [abstract] Abstract: As domain terminology plays a crucial role in the study of every domain, automatic domain terminology extraction method is in real demand. In
this paper, we propose a novel parsing-based method which generates the domain compound terms by utilizing the dependent relations between the words.
Dependency parsing is used to identify the dependent relations. In addition, a
multi-factor evaluator is proposed to evaluate the significance of each candidate
term which not only considers frequency but also includes the influence of other factors affecting domain terminology. Experimental results demonstrate that
the proposed domain terminology extraction method outperforms the traditional
POS-base method in both precision and recall. |
Ying Liu and Tianlin Zhang |
337 | Remote Procedure Calls for Improved Data Locality with the Epiphany Architecture [abstract] Abstract: This paper describes the software implementation of an emerging parallel programming model for partitioned global address space (PGAS) architectures. Applications with irregular memory access to distributed memory do not perform well on conventional symmetric multiprocessing (SMP) architectures with hierarchical caches. Such applications tend to scale with the number of memory interfaces and corresponding memory access latency. Using a remote procedure call (RPC) technique, these applications may see reduced latency and higher throughput compared to remote memory access or explicit message passing. The software implementation of a remote procedure call method detailed in the paper is designed for the low-power Adapteva Epiphany architecture. |
James Ross and David Richie |
342 | Identifying the propagation sources of stealth worms [abstract] Abstract: Worm virus can spread in various ways with great destructive power, which poses a great threat to network security. One example is the WannaCry worm in May 2017. By identifying the sources of worms, we can better understand the causation of risks, and then implement better security measures. However, the current available detection system may not be able to fully detect the existing threats when the worms with the stealth characteristics do not show any abnormal behaviors. This paper makes two key contributions toward the challenging problem of identifying the propagation sources: 1) A modified algorithm of observed results based on Bayes rule has been proposed, which can modify the results of possible missed nodes, so as to improve the accuracy of identifying the propagation sources. 2) We have applied the method of branch and bound, effectively reduced the traversal space and improved the efficiency of the algorithm by calculating the upper and lower bounds of the infection probability of nodes. Through the experiment simulation in the real network, we verified the accuracy and high efficiency of the algorithm for tracing the sources of worms. |
Yanwei Sun, Lihua Yin, Zhen Wang, Yunchuan Guo and Binxing Fang |
361 | Machine Learning Based Text Mining in Electronic Health Records: Cardiovascular Patient Cases [abstract] Abstract: This article presents the approach and experimental study results of machine learning based text mining methods with an application for EHR analysis. It is shown how the application of ML-based text mining methods to identify classes and features correlation to increases the possibility of prediction models. The analysis of the data in EHR has significant importance because it contains valuable information that is crucial for the decision-making process during patient treatment. The preprocessing of EHR using regular expressions and the means of vectorization and clustering medical texts data is shown. The correlation analysis confirms the dependence between the found classes of diagnosis and individual characteristics of patients and episodes. The medical interpretation of the findings is also presented with the support of physicians from the specialized medical center, which confirms the effectiveness of the shown approach. |
Sergey Sikorskiy, Oleg Metsker, Alexey Yakokovlev and Sergey Kovalchuk |
367 | Evolutionary ensemble approach for behavioral credit scoring [abstract] Abstract: This paper is concerned with the question of potential quality of scoring models that can be achieved using not only application form data but also behavioral data extracted from the transactional datasets. The several model types and a different configuration of the ensembles were analyzed in a set of experiments. Another aim of the research is to prove the effectiveness of evolutionary optimization of an ensemble structure and use it to increase the quality of default prediction. The example of obtained results is presented using models for borrowers default pre-diction trained on the set of features (purchase amount, location, merchant catego-ry) extracted from a transactional dataset of bank customers. |
Nikolay Nikitin, Anna Kalyuzhnaya, Alexander Kudryashov, Amir Uteuov, Ivan Derevitskii, Alexander Boukhanovsky and Klavdiya Bochenina |
374 | Detecting influential users in customer-oriented online communities [abstract] Abstract: Every year the activity of users in various social networks is increasing. Different business entities have the opportunity to analyze in more detail the behavior of the audience and adapt their products and services to its needs. Social network data allow not only to find the influential individuals according to their local topological properties, but also to investigate their preferences, and thus to personalize strategies of interaction with opinion leaders. Howev-er, information channels of organizations (e.g., community of a bank in a social network) in-clude not only target audience but also employees and fake accounts. This lowers the applica-bility of network-based methods of identifying influential nodes. In this study, we propose an algorithm of discovering influential nodes which combines topological metrics with the indi-vidual characteristics of users’ profiles and measures of their activities. The algorithm is used along with preliminary clustering procedure, which is aimed at the identification of groups of users with different roles, and with the algorithm of profiling the interests of users according to their subscriptions. The applicability of approach is tested using the data from a community of large Russian bank in the vk.com social network. Our results show: (i) the necessity of consid-eration a user’s role in the leader detection algorithm, (ii) the roles of poor described users may be effectively identified using roles of its neighbors, (iii) proposed approach allows for finding users with high values of actual informational influence and for distinguishing their key inter-ests. |
Ivan Nuzhdenko, Amir Uteuov and Klavdiya Bochenina |
393 | Precedent-based approach for the identification of deviant behavior in social media [abstract] Abstract: The current paper is devoted to a problem of deviant users’ identification in social media. For this purpose, each user of social media source should be described through a profile that aggregates open information about him/her within the special structure. Aggregated user profiles are formally described in terms of multivariate random process. The special emphasis in the paper is made on methods for identifying of users with certain on a base of few precedents and control the quality of search results. Experimental study shows the implementation of described methods for the case of commercial usage of the personal account in social media. |
Anna Kalyuzhnaya, Nikolay Nikitin, Nikolay Butakov and Denis Nasonov |
400 | Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster [abstract] Abstract: This paper discusses the performance of a parallel matrix multiplication routine (PDGEMM) that uses the 2.5D algorithm, which is a communication-reducing algorithm, on a cluster based on the Xeon Phi 7200-series (codenamed Knights Landing), Oakforest-PACS. Although the algorithm required a 2.5D matrix distribution instead of the conventional 2D distribution, it performed computations of 2D distributed matrices on a 2D process grid by redistributing the matrices (2D-compatible 2.5D-PDGEMM). Our use of up to 8192 nodes (8192 Xeon Phi processors) demonstrates that in terms of strong scaling, our implementation performs better than conventional 2D implementations. |
Daichi Mukunoki and Toshiyuki Imamura |