Technical Program


NPC 2014 Program Overview

Day 1 – September 18, 2014 (Thursday)
Time / Place 九芎廳 (6F)
08:00-17:00 Registration
08:50-09:10 Opening
09:10-10:20 Keynote-1 (K1)
10:20-10:40 Break
10:40-11:50 Keynote-2 (K2)
11:50-13:20 Lunch (蘭城百匯 6F)
13:20-15:20 IJPP Session (J1)
15:20-16:20 Coffee Break / Poster (P1)
16:20-18:20 IJPP Session (J2)
18:30-20:30 Welcome Reception (蘭城廳 6F)
Day 2 – September 19, 2014 (Friday)
Time / Place 蘭城廳(6F)
08:00-17:00 Registration
09:00-10:20 NPC Parallel Sessions (L1, L2)
10:20-10:40 Break
10:40-12:00 NPC Parallel Sessions (L3, L4)
12:00-13:30 Lunch (蘭城百匯 6F)
13:40-15:30 NPC Parallel Sessions (L5, L6)
15:30-16:30 Coffee Break / Poster (P2)
16:30-18:30 NPC Parallel Sessions (L7, L8)
18:30-20:30 Conference Banquet (九芎廳 6F)
Day 3 – September 20, 2014 (Saturday)
08:00-09:30 Registration
09:30-11:30 NPC Executive Committee Meeting / Social Events
12:00-13:30 Lunch
13:30-17:30 NPC Executive Committee Meeting / Social Events
Keynote Ta1k I – Dr. Michael Gschwind, IBM, NY, USA
Open Power: Liberating Scale Out Data Centers with Power8

Session Chair: Prof. Kemal Ebcioglu

Abstract: With the creation of the Open Power Foundation, the Open Power partner companies are looking to extend the reach of Power into massive scale-out data center computing environments. This will give data center operators more choice and free them from lock-in into the desktop processors currently used in these environments. To drive this transformation in scale out datacenter computing, the Open Power Foundation is being created as a framework in which to innovate around Power in architecture, I/O, system design and software. The first Open Power systems were recently announced based on the newly introduced Power8 processor. In addition to providing significant performance improvements over previous microprocessors. In addition, Power8 was designed to deliver unprecedented performance for emerging workloads, such as Business Analytics and Big Data applications, Cloud computing and Scale out Datacenter workloads. Finally, the CAPI accelerator provides a high-performance interconnect for accelerators.
To extend Power into the scale out datacenter space, IBM is reengineering the Power environment: IBM is developing a new system software stack using an open firmware available to third party system developers, and an open source-based hypervisor based on KVM. In addition, because many applications in this space were developed and tested exclusively in a little-endian environment, databases in storage may be stored in little-endian format, and to leverage a wide range of commodity I/O solutions/ the new Open Power environment will operate in little-endian mode. The new software stack is built with a new Open Power ABI which is introduced in conjunction with Open Power to take advantage of new architecture, microarchitecture and compiler opportunities by Power and future Open Power processors. In this talk I will discuss the Open Power Foundation and the new opportunities it is creating for Power, and then focus on the new Open Power software stack and in particular the Open Power ABI. To simplify application development for SIMD vector code, the Open Power environment also specifies a new little-endian vector SIMD programming API, as well as a "big-on-little" API to facilitate porting of big-endian code to little-endian environment and the maintenance of vector libraries targeting both little- and big-endian systems.

About the speaker

Dr. Michael Gschwind is a Senior Technical Staff Member and Senior Manager of the Systems Architecture team responsible for IBM's Power and mainframe architecture evolution. During his career at IBM, Mike has served as leader for many of IBM's microprocessor products, including architecture lead for Power8, Power7 and the PERCS project, Floating Point Chief Architect for BlueGene, Core Reliability lead for the BlueGene/Q microprocessor, lead architect for the XBox360 VMX128 multimedia architecture and the Cell Broadband Engine and its accelerators, and several generations of binary translation architecture (BOA, DAISY, DAISY/390). Dr. Gschwind served as design lead for the BlueGene/Q floating point unit, as core reliability lead for the BlueGene/Q core and as IFU and IDU lead and chief microarchitect for the Komal core which served as the foundation for IBM's successful Power7 and Power8 products. In addition to his leadership role in hardware design, Dr. Gschwind also developed the first Cell compiler and served as lead for the definition of the Cell and OpenPower LE software development environments. Dr. Gschwind has published numerous articles and received about 100 patents in the area of computer architecture. In 2006, Dr. Gschwind was recognized as an IT Innovator and Influencer by InformationWeek. Dr. Gschwind is a member of the ACM SIGMICRO Executive Board, an ACM Distinguised Speaker, a Member of the IBM Academy of Technology, an IBM Master Inventor and an IEEE Fellow.

Keynote Talk II – Prof. Yunquan Zhang, Chinese Academy of Science, China
yaSpMV: Yet Another SpMV Framework on GPUs

Session Chair: Prof. Barbara Chapman

Abstract: SpMV is a key linear algebra algorithm and has been widely used in many important application domains. As a result, numerous attempts have been made to optimize SpMV on GPUs to leverage their massive computational throughput. Although the previous work has shown impressive progress, load imbalance and high memory bandwidth remain the critical performance bottlenecks for SpMV. In this talk, we present our novel solutions to these problems. First, we propose a new SpMV format, called blocked compressed common coordinate (BCCOO), which uses bit flags to store the row indices in a blocked common coordinate (COO) format so as to alleviate the bandwidth problem. We further improve this format by partitioning the matrix into vertical slices to enhance the cache hit rates when accessing the vector to be multiplied. Second, we revisit the segmented scan approach for SpMV to address the load imbalance problem. We propose a highly efficient matrix-based segmented sum/scan for SpMV and further improve it by eliminating global synchronization. Then, we introduce an auto-tuning framework to choose optimization parameters based on the characteristics of input sparse matrices and target hardware platforms. Our experimental results on GTX680 GPUs and GTX480 GPUs show that our proposed framework achieves significant performance improvement over the vendor tuned CUSPARSE V5.0 (up to 229% and 65% on average on GTX680 GPUs, up to 150% and 42% on average on GTX480 GPUs) and some most recently proposed schemes (e.g., up to 195% and 70% on average over clSpMV on GTX680 GPUs, up to 162% and 40% on average over clSpMV on GTX480 GPUs).

About the speaker

I am a full professor of Computer Science at Chinese Academy of Sciences in Beijing, China. I also serve as Professor of the State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences. My other current appointments include the Expert Review Group member of Information Sciences Division of the National Science Foundation (China), Secretary-General of China's High Performance Computing Expert Committee, Technical Committee of National 863 “High-Performance Computer Evaluation Center”, and General Secretary of the Specialty Association of Mathematical & Scientific Software of China. I received a PhD degree in Computer Software and Theory from the Chinese Academy of Sciences in 2000. My research interests are in the areas of high performance parallel computing, with particular emphasis on large scale parallel computation and programming models, high-performance parallel numerical algorithms, and performance modeling and evaluation for parallel programs. I have published over 100 papers in international journals and conferences proceedings. Recently I served as Co-Chair of Program Committee of IEEE CSE 2010 and IEEE HPCC 2013, Vice-Chair of Program Committee of High-Performance Computing China (2008 ~ 2012), member of Steering Committee of International Supercomputing Conference 2012, member of Program Committee of IEEE ICPADS 2008, ACM ICS 2010, IEEE IPDPS 2012, IEEE CCGRid 2012 and CGO 2013. I also organize and distribute China’s TOP100 List of High Performance Computers, which traces and reports the development of the HPC system technology and usage in China. This list includes Tianhe-1A supercomputer that ranked as the world’s fast supercomputer in 2011.

NPC 2014 Technical Program

NPC(J1)–Systems & Architectures
Session Chair: Prof. Xuanhua Shi
1. Efficient Buffer Management for Tree Indexes On Solid State Drives
    Chengcheng Yang, Peiquan Jin, Lihua Yue and Puyuan Yang
2. SOC: Satisfaction-oriented Virtual Machine Consolidation in Enterprise Data Centers
    Xi Li, Anthony Ventresque, John Murphy and James Thorburn
3. A Phase Behavior Aware Dynamic Cache Partitioning Scheme for CMPs
    Xiaofei Liao, Rentong Guo, Danping Yu, Hai Jin and Li Lin
4. A credit-based load-balance-aware CTA scheduling optimization scheme in GPGPU
    Yulong Yu, Xubin He, He Guo, Yuxin Wang and Xin Chen
5. GPU Accelerated Finding of Channels and Tunnels for a Protein Molecule
    Byungjoo Kim, Jung Eun Lee, Young J. Kim and Ku-Jin Kim

NPC(J2)–Algorithms & Applications
Session Chair: Prof. Deqing Zou
1. Using Packet Processing Object Modules interchangeably as Stand-Alone Programs or "Multi-App" Components
    Ralph Duncan, Peder Jungck, Kenneth Ross, Dwight Mulcahy and Minh Nguyen
2. A Text Clustering Approach of Chinese News Based on Neural Network Language Model
    Zhaoxin Fan, Shuoying Chen, Li Zha and Jiadong Yang
3. Operating System Enhancement for Supporting Massively Multiplayer Online Games in a Server Cluster
    Mei-Ling Chiang, Bo-Wen Yu, Chi-Shian Shia, Jiun-Jiun Huang and Shi-Gi Hwang
4. Detection of Forwarding-based Malicious URLs in Online Social Networks
    Jian Cao, Qiang Li, Yuede Ji, Yukun He and Dong Guo
5. A Linear Time Self-Stabilizing Algorithm for Minimal Weakly Connected Dominating Sets
    Yihua Ding, James Wang and Pradip Srimani

NPC(L1)-Systems, Networks and Architectures
Session Chair: Prof. Chung-Ta King
1. Routing and Wavelength Assignment for Exchanged Hypercubes in Linear Array Optical Networks
    Yu-Liang Liu
2. Page Classifer and Placer: A Scheme of Managing Hybrid Caches
    Xin Yu, Xuanhua Shi, Hai Jin, Xiaofei Liao, Song Wu and Xiaoming Li
3. Temporal-based Ranking in Heterogeneous Networks
    Ruidan Li, Chen Yu, Dezhong Yao, Feng Lu and Hai Jin
4. Designing Buffer Capacity of Crosspoint-Queued Switch
    Guo Chen, Dan Pei, Youjian Zhao and Yongqian Sun

NPC(L2)–Multi-Core Technologies
Session Chair: Prof. Song Wu
1. Benchmarking the Memory Hierarchy of Modern GPUs
    Xinxin Mei, Kaiyong Zhao, Chengjian Liu and Xiaowen Chu
2. Parallel CYK Membership Test on GPUs
    Kyoung-Hwan Kim, Sang-Min Choi, Hyein Lee, Ka Lok Man and Yo-Sub Han
3. Designing Coalescing Network-on-Chip for Efficient Memory Accesses of GPGPUs
    Chien-Ting Chen, Yoshi Shih-Chieh Huang, Yuan-Ying Chang, Chiao-YunTu, Chung-Ta King, Tai-Yuan Wang, Janche Sang and Ming-Hua Li

NPC(L3)-Systems, Networks and Architectures
Session Chair: Prof. Xiaofei Liao
1. Loss-rate Driven Network Coding for Transmission Control
    Yihjia Tsai and Chaoyuan Chiang
2. Multilayer Perceptron and Stacked Autoencoder for Internet Traffic Prediction
    Tiago Prado Oliveira, Jamil Salem Barbar, and Alexsandro Santos Soares
3. APP-LRU: A New Page Replacement Method for PCM/DRAM-Based Hybrid Memory Systems
    Zhangling Wu, Peiquan Jin1, Chengcheng Yang, Lihua Yue
4. PaxStore:A Distributed Key Value Storage System
    Zhipeng Tan, Yongxing Dang, Jianliang Sun, Wei Zhou, Dan Feng

NPC(L4)-Multi-Core Technologies
Session Chair: Prof. Hsi-Ya Chang
1. A Real-time Scheduling Framework Based on Multi-core Dynamic Partitioning in Virtualized Environment
    Song Wu, Like Zhou, Danqing Fu, Hai Jin, Xuanhua Shi
2. Automatic Data Layout Transformation for Heterogeneous Many-core Systems
    Ying-Yu Tseng, Yu-Hao Huang, Bo-Cheng Lai and Jiun-Liang Lin
3. mpCache: Accelerating MapReduce with Hybrid Storage System on Many-Core Clusters
    Bo Wang, Jinlei Jiang and Guangwen Yang
4. HiNetSim: A parallel simulator for large-scale hierarchical direct networks
    Zhiguo Fan, Zheng Cao, Yong Su, Xiaoli Liu, Zhan Wang, Xiaobing Liu, Dawei Zang and Xuejun An

NPC(L5)–Virtualization & Cloud Computing
Session Chair: Prof. James Wang
1. Online Mechanism Design for VMs allocation in Private Cloud
    Xiaohong Wu, Yonggen Gu, Guoqiang Li, Jie Tao, Jingyu Chen, and Xiaolong Ma
2. A Novel Resource Provisioning Model for DHT-based Cloud Storage Systems
    Jingya Zhou and Wen He
3. BIDS: Bridgehead-employed Image Distribution System for Cloud Data Centers
    Zhongzhao Wang, Yuebin Bai, Kun Cheng, Jihong Ma, Duo Lv,Yuanfeng Peng, and Yao Ma
4. A Broker-Based Self-Organizing Mechanism for Cloud-Market
    Jie Xu and Jian Cao
5. Group Participation Game Strategy for Resource Allocation in Cloud Computing
    Weifeng Sun, Danchuang Zhang, Ning Zhang, Qingqing Zhang and Tie Qiu

NPC(L6)–Applications of Parallel & Distributed Computing
Session Chair: Prof. M. M. Hafizur Rahman
1. An adaptive channel sensing approach based on sequential order in distributed Cognitive Radio Networks
    Guangsheng Feng, Huiqiang Wang, Qian Zhao and Hongwu Lv
2. A Location Privacy Preserving based on Sensitive Diversity for LBS
    Changli Zhou, Chunguang Ma , Songtao Yang, Peng Wu, Linlin Liu
3. Message Passing Algorithm for the Generalized Assignment Problem
    Mindi Yuan, Chong Jiang, Shen Li, Wei Shen, Yannis Pavlidis and Jun Li
4. PPMS: a Peer to Peer Metadata Management Strategy for Distributed File Systems
    Di Yang, Weigang Wu and Jiongyu Yu
5. Improving Log-based Fault Diagnosis by Log Classification
    Deqing Zou, Hao Qin, Hai Jin, Weizhong Qiang, Zongfen Han, Xueguang Chen
6. Energy-Efficient and Adaptive Algorithms for Constructing Multipath Routing in Wireless Sensor Networks
    Shaohua Wan

NPC(L7)–Virtualization & Cloud Computing
Session Chair: Prof. Rui Wang
1. Towards Optimal Collaboration of Policies in the Two-phase Scheduling of Cloud Tasks
    Cong Xu, Jiahai Yang, Di Fu and Hui Zhang
2. Gossip Membership Management with Social Graphs For Byzantine Fault Tolerance in Clouds
    Jongbeom Lim, Joon Min Gil, Kwang Sik Chung, Jihun Kang, Daewon Lee and Heonchang Yu
3. Prediction-based Optimization of Live Virtual Machine Migration
    Changyuan Chen, Jian Cao
4. Control Protocol and Self-adaptive Mechanism for Live Virtual Machine Migration over XIA
    Dalu Zhang, Xiang Jin, Dejiang Zhou, Jianpeng Wang and Jiaqi Zhu
5. Efficient Live Migration of Virtual Machines with A Novel Data Filter
    Yonghui Ruan, Zhongsheng Cao, Yuanzhen Wang
6. A compilation and run-time framework for maximizing performance of self-scheduling algorithm
    Yizhuo Wang, Laleh Beni, Alex Nicolau, Alex Veidenbaum and Rosario Cammarota

NPC(L8)–Architectures and File Systems
Session Chair: Prof. Anna Kobusinska
1. Semi-Automatic Composition of Data Layout Transformations for Loop Vectorization
    Shixiong Xu and David Gregg
2. Accelerating the Reconstruction Process in Network Coding Storage System by Leveraging Data Temperature
    Kai Li and Yuhui Deng
3. Towards relaxed rollback-recovery consistency in SOA
    Jerzy Brzezinski, Mateusz Holenko, Anna Kobusinska, Dariusz Wawrzyniak, and Piotr Zierhoffer
4. A Novel Page Repalcement Algorithm for the Hybrid Memory Architecture Involving PCM and DRAM
    Kaimeng Chen, Peiquan Jin and Lihua Yue
5. Wire Length of Midimew-connected Mesh Network
    Md Rabiul Awal, M. M. Hafizur Rahman, Rizal Mohd Nor, Tengku Mohd Bin Tengku Sembok, Yasuyuki Miura, and Yasushi Inoguchi
6. Dynamic Stripe Management Mechanism in Distributed/Parallel File Systems
    Jianwei Liao, Guoqiang Xiao, Xiaoyan Liu, and Lingyu Zhu

Speedup Critical Stage of Machine Learning with Batch Scheduling in GPU
  Yuan Gao, Rui Wang, Ning An, Yanjiang Wei and Depei Qian
The New Territory of Lightweight Security in a Cloud Computing Environment
  Shu-Ching Wang, Shih-Chi Tseng, Hsin-Met Chuan, Kuo-Qin Yan, Szu-Hao Tsai
Tacked Link List - An Improved Linked List for Advance Resource Reservation
  Li-bing Wu, Jing Fan, Lei Nie and Bing-yi Liu.
CFIO2: overlapping communications and I/O with computations using RDMA technology
  Cheng Zhang, Xiaomeng Huang, Yong Hu, Shizhen Xu, Haohuan Fu and Guangwen Yang
Performance Analysis of End-to-end Services in Virtualized Computing Environments
  Guofeng Yan and Yuxing Peng
Increasing Multi-controller Parallelism for Hybrid-mapped Flash Translation Layers
  Hung-Yi Sung and Chin-Hsien Wu
An Estimation-based Task Load Balancing Scheduling in Spot Clouds
  Daeyong Jung, Heeseok Choi, Daewon Lee, Heonchang Yu and Eunyoung Lee
Distributed Ontology Integration Model for Cooperative Inference in Context Aware Computing‏
  Soomi Yang
Cross-Platform Parallel Programming in PARRAY: A Case Study
  Xiang Cui, Xiaowen Li, and Yifeng Chen
Different Solvers Evaluation for a Bucking Problem
  Chau-Yi Chou, Jiunn-Horng Lee, Yu-Fen Cheng, Chih-Wei Hsieh and Weichung Wang
Quality of Service Enhancement by Using an Integer Bloom Filter based Data De-duplication
Mechanism in the Cloud Storage Environment
  Kuo-Qin Yan, Yung-Hsiang Su, Hsin-Met Chuan, Shu-Ching Wang, Bo-Wei Chen
Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid
  Minhwan Ok
Scheduling Cloud Platform Managed Live-Migration Operations to Minimize the Makespan
  Xiaoyong Yuan, Ying Li, Yanqi Wang and Kewei Sun
Sequential Sensing and Transmission for Real-Time Traffic in Cognitive Networks
  Show-Shiow Tzeng and Ying-Jen Lin
An Adaptive Heterogeneous Runtime for Irregular Applications in the Case of Ray-Tracing
  Chih-Chen Kao and Wei-Chung Hsu
DLBer: A Dynamic Load Balancing Algorithm for the Event-driven Clusters
  Mingming Sun, Changlong Li and Xuehai Zhou
Performance Prediction Model and Analysis for Compute-intensive Tasks on GPUs
  Khondker S. Hasan, Amlan Chatterjee, Sridhar Radhakrishnan, and John K. Antonio
Interdomain Traffic Engineering Techniques to Overcome Undesirable Connectivity Incidents
  Amer AlGhadhban, Ashraf Mahmoud, Marwan Abu-Amara, Farag Azzedin, Mohammed H. Sqalli