Introduction
I'm a thesis-based master student in Department of Electrical & Computer Engineering at Concordia University under the supervision of Dr. Yan Liu. My primary interest lies in the distributed system framework for big data processing issues, such as distributed stream processing system (e.g.: Apache Storm, Apache S4) and batch-oriented data processing system (e.g.: Apache Hadoop), especially those suit for recommendation scenario or some other artificial intelligence scenarios.
Prior to studying in Concordia University, I got my master degree of Computer Software and Theory in Wuhan University and worked in Tencent (QQ.COM, top 10 websites in the world ranked by Alexa) as a software engineer for three years.
Education
Concordia University, Montreal, Canada, Sept. 2013 - Now
Master of Applied Science in ECE
Wuhan University, Wuhan, China, Sept. 2008 - Jun. 2010
Master of Computer Software and Theory
Wuhan University, Wuhan, China, Sept. 2004 - Jun. 2008
Bachelor of Computer Science and Technology
Work Experience
Research Intern, Ericsson Inc, Ottawa, Canada, Aug. 2014 - Now
Mobile Big Data Analysis
- Build a Hadoop-based data analysis platform to support ad-hoc queries on tens of terabytes of data. Develop ETL tools to load data from LTE base stations to HDFS in real-time.
- Design and develop a real-time monitoring and performance prediction system for LTE stations.
Software Engineer, Tencent Inc, Shenzhen, China, Jul. 2010 - Jun. 2013
Qzone & Open Platform
- Designer and developer of an application recommendation system for Qzone (the largest social network in China), which increased the click-through rate by 170%.
- Server-side programmer developing SNS games, web services, web pages using C++, serving over 600 million users, processing 50,000 requests per second.
Research Projects
Load Adaptive Optimization for Incremental Data Processing Platforms
Stream processing software frameworks enable real-time processing of continuous unbounded streams of data at a high speed. Leveraging the elasticity of cloud computing infrastructure, stream processing frameworks can become Platform as a Service for many domain applications that provides simplified development and run-time management. An issue of making such a PaaS scalability is to allocate data processing operators on nodes of clusters and balance the workload dynamically. Since the data volume and rate can be unpredictable, static mapping between operators and cluster resources open results in unbalanced operator load distribution. This projects proposes an optimization method that combines correlation of resource utilization of nodes and capacity of clusters. The associated software components form a layer between a streaming processing software framework and cloud clusters and nodes. This software layer allows dynamic transferring of an operator to different cluster nodes at runtime and keeps transparent to developers.
Performance and Cost Evaluation of Running Data Intensive Applications on Hadoop and Streaming Processing Middleware
Processing large scale data is an increasing common and important problem for many domains. The de facto standard programming model MapReduce, and the associated run-time systems were originally adopted by Google. Subsequently, an open-source platform named Hadoop that supports the same programming model has also gained tremendous popularity. However, MapReduce was not designed to efficiently process small and independent updates. This means the MapReduce must be run again over both the newly updated data and the old data. Given enough computing resources, MapReduce’s scalability makes this approach feasible. However, reprocessing the entire data discards the work done in earlier runs and makes latency proportional to the size of entire data, rather than the size of an update.
S4 is a distributed computing platform for processing continuous unbounded streams of data. The motivation of S4 is to provide a highly scalable software solution (akin to Hadoop for batch data processing) to operate at high data rates and process massive amounts of data.
This research aims to present an empirical performance and cost evaluation of both Hadoop and S4 on processing continuous and incremental updated data streams.
Publication
Xing Wu, Yan Liu, "Optimization of Load Adaptive Distributed Stream Processing Services", Proceedings of IEEE International Conference on Services Computing (SCC). 2014. (Link)
Xing Wu, Yan Liu, "Enabling A Load Adaptive Distributed Stream Processing Platform on Synchronized Clusters", Proceedings of IEEE International Conference on Cloud Engineering (IC2E). 2014. (Link)
Xing Wu and Yan Liu, "Scalability Evaluation of Incremental Data Processing using Hadoop and Distributed Stream Processing Middleware", In "Big Data: Algorithms, Analytics, and Applications". (Book Link)
Ai, Yong, Hongbin Dong, Xing Wu, and Yiwen Liang. "Access Control Algorithm on File View in Intranets." In Networks Security Wireless Communications and Trusted Computing (NSWCTC), 2010 Second International Conference on, vol. 2, pp. 166-170. IEEE, 2010. (Link)
Contact Me
Office
EV13-173