AAAI 2014 Tutorial

-      Large-scale Nonlinear Classification: Algorithms and Evaluations


Abstract: Most of the research in AI has been directed to the problem of data classification in which the algorithm learns linear/nonlinear models from data. Nonlinear data classification is particularly important as complex nonlinear concepts often occur in the nature. An accurate and scalable algorithm with the ability of learning nonlinear model plays a key role in various data mining, NLP, computer vision, and information retrieval problems. In an environment where new large-scale problems are emerging in various disciplines and pervasive computing applications are becoming common, there is a real need for classification algorithms that are able to process increasing amounts of data efficiently. Recent advances in large-scale learning resulted in many popular algorithms for linear classification using large data. However, technologies for large-scale nonlinear classification are still under developed and the best practices are less known. To fill this gap we present a survey of state of the art algorithms and software packages and the evaluation on benchmark data sets. We discuss algorithms on different aspect of this area in details. We also present a comprehensive experimental evaluation of these algorithms and off-the-shelf software on a collection of the large real-life data sets across various domains. Basic background in supervised learning is assumed.

Speaker Biography: Dr. Zhuang (John) Wang is a Member of Technical Staff at Skytree, a California-based machine learning startup. He was an Application Architect in Big Data and Analytics at IBM Global Business Services, where he was dedicated in bridging science and business by developing big data analytics solutions for business innovation. Prior to IBM, he was a Research Scientist with Siemens Corporate Research and led/worked on a wide variety of projects building predictive maintenance, anomaly detection and decision support systems for servicing fleets of industrial and medical equipments that generate huge amount of senor/log data. He earned his Ph.D. in Computer and Information Science at Temple Univ., PA in 2010 and his B.A. in Electronic Commerce at Wuhan Univ., China in 2006. Dr. WangĄ¯s research interests are in large-scale supervised learning algorithms, in particular in Support Vector Machines, Neural Networks, as well as in online, and multi-instance learning and their applications. He is the author/coauthor of 20+ papers published at JMLR, ICML, KDD, AISTATS et al. He is the project lead of BudgetedSVM, a highly optimized toolbox for SVM approximations when data cannot fit into memory, and the solution architect of the world first log-based predictive maintenance system.

Slides [pdf] (final version)

Tutorial Outline:

-       Large-scale linear classification (overview of classical algorithms)

-       Large-scale non-linear classification (deep dive into the state-of-the-arts)

-       Parallelism


Time: Sunday, July 27, 9am~1pm