Course syllabus: Big data analysis

This module presents recent advances in machine learning at the master level. It collects new challenges of big data analysis and solutions from the machine learning point-of-view. These challenges include not only drastic expansion size of datasets and dimensionality of the feature space but mainly increasing complexity of data structures. This phenomenon calls for new mechanisms of machine learning. The purpose of delivering this module is to systematically and clearly present new materials in the field of big data analysis systematically and clearly. The main goal is to foster student ability to understand and use practice methods from new scientific papers, which convey their results in the language of machine learning.

Large-scale clustering. Fast k-means. Dbscan. Mixture scales. Metrics selection.
- Practice: mushroom dataset clustering.
Metrics learning. Transformation matrix for clustering and classification problems. #* Metric tensor. Fast optimization.
- Practice: accelerometer time series clustering.
Similarity learning. Hash learning. Vantage Point Tree. KD-Tree. Geometrical near-neighbor access tree. Spatial approximation tree. Hierarchical k-means tree. Randomized KD-trees.
- Practice: plagiarism detection.
Deep Learning. Representation learning. Autoencoders. Ladder networks. Convolutional networks. Tensors.
- Practice: prediction of biological activity for nuclear receptors.
Dynamic time warping. Alignment and dynamic programming. Fast DTW. Kernel DTW.
- Practice: protein sequences alignment.
Spatiotemporal data analysis. Panel data. SVD. Partial least squares. Multi-modelling.
- Practice: Electrocorticogram classification.
Random decision trees. Bootstrap aggregating. Random subspace method. Bayesian model averaging.
- Practice: internet-of-things monitoring time series forecasting.
Semisupervised learning. Self-training, co-training, co-learning. Multiarmed bandit learning. Reinforcement Learning.
- Practice: personalized recommender system constructing.
Sparse decomposition. Probabilistic latent semantic analysis. Linear discriminant analysis.
- Practice: topic model creating.
Sequential data analysis. Hidden Markov models. Linear dynamical systems. Online learning.
- Practice: hidden financial strategy recovering.
Bayesian programming. Bayesian inference. Belief propagation. Bayesian and Kalman filter. Hidden Markov models.
- Practice: Bayesian spam filtering.
Transductive learning. Multitask learning. Learning to learn. Hierarchical models.
- Practice: Electrocardiogram classification.
Multitarget learning. Collaborative filtering. Individual and joint target. Mean-regularized multi-target learning. Regret analysis. Conditional random fields.
- Practice: continuous auction data.
Structured learning. Error function constructing. Conditional Random Fields. Structured Support Vector Machines.
- Practice: image overlapping discovering.
Review: large-scale classification. Data generation procedure. Model construction and optimization procedure. Computational experiment workflow.
- Practice: creating a report on the computational experiment.
- Exam.

Reading List

Core texts
1. Kevin P. Murphy, 2012. Machine Learning: A Probabilistic Perspective
2. Sergios Theodoridis, 2015. Machine Learning: A Bayesian and Optimization Perspective
3. David Barber, 2012. Bayesian Reasoning and Machine Learning
Supplementary texts
1. Peter Flach, 2013. Machine Learning: The Art and Science of Algorithms that Make Sense of Data
2. Rihard Sutton and Andrew Barto, 2004. Reinforcement Learning: An Introduction
3. Sebastian Nowozin and Christoph H. Lampert, 2011. Structured Learning and Prediction in Computer Vision