Difference between revisions of "Research Statement"

From Research management course
Jump to: navigation, search
Line 28: Line 28:
  
 
==Multimodeling and knowledge transfer==
 
==Multimodeling and knowledge transfer==
 +
 +
To reduce the complexity of deep learning models, we transfer information about the structure, parameters, and distribution from the teacher to the student model. This method is called distillation or privileged learning.
 +
We assume the student has fewer parameters than the teacher. Our method of  Bayesian model selection treats the prior distribution of the student parameters as the posterior distribution of the teacher parameters. We align the model structures by omitting non-informative parts of the teacher model for specific data.
  
 
==Spatio-time series and manifold learng==
 
==Spatio-time series and manifold learng==
  
 
==Physis-informed machine learning==
 
==Physis-informed machine learning==

Revision as of 18:37, 23 October 2022

Vadim, 2023

My research focuses on the problems of model selection in Machine Learning. It explores methods of Applied Mathematics and Computer Science. The central problem is selecting the most accurate and robust forecasting model of the simplest structure. To define the algebraic structure of this set according to the application and the origin of data, I use various tools: from tensor algebras to differential geometry. I use multivariate statistics, Bayesian inference, and graph probability to induce the quality criteria of selection. My work joins theory and practical applications. I believe multi-model decoding problems for heterogeneous data are the most promising. Forecasting the variable of a complex structure requires several models to recover dependencies in source and target spaces and to settle the forecast. The examples to investigate are various spatial-time measurements. The practical applications are brain-computer interface, health monitoring with wearable devices, and other signals in biology and physics.

Dimensionality reduction

Model selection is the fundamental problem of Machine learning. Lasso and LARS significantly impacted the statistical methods of forecasting. Time series data have multiple dependencies between components. It leads to redundancy of the design space. Our works show that the quadratic programming feature selection algorithms deliver less redundant and more stable models. This approach considers the mutual information of the features and selects a model according to relevance and similarity criteria. The main idea is to minimize mutual dependence and maximize approximation quality by varying the indicator function. This function asserts the model structure. For linear models, we use the quadratic programming problem statement. For neural networks, we use Newtonian methods.

We expand this approach for tensor decomposition to reduce the dimensionality of brain signals. We use a multi-way structure to predict hand trajectories from the spatial time series of cortical activity. It resides in the spatial-spectra-temporal domains. Since these data highly correlate in the domains, redundancy of the feature space and its dimensionality become a major obstacle for a robust solution of the regression problem both in multi-way and flat cases. The proposed method extends the quadratic programming feature selection approach. We do not have to optimize the model's parameters to get an optimal model structure.

Bayesian model selection

Bayesian model selection relies on the analysis of the model parameters. To select models, we shall use several types of parameters, including structure parameters and hyperparameters. The first one defines the structure of the model, the computational graph. The second one represents the distribution of model parameters and infers the criterion to select the optimal model. The proper hyperparameter values prevent the model from overfitting. Optimization of neural networks with large amounts of hyperparameters is computationally expensive. We proposed modifications of various gradient-based methods based on evidence lower bound.

A large part of my research is devoted to the methods of parameter covariance matric estimations. These estimations form a criterion to select a particular parameter of the model or subset of parameters like a neuron or layer of a neural network. The hyperparameters here are the expectation and covariance matrix. (as any parameters of the distribution of the model parameters). For some types of models, they are estimated directly, but the following three ways dive decent results. The likelihood function optimization. Let us treat the model evidence or the evidence lower bound as an example. The bootstrapping methods tackle empirical estimations for some given models. As the most productive way to get the hyperparameters, I treat stochastic processes, starting from Metropolis-Hastings Bayesian sampling.

Multimodeling and knowledge transfer

To reduce the complexity of deep learning models, we transfer information about the structure, parameters, and distribution from the teacher to the student model. This method is called distillation or privileged learning. We assume the student has fewer parameters than the teacher. Our method of Bayesian model selection treats the prior distribution of the student parameters as the posterior distribution of the teacher parameters. We align the model structures by omitting non-informative parts of the teacher model for specific data.

Spatio-time series and manifold learng

Physis-informed machine learning