Difference between revisions of "Highlights of selected papers"

From Research management course
Jump to: navigation, search
Line 141: Line 141:
  
 
[https://doi.org/10.1016/j.cam.2013.06.031 Sample size determination for logistic regression]. '''Journal of Computational and Applied Mathematics'''  
 
[https://doi.org/10.1016/j.cam.2013.06.031 Sample size determination for logistic regression]. '''Journal of Computational and Applied Mathematics'''  
+
 
 
The problem of sample size estimation is vital in medical applications, especially in cases of expensive measurements. This paper sets the problem of logistic regression analysis. The authors propose to estimate the sample size using the distance between parameter distribution functions on cross-validated data sets.
 
The problem of sample size estimation is vital in medical applications, especially in cases of expensive measurements. This paper sets the problem of logistic regression analysis. The authors propose to estimate the sample size using the distance between parameter distribution functions on cross-validated data sets.
  

Revision as of 02:31, 16 May 2023

Vadim

2022

Quadratic programming feature selection for multi-correlated signal decoding with partial least squares. 'Expert Systems with Applications

This paper investigates the dimensionality reduction problem for signal decoding. Its main application is brain-computer interface modeling. The model combines brain cortex signals and limb motion signals.

Numerical methods of sufficient sample size estimation for generalized linear models. Lobachevskii Journal of Mathematics

To select an adequate regression or classification model, a sample set of minimum sufficient size must be collected. This paper investigates the problem of cost reduction of data collection procedures.

Probabilistic interpretation of the distillation problem. Automation and Remote Control

A probabilistic interpretation of distillation and privileged learning methods is proposed. The theory is illustrated with linear and logistic regression.

Probabilistic models of expert learning. Automation and Remote Control

The student's model has fewer parameters than the teacher's model. The Bayesian approach to selecting a student model assigns an a priori distribution of the student parameters according to the posterior distribution of the teacher.

Neural architecture search with structure complexity control. EasyChair

A neural architecture search concerning its desired complexity uses the differential architecture search algorithm. Instead of optimizing the structural parameters, we consider them as a vector function, which depends on a complexity parameter.

2021

Continuous physical activity recognition for intelligent labor monitoring. Multimedia Tools and Applications

Human activity recognition depends on the context of actions. The solution is the hierarchical representation of activities as sets of low-level actions.

Bayesian distillation of deep learning models. Automation and Remote Control

The paper proposes a mechanism for parameter space reduction of a student model using a teacher model. A theoretical analysis of the proposed reduction mechanism is provided. The computational experiment uses FashionMNIST.

Prior distribution selection for a mixture of experts. Computational Mathematics and Mathematical Physics

The paper investigates a mixture of expert models. The gate function is a neural network with softmax on the last layer. The paper analyzes various prior distributions for each expert. The authors propose a method that considers the relationship between prior distributions of different experts.

Position-based content attention for time series forecasting with sequence-to-sequence RNN. Lecture Notes in Computer Science

We propose an extended attention model for sequence-to-sequence recurrent neural networks (RNNs) designed to capture (pseudo-)periods in time series. This extended attention model can be deployed on top of any RNN and is shown to yield state-of-the-art performance for time series forecasting on several univariate and multivariate time series.

2020

DRACON: disconnected graph neural network for atom mapping in chemical reactions. Physical Chemistry Chemical Physics

Machine learning solved many challenging problems in computer-assisted synthesis prediction (CASP). We formulate a reaction prediction problem regarding node classification in a disconnected graph of source molecules and generalize a graph convolution neural network for disconnected graphs. We demonstrate that our approach can successfully predict centers of reaction and atoms of the main product.


Comprehensive analysis of gradient-based hyperparameter optimization algorithms. Annals of Operations Research

The paper investigates the hyperparameter optimization problem. Hyperparameters are the parameters of model parameter distribution. The optimal hyperparameter values prevent model overfit and allow it to obtain higher predictive performance. Neural network models with a large number of hyperparameters are analyzed.

2018

Object selection in credit scoring using covariance matrix of parameters estimations. Annals of Operations Research

We address the problem of outlier detection for more reliable credit scoring. Scoring models estimate the probability of loan default based on the customer’s application. To get an unbiased estimation of the model parameters, one must select a set of informative objects (customers).

Dimensionality reduction for time series decoding and forecasting problems. IEEE DEStech Transactions on Computer Science and Engineering

The decoding of multiscaled time series and forecasting receives the predicted values, not for the next timestamp but for the whole time segment in the forecast horizon. We conducted computational experiments on the real dataset of energy consumption and electrocorticogram signals (ECoG).

Multi-way feature selection for ECoG-based Brain-Computer Interface. Expert Systems with Applications

We address the problem of designing Brain-Computer Interfaces. The feature description resides in a spatial-spectra-temporal domain. It includes the electrocorticogram time series and their spectral representation. We propose a filtering feature selection method for tensor data.

Deep learning model selection of suboptimal complexity. Automation and Remote Control

Suboptimal complexity is an approximate estimate of the minimum description length obtained with Bayesian inference and variational methods. We apply variational methods with gradient optimization algorithms to estimate the likelihood.

Quadratic Programming Optimization with Feature Selection for Nonlinear Models. Lobachevskii Journal of Mathematics

The high-dimensional feature space is redundant. There is multicollinearity in the features. To build a stable model, the authors solve the dimensionality reduction problem for the feature space. The proposed algorithm maximizes the relevance of model parameters to the residuals and makes them pairwise independent. The experiment investigates the nonlinear and logistic regression models.

Analysis of dissimilarity set between time series. Computational Mathematics and Modeling

A distance function between time series aligns two time series and builds a dissimilarity set. Using this distance function, we propose a classification method for human physical activity time series from the mobile phone accelerometer.

2017

Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria. Expert Systems with Applications

Data fitting is a single-objective optimization problem, where the objective function indicates the error of approximating the target vector as some function of given features. Linear dependence between features induces the multicollinearity problem and leads to instability of the model and redundancy of the feature set. This paper introduces a feature selection method based on quadratic programming.

Generation of simple structured information retrieval functions by genetic algorithm without stagnation. Expert Systems with Applications

To construct new ranking models for Information Retrieval on the document description, we propose a modified genetic algorithm. It generates models as superpositions of primitive functions and selects the best according to the quality criterion. The main impact of the research is the new solution to avoid stagnation and control the structural complexity of consequently generated models.

Selecting an optimal model for forecasting the volumes of railway goods transportation. Automation and Remote Control

To select an optimal model of short-term forecasting of the volumes of railway transportation with types of goods, departure, and destination points, we compare autoregression and nonparametric models.

2016

Extracting fundamental periods to segment biomedical signals. IEEE Journal of Biomedical and Health Informatics

We introduce quasi-periodic time series via triplets (basic shape, shape transformation, time scaling). To split the time series into periods, we select a pair of principal components of the phase trajectory matrix. Next, we cut the trajectory with the principal components by its symmetry axis. Finally, we obtained half-periods and merged them. The accelerometer time series were used to split human steps.

Analytic and Stochastic Methods of Structure Parameter Estimation. Informatica

The paper presents analytic and stochastic methods to estimate covariance matrices of parameters of linear and nonlinear models. The analytic ones are based on gradient descent. The stochastic ones are based on cross-validation.

Combining endogenous and exogenous variables in a special case of non-parametric time series forecasting model. Computational Mathematics and Cybernetics

We aim to improve a non-parametric forecasting algorithm that minimizes the convolution of a histogram of time series with the loss function. We propose to adjust the histogram, using mixtures of conditional histograms as a less sparse alternative to the multidimensional histogram.

Methods for intrinsic plagiarism detection and author diarization. CLEF Working Notes

We developed a plagiarism detection method based on constructing an author-style function from features of text sentences and detecting outliers. Both methods were tested on the PAN-2011 collection for intrinsic plagiarism detection and implemented for the PAN-2016 competition (winner).

2015

Stress test procedure for feature selection algorithms. Chemometrics and Intelligent Laboratory Systems

We propose a stress test procedure for a set of feature selection methods. This procedure generates test data sets with various configurations of the target vector and features. The computational experiment compares Lasso, ElasticNet, LARS, Ridge, Stepwise and Genetic algorithms.

Editorial of the special issue data analysis and intelligent optimization with applications. Machine Learning

In this special issue on “Data Analysis and Intelligent Optimization with Applications” we focus on applications of data analysis and optimization techniques. This special issue collected solutions adapted for real-world problems, leading to massive and large-scale data sets, online and imbalanced data. Our goal for this special issue was to bring together researchers in different areas related to analytics and optimization.

Supervised topic classification for modeling a hierarchical conference structure. Neural Information Processing

This paper investigates the problem of supervised latent modeling for extracting topic hierarchies. The proposed method is used to construct a topic hierarchy over the proceedings of the European Conference on Operational Research and helps to automatize the abstract submission system.

Metric concentration search procedure using reduced matrix of pairwise distances. Intelligent Data Analysis

This paper presents a new fast clustering algorithm RhoNet, based on the metric concentration location procedure. The algorithm uses a reduced matrix of pairwise rank distances to locate the metric concentration. The key feature of the proposed algorithm is that it does not need the exhaustive matrix of pairwise distances. This feature reduces computational complexity. It solves the protein secondary structure recognition problem.

Human activity recognition using quasiperiodic time series collected from a single tri-axial accelerometer. Multimedia Tools and Applications

This paper proposes a method for human physical activity recognition using time series from a tri-axial accelerometer of a smartphone. We use the k-nearest neighbor algorithm and neural network as an alternative to recognizing these activities.

Ordinal classification using Pareto fronts. Expert Systems with Applications

The paper presents an ordinal classification method using Pareto fronts. We describe the class boundaries by the set of Pareto fronts. We propose to predict the object class using the nearest Pareto front boundary. The IUCN Red List species categorization illustrates the method.

2014

Methods of expert estimations concordance for integral quality estimation. Expert Systems with Applications

To rank expert estimations according to measured data, the authors proposed a method that resolved the contradiction between measurements and expert estimations.To rank expert estimations according to measured data, the authors proposed a method that resolved the contradiction between measurements and expert estimations.

Sample size determination for logistic regression. Journal of Computational and Applied Mathematics

The problem of sample size estimation is vital in medical applications, especially in cases of expensive measurements. This paper sets the problem of logistic regression analysis. The authors propose to estimate the sample size using the distance between parameter distribution functions on cross-validated data sets.

2013

[​https://doi.org/10.1016/j.mcm.2011.02.017 Evidence optimization for consequently generated models]. Mathematical and Computer Modelling

We augment the set of measured features with their generated derivatives to construct an adequate regression model. Then we select features from this highly correlated set. A problem of European option volatility modeling illustrates the algorithm.

Integral indicator of ecological impact of the Croatian thermal power plants. Energy

This paper presents the Integral Indicator for the Croatian Thermal Power Plants and the Combined Heat and Power Plants. The features are: generated electricity and heat, consumed coal and liquid fuel, sulfur content in fuel, emitted CO2, SO2, NOx, and particles. The constructed Integral Indicator is compared with several others, such as the Pareto-optimal slicing indicator and the Metric indicator.

Nonlinear regression model generation using hyperparameter optimization. Computers and Mathematics with Applications

An algorithm of inductive model generation and model selection is proposed to solve the problem of the automatic construction of regression models. A regression model is an admissible superposition of smooth functions given by experts. Coherent Bayesian inference is used to estimate model parameters. It introduces hyperparameters that describe the distribution function of the model parameters. The hyperparameters control the model generation process.