# Week 3

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The goal is to understand the type of problem to state and solve.

## P: Problem statement

In the paradigm Idea$$\to$$Formula$$\to$$Code state the problem to find an optimal solution.

2. See the examples below and in the past projects.
3. Discuss terminology and notation see [pdf] and [tex] with notations and a useful style file.
4. In the beginning of Problem statement write a general problem description.
5. Describe the elements of your problem statement:
1. the sample set,
2. its origin, or its algebraic structure,
3. statistical hypotheses of data generation,
4. [conditions of measurements] ,
5. [restrictions of the sample set and its values],
6. your model in the class of models,
7. restrictions on the class of models,
8. the error function (and its inference) or a loss function, or a quality criterion,
9. cross-validation procedure,
10. restrictions to the solutions,
11. external (industrial) quality criteria,
12. the optimization statement as $$\arg\min$$.
6. Define the main termini: what is called the model, the solution, the algorithm.

Note that:

• The model is a parametric family of functions to map design space to target space.
• The criterion (error function) is a function to optimize in order to obtain an optimal solution (model parameters, a function).
• The algorithm transforms solution space, usually iteratively.
• The method combines a model, a criterion, and an algorithm to produce a solution.

Check it:

• the regression model,
• the sum of squared errors,
• the Newton-Raphson algorithm,
• the method of least squares.

## Resources

• Slides for week 3.
• Video for week 3.
• Recommended notations: pdf and .tex with .sty
• Slides with a plan of Problem statement
• Examples of problem statements
1. Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183 article
2. Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 article
3. Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. article
4. Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. article
5. Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 draft
• Notations for wiki Ru
• Basic notations, pdf
• Simple and useful notations
• Notations for Bayesian model selection, pdf