# Todo list

The todo lists here corresponds to the Course schedule. Each list must be completed before the day of review. It is Wednesday 06:00 am for the 2020 Spring semester.

## Contents

- 1 Todo T: Theoretical part
- 2 Todo C: Code of the computational experiment
- 3 Todo V: Visualize project
- 4 Todo Update: Put project straight
- 5 Todo X: Experiment planning
- 6 Todo B: Run basic code
- 7 Todo R: Preliminary report
- 8 Todo P: Problem statement
- 9 Todo A: Abstract
- 10 Todo B: Beginner's-talk
- 11 Todo I: Introduction
- 12 Todo L: Literature
- 13 Todo 1: Select your project
- 14 Todo 0: Prepare necessary tools
- 15 Todo -1: Subscribe to the course

## Todo T: Theoretical part

The theoretical part describes the proposed solution and declares its properties.
The goal is to join the theoretical elements into a **method**. This method includes hypotheses, model, criterion and the optimization algorithm.

- Write the solution of your problem
- in a simple outline variant,
- expand necessary details,
- use algorithm LaTeX template.

- Compare notations in the problem statement, solution and code. Make sure the code does not contradict the text.

**Resources**

- Collection of plots, assorted [1], vertion to download slides, PDF
- Neuro-ZOO
- Commercial Project Planning, supplementary to the group game

## Todo C: Code of the computational experiment

Organize your code so that the computational experiment runs every time with results stored.

- Set the only main file to run the experiment.
- Decompose the project code, write functions and modules.
- Gather the experiment parameters in a special-purpose section.
- A text description of the experiment flow helps.

- Set a procedure of historical version points to return to the previous experiment.
- Commit schedule helps.

- Write named plots to a designated folder.
- Write your results to a .tex-file and compile.

**If your experiment run takes long time, just cut the data set.***Do not use big or sophisticated data. Put your efforts to illustrate your main message.*

## Todo V: Visualize project

Set the list of plots that will be included in your paper and presentation.

- Make a plot of the source data.
**Goal:**put notations to the plot.

- List plots to illustrate the error analysis.
- Make a plot to show the main message.

## Todo Update: Put project straight

- Check the proper folder structure (example make sure that your paper is not in the Code folder):
- docs,
- code,
- data,
- [figs].

- Put the direct link to the paper in the table, so that everyone could access it.
- Rename article.tex to Surname2020Title.tex
- Check the both .tex and .pdf files are downloaded.
- Update your personal page on Machinelearning.ru.

## Todo X: Experiment planning

Plan your computational experiment.

- Discuss the experiment goal with your adviser and team.
- Put this goal in the section Computational experiment

- Describe your basic data set, a synthetic, or a simple real one:
- put in the text the title, source and set up of measurements (it is the technical description, the theoretical one is in the problem statement section),
- write down the number of objects, features, describe general statistics,
- for a synthetic data set describe the generation model, its parameters (for example, uniform random independent sampling some given interval).

- Describe the configuration of algorithm run.
- Plan the whole experimental part.
- List expected tables and figures:
- make short and long list, for each
- describe axes,
- make a draft with a pencil.

**Resources**

- The goals of computational experiments А. Грабовой, В. Алексеев, А. Рогозина, И. Игашов, Н. Уваров
- Example of the measurement description, Bishop C.P. Pattern recognition and machine learning, 2006. Pp. 677-683.]

## Todo B: Run basic code

Select the basic algorithm and run it using a simple data set.

- Run your basic algorithm:
- select a simplest algorithm (with your adviser) to (partially) solve the problem you set.

- Collect a synthetic data set or download a simple real-word data set of small size.
- Upload your data to the repository (in case the data size exceed 5MB or the data set consists of numerous files, please discuss with your adviser and team).
- Run the basic algorithm on the synthetic data set, estimate the error.
- Describe the basic algorithm, analyst its features, list competitive models.

**Resourses**

- Бахтеев О.Ю. Системы и средства глубокого обучения, статья
- Мотренко А.П. Повышение качества классификации, статья
- Исаченко Р.В. Снижение размерности в задаче декодирования, статья
- Построение выборки в задачах прогнозирования, слайды
- The IDEF standard for project planning

## Todo R: Preliminary report

- Make sure that the obtained results are in not contradiction with the goals of the computational experiment.
- Illustrate the obtained results with the preliminary plot see the format.
- Write a mini-report on the results with
- a short description of the figure: what the reader could see, what are the consequences,
- the results in numbers and comments on it,
- put the report to the section computational experiment.

## Todo P: Problem statement

In the paradigm Idea\(\to\)Formula\(\to\)Code state the problem to find an optimal solution.

- Discuss the problem statement with your adviser.
- See the examples below and in the past projects.
- Discuss terminology and notation see [pdf] and [tex] with notations and a useful style file.
- In the beginning of Problem statement write a general problem description.
- Describe the elements of your problem statement:
- the sample set,
- its origin, or its algebraic structure,
- statistical hypotheses of data generation,
- [conditions of measurements] ,
- [restrictions of the sample set and its values],
- your model in the class of models,
- restrictions on the class of models,
- the error function (and its inference) or a loss function, or a quality criterion,
- cross-validation procedure,
- restrictions to the solutions,
- external (industrial) quality criteria,
- the optimization statement as \(\arg\min\).

- Define the main termini: what is called the model, the solution, the algorithm.

Note that:

- The
**model**is a parametric family of functions to map design space to target space. - The
**criterion**(error function) is a function to optimize in order to obtain an optimal solution (model parameters, a function). - The
**algorithm**transforms solution space, usually iteratively. - The
**method**combines a model, a criterion, and an algorithm to produce a solution.

Check it:

- the regression
*model*, - the sum of squared
*errors*, - the Newton-Raphson
*algorithm*, - the
*method*of least squares.

**Resources**

- Slides with a plan of Problem statement
- Examples of problem statements
- Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183 article
- Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 article
- Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. article
- Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. article
- Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 draft

- Notations for wiki Ru
- Basic notations, pdf
- Recommended notations, 2019: pdf and .tex with .sty)]
- Simple and useful notations
- Notations for Bayesian model selection, pdf

## Todo A: Abstract

- Write a
**draft**of your abstract.

- The abstract shall not exceed 600 characters. It may contain:
- wide-range field of the investigated problem,
- narrow problem to focus on,
- features and conditions of the problem,
- [the novelty],
- application to illustrate with.

- For joint projects it is important that each team-member writes its own text.

**Resources**

- How to Read a Paper, 2016, S. Keshav
- Examples of rewiev-and-planning drafts LinkReview раз, два.

## Todo B: Beginner's-talk

Short 45-second introductory talk. Plan of the talk:

- The project goal. What is the motivation, the goal to reach?
- The main idea. What is the message?
- The expected result. What is your delivery, your impact, novelty?

There is no time to show a slide or draw a plot on the blackboard. It is recommended to rehearse the report.

## Todo I: Introduction

The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-arts references. It delivers the main message of the work to the reader. This message shows novelty of this work in comparison to recent results.

- Create a file
*ProjectN.bib*for the group project, or*Surname2018Title.bib*for your personal project. - Move from the file
*LinkReview*useful bibliographic records in the BibTeX format.- Check the correctness of the BibTeX database (styles of authors names, volumes of journals, page numbers).
- Use bibliographic databases to facilitate your work.
- Use the default style
*\bibliographystyle{plain}*before the bibliography section*\bibliography{ProjectN}*. - Important! Wikipedia is not the source of information, but it contains many useful sources.
- Important! ArXiv is not a peer-review source of information. Look for the copies of papers that are published in peer-review scientific journals. If after one or two years after its ArXiv version, the pare did not appear in a peer-review journal, be careful to use it: this paper might be non-verified since it was rejected by the other journals.

- Write Introduction. The expected size is one page. The expected plan is:
- the research goal (and its motivations),
- the object of research (introduce main termini),
- the problem (what is the challenge),
- methodology: literature review and state-of-the-art
- the project tasks,
- the proposed solution, its novelty and advantages,
- the profs and cons of recent works,
- goal of the experiment, set up, data sets, workflow.

**The goal of this week** is comprehend the goal at its whole and write about it.

**Resources**

- Bibliographic databases
- The Collection of Computer Science Bibliographies
- List of academic databases and search engines in Wikipedia
- Refer to BibTeX in Wikipedia
- An introduction updated after a peer-review.

## Todo L: Literature

We use the LinkReview draft format to share our evanescent ephemeral ideas and impressions we have during the literature reading.

- Collect the list of references including:
- state-of-the-art reviews, tutorials,
- fundamental solutions to the problem,
- the basic algorithm to solve your problem,
- alternative algorithms,
- [changes in the research directions],
- data sets and experiments,
- the papers that use these data sets
- applications of the results,
- names of researchers, who solve this problem,
- their students and teams,
- those, who refer to their works.

- Balance the list of the new and well-known works.
- Keep up-to date the list of keywords to search with.
- Continuously fill your LinkReview.
- Plan Introduction (see the next todo list), namely collect:
- keywords as the basic termini; those who brigs good search results are useful,
- what the paper devoted to,
- the investigated problem,
- the central idea,
- literature review,
- the authors' contribution.

## Todo 1: Select your project

To select your project:

- Look through the list of projects.
- Find information about the experts and consultants.
- Select your projects in the questionnaire
**before Wednesday 22:00pm**. - Wait for confirmation.
- Put confirmed topics to the Group table on Machine learning

## Todo 0: Prepare necessary tools

**Editing**. Install LaTeX: MikTeX for Windown, TeX Live for Linux, and for Mac OS. Sign up V2 OverLeaf ShareLaTeX.- Install the editor TeXnic Center or its alternative WinEdt for Windows, TeXworks for Linux, and TeXmakerfor Mac OS.
- Read LaTeX on MachineLearning (Ru).
- Useful: Wikibooks LaTeX, К.В.Воронцов. LaTeX2e в примерах.
- Read
*Львовский С. М.*Набор и верстка в системе LaTeX.

- Download the paper template, ZIP and compile it.
- Read BibTeX.
- Install bibliographic collection software JabRef (can be postponed).
**Communications**. Sign up GitHub.- Important: address and login like Name.Surname or Name-Surname (it depends on system conventions) is welcome.
- Introductory sliders on Version Control System.
- Introduction to GitHub.
- The first steps in GitHub.

- Download a shell: Desktop.GitHub, or use a command line to synchronise your project.
- Sign up MachineLearning.ru. Send a logon to your coordinator of to mlalgorithms [at] gmail [dot] com.
- To state a problem (write essay) using notebook see example in MathJax.
- Create your page example.

- Install Hangouts, Skype - read instructions.
**Programming**. Install Python Anaconda, PyCharm (alternative Visual Studio), Notebook online Google.Colab.- Development for ML: PyTorch
- Style formatting: Codestyle pep8

**Add.**As alternative install and try Matlab (MIPT provides free version), (alternative Octave), R-project, Wofram Mathematica.**Add.**Read with pleasure Кутателадзе С. С. Советы эпизодическому переводчику and Сосинский А. Б. Как написать математическую статью по-английски.

**Resources**

- Announcements: Telegram m1p_news
- Ask to email mlalgorithms [at] gmail [dot] com
- Slides.
- Short course description.

**References to catch up**

- A Brief Introduction to Machine Learning for Engineers by Osvaldo Simeone, 2017-2018
- Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz, Shai Ben-David, 2014
- Mathematics for Machine learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong
- Mathematics for Physicists: Introductory Concepts and Methods by Alexander Altland & Jan von Delf
- Python notes for professionals by GoalKicker.com Free Programming Books.
- Лагутин М.Б. Наглядная математическая статистика, М.: Бином, 2009. См. также вырезку.
- Bishop C.P. Pattern recognition and machine learning, Berlin: Springer, 2008.
- MackKay D. Information Theory, Pattern Recognition and Neural Networks, Inference.org.uk, 2009.

## Todo -1: Subscribe to the course

**Todo before 06:00 Wednesday, February 12 th:**

- pick up a problem from the page Try-on programming problems (get the oldest problems, they are simpler),
- plot one figure to illustrate the problem (plot data or analysis),
- write explanatory comments to the figure (what the reader sees on the figure, what conclusions follow up),
- an example of the figure formatting is here
- upload your notebook to your github repository,
- send the link to this notebook to mlalgorithms [at] gmail [dot] com, with the subject "Application m1p"