Todo list

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The todo lists here corresponds to the Course schedule. Each list must be completed before the day of review. It is Wednesday 06:00 am for the 2020 Spring semester.

Todo T: Theoretical part

The theoretical part describes the proposed solution and declares its properties. The goal is to join the theoretical elements into a method. This method includes hypotheses, model, criterion and the optimization algorithm.

1. Write the solution of your problem
• in a simple outline variant,
• expand necessary details,
• use algorithm LaTeX template.
2. Compare notations in the problem statement, solution and code. Make sure the code does not contradict the text.

Resources

Todo C: Code of the computational experiment

Organize your code so that the computational experiment runs every time with results stored.

1. Set the only main file to run the experiment.
2. Decompose the project code, write functions and modules.
3. Gather the experiment parameters in a special-purpose section.
• A text description of the experiment flow helps.
4. Set a procedure of historical version points to return to the previous experiment.
• Commit schedule helps.
5. Write named plots to a designated folder.
• Write your results to a .tex-file and compile.
• If your experiment run takes long time, just cut the data set.
• Do not use big or sophisticated data. Put your efforts to illustrate your main message.

Todo V: Visualize project

Set the list of plots that will be included in your paper and presentation.

1. Make a plot of the source data.
• Goal: put notations to the plot.
2. List plots to illustrate the error analysis.
3. Make a plot to show the main message.

Todo Update: Put project straight

1. Check the proper folder structure (example make sure that your paper is not in the Code folder):
• docs,
• code,
• data,
• [figs].
2. Put the direct link to the paper in the table, so that everyone could access it.
3. Rename article.tex to Surname2020Title.tex
5. Update your personal page on Machinelearning.ru.

Todo X: Experiment planning

• Put this goal in the section Computational experiment
2. Describe your basic data set, a synthetic, or a simple real one:
• put in the text the title, source and set up of measurements (it is the technical description, the theoretical one is in the problem statement section),
• write down the number of objects, features, describe general statistics,
• for a synthetic data set describe the generation model, its parameters (for example, uniform random independent sampling some given interval).
3. Describe the configuration of algorithm run.
4. Plan the whole experimental part.
5. List expected tables and figures:
• make short and long list, for each
• describe axes,
• make a draft with a pencil.

Resources

Todo B: Run basic code

Select the basic algorithm and run it using a simple data set.

• select a simplest algorithm (with your adviser) to (partially) solve the problem you set.
2. Collect a synthetic data set or download a simple real-word data set of small size.
3. Upload your data to the repository (in case the data size exceed 5MB or the data set consists of numerous files, please discuss with your adviser and team).
4. Run the basic algorithm on the synthetic data set, estimate the error.
5. Describe the basic algorithm, analyst its features, list competitive models.

Resourses

• Бахтеев О.Ю. Системы и средства глубокого обучения, статья
• Мотренко А.П. Повышение качества классификации, статья
• Исаченко Р.В. Снижение размерности в задаче декодирования, статья
• Построение выборки в задачах прогнозирования, слайды
• The IDEF standard for project planning

Todo R: Preliminary report

1. Make sure that the obtained results are in not contradiction with the goals of the computational experiment.
2. Illustrate the obtained results with the preliminary plot see the format.
3. Write a mini-report on the results with
1. a short description of the figure: what the reader could see, what are the consequences,
2. the results in numbers and comments on it,
3. put the report to the section computational experiment.

Todo P: Problem statement

In the paradigm Idea$$\to$$Formula$$\to$$Code state the problem to find an optimal solution.

2. See the examples below and in the past projects.
3. Discuss terminology and notation see [pdf] and [tex] with notations and a useful style file.
4. In the beginning of Problem statement write a general problem description.
5. Describe the elements of your problem statement:
1. the sample set,
2. its origin, or its algebraic structure,
3. statistical hypotheses of data generation,
4. [conditions of measurements] ,
5. [restrictions of the sample set and its values],
6. your model in the class of models,
7. restrictions on the class of models,
8. the error function (and its inference) or a loss function, or a quality criterion,
9. cross-validation procedure,
10. restrictions to the solutions,
11. external (industrial) quality criteria,
12. the optimization statement as $$\arg\min$$.
6. Define the main termini: what is called the model, the solution, the algorithm.

Note that:

• The model is a parametric family of functions to map design space to target space.
• The criterion (error function) is a function to optimize in order to obtain an optimal solution (model parameters, a function).
• The algorithm transforms solution space, usually iteratively.
• The method combines a model, a criterion, and an algorithm to produce a solution.

Check it:

• the regression model,
• the sum of squared errors,
• the Newton-Raphson algorithm,
• the method of least squares.

Resources

• Slides with a plan of Problem statement
• Examples of problem statements
1. Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183 article
2. Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 article
3. Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. article
4. Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. article
5. Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 draft
• Notations for wiki Ru
• Basic notations, pdf
• Recommended notations, 2019: pdf and .tex with .sty)]
• Simple and useful notations
• Notations for Bayesian model selection, pdf

Todo A: Abstract

1. Write a draft of your abstract.
• The abstract shall not exceed 600 characters. It may contain:
• wide-range field of the investigated problem,
• narrow problem to focus on,
• features and conditions of the problem,
• [the novelty],
• application to illustrate with.
• For joint projects it is important that each team-member writes its own text.

Resources

Todo B: Beginner's-talk

Short 45-second introductory talk. Plan of the talk:

1. The project goal. What is the motivation, the goal to reach?
2. The main idea. What is the message?

There is no time to show a slide or draw a plot on the blackboard. It is recommended to rehearse the report.

Todo I: Introduction

The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-arts references. It delivers the main message of the work to the reader. This message shows novelty of this work in comparison to recent results.

1. Create a file ProjectN.bib for the group project, or Surname2018Title.bib for your personal project.
2. Move from the file LinkReview useful bibliographic records in the BibTeX format.
• Check the correctness of the BibTeX database (styles of authors names, volumes of journals, page numbers).
• Use bibliographic databases to facilitate your work.
• Use the default style \bibliographystyle{plain} before the bibliography section \bibliography{ProjectN}.
• Important! Wikipedia is not the source of information, but it contains many useful sources.
• Important! ArXiv is not a peer-review source of information. Look for the copies of papers that are published in peer-review scientific journals. If after one or two years after its ArXiv version, the pare did not appear in a peer-review journal, be careful to use it: this paper might be non-verified since it was rejected by the other journals.
3. Write Introduction. The expected size is one page. The expected plan is:
1. the research goal (and its motivations),
2. the object of research (introduce main termini),
3. the problem (what is the challenge),
4. methodology: literature review and state-of-the-art
6. the proposed solution, its novelty and advantages,
7. the profs and cons of recent works,
8. goal of the experiment, set up, data sets, workflow.

The goal of this week is comprehend the goal at its whole and write about it.

Resources

Todo L: Literature

We use the LinkReview draft format to share our evanescent ephemeral ideas and impressions we have during the literature reading.

1. Collect the list of references including:
1. state-of-the-art reviews, tutorials,
2. fundamental solutions to the problem,
3. the basic algorithm to solve your problem,
4. alternative algorithms,
5. [changes in the research directions],
6. data sets and experiments,
7. the papers that use these data sets
8. applications of the results,
9. names of researchers, who solve this problem,
10. their students and teams,
11. those, who refer to their works.
2. Balance the list of the new and well-known works.
3. Keep up-to date the list of keywords to search with.
5. Plan Introduction (see the next todo list), namely collect:
• keywords as the basic termini; those who brigs good search results are useful,
• what the paper devoted to,
• the investigated problem,
• the central idea,
• literature review,
• the authors' contribution.

1. Look through the list of projects.
2. Find information about the experts and consultants.
3. Select your projects in the questionnaire before Wednesday 22:00pm.
4. Wait for confirmation.
5. Put confirmed topics to the Group table on Machine learning

Todo 0: Prepare necessary tools

1. Editing. Install LaTeX: MikTeX for Windown, TeX Live for Linux, and for Mac OS. Sign up V2 OverLeaf ShareLaTeX.
2. Install the editor TeXnic Center or its alternative WinEdt for Windows, TeXworks for Linux, and TeXmakerfor Mac OS.
5. Install bibliographic collection software JabRef (can be postponed).
• Important: address and login like Name.Surname or Name-Surname (it depends on system conventions) is welcome.
• Introductory sliders on Version Control System.
• Introduction to GitHub.
• The first steps in GitHub.
8. Sign up MachineLearning.ru. Send a logon to your coordinator of to mlalgorithms [at] gmail [dot] com.
9. To state a problem (write essay) using notebook see example in MathJax.
10. Install Hangouts, Skype - read instructions.
11. Programming. Install Python Anaconda, PyCharm (alternative Visual Studio), Notebook online Google.Colab.
• Development for ML: PyTorch
• Style formatting: Codestyle pep8
12. Add. As alternative install and try Matlab (MIPT provides free version), (alternative Octave), R-project, Wofram Mathematica.
13. Add. Read with pleasure Кутателадзе С. С. Советы эпизодическому переводчику and Сосинский А. Б. Как написать математическую статью по-английски.

Resources

References to catch up

Todo -1: Subscribe to the course

Todo before 06:00 Wednesday, February 12 th:

1. pick up a problem from the page Try-on programming problems (get the oldest problems, they are simpler),
2. plot one figure to illustrate the problem (plot data or analysis),
3. write explanatory comments to the figure (what the reader sees on the figure, what conclusions follow up),
4. an example of the figure formatting is here