Todo list
The to-do lists here correspond to the Course schedule. Each list must be completed before the day of review. It is Wednesday 06:00 am for the 2020 Spring semester.
Contents
- 1 Todo T: Theoretical part
- 2 Todo C: Code of the computational experiment
- 3 Todo V: Visualize project
- 4 Todo Update: Put project straight
- 5 Todo X: Experiment planning
- 6 Todo B: Run basic code
- 7 Todo R: Preliminary report
- 8 Todo P: Problem statement
- 9 Todo A: Abstract
- 10 Todo B: Beginner's-talk
- 11 Todo I: Introduction
- 12 Todo L: Literature
- 13 Todo 1: Select your project
- 14 Todo 0: Prepare necessary tools
- 15 Todo -1: Subscribe to the course
Todo T: Theoretical part
The theoretical part describes the proposed solution and declares its properties. The goal is to join the theoretical elements into a method. This method includes hypotheses, models, criteria, and the optimization algorithm.
- Write the solution of your problem
- in a simple outline variant,
- expand necessary details,
- use algorithm LaTeX template.
- Compare notations in the problem statement, solution, and code. Make sure the code does not contradict the text.
Resources
- Collection of plots, assorted [1], version to download slides, PDF
- Neuro-ZOO
- Commercial Project Planning, supplementary to the group game
Todo C: Code of the computational experiment
Organize your code so that the computational experiment runs every time with results stored.
- Set the only main file to run the experiment.
- Decompose the project code, and write functions and modules.
- Gather the experiment parameters in a special-purpose section.
- A text description of the experiment flow helps.
- Set a procedure of historical version points to return to the previous experiment.
- Commit schedule helps.
- Write named plots to a designated folder.
- Write your results to a .tex-file and compile.
- If your experiment run takes a long time, just cut the data set.
- Do not use big or sophisticated data. Put your efforts to illustrate your main message.
Todo V: Visualize project
Set the list of plots that will be included in your paper and presentation.
- Make a plot of the source data.
- Goal: put notations to the plot.
- List plots to illustrate the error analysis.
- Make a plot to show the main message.
Todo Update: Put project straight
- Check the proper folder structure (example make sure that your paper is not in the Code folder):
- docs,
- code,
- data,
- [figs].
- Put the direct link to the paper in the table, so that everyone could access it.
- Rename article.tex to Surname2020Title.tex
- Check the both .tex and .pdf files are downloaded.
- Update your personal page on Machinelearning.ru.
Todo X: Experiment planning
Plan your computational experiment.
- Discuss the experiment goal with your adviser and team.
- Put this goal in the section Computational experiment
- Describe your basic data set, a synthetic, or a simple real one:
- put in the text the title, source, and set up of measurements (it is the technical description, the theoretical one is in the problem statement section),
- write down the number of objects, and features, describe general statistics,
- for a synthetic data set describe the generation model, and its parameters (for example, uniform random independent sampling at some given interval).
- Describe the configuration of the algorithm run.
- Plan the whole experimental part.
- List expected tables and figures:
- make short and long list, for each
- describe axes,
- make a draft with a pencil.
Resources
- The goals of computational experiments А. Грабовой, В. Алексеев, А. Рогозина, И. Игашов, Н. Уваров
- Example of the measurement description, Bishop C.P. Pattern recognition and machine learning, 2006. Pp. 677-683.]
Todo B: Run basic code
Select the basic algorithm and run it using a simple data set.
- Run your basic algorithm:
- select the simplest algorithm (with your adviser) to (partially) solve the problem you set.
- Collect a synthetic data set or download a simple real-word data set of small size.
- Upload your data to the repository (in case the data size exceed 5MB or the data set consists of numerous files, please discuss with your adviser and team).
- Run the basic algorithm on the synthetic data set, estimate the error.
- Describe the basic algorithm, analyst its features, list competitive models.
Resourses
- Бахтеев О.Ю. Системы и средства глубокого обучения, статья
- Мотренко А.П. Повышение качества классификации, статья
- Исаченко Р.В. Снижение размерности в задаче декодирования, статья
- Построение выборки в задачах прогнозирования, слайды
- The IDEF standard for project planning
Todo R: Preliminary report
- Make sure that the obtained results are in no contradiction with the goals of the computational experiment.
- Illustrate the obtained results with the preliminary plot see the format.
- Write a mini-report on the results with
- a short description of the figure: what the reader could see, what are the consequences,
- the results in numbers and comments on it,
- put the report to the section computational experiment.
Todo P: Problem statement
In the paradigm Idea\(\to\)Formula\(\to\)Code state the problem to find an optimal solution.
- Discuss the problem statement with your adviser.
- See the examples below and in the past projects.
- Discuss terminology and notation see [pdf] and [tex] with notations and a useful style file.
- At the beginning of the Problem statement write a general problem description.
- Describe the elements of your problem statement:
- the sample set,
- its origin, or its algebraic structure,
- statistical hypotheses of data generation,
- [conditions of measurements] ,
- [restrictions of the sample set and its values],
- your model in the class of models,
- restrictions on the class of models,
- the error function (and its inference) or a loss function, or a quality criterion,
- cross-validation procedure,
- restrictions to the solutions,
- external (industrial) quality criteria,
- the optimization statement as \(\arg\min\).
- Define the main termini: what is called the model, the solution, and the algorithm.
Note that:
- The model is a parametric family of functions to map design space to target space.
- The criterion (error function) is a function to optimize in order to obtain an optimal solution (model parameters, a function).
- The algorithm transforms solution space, usually iteratively.
- The method combines a model, a criterion, and an algorithm to produce a solution.
Check it:
- the regression model,
- the sum of squared errors,
- the Newton-Raphson algorithm,
- the method of least squares.
Resources
- Slides with a plan of Problem statement
- Examples of problem statements
- Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183 article
- Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 article
- Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. article
- Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. article
- Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 draft
- Notations for wiki Ru
- Basic notations, pdf
- Recommended notations, 2019: pdf and .tex with .sty)]
- Simple and useful notations
- Notations for Bayesian model selection, pdf
Todo A: Abstract
- Write a draft of your abstract.
- The abstract shall not exceed 600 characters. It may contain:
- wide-range field of the investigated problem,
- narrow problem to focus on,
- features and conditions of the problem,
- [the novelty],
- application to illustrate with.
- For joint projects it is important that each team member writes its own text.
Resources
- How to Read a Paper, 2016, S. Keshav
- Examples of rewiev-and-planning drafts LinkReview раз, два.
Todo B: Beginner's-talk
Short 45-second introductory talk. Plan of the talk:
- The project goal. What is the motivation, the goal to reach?
- The main idea. What is the message?
- The expected result. What is your delivery, your impact, novelty?
There is no time to show a slide or draw a plot on the blackboard. It is recommended to rehearse the report.
Todo I: Introduction
The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-art references. It delivers the main message of the work to the reader. This message shows the novelty of this work in comparison to recent results.
- Create a file ProjectN.bib for the group project, or Surname2018Title.bib for your personal project.
- Move from the file LinkReview useful bibliographic records in the BibTeX format.
- Check the correctness of the BibTeX database (styles of authors' names, volumes of journals, page numbers).
- Use bibliographic databases to facilitate your work.
- Use the default style \bibliographystyle{plain} before the bibliography section \bibliography{ProjectN}.
- Important! Wikipedia is not a source of information, but it contains many useful sources.
- Important! ArXiv is not a peer-reviewed source of information. Look for copies of papers that are published in peer-reviewed scientific journals. If after one or two years after its ArXiv version, the paper did not appear in a peer-reviewed journal, be careful to use it: this paper might be non-verified since it was rejected by the other journals.
- Write Introduction. The expected size is one page. The expected plan is:
- the research goal (and its motivations),
- the object of research (introduce main termini),
- the problem (what is the challenge),
- methodology: literature review and state-of-the-art
- the project tasks,
- the proposed solution, its novelty and advantages,
- the profs and cons of recent works,
- goal of the experiment, set up, data sets, workflow.
The goal of this week is to comprehend the goal at its whole and write about it.
Resources
- Bibliographic databases
- The Collection of Computer Science Bibliographies
- List of academic databases and search engines in Wikipedia
- Refer to BibTeX in Wikipedia
- An introduction updated after a peer-review.
Todo L: Literature
We use the LinkReview draft format to share our evanescent ephemeral ideas and impressions we have during the literature reading.
- Collect the list of references including:
- state-of-the-art reviews, tutorials,
- fundamental solutions to the problem,
- the basic algorithm to solve your problem,
- alternative algorithms,
- [changes in the research directions],
- data sets and experiments,
- the papers that use these data sets
- applications of the results,
- names of researchers, who solve this problem,
- their students and teams,
- those, who refer to their works.
- Balance the list of the new and well-known works.
- Keep up-to-date the list of keywords to search with.
- Continuously fill your LinkReview.
- Plan Introduction (see the next todo list), namely collect
- keywords as the basic termini; those who bring good search results are useful,
- what the paper devoted to,
- the investigated problem,
- the central idea,
- literature review,
- the authors' contribution.
Todo 1: Select your project
To select your project:
- Look through the list of projects.
- Find information about the experts and consultants.
- Select your projects in the questionnaire before Wednesday 22:00pm.
- Wait for confirmation.
- Put confirmed topics to the Group table on Machine learning
Todo 0: Prepare necessary tools
- Editing. Install LaTeX: MikTeX for Windows, TeX Live for Linux, and for Mac OS. Sign up V2 OverLeaf ShareLaTeX.
- Install the editor TeXnic Center or its alternative WinEdt for Windows, TeXworks for Linux, and TeXmakerfor Mac OS.
- Read LaTeX on MachineLearning (Ru).
- Useful: Wikibooks LaTeX, К.В.Воронцов. LaTeX2e в примерах.
- Read Львовский С. М. Набор и верстка в системе LaTeX.
- Download the paper template, ZIP and compile it.
- Read BibTeX.
- Install bibliographic collection software JabRef (can be postponed).
- Communications. Sign up GitHub.
- Important: address and login like Name. Surname or Name-Surname (it depends on system conventions) is welcome.
- Introductory sliders on Version Control System.
- Introduction to GitHub.
- The first steps in GitHub.
- Download a shell: Desktop.GitHub, or use a command line to synchronize your project.
- Sign up MachineLearning.ru. Send a logon to your coordinator of mlalgorithms [at] gmail [dot] com.
- To state a problem (write an essay) using notebook see example in MathJax.
- Create your page example.
- Install Hangouts, Skype - read instructions.
- Programming. Install Python Anaconda, PyCharm (alternative Visual Studio), Notebook online Google.Colab.
- Development for ML: PyTorch
- Style formatting: Codestyle pep8
- Add. As alternative install and try Matlab (MIPT provides free version), (alternative Octave), R-project, Wofram Mathematica.
- Add. Read with pleasure Кутателадзе С. С. Советы эпизодическому переводчику and Сосинский А. Б. Как написать математическую статью по-английски.
Resources
- Announcements: Telegram m1p_news
- Ask to email mlalgorithms [at] gmail [dot] com
- Slides.
- Short course description.
References to catch up
- A Brief Introduction to Machine Learning for Engineers by Osvaldo Simeone, 2017-2018
- Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz, Shai Ben-David, 2014
- Mathematics for Machine learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong
- Mathematics for Physicists: Introductory Concepts and Methods by Alexander Altland & Jan von Delf
- Python notes for professionals by GoalKicker.com Free Programming Books.
- Лагутин М.Б. Наглядная математическая статистика, М.: Бином, 2009. См. также вырезку.
- Bishop C.P. Pattern recognition and machine learning, Berlin: Springer, 2008.
- MackKay D. Information Theory, Pattern Recognition and Neural Networks, Inference.org.uk, 2009.
Todo -1: Subscribe to the course
Todo before 06:00 Wednesday, February 12 th:
- pick up a problem from the page Try-on programming problems (get the oldest problems, they are simpler),
- plot one figure to illustrate the problem (plot data or analysis),
- write explanatory comments to the figure (what the reader sees on the figure, what conclusions follow up),
- an example of the figure formatting is here
- upload your notebook to your github repository,
- send the link to this notebook to mlalgorithms [at] gmail [dot] com, with the subject "Application m1p"