Todo list
The todo lists here corresponds to the Course schedule. Each list must be completed before the day of review. It is Wednesday 06:00 am for the 2020 Spring semester.
Contents
- 1 Todo Update: Put project straight
- 2 Todo X: Experiment planning
- 3 Todo B, R: Run basic code and report it
- 4 Todo P: Problem statement
- 5 Todo A: Abstract
- 6 Todo B: Beginner's-talk
- 7 Todo I: Introduction
- 8 Todo L: Literature
- 9 Todo 1: Select your project
- 10 Todo 0: Prepare necessary tools
- 11 Todo -1: Subscribe to the course
- 12 Todo A: Write an abstract
Todo Update: Put project straight
- Check the proper folder structure (example make sure that your paper is not in the Code folder).
- Put the direct link to the paper [bit.ly/m1p_2020 in the table], so that everyone could access it.
- Rename article.tex to Surname2020Title.tex
- Check the both .tex and .pdf files are downloaded.
- Update your personal page on [bit.ly/m1p_2020 Machinelearning.ru].
Todo X: Experiment planning
Todo B, R: Run basic code and report it
Todo P: Problem statement
In the paradigm Idea\(\to\)Formula\(\to\)Code state the problem to find an optimal solution.
- Discuss the problem statement with your adviser.
- See the examples below and in the past projects.
- Discuss terminology and notation see [pdf] and [tex] with notations and a useful style file.
- In the beginning of Problem statement write a general problem description.
- Describe the elements of your problem statement:
- the sample set,
- its origin, or its algebraic structure,
- statistical hypotheses of data generation,
- [conditions of measurements] ,
- [restrictions of the sample set and its values],
- your model in the class of models,
- restrictions on the class of models,
- the error function (and its inference) or a loss function, or a quality criterion,
- cross-validation procedure,
- restrictions to the solutions,
- external (industrial) quality criteria,
- the optimization statement as \(\arg\min\).
- Define the main termini: what is called the model, the solution, the algorithm.
Note that:
- The model is a parametric family of functions to map design space to target space.
- The criterion (error function) is a function to optimize in order to obtain an optimal solution (model parameters, a function).
- The algorithm transforms solution space, usually iteratively.
- The method combines a model, a criterion, and an algorithm to produce a solution.
Check it:
- the regression model,
- the sum of squared errors,
- the Newton-Raphson algorithm,
- the method of least squares.
Resources
- Slides with a plan of Problem statement
- Examples of problem statements
- Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183 article
- Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 article
- Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. article
- Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. article
- Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 draft
- Notations for wiki Ru
- Basic notations, pdf
- Recommended notations, 2019: pdf and .tex with .sty)]
- Simple and useful notations
- Notations for Bayesian model selection, pdf
Todo A: Abstract
- Write a draft of your abstract.
- The abstract shall not exceed 600 characters. It may contain:
- wide-range field of the investigated problem,
- narrow problem to focus on,
- features and conditions of the problem,
- [the novelty],
- application to illustrate with.
- For joint projects it is important that each team-member writes its own text.
Resources
- How to Read a Paper, 2016, S. Keshav
- Examples of rewiev-and-planning drafts LinkReview раз, два.
Todo B: Beginner's-talk
Short 45-second introductory talk. Plan of the talk:
- The project goal. What is the motivation, the goal to reach?
- The main idea. What is the message?
- The expected result. What is your delivery, your impact, novelty?
There is no time to show a slide or draw a plot on the blackboard. It is recommended to rehearse the report.
Todo I: Introduction
The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-arts references. It delivers the main message of the work to the reader. This message shows novelty of this work in comparison to recent results.
- Create a file ProjectN.bib for the group project, or Surname2018Title.bib for your personal project.
- Move from the file LinkReview useful bibliographic records in the BibTeX format.
- Check the correctness of the BibTeX database (styles of authors names, volumes of journals, page numbers).
- Use bibliographic databases to facilitate your work.
- Use the default style \bibliographystyle{plain} before the bibliography section \bibliography{ProjectN}.
- Important! Wikipedia is not the source of information, but it contains many useful sources.
- Important! ArXiv is not a peer-review source of information. Look for the copies of papers that are published in peer-review scientific journals. If after one or two years after its ArXiv version, the pare did not appear in a peer-review journal, be careful to use it: this paper might be non-verified since it was rejected by the other journals.
- Write Introduction. The expected size is one page. The expected plan is:
- the research goal (and its motivations),
- the object of research (introduce main termini),
- the problem (what is the challenge),
- methodology: literature review and state-of-the-art
- the project tasks,
- the proposed solution, its novelty and advantages,
- the profs and cons of recent works,
- goal of the experiment, set up, data sets, workflow.
The goal of this week is comprehend the goal at its whole and write about it.
Resources
- Bibliographic databases
- The Collection of Computer Science Bibliographies
- List of academic databases and search engines in Wikipedia
- Refer to BibTeX in Wikipedia
- An introduction updated after a peer-review.
Todo L: Literature
We use the LinkReview draft format to share our evanescent ephemeral ideas and impressions we have during the literature reading.
- Collect the list of references including:
- state-of-the-art reviews, tutorials,
- fundamental solutions to the problem,
- the basic algorithm to solve your problem,
- alternative algorithms,
- [changes in the research directions],
- data sets and experiments,
- the papers that use these data sets
- applications of the results,
- names of researchers, who solve this problem,
- their students and teams,
- those, who refer to their works.
- Balance the list of the new and well-known works.
- Keep up-to date the list of keywords to search with.
- Continuously fill your LinkReview.
- Plan Introduction (see the next todo list), namely collect:
- keywords as the basic termini; those who brigs good search results are useful,
- what the paper devoted to,
- the investigated problem,
- the central idea,
- literature review,
- the authors' contribution.
Todo 1: Select your project
To select your project:
- Look through the list of projects.
- Find information about the experts and consultants.
- Select your projects in the questionnaire before Wednesday 22:00pm.
- Wait for confirmation.
- Put confirmed topics to the Group table on Machine learning
Todo 0: Prepare necessary tools
- Editing. Install LaTeX: MikTeX for Windown, TeX Live for Linux, and for Mac OS. Sign up V2 OverLeaf ShareLaTeX.
- Install the editor TeXnic Center or its alternative WinEdt for Windows, TeXworks for Linux, and TeXmakerfor Mac OS.
- Read LaTeX on MachineLearning (Ru).
- Useful: Wikibooks LaTeX, К.В.Воронцов. LaTeX2e в примерах.
- Read Львовский С. М. Набор и верстка в системе LaTeX.
- Download the paper template, ZIP and compile it.
- Read BibTeX.
- Install bibliographic collection software JabRef (can be postponed).
- Communications. Sign up GitHub.
- Important: address and login like Name.Surname or Name-Surname (it depends on system conventions) is welcome.
- Introductory sliders on Version Control System.
- Introduction to GitHub.
- The first steps in GitHub.
- Download a shell: Desktop.GitHub, or use a command line to synchronise your project.
- Sign up MachineLearning.ru. Send a logon to your coordinator of to mlalgorithms [at] gmail [dot] com.
- To state a problem (write essay) using notebook see example in MathJax.
- Create your page example.
- Install Hangouts, Skype - read instructions.
- Programming. Install Python Anaconda, PyCharm (alternative Visual Studio), Notebook online Google.Colab.
- Development for ML: PyTorch
- Style formatting: Codestyle pep8
- Add. As alternative install and try Matlab (MIPT provides free version), (alternative Octave), R-project, Wofram Mathematica.
- Add. Read with pleasure Кутателадзе С. С. Советы эпизодическому переводчику and Сосинский А. Б. Как написать математическую статью по-английски.
Resources
- Announcements: Telegram m1p_news
- Ask to email mlalgorithms [at] gmail [dot] com
- Slides.
- Short course description.
References to catch up
- A Brief Introduction to Machine Learning for Engineers by Osvaldo Simeone, 2017-2018
- Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz, Shai Ben-David, 2014
- Mathematics for Machine learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong
- Mathematics for Physicists: Introductory Concepts and Methods by Alexander Altland & Jan von Delf
- Python notes for professionals by GoalKicker.com Free Programming Books.
- Лагутин М.Б. Наглядная математическая статистика, М.: Бином, 2009. См. также вырезку.
- Bishop C.P. Pattern recognition and machine learning, Berlin: Springer, 2008.
- MackKay D. Information Theory, Pattern Recognition and Neural Networks, Inference.org.uk, 2009.
Todo -1: Subscribe to the course
Todo before 06:00 Wednesday, February 12 th:
- pick up a problem from the page Try-on programming problems (get the oldest problems, they are simpler),
- plot one figure to illustrate the problem (plot data or analysis),
- write explanatory comments to the figure (what the reader sees on the figure, what conclusions follow up),
- an example of the figure formatting is here
- upload your notebook to your github repository,
- send the link to this notebook to mlalgorithms [at] gmail [dot] com, with the subject "Application m1p"
- Example of a nice simple problem: bread regression.
- Examples of plots: one many solutions from this project.
- Examples of old problems Problem 7, Problem 1, Problem15.