Difference between revisions of "Todo list"
(21 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | The | + | {{#seo: |
+ | |title=Course My first scientific article: To-do list | ||
+ | |titlemode=replace | ||
+ | |keywords=My first scientific article | ||
+ | |description=Course My first scientific article: The to-do lists here correspond to the Course Schedule. Each list must be completed before the day of review. | ||
+ | }} | ||
+ | |||
+ | The to-do lists here correspond to the [[Course schedule]]. Each list must be completed before the day of review. It is Wednesday 06:00 am for the 2020 Spring semester. | ||
+ | |||
+ | <!-- == Todo E: Error analysis == | ||
+ | ([http://www.machinelearning.ru/wiki/index.php?title=M1#.D0.94.D0.BE.D0.BC.D0.B0.D1.88.D0.BD.D0.B5.D0.B5_.D0.B7.D0.B0.D0.B4.D0.B0.D0.BD.D0.B8.D0.B5-E:_.D0.B0.D0.BD.D0.B0.D0.BB.D0.B8.D0.B7_.D0.BE.D1.88.D0.B8.D0.B1.D0.BA.D0.B8 Rus]) | ||
+ | --> | ||
+ | == Todo T: Theoretical part == | ||
+ | The theoretical part describes the proposed solution and declares its properties. | ||
+ | The goal is to join the theoretical elements into a '''method'''. This method includes hypotheses, models, criteria, and the optimization algorithm. | ||
+ | # Write the solution of your problem | ||
+ | #* in a simple outline variant, | ||
+ | #* expand necessary details, | ||
+ | #* use algorithm LaTeX template. | ||
+ | # Compare notations in the problem statement, solution, and code. Make sure the code does not contradict the text. | ||
+ | |||
+ | '''Resources''' | ||
+ | * Collection of plots, assorted [https://sourceforge.net/p/mvr/code/HEAD/tree/lectures/MachineLearningResearch/ComputationalExperiment/fig_compilation_slides.pdf?format=raw], version to download [http://www.machinelearning.ru/wiki/images/2/25/Fig_compilation_slides_stable.pdf slides, PDF] | ||
+ | * [http://www.machinelearning.ru/wiki/images/2/24/Zharikov2017Presentation.pdf Neuro-ZOO] | ||
+ | * [http://www.machinelearning.ru/wiki/images/d/d0/Strijov2020CommercialProjectPlanning.pdf Commercial Project Planning, supplementary to the group game] | ||
+ | |||
+ | == Todo C: Code of the computational experiment == | ||
+ | Organize your code so that the computational experiment runs every time with results stored. | ||
+ | # Set the only main file to run the experiment. | ||
+ | # Decompose the project code, and write functions and modules. | ||
+ | # Gather the experiment parameters in a special-purpose section. | ||
+ | #* A text description of the experiment flow helps. | ||
+ | # Set a procedure of historical version points to return to the previous experiment. | ||
+ | #* Commit schedule helps. | ||
+ | # Write named plots to a designated folder. | ||
+ | #* Write your results to a .tex-file and compile. | ||
+ | * '''If your experiment run takes a long time, just cut the data set.''' | ||
+ | ** ''Do not use big or sophisticated data. Put your efforts to illustrate your main message.'' | ||
+ | |||
+ | == Todo V: Visualize project == | ||
+ | Set the list of plots that will be included in your paper and presentation. | ||
+ | # Make a plot of the source data. | ||
+ | #* '''Goal:''' put notations to the plot. | ||
+ | # List plots to illustrate the error analysis. | ||
+ | # Make a plot to show the main message. | ||
+ | |||
+ | == Todo Update: Put project straight == | ||
+ | # Check the proper folder structure (example make sure that your paper is not in the Code folder): | ||
+ | #* docs, | ||
+ | #* code, | ||
+ | #* data, | ||
+ | #* [figs]. | ||
+ | # Put the direct link to the paper [http://bit.ly/m1p_2020 in the table], so that everyone could access it. | ||
+ | # Rename article.tex to Surname2020Title.tex | ||
+ | # Check the both .tex and .pdf files are downloaded. | ||
+ | <!-- # Fill in the readme.md file in the github project (together with the necessary links)--> | ||
+ | # Update [http://www.machinelearning.ru/wiki/index.php?title=%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:Vmarkin your personal page] on [http://bit.ly/m1p_2020 Machinelearning.ru]. | ||
+ | |||
+ | == Todo X: Experiment planning == | ||
+ | Plan your computational experiment. | ||
+ | # Discuss the experiment goal with your adviser and team. | ||
+ | #* Put this goal in the section Computational experiment | ||
+ | # Describe your basic data set, a synthetic, or a simple real one: | ||
+ | #* put in the text the title, source, and set up of measurements (it is the technical description, the theoretical one is in the problem statement section), | ||
+ | #* write down the number of objects, and features, describe general statistics, | ||
+ | #* for a synthetic data set describe the generation model, and its parameters (for example, uniform random independent sampling at some given interval). | ||
+ | # Describe the configuration of the algorithm run. | ||
+ | # Plan the whole experimental part. | ||
+ | # List expected tables and figures: | ||
+ | #* make short and long list, for each | ||
+ | #* describe axes, | ||
+ | #* make a draft with a pencil. | ||
+ | |||
+ | '''Resources''' | ||
+ | * The goals of computational experiments [http://svn.code.sf.net/p/mlalgorithms/code/Group574/Grabovoy2018OptimalBrainDamage/doc/slides/Grabovoy2018OptimalBrainDamage.pdf А. Грабовой], [http://svn.code.sf.net/p/mlalgorithms/code/Group474/Alekseev2017IntraTextCoherence/doc/Alekseev2017Presentation.pdf В. Алексеев], [http://svn.code.sf.net/p/mlalgorithms/code/Group574/Rogozina2018StructurePredictionRNA/doc/slides/Rogozina2018RNAPredictionsSlides.pdf А. Рогозина], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/presentation/presentation.pdf И. Игашов], [http://svn.code.sf.net/p/mlalgorithms/code/Group474/Uvarov2017DynamicGraphicalModels/slides/Uvarov2017DynamicGraphicalModels.pdf Н. Уваров] | ||
+ | * Example of the measurement description, [http://www.machinelearning.ru/wiki/images/3/35/Old_Faithful_dataset_description.pdf Bishop C.P. Pattern recognition and machine learning, 2006. Pp. 677-683.]] | ||
+ | |||
+ | == Todo B: Run basic code == | ||
+ | Select the basic algorithm and run it using a simple data set. | ||
+ | |||
+ | # Run your basic algorithm: | ||
+ | #* select the simplest algorithm (with your adviser) to (partially) solve the problem you set. | ||
+ | # Collect a synthetic data set or download a simple real-word data set of small size. | ||
+ | # Upload your data to the repository (in case the data size exceed 5MB or the data set consists of numerous files, please discuss with your adviser and team). | ||
+ | # Run the basic algorithm on the synthetic data set, estimate the error. | ||
+ | # Describe the basic algorithm, analyst its features, list competitive models.<!--: | ||
+ | ## Описание - указание на название черного ящика. Желательно указывать на источник, где содержимое черного ящика описывается подробно. Указывать структурные параметры черного ящика. | ||
+ | ## Описание модели как отображения из пространства описания объектов в пространство целевых переменных. При этом можно указать на алгоритм оптимизации параметров модели в виде черного ящика. | ||
+ | ## Описание модели и алгоритма оптимизации его параметров в виде псевдокода. | ||
+ | --> | ||
+ | |||
+ | '''Resourses''' | ||
+ | * Бахтеев О.Ю. Системы и средства глубокого обучения, [http://strijov.com/papers/Bakhteev2016AWS.pdf статья] | ||
+ | * Мотренко А.П. Повышение качества классификации, [http://strijov.com/papers/MolybogMotrenko2017DimRed.pdf статья] | ||
+ | * Исаченко Р.В. Снижение размерности в задаче декодирования, [https://github.com/Intelligent-Systems-Phystech/2017-Isachenko-PLS/raw/master/doc/Isachenko2017PLS.pdf статья] | ||
+ | * Построение выборки в задачах прогнозирования, [http://svn.code.sf.net/p/mvr/code/lectures/DataFest/Strijov2016Tutorial.pdf слайды] | ||
+ | <!-- * Постановка задачи прогнозирования дефолтов по картам на год вперед, [[Media:Strijov2018ProbStCardScoring.pdf|слайды]]--> | ||
+ | * [http://www.machinelearning.ru/wiki/images/4/49/Strijov2019IDEF0.pdf The IDEF standard for project planning] | ||
+ | |||
+ | == Todo R: Preliminary report == | ||
+ | # Make sure that the obtained results are in no contradiction with the goals of the computational experiment. | ||
+ | # Illustrate the obtained results with the preliminary plot [http://www.machinelearning.ru/wiki/index.php?title=JMLDA/Fig see the format]. | ||
+ | # Write a mini-report on the results with | ||
+ | ## a short description of the figure: what the reader could see, what are the consequences, | ||
+ | ## the results in numbers and comments on it, | ||
+ | ## put the report to the section computational experiment. | ||
+ | |||
+ | == Todo P: Problem statement == | ||
+ | In the paradigm Idea<math>\to</math>Formula<math>\to</math>Code state the problem to find an optimal solution. | ||
+ | # Discuss the problem statement with your adviser. | ||
+ | # See the examples below and in the past projects. | ||
+ | # Discuss terminology and notation see [pdf] and [tex] with notations and a useful style file. | ||
+ | # At the beginning of the Problem statement write a general problem description. | ||
+ | # Describe the elements of your problem statement: | ||
+ | ## the sample set, | ||
+ | ## its origin, or its algebraic structure, | ||
+ | ## statistical hypotheses of data generation, | ||
+ | ## [conditions of measurements] , | ||
+ | ## [restrictions of the sample set and its values], | ||
+ | ## your model in the class of models, | ||
+ | ## restrictions on the class of models, | ||
+ | ## the error function (and its inference) or a loss function, or a quality criterion, | ||
+ | ## cross-validation procedure, | ||
+ | ## restrictions to the solutions, | ||
+ | ## external (industrial) quality criteria, | ||
+ | ## the optimization statement as <math>\arg\min</math>. | ||
+ | # Define the main termini: what is called the model, the solution, and the algorithm. | ||
+ | |||
+ | Note that: | ||
+ | * The '''model''' is a parametric family of functions to map design space to target space. | ||
+ | * The '''criterion''' (error function) is a function to optimize in order to obtain an optimal solution (model parameters, a function). | ||
+ | * The '''algorithm''' transforms solution space, usually iteratively. | ||
+ | * The '''method''' combines a model, a criterion, and an algorithm to produce a solution. | ||
+ | |||
+ | Check it: | ||
+ | * the regression ''model'', | ||
+ | * the sum of squared ''errors'', | ||
+ | * the Newton-Raphson ''algorithm'', | ||
+ | * the ''method'' of least squares. | ||
+ | |||
+ | '''Resources''' | ||
+ | * Slides [http://www.machinelearning.ru/wiki/images/b/b9/Strijov2020ProblStatement.pdf with a plan of Problem statement] | ||
+ | * Examples of problem statements | ||
+ | *# Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183 [http://strijov.com/papers/Katrutsa2014TestGenerationEn.pdf article] | ||
+ | *# Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 [http://strijov.com/papers/Katrutsa2016QPFeatureSelection.pdf article] | ||
+ | *# Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. [http://strijov.com/papers/MotrenkoStrijovWeber2012SampleSize.pdf article] | ||
+ | *# Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. [http://strijov.com/papers/Kulunchakov2014RankingBySimpleFun.pdf article] | ||
+ | *# Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group074/Kuznetsov2013SSAForecasting/doc/Ivkin2013ProblemStatement.pdf?format=raw draft] | ||
+ | * Notations for wiki [http://www.machinelearning.ru/wiki/index.php?title=%D0%A7%D0%B8%D1%81%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%B5_%D0%BC%D0%B5%D1%82%D0%BE%D0%B4%D1%8B_%D0%BE%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D1%8F_%D0%BF%D0%BE_%D0%BF%D1%80%D0%B5%D1%86%D0%B5%D0%B4%D0%B5%D0%BD%D1%82%D0%B0%D0%BC_%28%D0%BF%D1%80%D0%B0%D0%BA%D1%82%D0%B8%D0%BA%D0%B0%2C_%D0%92.%D0%92._%D0%A1%D1%82%D1%80%D0%B8%D0%B6%D0%BE%D0%B2%29/%D0%A0%D0%B5%D0%BA%D0%BE%D0%BC%D0%B5%D0%BD%D0%B4%D1%83%D0%B5%D0%BC%D1%8B%D0%B5_%D0%BE%D0%B1%D0%BE%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D1%8F Ru] | ||
+ | * Basic notations, [http://www.machinelearning.ru/wiki/images/c/c2/Strijov2013Notation.pdf pdf] | ||
+ | * Recommended notations, 2019: [http://www.machinelearning.ru/wiki/images/0/0f/M1_Notation.pdf pdf] and [http://www.machinelearning.ru/wiki/images/6/6d/M1_Notation_source.zip .tex with .sty)]] | ||
+ | * Simple and useful [http://www.machinelearning.ru/wiki/images/4/41/NiceNotations.pdf notations] | ||
+ | * Notations for Bayesian model selection, [http://www.machinelearning.ru/wiki/images/0/03/ABS_notations.pdf pdf] | ||
+ | |||
+ | |||
== Todo A: Abstract == | == Todo A: Abstract == | ||
Line 10: | Line 164: | ||
** [the novelty], | ** [the novelty], | ||
** application to illustrate with. | ** application to illustrate with. | ||
− | * For joint projects it is important that each team | + | * For joint projects it is important that each team member writes its own text. |
'''Resources''' | '''Resources''' | ||
Line 27: | Line 181: | ||
== Todo I: Introduction == | == Todo I: Introduction == | ||
− | The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the- | + | The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-art references. It delivers the main message of the work to the reader. This message shows the novelty of this work in comparison to recent results. |
# Create a file ''ProjectN.bib'' for the group project, or ''Surname2018Title.bib'' for your personal project. | # Create a file ''ProjectN.bib'' for the group project, or ''Surname2018Title.bib'' for your personal project. | ||
# Move from the file ''LinkReview'' useful bibliographic records in the BibTeX format. | # Move from the file ''LinkReview'' useful bibliographic records in the BibTeX format. | ||
− | #* Check the correctness of the BibTeX database (styles of authors names, volumes of journals, page numbers). | + | #* Check the correctness of the BibTeX database (styles of authors' names, volumes of journals, page numbers). |
#* Use [http://liinwww.ira.uka.de/bibliography/ bibliographic databases] to facilitate your work. | #* Use [http://liinwww.ira.uka.de/bibliography/ bibliographic databases] to facilitate your work. | ||
#* Use the default style ''\bibliographystyle{plain}'' before the bibliography section ''\bibliography{ProjectN}''. | #* Use the default style ''\bibliographystyle{plain}'' before the bibliography section ''\bibliography{ProjectN}''. | ||
− | #* Important! Wikipedia is not | + | #* Important! Wikipedia is not a source of information, but it contains many useful sources. |
− | #* Important! ArXiv is not a peer- | + | #* Important! ArXiv is not a peer-reviewed source of information. Look for copies of papers that are published in peer-reviewed scientific journals. If after one or two years after its ArXiv version, the paper did not appear in a peer-reviewed journal, be careful to use it: this paper might be non-verified since it was rejected by the other journals. |
# Write Introduction. The expected size is one page. The expected plan is: | # Write Introduction. The expected size is one page. The expected plan is: | ||
## the research goal (and its motivations), | ## the research goal (and its motivations), | ||
Line 46: | Line 200: | ||
## goal of the experiment, set up, data sets, workflow. | ## goal of the experiment, set up, data sets, workflow. | ||
− | '''The goal of this week''' is comprehend the goal at its whole and write about it. | + | '''The goal of this week''' is to comprehend the goal at its whole and write about it. |
'''Resources''' | '''Resources''' | ||
Line 70: | Line 224: | ||
## those, who refer to their works. | ## those, who refer to their works. | ||
# Balance the list of the new and well-known works. | # Balance the list of the new and well-known works. | ||
− | # Keep up-to date the list of keywords to search with. | + | # Keep up-to-date the list of keywords to search with. |
− | # Continuously | + | # Continuously fill your LinkReview. |
− | # Plan Introduction (see the next todo list), namely collect | + | # Plan Introduction (see the next todo list), namely collect |
− | #* keywords as the basic termini; those who | + | #* keywords as the basic termini; those who bring good search results are useful, |
#* what the paper devoted to, | #* what the paper devoted to, | ||
#* the investigated problem, | #* the investigated problem, | ||
Line 101: | Line 255: | ||
== Todo 0: Prepare necessary tools == | == Todo 0: Prepare necessary tools == | ||
− | # '''Editing'''. Install LaTeX: [http://miktex.org MikTeX] for | + | # '''Editing'''. Install LaTeX: [http://miktex.org MikTeX] for Windows, [http://www.tug.org/texlive/ TeX Live] for Linux, and for Mac OS. Sign up [https://v2.overleaf.com/ V2 OverLeaf ShareLaTeX]. |
# Install the editor [http://www.texniccenter.org/ TeXnic Center] or its alternative [http://www.winedt.com/ WinEdt] for Windows, [http://www.tug.org/texworks/ TeXworks] for Linux, and [https://www.xm1math.net/texmaker/ TeXmaker]for Mac OS. | # Install the editor [http://www.texniccenter.org/ TeXnic Center] or its alternative [http://www.winedt.com/ WinEdt] for Windows, [http://www.tug.org/texworks/ TeXworks] for Linux, and [https://www.xm1math.net/texmaker/ TeXmaker]for Mac OS. | ||
#* Read [http://www.machinelearning.ru/wiki/index.php?title=LaTeX LaTeX on MachineLearning] (Ru). | #* Read [http://www.machinelearning.ru/wiki/index.php?title=LaTeX LaTeX on MachineLearning] (Ru). | ||
Line 115: | Line 269: | ||
# Install bibliographic collection software [http://jabref.sourceforge.net/ JabRef] (can be postponed). | # Install bibliographic collection software [http://jabref.sourceforge.net/ JabRef] (can be postponed). | ||
# '''Communications'''. Sign up [https://github.com/ GitHub]. | # '''Communications'''. Sign up [https://github.com/ GitHub]. | ||
− | #* Important: address and login like Name.Surname or Name-Surname (it depends on system conventions) is welcome. | + | #* Important: address and login like Name. Surname or Name-Surname (it depends on system conventions) is welcome. |
#* Introductory sliders [http://www.machinelearning.ru/wiki/images/2/29/MMP_Praktikum317_2013s_VCS.pdf on Version Control System]. | #* Introductory sliders [http://www.machinelearning.ru/wiki/images/2/29/MMP_Praktikum317_2013s_VCS.pdf on Version Control System]. | ||
#* Introduction to [https://guides.github.com/ GitHub]. | #* Introduction to [https://guides.github.com/ GitHub]. | ||
#* The first steps in [https://guides.github.com/activities/hello-world/ GitHub]. | #* The first steps in [https://guides.github.com/activities/hello-world/ GitHub]. | ||
− | # Download a shell: [https://desktop.github.com/ Desktop.GitHub], or use a command line to | + | # Download a shell: [https://desktop.github.com/ Desktop.GitHub], or use a command line to synchronize your project. |
− | # Sign up [http://www.machinelearning.ru/ MachineLearning.ru]. Send a logon to your coordinator of | + | # Sign up [http://www.machinelearning.ru/ MachineLearning.ru]. Send a logon to your coordinator of mlalgorithms [at] gmail [dot] com. |
− | # To state a problem (write essay) using notebook [https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Typesetting%20Equations.html see example] in MathJax. | + | # To state a problem (write an essay) using notebook [https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Typesetting%20Equations.html see example] in MathJax. |
#* Create your page [http://www.machinelearning.ru/wiki/index.php?title=%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:Anastasiya example]. | #* Create your page [http://www.machinelearning.ru/wiki/index.php?title=%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:Anastasiya example]. | ||
<!-- # Поставить ссылку на личную страницу со своей фамилии в таблице на странице группы.--> | <!-- # Поставить ссылку на личную страницу со своей фамилии в таблице на странице группы.--> | ||
Line 161: | Line 315: | ||
* Examples of plots: one many [https://github.com/Intelligent-Systems-Phystech/StartCode/blob/master/Hohlov2018Problem3/HW4.ipynb solutions] from this [https://github.com/Intelligent-Systems-Phystech/StartCode project]. | * Examples of plots: one many [https://github.com/Intelligent-Systems-Phystech/StartCode/blob/master/Hohlov2018Problem3/HW4.ipynb solutions] from this [https://github.com/Intelligent-Systems-Phystech/StartCode project]. | ||
* Examples of old problems [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Example2014Code/Popova2014Problem7/fig1.fig Problem 7], [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Example2014Code/Plavin2014Problem1/html/ Problem 1], [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Example2014Code/ShvetsProblem15/ Problem15]. | * Examples of old problems [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Example2014Code/Popova2014Problem7/fig1.fig Problem 7], [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Example2014Code/Plavin2014Problem1/html/ Problem 1], [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Example2014Code/ShvetsProblem15/ Problem15]. | ||
− | |||
− |
Latest revision as of 13:05, 17 February 2024
The to-do lists here correspond to the Course schedule. Each list must be completed before the day of review. It is Wednesday 06:00 am for the 2020 Spring semester.
Contents
- 1 Todo T: Theoretical part
- 2 Todo C: Code of the computational experiment
- 3 Todo V: Visualize project
- 4 Todo Update: Put project straight
- 5 Todo X: Experiment planning
- 6 Todo B: Run basic code
- 7 Todo R: Preliminary report
- 8 Todo P: Problem statement
- 9 Todo A: Abstract
- 10 Todo B: Beginner's-talk
- 11 Todo I: Introduction
- 12 Todo L: Literature
- 13 Todo 1: Select your project
- 14 Todo 0: Prepare necessary tools
- 15 Todo -1: Subscribe to the course
Todo T: Theoretical part
The theoretical part describes the proposed solution and declares its properties. The goal is to join the theoretical elements into a method. This method includes hypotheses, models, criteria, and the optimization algorithm.
- Write the solution of your problem
- in a simple outline variant,
- expand necessary details,
- use algorithm LaTeX template.
- Compare notations in the problem statement, solution, and code. Make sure the code does not contradict the text.
Resources
- Collection of plots, assorted [1], version to download slides, PDF
- Neuro-ZOO
- Commercial Project Planning, supplementary to the group game
Todo C: Code of the computational experiment
Organize your code so that the computational experiment runs every time with results stored.
- Set the only main file to run the experiment.
- Decompose the project code, and write functions and modules.
- Gather the experiment parameters in a special-purpose section.
- A text description of the experiment flow helps.
- Set a procedure of historical version points to return to the previous experiment.
- Commit schedule helps.
- Write named plots to a designated folder.
- Write your results to a .tex-file and compile.
- If your experiment run takes a long time, just cut the data set.
- Do not use big or sophisticated data. Put your efforts to illustrate your main message.
Todo V: Visualize project
Set the list of plots that will be included in your paper and presentation.
- Make a plot of the source data.
- Goal: put notations to the plot.
- List plots to illustrate the error analysis.
- Make a plot to show the main message.
Todo Update: Put project straight
- Check the proper folder structure (example make sure that your paper is not in the Code folder):
- docs,
- code,
- data,
- [figs].
- Put the direct link to the paper in the table, so that everyone could access it.
- Rename article.tex to Surname2020Title.tex
- Check the both .tex and .pdf files are downloaded.
- Update your personal page on Machinelearning.ru.
Todo X: Experiment planning
Plan your computational experiment.
- Discuss the experiment goal with your adviser and team.
- Put this goal in the section Computational experiment
- Describe your basic data set, a synthetic, or a simple real one:
- put in the text the title, source, and set up of measurements (it is the technical description, the theoretical one is in the problem statement section),
- write down the number of objects, and features, describe general statistics,
- for a synthetic data set describe the generation model, and its parameters (for example, uniform random independent sampling at some given interval).
- Describe the configuration of the algorithm run.
- Plan the whole experimental part.
- List expected tables and figures:
- make short and long list, for each
- describe axes,
- make a draft with a pencil.
Resources
- The goals of computational experiments А. Грабовой, В. Алексеев, А. Рогозина, И. Игашов, Н. Уваров
- Example of the measurement description, Bishop C.P. Pattern recognition and machine learning, 2006. Pp. 677-683.]
Todo B: Run basic code
Select the basic algorithm and run it using a simple data set.
- Run your basic algorithm:
- select the simplest algorithm (with your adviser) to (partially) solve the problem you set.
- Collect a synthetic data set or download a simple real-word data set of small size.
- Upload your data to the repository (in case the data size exceed 5MB or the data set consists of numerous files, please discuss with your adviser and team).
- Run the basic algorithm on the synthetic data set, estimate the error.
- Describe the basic algorithm, analyst its features, list competitive models.
Resourses
- Бахтеев О.Ю. Системы и средства глубокого обучения, статья
- Мотренко А.П. Повышение качества классификации, статья
- Исаченко Р.В. Снижение размерности в задаче декодирования, статья
- Построение выборки в задачах прогнозирования, слайды
- The IDEF standard for project planning
Todo R: Preliminary report
- Make sure that the obtained results are in no contradiction with the goals of the computational experiment.
- Illustrate the obtained results with the preliminary plot see the format.
- Write a mini-report on the results with
- a short description of the figure: what the reader could see, what are the consequences,
- the results in numbers and comments on it,
- put the report to the section computational experiment.
Todo P: Problem statement
In the paradigm Idea\(\to\)Formula\(\to\)Code state the problem to find an optimal solution.
- Discuss the problem statement with your adviser.
- See the examples below and in the past projects.
- Discuss terminology and notation see [pdf] and [tex] with notations and a useful style file.
- At the beginning of the Problem statement write a general problem description.
- Describe the elements of your problem statement:
- the sample set,
- its origin, or its algebraic structure,
- statistical hypotheses of data generation,
- [conditions of measurements] ,
- [restrictions of the sample set and its values],
- your model in the class of models,
- restrictions on the class of models,
- the error function (and its inference) or a loss function, or a quality criterion,
- cross-validation procedure,
- restrictions to the solutions,
- external (industrial) quality criteria,
- the optimization statement as \(\arg\min\).
- Define the main termini: what is called the model, the solution, and the algorithm.
Note that:
- The model is a parametric family of functions to map design space to target space.
- The criterion (error function) is a function to optimize in order to obtain an optimal solution (model parameters, a function).
- The algorithm transforms solution space, usually iteratively.
- The method combines a model, a criterion, and an algorithm to produce a solution.
Check it:
- the regression model,
- the sum of squared errors,
- the Newton-Raphson algorithm,
- the method of least squares.
Resources
- Slides with a plan of Problem statement
- Examples of problem statements
- Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183 article
- Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 article
- Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. article
- Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. article
- Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 draft
- Notations for wiki Ru
- Basic notations, pdf
- Recommended notations, 2019: pdf and .tex with .sty)]
- Simple and useful notations
- Notations for Bayesian model selection, pdf
Todo A: Abstract
- Write a draft of your abstract.
- The abstract shall not exceed 600 characters. It may contain:
- wide-range field of the investigated problem,
- narrow problem to focus on,
- features and conditions of the problem,
- [the novelty],
- application to illustrate with.
- For joint projects it is important that each team member writes its own text.
Resources
- How to Read a Paper, 2016, S. Keshav
- Examples of rewiev-and-planning drafts LinkReview раз, два.
Todo B: Beginner's-talk
Short 45-second introductory talk. Plan of the talk:
- The project goal. What is the motivation, the goal to reach?
- The main idea. What is the message?
- The expected result. What is your delivery, your impact, novelty?
There is no time to show a slide or draw a plot on the blackboard. It is recommended to rehearse the report.
Todo I: Introduction
The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-art references. It delivers the main message of the work to the reader. This message shows the novelty of this work in comparison to recent results.
- Create a file ProjectN.bib for the group project, or Surname2018Title.bib for your personal project.
- Move from the file LinkReview useful bibliographic records in the BibTeX format.
- Check the correctness of the BibTeX database (styles of authors' names, volumes of journals, page numbers).
- Use bibliographic databases to facilitate your work.
- Use the default style \bibliographystyle{plain} before the bibliography section \bibliography{ProjectN}.
- Important! Wikipedia is not a source of information, but it contains many useful sources.
- Important! ArXiv is not a peer-reviewed source of information. Look for copies of papers that are published in peer-reviewed scientific journals. If after one or two years after its ArXiv version, the paper did not appear in a peer-reviewed journal, be careful to use it: this paper might be non-verified since it was rejected by the other journals.
- Write Introduction. The expected size is one page. The expected plan is:
- the research goal (and its motivations),
- the object of research (introduce main termini),
- the problem (what is the challenge),
- methodology: literature review and state-of-the-art
- the project tasks,
- the proposed solution, its novelty and advantages,
- the profs and cons of recent works,
- goal of the experiment, set up, data sets, workflow.
The goal of this week is to comprehend the goal at its whole and write about it.
Resources
- Bibliographic databases
- The Collection of Computer Science Bibliographies
- List of academic databases and search engines in Wikipedia
- Refer to BibTeX in Wikipedia
- An introduction updated after a peer-review.
Todo L: Literature
We use the LinkReview draft format to share our evanescent ephemeral ideas and impressions we have during the literature reading.
- Collect the list of references including:
- state-of-the-art reviews, tutorials,
- fundamental solutions to the problem,
- the basic algorithm to solve your problem,
- alternative algorithms,
- [changes in the research directions],
- data sets and experiments,
- the papers that use these data sets
- applications of the results,
- names of researchers, who solve this problem,
- their students and teams,
- those, who refer to their works.
- Balance the list of the new and well-known works.
- Keep up-to-date the list of keywords to search with.
- Continuously fill your LinkReview.
- Plan Introduction (see the next todo list), namely collect
- keywords as the basic termini; those who bring good search results are useful,
- what the paper devoted to,
- the investigated problem,
- the central idea,
- literature review,
- the authors' contribution.
Todo 1: Select your project
To select your project:
- Look through the list of projects.
- Find information about the experts and consultants.
- Select your projects in the questionnaire before Wednesday 22:00pm.
- Wait for confirmation.
- Put confirmed topics to the Group table on Machine learning
Todo 0: Prepare necessary tools
- Editing. Install LaTeX: MikTeX for Windows, TeX Live for Linux, and for Mac OS. Sign up V2 OverLeaf ShareLaTeX.
- Install the editor TeXnic Center or its alternative WinEdt for Windows, TeXworks for Linux, and TeXmakerfor Mac OS.
- Read LaTeX on MachineLearning (Ru).
- Useful: Wikibooks LaTeX, К.В.Воронцов. LaTeX2e в примерах.
- Read Львовский С. М. Набор и верстка в системе LaTeX.
- Download the paper template, ZIP and compile it.
- Read BibTeX.
- Install bibliographic collection software JabRef (can be postponed).
- Communications. Sign up GitHub.
- Important: address and login like Name. Surname or Name-Surname (it depends on system conventions) is welcome.
- Introductory sliders on Version Control System.
- Introduction to GitHub.
- The first steps in GitHub.
- Download a shell: Desktop.GitHub, or use a command line to synchronize your project.
- Sign up MachineLearning.ru. Send a logon to your coordinator of mlalgorithms [at] gmail [dot] com.
- To state a problem (write an essay) using notebook see example in MathJax.
- Create your page example.
- Install Hangouts, Skype - read instructions.
- Programming. Install Python Anaconda, PyCharm (alternative Visual Studio), Notebook online Google.Colab.
- Development for ML: PyTorch
- Style formatting: Codestyle pep8
- Add. As alternative install and try Matlab (MIPT provides free version), (alternative Octave), R-project, Wofram Mathematica.
- Add. Read with pleasure Кутателадзе С. С. Советы эпизодическому переводчику and Сосинский А. Б. Как написать математическую статью по-английски.
Resources
- Announcements: Telegram m1p_news
- Ask to email mlalgorithms [at] gmail [dot] com
- Slides.
- Short course description.
References to catch up
- A Brief Introduction to Machine Learning for Engineers by Osvaldo Simeone, 2017-2018
- Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz, Shai Ben-David, 2014
- Mathematics for Machine learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong
- Mathematics for Physicists: Introductory Concepts and Methods by Alexander Altland & Jan von Delf
- Python notes for professionals by GoalKicker.com Free Programming Books.
- Лагутин М.Б. Наглядная математическая статистика, М.: Бином, 2009. См. также вырезку.
- Bishop C.P. Pattern recognition and machine learning, Berlin: Springer, 2008.
- MackKay D. Information Theory, Pattern Recognition and Neural Networks, Inference.org.uk, 2009.
Todo -1: Subscribe to the course
Todo before 06:00 Wednesday, February 12 th:
- pick up a problem from the page Try-on programming problems (get the oldest problems, they are simpler),
- plot one figure to illustrate the problem (plot data or analysis),
- write explanatory comments to the figure (what the reader sees on the figure, what conclusions follow up),
- an example of the figure formatting is here
- upload your notebook to your github repository,
- send the link to this notebook to mlalgorithms [at] gmail [dot] com, with the subject "Application m1p"