Difference between revisions of "Todo list"

From Research management course
Jump to: navigation, search
m
 
(29 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The todo lists here corresponds to the [[Course schedule]]. Each list must be completed before the day of review. It is Wednesday 06:00 am for the 2020 Spring semester.
+
{{#seo:
 +
|title=Course My first scientific article: To-do list
 +
|titlemode=replace
 +
|keywords=My first scientific article
 +
|description=Course My first scientific article: The to-do lists here correspond to the Course Schedule. Each list must be completed before the day of review.
 +
}}
 +
 
 +
The to-do lists here correspond to the [[Course schedule]]. Each list must be completed before the day of review. It is Wednesday 06:00 am for the 2020 Spring semester.
 +
 
 +
<!-- == Todo E: Error analysis ==
 +
([http://www.machinelearning.ru/wiki/index.php?title=M1#.D0.94.D0.BE.D0.BC.D0.B0.D1.88.D0.BD.D0.B5.D0.B5_.D0.B7.D0.B0.D0.B4.D0.B0.D0.BD.D0.B8.D0.B5-E:_.D0.B0.D0.BD.D0.B0.D0.BB.D0.B8.D0.B7_.D0.BE.D1.88.D0.B8.D0.B1.D0.BA.D0.B8 Rus])
 +
-->
 +
== Todo T: Theoretical part ==
 +
The theoretical part describes the proposed solution and declares its properties.
 +
The goal is to join the theoretical elements into a '''method'''. This method includes hypotheses, models, criteria, and the optimization algorithm.
 +
# Write the solution of your problem
 +
#* in a simple outline variant,
 +
#* expand necessary details,
 +
#* use algorithm LaTeX template.
 +
# Compare notations in the problem statement, solution, and code. Make sure the code does not contradict the text.
 +
 
 +
'''Resources'''
 +
* Collection of plots, assorted [https://sourceforge.net/p/mvr/code/HEAD/tree/lectures/MachineLearningResearch/ComputationalExperiment/fig_compilation_slides.pdf?format=raw], version to download [http://www.machinelearning.ru/wiki/images/2/25/Fig_compilation_slides_stable.pdf slides, PDF]
 +
* [http://www.machinelearning.ru/wiki/images/2/24/Zharikov2017Presentation.pdf Neuro-ZOO]
 +
* [http://www.machinelearning.ru/wiki/images/d/d0/Strijov2020CommercialProjectPlanning.pdf  Commercial Project Planning, supplementary  to the group game]
 +
 
 +
== Todo C: Code of the computational experiment ==
 +
Organize your code so that the computational experiment runs every time with results stored.
 +
# Set the only main file to run the experiment.
 +
# Decompose the project code, and write functions and modules.
 +
# Gather the experiment parameters in a special-purpose section.
 +
#* A text description of the experiment flow helps.
 +
# Set a procedure of historical version points to return to the previous experiment.
 +
#* Commit schedule helps.
 +
# Write named plots to a designated folder.
 +
#* Write your results to a .tex-file and compile.
 +
* '''If your experiment run takes a long time, just cut the data set.'''
 +
** ''Do not use big or sophisticated data. Put your efforts to illustrate your main message.''
 +
 
 +
== Todo V: Visualize project ==
 +
Set the list of plots that will be included in your paper and presentation.
 +
# Make a plot of the source data.
 +
#* '''Goal:''' put notations to the plot.
 +
# List plots to illustrate the error analysis.
 +
# Make a plot to show the main message.
 +
 
 +
== Todo Update: Put project straight ==
 +
# Check the proper folder structure (example make sure that your paper is not in the Code folder):
 +
#* docs,
 +
#* code,
 +
#* data,
 +
#* [figs].
 +
# Put the direct link to the paper [http://bit.ly/m1p_2020 in the table], so that everyone could access it.
 +
# Rename article.tex to Surname2020Title.tex
 +
# Check the both .tex and .pdf files are downloaded.
 +
<!-- # Fill in the readme.md file in the github project (together with the necessary links)-->
 +
# Update [http://www.machinelearning.ru/wiki/index.php?title=%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:Vmarkin your personal page] on [http://bit.ly/m1p_2020 Machinelearning.ru].
 +
 
 +
== Todo X: Experiment planning ==
 +
Plan your computational experiment.
 +
# Discuss the experiment goal with your adviser and team.
 +
#* Put this goal in the section Computational experiment
 +
# Describe your basic data set, a synthetic, or a simple real one:
 +
#* put in the text the title, source, and set up of measurements (it is the technical description, the theoretical one is in the problem statement section),
 +
#* write down the number of objects, and features, describe general statistics,
 +
#* for a synthetic data set describe the generation model, and its parameters (for example, uniform random independent sampling at some given interval).
 +
# Describe the configuration of the algorithm run.
 +
# Plan the whole experimental part.
 +
# List expected tables and figures:
 +
#* make short and long list, for each
 +
#* describe axes,
 +
#* make a draft with a pencil.
 +
 
 +
'''Resources'''
 +
* The goals of computational experiments [http://svn.code.sf.net/p/mlalgorithms/code/Group574/Grabovoy2018OptimalBrainDamage/doc/slides/Grabovoy2018OptimalBrainDamage.pdf А. Грабовой], [http://svn.code.sf.net/p/mlalgorithms/code/Group474/Alekseev2017IntraTextCoherence/doc/Alekseev2017Presentation.pdf В. Алексеев], [http://svn.code.sf.net/p/mlalgorithms/code/Group574/Rogozina2018StructurePredictionRNA/doc/slides/Rogozina2018RNAPredictionsSlides.pdf А. Рогозина], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/presentation/presentation.pdf И. Игашов], [http://svn.code.sf.net/p/mlalgorithms/code/Group474/Uvarov2017DynamicGraphicalModels/slides/Uvarov2017DynamicGraphicalModels.pdf Н. Уваров]
 +
* Example of the measurement description, [http://www.machinelearning.ru/wiki/images/3/35/Old_Faithful_dataset_description.pdf Bishop C.P. Pattern recognition and machine learning, 2006. Pp. 677-683.]]
 +
 
 +
== Todo B: Run basic code ==
 +
Select the basic algorithm and run it using a simple data set.
 +
 
 +
# Run your basic algorithm:
 +
#* select the simplest algorithm (with your adviser) to (partially) solve the problem you set.
 +
# Collect a synthetic data set or download a simple real-word data set of small size.
 +
# Upload your data to the repository (in case the data size exceed 5MB or the data set consists of numerous files, please discuss with your adviser and team).
 +
# Run the basic algorithm on the synthetic data set, estimate the error.
 +
# Describe the basic algorithm, analyst its features, list competitive models.<!--:
 +
## Описание - указание на название черного ящика. Желательно указывать на источник, где содержимое черного ящика описывается подробно. Указывать структурные параметры черного ящика.
 +
## Описание модели как отображения из пространства описания объектов в пространство целевых переменных. При этом можно указать на алгоритм оптимизации параметров модели в виде черного ящика.
 +
## Описание модели и алгоритма оптимизации его параметров в виде псевдокода.
 +
-->
 +
 
 +
'''Resourses'''
 +
* Бахтеев О.Ю. Системы и средства глубокого обучения, [http://strijov.com/papers/Bakhteev2016AWS.pdf статья]
 +
* Мотренко А.П. Повышение качества классификации, [http://strijov.com/papers/MolybogMotrenko2017DimRed.pdf статья]
 +
* Исаченко Р.В. Снижение размерности в задаче декодирования, [https://github.com/Intelligent-Systems-Phystech/2017-Isachenko-PLS/raw/master/doc/Isachenko2017PLS.pdf статья]
 +
* Построение выборки в задачах прогнозирования, [http://svn.code.sf.net/p/mvr/code/lectures/DataFest/Strijov2016Tutorial.pdf слайды]
 +
<!-- * Постановка задачи прогнозирования дефолтов по картам на год вперед, [[Media:Strijov2018ProbStCardScoring.pdf|слайды]]-->
 +
* [http://www.machinelearning.ru/wiki/images/4/49/Strijov2019IDEF0.pdf The IDEF standard for project planning]
 +
 
 +
== Todo R: Preliminary report ==
 +
# Make sure that the obtained results are in no contradiction with the goals of the computational experiment.
 +
# Illustrate the obtained results with the preliminary plot [http://www.machinelearning.ru/wiki/index.php?title=JMLDA/Fig see the format].
 +
# Write a mini-report on the results with
 +
## a short description of the figure: what the reader could see, what are the consequences,
 +
## the results in numbers and comments on it,
 +
## put the report to the section computational experiment.
 +
 
 +
== Todo P: Problem statement ==
 +
In the paradigm Idea<math>\to</math>Formula<math>\to</math>Code state the problem to find an optimal solution.
 +
# Discuss the problem statement with your adviser.
 +
# See the examples below and in the past projects.
 +
# Discuss terminology and notation see [pdf] and [tex] with notations and a useful style file.
 +
# At the beginning of the Problem statement write a general problem description.
 +
# Describe the elements of your problem statement:
 +
## the sample set,
 +
## its origin, or its algebraic structure,
 +
## statistical hypotheses of data generation,
 +
## [conditions of measurements] ,
 +
## [restrictions of the sample set and its values],
 +
## your model in the class of models,
 +
## restrictions on the class of models,
 +
## the error function (and its inference) or a loss function, or a quality criterion,
 +
## cross-validation procedure,
 +
## restrictions to the solutions,
 +
## external (industrial) quality criteria,
 +
## the optimization statement as <math>\arg\min</math>.
 +
# Define the main termini: what is called the model, the solution, and the algorithm.
 +
 
 +
Note that:
 +
* The '''model''' is a parametric family of functions to map design space to target space.
 +
* The '''criterion''' (error function) is a function to optimize in order to obtain an optimal solution (model parameters, a function).
 +
* The '''algorithm''' transforms solution space, usually iteratively.
 +
* The '''method''' combines a model, a criterion, and an algorithm to produce a solution.
 +
 
 +
Check it:
 +
* the regression ''model'',
 +
* the sum of squared ''errors'',
 +
* the Newton-Raphson ''algorithm'',
 +
* the ''method'' of least squares.
 +
 
 +
'''Resources'''
 +
* Slides [http://www.machinelearning.ru/wiki/images/b/b9/Strijov2020ProblStatement.pdf with a plan of Problem statement]
 +
* Examples of problem statements
 +
*#  Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183 [http://strijov.com/papers/Katrutsa2014TestGenerationEn.pdf article]
 +
*# Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 [http://strijov.com/papers/Katrutsa2016QPFeatureSelection.pdf article]
 +
*# Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. [http://strijov.com/papers/MotrenkoStrijovWeber2012SampleSize.pdf article]
 +
*# Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. [http://strijov.com/papers/Kulunchakov2014RankingBySimpleFun.pdf article]
 +
*# Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group074/Kuznetsov2013SSAForecasting/doc/Ivkin2013ProblemStatement.pdf?format=raw draft]
 +
* Notations for wiki [http://www.machinelearning.ru/wiki/index.php?title=%D0%A7%D0%B8%D1%81%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%B5_%D0%BC%D0%B5%D1%82%D0%BE%D0%B4%D1%8B_%D0%BE%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D1%8F_%D0%BF%D0%BE_%D0%BF%D1%80%D0%B5%D1%86%D0%B5%D0%B4%D0%B5%D0%BD%D1%82%D0%B0%D0%BC_%28%D0%BF%D1%80%D0%B0%D0%BA%D1%82%D0%B8%D0%BA%D0%B0%2C_%D0%92.%D0%92._%D0%A1%D1%82%D1%80%D0%B8%D0%B6%D0%BE%D0%B2%29/%D0%A0%D0%B5%D0%BA%D0%BE%D0%BC%D0%B5%D0%BD%D0%B4%D1%83%D0%B5%D0%BC%D1%8B%D0%B5_%D0%BE%D0%B1%D0%BE%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D1%8F Ru]
 +
* Basic notations, [http://www.machinelearning.ru/wiki/images/c/c2/Strijov2013Notation.pdf pdf]
 +
* Recommended notations, 2019: [http://www.machinelearning.ru/wiki/images/0/0f/M1_Notation.pdf pdf] and [http://www.machinelearning.ru/wiki/images/6/6d/M1_Notation_source.zip .tex with .sty)]]
 +
* Simple and useful [http://www.machinelearning.ru/wiki/images/4/41/NiceNotations.pdf notations]
 +
* Notations for Bayesian model selection, [http://www.machinelearning.ru/wiki/images/0/03/ABS_notations.pdf pdf]
 +
 
 +
 
 +
 
 +
== Todo A: Abstract ==
 +
 
 +
# Write a '''draft''' of your abstract.
 +
* The abstract shall not exceed 600 characters. It may contain:
 +
** wide-range field of the investigated problem,
 +
** narrow problem to focus on,
 +
** features and conditions of the problem,
 +
** [the novelty],
 +
** application to illustrate with.
 +
* For joint projects it is important that each team member writes its own text.
 +
 
 +
'''Resources'''
 +
* [https://web.stanford.edu/class/ee384m/Handouts/HowtoReadPaper.pdf How to Read a Paper, 2016, S. Keshav]
 +
* Examples of rewiev-and-planning drafts LinkReview [https://docs.google.com/document/d/1fx7fVlmnwdTesElt-lbaHvoGEjJC5t_9e-X0ZpUzEcQ/edit?usp=sharing раз], [https://docs.google.com/document/d/1XNhnwvooJwjj5UL6lkTio0bpvRKIj2NPVNCdHDRLOLc/edit?usp=sharing два].
 +
<!-- * [https://github.com/Strijov/Strijov2018-1AutomationOfResearch/raw/master/MotivationExamples.pdf Examples of project goals and motivations].
 +
* [https://github.com/Strijov/Strijov2018-1AutomationOfResearch/raw/master/AbstractExamples.pdf Примеры аннотаций].
 +
-->
 +
 
 +
== Todo B: Beginner's-talk ==
 +
Short 45-second introductory talk. Plan of the talk:
 +
# The project goal. What is the motivation, the goal to reach?
 +
# The main idea. What is the message?
 +
# The expected result. What is your delivery, your impact, novelty?
 +
There is no time to show a slide or draw a plot on the blackboard. It is recommended to rehearse the report.
 +
 
 +
== Todo I: Introduction ==
 +
The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-art references. It delivers the main message of the work to the reader.  This message shows the novelty of this work in comparison to recent results.
 +
 +
# Create a file ''ProjectN.bib'' for the group project, or ''Surname2018Title.bib'' for your personal project.
 +
# Move from the file ''LinkReview'' useful bibliographic records in the BibTeX format.
 +
#* Check the correctness of the BibTeX database (styles of authors' names, volumes of journals, page numbers).
 +
#* Use [http://liinwww.ira.uka.de/bibliography/  bibliographic databases] to facilitate your work.
 +
#* Use the default style ''\bibliographystyle{plain}'' before the bibliography section ''\bibliography{ProjectN}''.
 +
#* Important! Wikipedia is not a source of information, but it contains many useful sources.
 +
#* Important! ArXiv is not a peer-reviewed source of information. Look for copies of papers that are published in peer-reviewed scientific journals. If after one or two years after its ArXiv version, the paper did not appear in a peer-reviewed journal, be careful to use it: this paper might be non-verified since it was rejected by the other journals.
 +
# Write Introduction. The expected size is one page. The expected plan is:
 +
## the research goal (and its motivations),
 +
## the object of research (introduce main termini),
 +
## the problem (what is the challenge),
 +
## methodology: literature review and state-of-the-art
 +
## the project tasks,
 +
## the  proposed solution, its novelty and advantages,
 +
## the profs and cons of recent works,
 +
## goal of the experiment, set up, data sets, workflow.
 +
 
 +
'''The goal of this week''' is to comprehend the goal at its whole and write about it.
 +
 
 +
'''Resources'''
 +
* Bibliographic databases
 +
** [http://liinwww.ira.uka.de/bibliography/ The Collection of Computer Science Bibliographies]
 +
** [http://en.wikipedia.org/wiki/List_of_academic_databases_and_search_engines List of academic databases and search engines in Wikipedia]
 +
** Refer to [http://ru.wikipedia.org/wiki/BibTeX BibTeX in Wikipedia]
 +
** [http://svn.code.sf.net/p/mlalgorithms/code/Examples/ArticleReviews/%20r0_ESWA_review_report.pdf An introduction] updated after a peer-review.
 +
 
 +
== Todo L: Literature ==
 +
We use the [https://docs.google.com/document/d/1K7bIzU33MSfeUvg3WITRZX0pe3sibbtH62aw42wxsEI/edit?usp=sharing LinkReview] draft format to share our evanescent ephemeral ideas and impressions we have during the literature reading. 
 +
# Collect the list of references including:
 +
## state-of-the-art reviews, tutorials,
 +
## fundamental solutions to the problem,
 +
## the basic algorithm to solve your problem,
 +
## alternative algorithms,
 +
## [changes in the research directions],
 +
## data sets and experiments,
 +
## the papers that use these data sets
 +
## applications of the results,
 +
## names of researchers, who solve this problem,
 +
## their students and teams,
 +
## those, who refer to their works.
 +
# Balance the list of the new and well-known works.
 +
# Keep up-to-date the list of keywords to search with.
 +
# Continuously fill your LinkReview.
 +
# Plan Introduction (see the next todo list), namely collect
 +
#* keywords as the basic termini; those who bring good search results are useful,
 +
#* what the paper devoted to,
 +
#* the investigated problem,
 +
#* the central idea,
 +
#* literature review,
 +
#* the authors' contribution.
 +
 
 +
<!--
 +
#* Введение (около страницы); ниже — по абзацам, примерный план
 +
 
 +
'''Resources'''
 +
* [https://github.com/Strijov/Strijov2018-1AutomationOfResearch/raw/master/MotivationExamples.pdf Примеры целеполаганий].
 +
* [https://github.com/Strijov/Strijov2018-1AutomationOfResearch/raw/master/AbstractExamples.pdf Примеры аннотаций].
 +
* [[Media:CommResrarchProtocol.pdf|Методические рекомендации выполнения исследовательских проектов в коммерческой фирме]].
 +
* [https://github.com/Strijov/Strijov2018-1AutomationOfResearch/raw/master/Strijov2018_2AutomationOfResearch.pdf Простое введение в нейросети]
 +
* [[Media:Zhuikov2015MSPresentation.pdf|Проект-стартап, пример отчета от Владимира Жуйкова]].
 +
* [https://cs10.pikabu.ru/post_img/big/2018/07/10/4/1531199339121474094.jpg Демотиватор]
 +
-->
 +
 
 +
== Todo 1: Select your project ==
 +
To select your project:
 +
# [http://bit.ly/m1p_2020 Look through the list of projects].
 +
# Find information about the experts and consultants.
 +
# Select your projects in [http://bit.ly/m1p_select the questionnaire] <strong>before Wednesday 22:00pm</strong>.
 +
# Wait for confirmation.
 +
# Put confirmed topics [http://bit.ly/m1p_2020 to the Group table on Machine learning]
  
 
== Todo 0: Prepare necessary tools ==  
 
== Todo 0: Prepare necessary tools ==  
# '''Editing'''. Install LaTeX: [http://miktex.org MikTeX] for Windown, [http://www.tug.org/texlive/ TeX Live] for Linux, and for Mac OS. Sign up [https://v2.overleaf.com/ V2 OverLeaf  ShareLaTeX].
+
# '''Editing'''. Install LaTeX: [http://miktex.org MikTeX] for Windows, [http://www.tug.org/texlive/ TeX Live] for Linux, and for Mac OS. Sign up [https://v2.overleaf.com/ V2 OverLeaf  ShareLaTeX].
 
# Install the editor [http://www.texniccenter.org/ TeXnic Center] or its alternative [http://www.winedt.com/ WinEdt] for Windows, [http://www.tug.org/texworks/ TeXworks] for Linux, and [https://www.xm1math.net/texmaker/ TeXmaker]for Mac OS.
 
# Install the editor [http://www.texniccenter.org/ TeXnic Center] or its alternative [http://www.winedt.com/ WinEdt] for Windows, [http://www.tug.org/texworks/ TeXworks] for Linux, and [https://www.xm1math.net/texmaker/ TeXmaker]for Mac OS.
 
#* Read [http://www.machinelearning.ru/wiki/index.php?title=LaTeX LaTeX on MachineLearning] (Ru).
 
#* Read [http://www.machinelearning.ru/wiki/index.php?title=LaTeX LaTeX on MachineLearning] (Ru).
Line 10: Line 263:
 
# Read [http://en.wikipedia.org/wiki/Bibtex BibTeX].
 
# Read [http://en.wikipedia.org/wiki/Bibtex BibTeX].
 
#* [http://liinwww.ira.uka.de/csbib?strijov%20nonlinear An example of a bibliographic database].
 
#* [http://liinwww.ira.uka.de/csbib?strijov%20nonlinear An example of a bibliographic database].
#* [http://liinwww.ira.uka.de/cgi-bin/bibshow?e=Njtd0ECMQ03121/fyqboefe%7d81352582&r=bibtex&mode=intra An example of a biliographic record].
+
#* [http://liinwww.ira.uka.de/cgi-bin/bibshow?e=Njtd0ECMQ03121/fyqboefe%7d81352582&r=bibtex&mode=intra An example of a bibliographic record].
#* [https://docs.google.com/document/d/10JgJMieX13R5vlrCfJPrMqU9Rsd4c4JzaGytU8rWFE4/edit?usp=sharing An example of draft revirew LinkReview].
+
#* [https://docs.google.com/document/d/10JgJMieX13R5vlrCfJPrMqU9Rsd4c4JzaGytU8rWFE4/edit?usp=sharing An example of draft review LinkReview].
#* [http://en.wikipedia.org/wiki/List_of_academic_databases_and_search_engines List of databasea and search engines].
+
#* [http://en.wikipedia.org/wiki/List_of_academic_databases_and_search_engines List of databases and search engines].
#* [https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research List of datasets for Machine Learning projects].
+
#* [https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research List of data-sets for Machine Learning projects].
 
# Install bibliographic collection software [http://jabref.sourceforge.net/ JabRef] (can be postponed).
 
# Install bibliographic collection software [http://jabref.sourceforge.net/ JabRef] (can be postponed).
 
# '''Communications'''. Sign up [https://github.com/ GitHub].
 
# '''Communications'''. Sign up [https://github.com/ GitHub].
#* Inportant: address and login like Name.Surname or Name-Surname (it depends on system conventions) is welcome.
+
#* Important: address and login like Name. Surname or Name-Surname (it depends on system conventions) is welcome.
 
#* Introductory sliders [http://www.machinelearning.ru/wiki/images/2/29/MMP_Praktikum317_2013s_VCS.pdf on Version Control System].
 
#* Introductory sliders [http://www.machinelearning.ru/wiki/images/2/29/MMP_Praktikum317_2013s_VCS.pdf on Version Control System].
 
#* Introduction to [https://guides.github.com/ GitHub].
 
#* Introduction to [https://guides.github.com/ GitHub].
 
#* The first steps in [https://guides.github.com/activities/hello-world/ GitHub].
 
#* The first steps in [https://guides.github.com/activities/hello-world/ GitHub].
# Download a shell: [https://desktop.github.com/ Desktop.GitHub], or use a command line to synchronise your project.
+
# Download a shell: [https://desktop.github.com/ Desktop.GitHub], or use a command line to synchronize your project.
# Sign up [http://www.machinelearning.ru/ MachineLearning.ru]. Send a logon to your coordinator of to mlalgorithms [at] gmail [dot] com.
+
# Sign up [http://www.machinelearning.ru/ MachineLearning.ru]. Send a logon to your coordinator of mlalgorithms [at] gmail [dot] com.
# To state a problem (write essay) using notebook [https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Typesetting%20Equations.html see example] in MathJax.  
+
# To state a problem (write an essay) using notebook [https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Typesetting%20Equations.html see example] in MathJax.  
 
#* Create your page [http://www.machinelearning.ru/wiki/index.php?title=%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:Anastasiya example].
 
#* Create your page [http://www.machinelearning.ru/wiki/index.php?title=%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:Anastasiya example].
 
<!-- # Поставить ссылку на личную страницу со своей фамилии в таблице на странице группы.-->
 
<!-- # Поставить ссылку на личную страницу со своей фамилии в таблице на странице группы.-->
 
# Install [https://hangouts.google.com/ Hangouts], [http://www.machinelearning.ru/wiki/index.php?title=%D0%A1%D0%BA%D0%B0%D0%B9%D0%BF_%28Skype%29 Skype - read instructions].
 
# Install [https://hangouts.google.com/ Hangouts], [http://www.machinelearning.ru/wiki/index.php?title=%D0%A1%D0%BA%D0%B0%D0%B9%D0%BF_%28Skype%29 Skype - read instructions].
# '''Programming'''. Install Python [https://anaconda.org/anaconda/python Anakonda], [https://www.jetbrains.com/pycharm/ PyCharm] (alternative [https://code.visualstudio.com/ Visual Studio]), Notebook online [https://colab.research.google.com/notebooks/welcome.ipynb#recent=true Google.Colab].
+
# '''Programming'''. Install Python [https://anaconda.org/anaconda/python Anaconda], [https://www.jetbrains.com/pycharm/ PyCharm] (alternative [https://code.visualstudio.com/ Visual Studio]), Notebook online [https://colab.research.google.com/notebooks/welcome.ipynb#recent=true Google.Colab].
#* Development for ML: Pytorch
+
#* Development for ML: PyTorch
 
#* Style formatting: Codestyle pep8
 
#* Style formatting: Codestyle pep8
 
# '''Add.''' As alternative install and try [http://www.machinelearning.ru/wiki/index.php?title=Matlab Matlab (MIPT provides free version)], (alternative [http://www.gnu.org/software/octave/ Octave]), [https://www.r-project.org/ R-project], [https://www.wolframcloud.com/ Wofram Mathematica].
 
# '''Add.''' As alternative install and try [http://www.machinelearning.ru/wiki/index.php?title=Matlab Matlab (MIPT provides free version)], (alternative [http://www.gnu.org/software/octave/ Octave]), [https://www.r-project.org/ R-project], [https://www.wolframcloud.com/ Wofram Mathematica].
#* Read [http://www.machinelearning.ru/wiki/images/archive/f/fc/20150209132356%21Voron-ML-Intro-slides.pdf Introducton to Matlab]].
+
#* Read [http://www.machinelearning.ru/wiki/images/archive/f/fc/20150209132356%21Voron-ML-Intro-slides.pdf Introduction to Matlab]].
 
#* Read [http://www.machinelearning.ru/wiki/index.php?title=%D0%94%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D1%84%D1%83%D0%BD%D0%BA%D1%86%D0%B8%D0%B9_Matlab Matlab code style, reporting and documenting]].
 
#* Read [http://www.machinelearning.ru/wiki/index.php?title=%D0%94%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D1%84%D1%83%D0%BD%D0%BA%D1%86%D0%B8%D0%B9_Matlab Matlab code style, reporting and documenting]].
 
#* Read [http://www.machinelearning.ru/wiki/images/1/18/MatlabStyle1p5.pdf Matlab Programming Style Guidelines].
 
#* Read [http://www.machinelearning.ru/wiki/images/1/18/MatlabStyle1p5.pdf Matlab Programming Style Guidelines].
Line 36: Line 289:
  
 
'''Resources'''
 
'''Resources'''
* Announcenents: Telegram [http://t.me/AutomationML AutomationML]
+
* Announcements: Telegram [http://t.me/m1p_news m1p_news]
 
* Ask to email mlalgorithms [at] gmail [dot] com
 
* Ask to email mlalgorithms [at] gmail [dot] com
 
* [https://github.com/Strijov/Strijov2018-1AutomationOfResearch/raw/master/Strijov2018_1AutomationOfResearch.pdf  Slides].
 
* [https://github.com/Strijov/Strijov2018-1AutomationOfResearch/raw/master/Strijov2018_1AutomationOfResearch.pdf  Slides].
 
* [http://svn.code.sf.net/p/mvr/code/lectures/MLEducation/Strijov2014MLCourseShort.pdf?format=raw Short course description].
 
* [http://svn.code.sf.net/p/mvr/code/lectures/MLEducation/Strijov2014MLCourseShort.pdf?format=raw Short course description].
* Progress questionary '''[https://goo.gl/forms/jPbh92DOfTrwpyTF3 Todo 1]'''.
 
  
 
'''References to catch up'''
 
'''References to catch up'''
Line 52: Line 304:
 
* [http://www.inference.org.uk/itprnn/book.pdf MackKay D. Information Theory, Pattern Recognition and Neural Networks, Inference.org.uk, 2009.]
 
* [http://www.inference.org.uk/itprnn/book.pdf MackKay D. Information Theory, Pattern Recognition and Neural Networks, Inference.org.uk, 2009.]
  
'''Questionnaries'''
+
== Todo -1: Subscribe to the course ==
* [https://goo.gl/forms/es2dEL9qBAtlYfbL2 Todo list 1: Prepare necessary tools]
+
<strong>[[Todo list|Todo before]] 06:00 Wednesday, February 12 th:</strong>
* [https://goo.gl/forms/Z19P6Rufll0nL06a2 Select problems]
+
# pick up a problem from the page [http://bit.ly/1B4NKjZ Try-on programming problems] (get the oldest problems, they are simpler),
 
+
# plot one figure to illustrate the problem (plot data or analysis),
http://svn.code.sf.net/p/mvr/code/lectures/MLEducation/Strijov2014MLCourseShort_eng.pdf?format=raw
+
# write explanatory comments to the figure (what the reader sees on the figure, what conclusions follow up),
 
+
# an example of the [http://www.machinelearning.ru/wiki/index.php?title=JMLDA/Fig figure formatting is here]
== Todo A: Write an abstract ==
+
# upload your notebook to your github repository,
 +
# send the link to this notebook to mlalgorithms [at] gmail [dot] com, with the subject "Application m1p"
 +
* Example of a nice simple problem: [http://www.machinelearning.ru/wiki/index.php?title=%D0%9B%D0%B8%D0%BD%D0%B5%D0%B9%D0%BD%D0%B0%D1%8F_%D1%80%D0%B5%D0%B3%D1%80%D0%B5%D1%81%D1%81%D0%B8%D1%8F_%28%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80%29 bread regression].
 +
* Examples of plots: one many [https://github.com/Intelligent-Systems-Phystech/StartCode/blob/master/Hohlov2018Problem3/HW4.ipynb solutions] from this [https://github.com/Intelligent-Systems-Phystech/StartCode project].
 +
* Examples of old problems [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Example2014Code/Popova2014Problem7/fig1.fig Problem 7], [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Example2014Code/Plavin2014Problem1/html/ Problem 1], [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group174/Example2014Code/ShvetsProblem15/ Problem15].

Latest revision as of 13:05, 17 February 2024

The to-do lists here correspond to the Course schedule. Each list must be completed before the day of review. It is Wednesday 06:00 am for the 2020 Spring semester.

Todo T: Theoretical part

The theoretical part describes the proposed solution and declares its properties. The goal is to join the theoretical elements into a method. This method includes hypotheses, models, criteria, and the optimization algorithm.

  1. Write the solution of your problem
    • in a simple outline variant,
    • expand necessary details,
    • use algorithm LaTeX template.
  2. Compare notations in the problem statement, solution, and code. Make sure the code does not contradict the text.

Resources

Todo C: Code of the computational experiment

Organize your code so that the computational experiment runs every time with results stored.

  1. Set the only main file to run the experiment.
  2. Decompose the project code, and write functions and modules.
  3. Gather the experiment parameters in a special-purpose section.
    • A text description of the experiment flow helps.
  4. Set a procedure of historical version points to return to the previous experiment.
    • Commit schedule helps.
  5. Write named plots to a designated folder.
    • Write your results to a .tex-file and compile.
  • If your experiment run takes a long time, just cut the data set.
    • Do not use big or sophisticated data. Put your efforts to illustrate your main message.

Todo V: Visualize project

Set the list of plots that will be included in your paper and presentation.

  1. Make a plot of the source data.
    • Goal: put notations to the plot.
  2. List plots to illustrate the error analysis.
  3. Make a plot to show the main message.

Todo Update: Put project straight

  1. Check the proper folder structure (example make sure that your paper is not in the Code folder):
    • docs,
    • code,
    • data,
    • [figs].
  2. Put the direct link to the paper in the table, so that everyone could access it.
  3. Rename article.tex to Surname2020Title.tex
  4. Check the both .tex and .pdf files are downloaded.
  5. Update your personal page on Machinelearning.ru.

Todo X: Experiment planning

Plan your computational experiment.

  1. Discuss the experiment goal with your adviser and team.
    • Put this goal in the section Computational experiment
  2. Describe your basic data set, a synthetic, or a simple real one:
    • put in the text the title, source, and set up of measurements (it is the technical description, the theoretical one is in the problem statement section),
    • write down the number of objects, and features, describe general statistics,
    • for a synthetic data set describe the generation model, and its parameters (for example, uniform random independent sampling at some given interval).
  3. Describe the configuration of the algorithm run.
  4. Plan the whole experimental part.
  5. List expected tables and figures:
    • make short and long list, for each
    • describe axes,
    • make a draft with a pencil.

Resources

Todo B: Run basic code

Select the basic algorithm and run it using a simple data set.

  1. Run your basic algorithm:
    • select the simplest algorithm (with your adviser) to (partially) solve the problem you set.
  2. Collect a synthetic data set or download a simple real-word data set of small size.
  3. Upload your data to the repository (in case the data size exceed 5MB or the data set consists of numerous files, please discuss with your adviser and team).
  4. Run the basic algorithm on the synthetic data set, estimate the error.
  5. Describe the basic algorithm, analyst its features, list competitive models.

Resourses

  • Бахтеев О.Ю. Системы и средства глубокого обучения, статья
  • Мотренко А.П. Повышение качества классификации, статья
  • Исаченко Р.В. Снижение размерности в задаче декодирования, статья
  • Построение выборки в задачах прогнозирования, слайды
  • The IDEF standard for project planning

Todo R: Preliminary report

  1. Make sure that the obtained results are in no contradiction with the goals of the computational experiment.
  2. Illustrate the obtained results with the preliminary plot see the format.
  3. Write a mini-report on the results with
    1. a short description of the figure: what the reader could see, what are the consequences,
    2. the results in numbers and comments on it,
    3. put the report to the section computational experiment.

Todo P: Problem statement

In the paradigm Idea\(\to\)Formula\(\to\)Code state the problem to find an optimal solution.

  1. Discuss the problem statement with your adviser.
  2. See the examples below and in the past projects.
  3. Discuss terminology and notation see [pdf] and [tex] with notations and a useful style file.
  4. At the beginning of the Problem statement write a general problem description.
  5. Describe the elements of your problem statement:
    1. the sample set,
    2. its origin, or its algebraic structure,
    3. statistical hypotheses of data generation,
    4. [conditions of measurements] ,
    5. [restrictions of the sample set and its values],
    6. your model in the class of models,
    7. restrictions on the class of models,
    8. the error function (and its inference) or a loss function, or a quality criterion,
    9. cross-validation procedure,
    10. restrictions to the solutions,
    11. external (industrial) quality criteria,
    12. the optimization statement as \(\arg\min\).
  6. Define the main termini: what is called the model, the solution, and the algorithm.

Note that:

  • The model is a parametric family of functions to map design space to target space.
  • The criterion (error function) is a function to optimize in order to obtain an optimal solution (model parameters, a function).
  • The algorithm transforms solution space, usually iteratively.
  • The method combines a model, a criterion, and an algorithm to produce a solution.

Check it:

  • the regression model,
  • the sum of squared errors,
  • the Newton-Raphson algorithm,
  • the method of least squares.

Resources

  • Slides with a plan of Problem statement
  • Examples of problem statements
    1. Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183 article
    2. Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 article
    3. Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. article
    4. Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. article
    5. Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 draft
  • Notations for wiki Ru
  • Basic notations, pdf
  • Recommended notations, 2019: pdf and .tex with .sty)]
  • Simple and useful notations
  • Notations for Bayesian model selection, pdf


Todo A: Abstract

  1. Write a draft of your abstract.
  • The abstract shall not exceed 600 characters. It may contain:
    • wide-range field of the investigated problem,
    • narrow problem to focus on,
    • features and conditions of the problem,
    • [the novelty],
    • application to illustrate with.
  • For joint projects it is important that each team member writes its own text.

Resources

Todo B: Beginner's-talk

Short 45-second introductory talk. Plan of the talk:

  1. The project goal. What is the motivation, the goal to reach?
  2. The main idea. What is the message?
  3. The expected result. What is your delivery, your impact, novelty?

There is no time to show a slide or draw a plot on the blackboard. It is recommended to rehearse the report.

Todo I: Introduction

The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-art references. It delivers the main message of the work to the reader. This message shows the novelty of this work in comparison to recent results.

  1. Create a file ProjectN.bib for the group project, or Surname2018Title.bib for your personal project.
  2. Move from the file LinkReview useful bibliographic records in the BibTeX format.
    • Check the correctness of the BibTeX database (styles of authors' names, volumes of journals, page numbers).
    • Use bibliographic databases to facilitate your work.
    • Use the default style \bibliographystyle{plain} before the bibliography section \bibliography{ProjectN}.
    • Important! Wikipedia is not a source of information, but it contains many useful sources.
    • Important! ArXiv is not a peer-reviewed source of information. Look for copies of papers that are published in peer-reviewed scientific journals. If after one or two years after its ArXiv version, the paper did not appear in a peer-reviewed journal, be careful to use it: this paper might be non-verified since it was rejected by the other journals.
  3. Write Introduction. The expected size is one page. The expected plan is:
    1. the research goal (and its motivations),
    2. the object of research (introduce main termini),
    3. the problem (what is the challenge),
    4. methodology: literature review and state-of-the-art
    5. the project tasks,
    6. the proposed solution, its novelty and advantages,
    7. the profs and cons of recent works,
    8. goal of the experiment, set up, data sets, workflow.

The goal of this week is to comprehend the goal at its whole and write about it.

Resources

Todo L: Literature

We use the LinkReview draft format to share our evanescent ephemeral ideas and impressions we have during the literature reading.

  1. Collect the list of references including:
    1. state-of-the-art reviews, tutorials,
    2. fundamental solutions to the problem,
    3. the basic algorithm to solve your problem,
    4. alternative algorithms,
    5. [changes in the research directions],
    6. data sets and experiments,
    7. the papers that use these data sets
    8. applications of the results,
    9. names of researchers, who solve this problem,
    10. their students and teams,
    11. those, who refer to their works.
  2. Balance the list of the new and well-known works.
  3. Keep up-to-date the list of keywords to search with.
  4. Continuously fill your LinkReview.
  5. Plan Introduction (see the next todo list), namely collect
    • keywords as the basic termini; those who bring good search results are useful,
    • what the paper devoted to,
    • the investigated problem,
    • the central idea,
    • literature review,
    • the authors' contribution.


Todo 1: Select your project

To select your project:

  1. Look through the list of projects.
  2. Find information about the experts and consultants.
  3. Select your projects in the questionnaire before Wednesday 22:00pm.
  4. Wait for confirmation.
  5. Put confirmed topics to the Group table on Machine learning

Todo 0: Prepare necessary tools

  1. Editing. Install LaTeX: MikTeX for Windows, TeX Live for Linux, and for Mac OS. Sign up V2 OverLeaf ShareLaTeX.
  2. Install the editor TeXnic Center or its alternative WinEdt for Windows, TeXworks for Linux, and TeXmakerfor Mac OS.
  3. Download the paper template, ZIP and compile it.
  4. Read BibTeX.
  5. Install bibliographic collection software JabRef (can be postponed).
  6. Communications. Sign up GitHub.
    • Important: address and login like Name. Surname or Name-Surname (it depends on system conventions) is welcome.
    • Introductory sliders on Version Control System.
    • Introduction to GitHub.
    • The first steps in GitHub.
  7. Download a shell: Desktop.GitHub, or use a command line to synchronize your project.
  8. Sign up MachineLearning.ru. Send a logon to your coordinator of mlalgorithms [at] gmail [dot] com.
  9. To state a problem (write an essay) using notebook see example in MathJax.
  10. Install Hangouts, Skype - read instructions.
  11. Programming. Install Python Anaconda, PyCharm (alternative Visual Studio), Notebook online Google.Colab.
    • Development for ML: PyTorch
    • Style formatting: Codestyle pep8
  12. Add. As alternative install and try Matlab (MIPT provides free version), (alternative Octave), R-project, Wofram Mathematica.
  13. Add. Read with pleasure Кутателадзе С. С. Советы эпизодическому переводчику and Сосинский А. Б. Как написать математическую статью по-английски.

Resources

References to catch up

Todo -1: Subscribe to the course

Todo before 06:00 Wednesday, February 12 th:

  1. pick up a problem from the page Try-on programming problems (get the oldest problems, they are simpler),
  2. plot one figure to illustrate the problem (plot data or analysis),
  3. write explanatory comments to the figure (what the reader sees on the figure, what conclusions follow up),
  4. an example of the figure formatting is here
  5. upload your notebook to your github repository,
  6. send the link to this notebook to mlalgorithms [at] gmail [dot] com, with the subject "Application m1p"