Difference between revisions of "Week 4"

Latest revision as of 13:50, 13 March 2025

The goal is to get the simplest possible solution to your problem: it is models and its parameters. So make the model fit data with the minimum of your efforts.

X: Experiment planning

Plan your computational experiment.

Discuss the experiment goal with your adviser
- and put this goal in the section Computational experiment
Describe your basic data set, a synthetic, or a simple real one:
- put in the text the title, source, and set up of measurements (it is the technical description, the theoretical one is in the problem statement section),
- write down the number of objects, and features, describe general statistics,
- for a synthetic data set describe the generation model, and its parameters (for example, uniform random independent sampling at some given interval).
Describe the configuration of the algorithm run.
Plan the whole experimental part.
List expected tables and figures:
- make short and long list, for each
- describe axes,
- make a draft with a pencil.

R: Preliminary report

Make sure that the obtained results do not logically contradict the goals of the computational experiment.
Illustrate the obtained results with the preliminary plot. Optimally this plot is hand-made. Just draw it with a pencil on a piece of paper. See for an example. For the final version use this format.
Write a mini-report on the results with
1. a short description of the figure: what the reader could see, what are the consequences,
2. the results in numbers and comments on it,
3. put the report to the section computational experiment.

B: Run basic code

Select the basic algorithm and run it using a simple data set.

Run your basic algorithm: select the simplest algorithm to get the fastest draft solution of the problem you set.
Collect a synthetic data set or download a simple real-world data set of small size.
Upload your data to the repository. If the data size exceeds 5MB or the data set consists of numerous files, please discuss with your adviser and team how to keep and share these data.
Do not use custom or client's data. Use only open-access data that are easy to download and use.
Run the basic algorithm on the synthetic data set, and estimate the error.
Describe the basic algorithm, analyze its features, and list competitive models. Here the examples of the description style.
1. Description refers to the name of some black box model. It is advisable to indicate the source, where the contents of the black box model are described in detail. The description specifies the structural parameters of the black box.
2. Description defines a model as a map from the design space of features to the space of target variables. Since the model has its parameters the description may refer to the algorithm for optimizing the model parameters in the form of a black box.
3. Description of the model and algorithm for optimizing its parameters in the form of pseudocode.

Resources to read

See examples inside the reports.

Системы и средства глубокого обучения, Бахтеев О.Ю.
Повышение качества классификации, Мотренко А.П.
Снижение размерности в задаче декодирования, Исаченко Р.В.

Mimic the goals of computational experiments.

An example of the measurement description is Old Faithful by Bishop C.P. Pattern recognition and machine learning, 2006. Pp. 677-683.

Dataset sources

Homework

Write the goal of your computational experiment. A couple of sentences help you focus your efforts.
Find the code that works to run the preliminary experiment.
Generate or find the simplest dataset. Avoid struggling with data.
Run the code on the simplest dataset.
Write a draft of your desired report and draw a plot for the error analysis.
Collect the letters:
1. X put the goal of your experiment into the section Computational Experiment.
2. R put the plan of the hypothesis testing and illustration after.
3. B upload the notebook or code with a basic experiment into your repository.

Prepare your project to Check 1, which starts this week and goes the next 2-3 weeks.

References

Research Methodology: Methods and Techniques by C.R. Kothari, 2004 pdf or pdf
Experiment planning: Questionable practices in machine learning by Gavin Leech et al., 2024

Site and docs generators

MkDocs is an easy one
Sphinx is more complex
An example is RelaxIt

@@ Line 1: / Line 1: @@
-Цель этой недели — в самый краткий срок получить, самое простое решение (модель и ее параметры), поставленной задачи.
+{{#seo:
+|title=Course My first scientific paper: Week 4
+|titlemode=replace
+|keywords=My first scientific paper
+|description=Course My first scientific paper: The goal is to get the simplest possible solution to your problem: it is models and its parameters.  So make the model fit data with the minimum of your efforts.
+ }}
+The goal is to get the simplest possible solution to your problem: it is models and its parameters.  So make the model fit data with the minimum of your efforts.
 ==X: Experiment planning ==
 Plan your computational experiment.
-# Discuss the experiment goal with your adviser and team.
+# Discuss the experiment goal with your adviser
-#* Put this goal in the section Computational experiment
+#* and put this goal in the section Computational experiment
 # Describe your basic data set, a synthetic, or a simple real one:
-#* put in the text the title, source and set up of measurements (it is the technical description, the theoretical one is in the problem statement section),
+#* put in the text the title, source, and set up of measurements (it is the technical description, the theoretical one is in the problem statement section),
-#* write down the number of objects, features, describe general statistics,
+#* write down the number of objects, and features, describe general statistics,
-#* for a synthetic data set describe the generation model, its parameters (for example, uniform random independent sampling some given interval).
+#* for a synthetic data set describe the generation model, and its parameters (for example, uniform random independent sampling at some given interval).
-# Describe the configuration of algorithm run.
+# Describe the configuration of the algorithm run.
 # Plan the whole experimental part.
 # List expected tables and figures:
@@ Line 17: / Line 24: @@
 ==R: Preliminary report ==
-# Make sure that the obtained results are in not logical (sic!) contradiction with the goals of the computational experiment.
+# Make sure that the obtained results ''do not logically contradict'' the goals of the computational experiment.
-# Illustrate the obtained results with the preliminary plot [http://www.machinelearning.ru/wiki/index.php?title=JMLDA/Fig see the format].
+# Illustrate the obtained results with the preliminary plot. Optimally this plot is hand-made. '''Just draw it with a pencil on a piece of paper.''' See [http://www.machinelearning.ru/wiki/images/3/30/Likelihood_handdrawn.pdf for an example]. For the final version [http://www.machinelearning.ru/wiki/index.php?title=JMLDA/Fig use this format].
 # Write a mini-report on the results with
 ## a short description of the figure: what the reader could see, what are the consequences,
@@ Line 27: / Line 34: @@
 Select the basic algorithm and run it using a simple data set.
-# Run your basic algorithm:
+# Run your basic algorithm: select the simplest algorithm to get the fastest draft solution of the problem you set.
-#* select a simplest algorithm (with your adviser) to (partially) solve the problem you set.
+# Collect a synthetic data set or download a simple real-world data set of small size.
-# Collect a synthetic data set or download a simple real-word data set of small size.
+# Upload your data to the repository. If the data size exceeds 5MB or the data set consists of numerous files, please discuss with your adviser and team how to keep and share these data.
-# Upload your data to the repository (in case the data size exceed 5MB or the data set consists of numerous files, please discuss with your adviser and team).
+# Do not use custom or client's data. Use only open-access data that are easy to download and use.
-# Run the basic algorithm on the synthetic data set, estimate the error.
+# Run the basic algorithm on the synthetic data set, and estimate the error.
-# Describe the basic algorithm, analyst its features, list competitive models.
+# Describe the basic algorithm, analyze its features, and list competitive models. Here the examples of the description style.
-## Описание - указание на название черного ящика. Желательно указывать на источник, где содержимое черного ящика описывается подробно. Указывать структурные параметры черного ящика.
+## Description refers to the name of some black box model. It is advisable to indicate the source, where the contents of the black box model are described in detail. The description specifies the structural parameters of the black box.
-## Описание модели как отображения из пространства описания объектов в пространство целевых переменных. При этом можно указать на алгоритм оптимизации параметров модели в виде черного ящика.
+## Description defines a model as a map from the design space of features to the space of target variables. Since the model has its parameters the description may refer to the algorithm for optimizing the model parameters in the form of a black box.
-## Описание модели и алгоритма оптимизации его параметров в виде псевдокода.
+## Description of the model and algorithm for optimizing its parameters in the form of pseudocode.
+==Resources to read==
+See examples inside the reports.
+# [https://m1p.org/papers/Bakhteev2016AWS.pdf Системы и средства глубокого обучения], Бахтеев О.Ю.
+# [https://m1p.org/papers/MolybogMotrenko2017DimRed.pdf Повышение качества классификации], Мотренко А.П.
+# [https://github.com/Intelligent-Systems-Phystech/2017-Isachenko-PLS/raw/master/doc/Isachenko2017PLS.pdf Снижение размерности в задаче декодирования],  Исаченко Р.В.
+Mimic the goals of computational experiments.
+#[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Grabovoy2018OptimalBrainDamage/doc/slides/Grabovoy2018OptimalBrainDamage.pdf А. Грабовой],
+#[http://svn.code.sf.net/p/mlalgorithms/code/Group474/Alekseev2017IntraTextCoherence/doc/Alekseev2017Presentation.pdf В. Алексеев],
+#[http://svn.code.sf.net/p/mlalgorithms/code/Group574/Rogozina2018StructurePredictionRNA/doc/slides/Rogozina2018RNAPredictionsSlides.pdf А. Рогозина],
+# [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/presentation/presentation.pdf И. Игашов],
+# [http://svn.code.sf.net/p/mlalgorithms/code/Group474/Uvarov2017DynamicGraphicalModels/slides/Uvarov2017DynamicGraphicalModels.pdf Н. Уваров]
-==Resources==
+An example of the measurement description is [http://www.machinelearning.ru/wiki/images/3/35/Old_Faithful_dataset_description.pdf Old Faithful] by Bishop C.P. Pattern recognition and machine learning, 2006. Pp. 677-683.
-* [http://www.machinelearning.ru/wiki/images/f/fc/M1p_lect4.pdf Slides for week 4].
-* [http://www.machinelearning.ru/wiki/images/4/45/M1p_lect4.pdf Video for week 4].
-* Бахтеев О.Ю. Системы и средства глубокого обучения, [http://strijov.com/papers/Bakhteev2016AWS.pdf статья]
-* Мотренко А.П. Повышение качества классификации, [http://strijov.com/papers/MolybogMotrenko2017DimRed.pdf статья]
-* Исаченко Р.В. Снижение размерности в задаче декодирования, [https://github.com/Intelligent-Systems-Phystech/2017-Isachenko-PLS/raw/master/doc/Isachenko2017PLS.pdf статья]
-* The goals of computational experiments [http://svn.code.sf.net/p/mlalgorithms/code/Group574/Grabovoy2018OptimalBrainDamage/doc/slides/Grabovoy2018OptimalBrainDamage.pdf А. Грабовой], [http://svn.code.sf.net/p/mlalgorithms/code/Group474/Alekseev2017IntraTextCoherence/doc/Alekseev2017Presentation.pdf В. Алексеев], [http://svn.code.sf.net/p/mlalgorithms/code/Group574/Rogozina2018StructurePredictionRNA/doc/slides/Rogozina2018RNAPredictionsSlides.pdf А. Рогозина], [https://github.com/Intelligent-Systems-Phystech/Group594/raw/master/Igashov2018ProteinLigandComplexes/presentation/presentation.pdf И. Игашов], [http://svn.code.sf.net/p/mlalgorithms/code/Group474/Uvarov2017DynamicGraphicalModels/slides/Uvarov2017DynamicGraphicalModels.pdf Н. Уваров]
-* Example of the measurement description, [http://www.machinelearning.ru/wiki/images/3/35/Old_Faithful_dataset_description.pdf Bishop C.P. Pattern recognition and machine learning, 2006. Pp. 677-683.]]
 <!-- * Построение выборки в задачах прогнозирования, [http://svn.code.sf.net/p/mvr/code/lectures/DataFest/Strijov2016Tutorial.pdf слайды]. EXTRACT The feature generation part-->
 <!-- * Постановка задачи прогнозирования дефолтов по картам на год вперед, [[Media:Strijov2018ProbStCardScoring.pdf|слайды]] -->
 <!-- * [http://www.machinelearning.ru/wiki/images/4/49/Strijov2019IDEF0.pdf The IDEF standard for project planning] OLD version -->
+Dataset sources
+# [https://medium.datadriveninvestor.com/top-8-sources-for-machine-learning-and-analytics-datasets-5d2d94ada8ab Top 8 Sources For Machine Learning Datasets]
+# [https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research List of data-sets for Machine Learning projects]
+# [https://datasetsearch.research.google.com/ Google dataset search]
+# [https://github.com/google-research-datasets Datasets released by Google Research]
+==Homework==
+# Write the goal of your computational experiment. A couple of sentences help you focus your efforts.
+# Find the code that works to run the preliminary experiment.
+# Generate or find the simplest dataset. Avoid struggling with data.
+# Run the code on the simplest dataset.
+# Write a draft of your desired report and draw a plot for the error analysis.
+# Collect the letters:
+## '''X''' put the goal of your experiment into the section Computational Experiment.
+## '''R''' put the plan of the hypothesis testing and illustration after.
+## '''B''' upload the notebook or code with a basic experiment into your repository.
+Prepare your project to Check 1, which starts this week and goes the next 2-3 weeks.
+==References==
+# Research Methodology: Methods and Techniques by C.R. Kothari, 2004 [https://www.academia.edu/22328603/Kothari_Research_Methodology_Methods_and_Techniques pdf] or [http://ndl.ethernet.edu.et/bitstream/123456789/79439/5/Research%20Methodology%20-%20Methods%20and%20Techniques%202004.pdf pdf]
+# Experiment planning: [https://arxiv.org/pdf/2407.12220 Questionable practices in machine learning] by Gavin Leech et al., 2024
+===Site and docs generators===
+# [https://www.mkdocs.org/ MkDocs] is an easy one
+# [https://www.sphinx-doc.org/en/master/ Sphinx] is more complex
+# An example is [https://intsystems.github.io/relaxit/index.html RelaxIt]

Navigation menu