Week 4
From m1p.org
The goal is to get the simplest possible solution to your problem: it is models and its parameters. So make the model fit data with the minimum of your efforts.
Contents
X: Experiment planning
Plan your computational experiment.
- Discuss the experiment goal with your adviser
- and put this goal in the section Computational experiment
- Describe your basic data set, a synthetic, or a simple real one:
- put in the text the title, source, and set up of measurements (it is the technical description, the theoretical one is in the problem statement section),
- write down the number of objects, and features, describe general statistics,
- for a synthetic data set describe the generation model, and its parameters (for example, uniform random independent sampling at some given interval).
- Describe the configuration of the algorithm run.
- Plan the whole experimental part.
- List expected tables and figures:
- make short and long list, for each
- describe axes,
- make a draft with a pencil.
R: Preliminary report
- Make sure that the obtained results do not logically contradict the goals of the computational experiment.
- Illustrate the obtained results with the preliminary plot. Optimally this plot is hand-made. Just draw it with a pencil on a piece of paper. See for an example. For the final version use this format.
- Write a mini-report on the results with
- a short description of the figure: what the reader could see, what are the consequences,
- the results in numbers and comments on it,
- put the report to the section computational experiment.
B: Run basic code
Select the basic algorithm and run it using a simple data set.
- Run your basic algorithm: select the simplest algorithm to get the fastest draft solution of the problem you set.
- Collect a synthetic data set or download a simple real-world data set of small size.
- Upload your data to the repository. If the data size exceeds 5MB or the data set consists of numerous files, please discuss with your adviser and team how to keep and share these data.
- Do not use custom or client's data. Use only open-access data that are easy to download and use.
- Run the basic algorithm on the synthetic data set, and estimate the error.
- Describe the basic algorithm, analyze its features, and list competitive models. Here the examples of the description style.
- Description refers to the name of some black box model. It is advisable to indicate the source, where the contents of the black box model are described in detail. The description specifies the structural parameters of the black box.
- Description defines a model as a map from the design space of features to the space of target variables. Since the model has its parameters the description may refer to the algorithm for optimizing the model parameters in the form of a black box.
- Description of the model and algorithm for optimizing its parameters in the form of pseudocode.
Resources to read
See examples inside the reports.
- Системы и средства глубокого обучения, Бахтеев О.Ю.
- Повышение качества классификации, Мотренко А.П.
- Снижение размерности в задаче декодирования, Исаченко Р.В.
Mimic the goals of computational experiments.
An example of the measurement description is Old Faithful by Bishop C.P. Pattern recognition and machine learning, 2006. Pp. 677-683.
Dataset sources
- Top 8 Sources For Machine Learning Datasets
- List of data-sets for Machine Learning projects
- Google dataset search
- Datasets released by Google Research
Homework
- Write the goal of your computational experiment. A couple of sentences help you focus your efforts.
- Find the code that works to run the preliminary experiment.
- Generate or find the simplest dataset. Avoid struggling with data.
- Run the code on the simplest dataset.
- Write a draft of your desired report and draw a plot for the error analysis.
- Collect the letters:
- X put the goal of your experiment into the section Computational Experiment.
- R put the plan of the hypothesis testing and illustration after.
- B upload the notebook or code with a basic experiment into your repository.
Prepare your project to Check 1, which starts this week and goes the next 2-3 weeks.
References
- Research Methodology: Methods and Techniques by C.R. Kothari, 2004 pdf or pdf
- Experiment planning: Questionable practices in machine learning by Gavin Leech et al., 2024