Difference between revisions of "Week 3"

From m1p.org
Jump to: navigation, search
 
(35 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The goal is to understand '''the type of problem''' to state and solve.
+
{{#seo:
 +
|title=Course My first scientific paper: Week 3
 +
|titlemode=replace
 +
|keywords=My first scientific paper
 +
|description=Course My first scientific paper: The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-art references.
 +
}}
 +
''The goal is'' to understand the type of problem to state.
 +
 
 +
== I: Introduction ==
 +
The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-art references. It delivers the main message of the work to the reader. This message shows the novelty of this work in comparison to recent results.
 +
 
 +
Write Introduction. The expected size is one page. The expected plan is:
 +
# the research goal (and its motivations),
 +
# the object of research (introduce main termini),
 +
# the problem (what is the challenge),
 +
# methodology: literature review and state-of-the-art,
 +
# the project tasks,
 +
# the  proposed solution, its novelty, and advantages,
 +
# the pros and cons of recent works,
 +
# goal of the experiment, set up, data sets, workflow.
 +
 
 +
Include citation links to your Introduction.
 +
# Fulfill your .bib file, moving from ''LinkReview'' records in the BibTeX format.
 +
#* The best way is to use DOI when you add a new record in JabRef. It fills automatically.
 +
#* Otherwise, check the correctness of BibTeX records: DOI, styles of authors' names, volumes of journals, page numbers, etc.
 +
<!--#* Use the default style ''\bibliographystyle{plain}'' before the bibliography section ''\bibliography{ProjectN}''.-->
 +
 
 +
===Introduction from the Chief Editor's point of view===
 +
Three questions to answer:
 +
# What is the nearest alternative result?
 +
# What is the advantage?
 +
# What are the distinguished characteristics?
 +
 
 +
It follows the formula:
 +
The paper proposed a method (for) X, providing Y, and distinguished by Z.
 +
Sometimes the authors put it into the comparative table of three columns: 1) alternative methods with references, 2) strengths, 3) weaknesses.
  
 
== P: Problem statement ==  
 
== P: Problem statement ==  
 
In the paradigm Idea<math>\to</math>Formula<math>\to</math>Code state the problem to find an optimal solution.
 
In the paradigm Idea<math>\to</math>Formula<math>\to</math>Code state the problem to find an optimal solution.
 
# Discuss the problem statement with your adviser.  
 
# Discuss the problem statement with your adviser.  
# See the examples below and in the past projects.
+
# See the examples below and in past projects.
# Discuss terminology and notation see [pdf] and [tex] with notations and a useful style file.  
+
# Discuss terminology and notation. See [pdf] and [tex] with notations and a useful style file.  
# In the beginning of Problem statement write a general problem description.
+
# At the beginning of the Problem statement, write a general problem description.
 
# Describe the elements of your problem statement:
 
# Describe the elements of your problem statement:
 
## the sample set,  
 
## the sample set,  
 
## its origin, or its algebraic structure,
 
## its origin, or its algebraic structure,
 
## statistical hypotheses of data generation,
 
## statistical hypotheses of data generation,
## [conditions of measurements] ,  
+
## [conditions of measurements],  
 
## [restrictions of the sample set and its values],
 
## [restrictions of the sample set and its values],
 
## your model in the class of models,
 
## your model in the class of models,
 
## restrictions on the class of models,  
 
## restrictions on the class of models,  
## the error function (and its inference) or a loss function, or a quality criterion,
+
## the error function (and its inference) or a loss function, a quality criterion,
 
## cross-validation procedure,
 
## cross-validation procedure,
 
## restrictions to the solutions,
 
## restrictions to the solutions,
 
## external (industrial) quality criteria,
 
## external (industrial) quality criteria,
 
## the optimization statement as <math>\arg\min</math>.
 
## the optimization statement as <math>\arg\min</math>.
# Define the main termini: what is called the model, the solution, the algorithm.
+
# Define the main termini: what is called the model, the solution, and the algorithm.
  
Note that:
+
===Examples of problem statements===
* The '''model''' is a parametric family of functions to map design space to target space.
+
*#  Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142: 172-183 [http://m1p.org/papers/Katrutsa2014TestGenerationEn.pdf article]
* The '''criterion''' (error function) is a function to optimize in order to obtain an optimal solution (model parameters, a function).
+
*# Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 [http://m1p.org/papers/Katrutsa2016QPFeatureSelection.pdf article]
* The '''algorithm''' transforms solution space, usually iteratively.
+
*# Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255: 743-752 [http://m1p.org/papers/MotrenkoStrijovWeber2012SampleSize.pdf article]
* The '''method''' combines a model, a criterion, and an algorithm to produce a solution.
+
*# Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85: 221-230 [http://m1p.org/papers/Kulunchakov2014RankingBySimpleFun.pdf article]
 +
*# Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group074/Kuznetsov2013SSAForecasting/doc/Ivkin2013ProblemStatement.pdf?format=raw draft]
  
Check it:
+
===Tips for problem statement===
* the regression ''model'',
+
Introduce the proper terminology. Note that:
* the sum of squared ''errors'',
+
* ''The model'' is a parametric family of functions that map design space to target space.
* the Newton-Raphson ''algorithm'',
+
* ''The criterion'' (error function, metric) is a function to optimize and get an optimal solution (model parameters, a function).
* the ''method'' of least squares.
+
* ''The algorithm'' transforms solution space, usually iteratively.
 +
* ''The method'' combines a model, a criterion, and an algorithm to produce a solution. Check it:
 +
** the regression ''model'',
 +
** the sum of squared ''errors'',
 +
** the Newton-Raphson ''algorithm'',
 +
** the ''method'' of least squares.
  
==Resources==
+
==Notations==
* [http://www.machinelearning.ru/wiki/images/f/fc/M1p_lect3.pdf Slides for week 3].
+
# Notations for wiki [http://www.machinelearning.ru/wiki/index.php?title=%D0%A7%D0%B8%D1%81%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%B5_%D0%BC%D0%B5%D1%82%D0%BE%D0%B4%D1%8B_%D0%BE%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D1%8F_%D0%BF%D0%BE_%D0%BF%D1%80%D0%B5%D1%86%D0%B5%D0%B4%D0%B5%D0%BD%D1%82%D0%B0%D0%BC_%28%D0%BF%D1%80%D0%B0%D0%BA%D1%82%D0%B8%D0%BA%D0%B0%2C_%D0%92.%D0%92._%D0%A1%D1%82%D1%80%D0%B8%D0%B6%D0%BE%D0%B2%29/%D0%A0%D0%B5%D0%BA%D0%BE%D0%BC%D0%B5%D0%BD%D0%B4%D1%83%D0%B5%D0%BC%D1%8B%D0%B5_%D0%BE%D0%B1%D0%BE%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D1%8F Ru]
* [Video for week 3].
+
# Basic notations, [http://www.machinelearning.ru/wiki/images/c/c2/Strijov2013Notation.pdf pdf]
 +
# Practical [http://www.machinelearning.ru/wiki/images/4/41/NiceNotations.pdf notations]
 +
# Notations for Bayesian model selection, [http://www.machinelearning.ru/wiki/images/0/03/ABS_notations.pdf pdf]
 +
* A LaTeX style file with notations: [http://www.machinelearning.ru/wiki/images/0/0f/M1_Notation.pdf pdf] and [http://www.machinelearning.ru/wiki/images/6/6d/M1_Notation_source.zip .tex with .sty]
 +
# [https://nthu-datalab.github.io/ml/slides/Notation.pdf Machine learning notation] by Shan-Hung Wu
 +
# [https://github.com/vadim-vic/pub/raw/main/m1p/m1p_lect3_pronounce.pdf How to pronounce mathematical notations]
 +
 
 +
==Homework==
 +
# Use your notes from your LinkReview and write a version of the Introduction according to the plan [[Week_3#Introduction|plan]]. Prepare the letter '''I''' and discuss it with your consultant.
 +
# Look at the useful [[Week_3#Notations|notations]]. Select the essential notations and terms.
 +
# State your problem formally. It ends with the argmin statement. Together with your consultant prepare the letter '''P'''.
 +
# Keep in mind updating your GitHub repo.
 +
 
 +
==Resources 2024==
 +
*[[Media:m1p_2024_lect3_a.pdf|Slides, part a]]
 +
*[[Media:m1p_2024_lect3_b.pdf|Slides, part b]]
 +
*[https://www.youtube.com/watch?v=GSEBi3ttvbk Video]
 +
* Recommended notations: [http://www.machinelearning.ru/wiki/images/0/0f/M1_Notation.pdf pdf] and [http://www.machinelearning.ru/wiki/images/6/6d/M1_Notation_source.zip .tex with .sty]
 +
Old
 +
* [http://www.machinelearning.ru/wiki/images/f/fc/M1p_lect3.pdf Slides for week 3], slides [http://www.machinelearning.ru/wiki/images/5/55/M1p2022lect3.pdf 2022].
 +
* [https://www.youtube.com/watch?v=9ATqp5tyTWI Video for week 3].
 
* Slides [http://www.machinelearning.ru/wiki/images/b/b9/Strijov2020ProblStatement.pdf with a plan of Problem statement]
 
* Slides [http://www.machinelearning.ru/wiki/images/b/b9/Strijov2020ProblStatement.pdf with a plan of Problem statement]
* Examples of problem statements
+
 
*#  Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183 [http://strijov.com/papers/Katrutsa2014TestGenerationEn.pdf article]
+
 
*# Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 [http://strijov.com/papers/Katrutsa2016QPFeatureSelection.pdf article]
+
# Watch the slides in Resources <!--[https://youtu.be/6GNb1kiANNk?list=PLk4h7dmY2eYE2Lp2ScMRSGDxLIbJr4vJ8&t=4093 video].-->
*# Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. [http://strijov.com/papers/MotrenkoStrijovWeber2012SampleSize.pdf article]
+
# '''Request feedback''' for your project at its current landed state from consultants and instructors!
*# Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. [http://strijov.com/papers/Kulunchakov2014RankingBySimpleFun.pdf article]
 
*# Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 [https://sourceforge.net/p/mlalgorithms/code/HEAD/tree/Group074/Kuznetsov2013SSAForecasting/doc/Ivkin2013ProblemStatement.pdf?format=raw draft]
 
* Notations for wiki [http://www.machinelearning.ru/wiki/index.php?title=%D0%A7%D0%B8%D1%81%D0%BB%D0%B5%D0%BD%D0%BD%D1%8B%D0%B5_%D0%BC%D0%B5%D1%82%D0%BE%D0%B4%D1%8B_%D0%BE%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D1%8F_%D0%BF%D0%BE_%D0%BF%D1%80%D0%B5%D1%86%D0%B5%D0%B4%D0%B5%D0%BD%D1%82%D0%B0%D0%BC_%28%D0%BF%D1%80%D0%B0%D0%BA%D1%82%D0%B8%D0%BA%D0%B0%2C_%D0%92.%D0%92._%D0%A1%D1%82%D1%80%D0%B8%D0%B6%D0%BE%D0%B2%29/%D0%A0%D0%B5%D0%BA%D0%BE%D0%BC%D0%B5%D0%BD%D0%B4%D1%83%D0%B5%D0%BC%D1%8B%D0%B5_%D0%BE%D0%B1%D0%BE%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D1%8F Ru]
 
* Basic notations, [http://www.machinelearning.ru/wiki/images/c/c2/Strijov2013Notation.pdf pdf]
 
* Recommended notations, 2019: [http://www.machinelearning.ru/wiki/images/0/0f/M1_Notation.pdf pdf] and [http://www.machinelearning.ru/wiki/images/6/6d/M1_Notation_source.zip .tex with .sty)]]
 
* Simple and useful [http://www.machinelearning.ru/wiki/images/4/41/NiceNotations.pdf notations]
 
* Notations for Bayesian model selection, [http://www.machinelearning.ru/wiki/images/0/03/ABS_notations.pdf pdf]
 

Latest revision as of 16:49, 6 March 2025

The goal is to understand the type of problem to state.

I: Introduction

The introductory part includes research goals and motivations. It reasons the research with fundamental and state-of-the-art references. It delivers the main message of the work to the reader. This message shows the novelty of this work in comparison to recent results.

Write Introduction. The expected size is one page. The expected plan is:

  1. the research goal (and its motivations),
  2. the object of research (introduce main termini),
  3. the problem (what is the challenge),
  4. methodology: literature review and state-of-the-art,
  5. the project tasks,
  6. the proposed solution, its novelty, and advantages,
  7. the pros and cons of recent works,
  8. goal of the experiment, set up, data sets, workflow.

Include citation links to your Introduction.

  1. Fulfill your .bib file, moving from LinkReview records in the BibTeX format.
    • The best way is to use DOI when you add a new record in JabRef. It fills automatically.
    • Otherwise, check the correctness of BibTeX records: DOI, styles of authors' names, volumes of journals, page numbers, etc.

Introduction from the Chief Editor's point of view

Three questions to answer:

  1. What is the nearest alternative result?
  2. What is the advantage?
  3. What are the distinguished characteristics?

It follows the formula:

The paper proposed a method (for) X, providing Y, and distinguished by Z.

Sometimes the authors put it into the comparative table of three columns: 1) alternative methods with references, 2) strengths, 3) weaknesses.

P: Problem statement

In the paradigm Idea\(\to\)Formula\(\to\)Code state the problem to find an optimal solution.

  1. Discuss the problem statement with your adviser.
  2. See the examples below and in past projects.
  3. Discuss terminology and notation. See [pdf] and [tex] with notations and a useful style file.
  4. At the beginning of the Problem statement, write a general problem description.
  5. Describe the elements of your problem statement:
    1. the sample set,
    2. its origin, or its algebraic structure,
    3. statistical hypotheses of data generation,
    4. [conditions of measurements],
    5. [restrictions of the sample set and its values],
    6. your model in the class of models,
    7. restrictions on the class of models,
    8. the error function (and its inference) or a loss function, a quality criterion,
    9. cross-validation procedure,
    10. restrictions to the solutions,
    11. external (industrial) quality criteria,
    12. the optimization statement as \(\arg\min\).
  6. Define the main termini: what is called the model, the solution, and the algorithm.

Examples of problem statements

    1. Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142: 172-183 article
    2. Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017 article
    3. Motrenko A., Strijov V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255: 743-752 article
    4. Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85: 221-230 article
    5. Ivkin N.P. Feature generation for classification and forecasting problems, MIPT, 2013 draft

Tips for problem statement

Introduce the proper terminology. Note that:

  • The model is a parametric family of functions that map design space to target space.
  • The criterion (error function, metric) is a function to optimize and get an optimal solution (model parameters, a function).
  • The algorithm transforms solution space, usually iteratively.
  • The method combines a model, a criterion, and an algorithm to produce a solution. Check it:
    • the regression model,
    • the sum of squared errors,
    • the Newton-Raphson algorithm,
    • the method of least squares.

Notations

  1. Notations for wiki Ru
  2. Basic notations, pdf
  3. Practical notations
  4. Notations for Bayesian model selection, pdf
  1. Machine learning notation by Shan-Hung Wu
  2. How to pronounce mathematical notations

Homework

  1. Use your notes from your LinkReview and write a version of the Introduction according to the plan plan. Prepare the letter I and discuss it with your consultant.
  2. Look at the useful notations. Select the essential notations and terms.
  3. State your problem formally. It ends with the argmin statement. Together with your consultant prepare the letter P.
  4. Keep in mind updating your GitHub repo.

Resources 2024

Old


  1. Watch the slides in Resources
  2. Request feedback for your project at its current landed state from consultants and instructors!