Professor: Filipe A. N. Verri
Email: filipe.verri@gp.ita.br
Updates
July 26, 2025: Classes are confirmed to start on August 4, 2025, in room 209 at ICT Unifesp.
Course Program
Brief history of data science. Fundamental data concepts. Methodologies for data science projects. Structured data, database normalization, and tidy data. Data handling operators and their properties. Learning from data and principles of statistical learning theory. Data preprocessing tasks. Evaluation and validation of data science products.
Course Information
Important: Only graduate students are permitted to enroll in this course.
- Number of students: Approximately 20
- Course load: 3–0–0–4
- Schedule: Mondays, 8:00–11:00, starting August 4, 2025
- Classroom: Unifesp ICT, room 209 (Avenida Cesare Mansueto Giulio Lattes, n° 1201, Eugênio de Mello, São José dos Campos, SP, Brazil)
- Language: All classes will be given in English. Students are encouraged to ask questions in English, but Portuguese is also permitted. All written and oral assignments must be in English.
Prerequisites
- Advanced programming skills
- Strong statistical background
- Machine learning skills
Goals
Providing the theoretical foundation and practical concepts to develop an end-to-end data science project for an inductive task.
Teaching Methodology
Expository classes in a common classroom, using a whiteboard, slide presentations, coding examples, books, and scientific papers. Supplementary didactic materials will be available in this page. The development of the case study will occur during home study hours, including programming and scientific paper writing.
Assessment
Grading Components
- T₁, T₂: Individual written tests in the 1st quarter
- T₃: Individual written test in the 2nd quarter
- L: Group activity including:
- Writing a scientific paper (optional)
- Developing a data science product
- 30-minute presentation
Final Grade Calculation
Final grades will be calculated as:
√((T₁ + T₂ + T₃)/3 × L)
Case Study Project
Ideally, 3 groups will be formed. Each group will be responsible for a case study. Students must choose a real-world problem and develop a data science project, including:
- Data collection
- Data handling
- Inductive learning
- Validation
- Documentation
- Deployment
The results must be presented in a 30-minute presentation. Extra points will be awarded to groups that write a scientific paper about the case study. The trained models must be incorporated into a data science product, such as a web application, a mobile application, or a web service.
Bibliography
- Filipe A. N. Verri (2025). Data Science Project: An Inductive Learning Approach. Victoria, British Columbia, Canada: Leanpub. Available at https://leanpub.com/dsp.
- Nina Zumel & John Mount (2019). Practical Data Science with R.
- Hadley Wickham & Garrett Grolemund (2023). R for Data Science.
Any required extra material will be made available in this page.
Schedule
1st Quarter
Week | Topics |
---|---|
1 | Chapter 1: A brief history of data scienceReview: Mathematical foundations |
2 | Written test (60 min) and Chapter 2: Fundamental concepts |
3 | Chapter 3: Data science project |
4-5 | Chapter 4: Structured data |
6-7 | Chapter 5: Data handling |
8 | Written test (60 min) and Project discussions |
2nd Quarter
Week | Topics |
---|---|
1 | Chapter 6: Learning from data |
2 | Chapter 7: Data preprocessing |
3 | Chapter 8: Solution validation |
4 | Project discussions |
5 | Written test (60 min) and Project discussions |
6-7 | Project discussions |
8 | Presentations |
Presentation Details
At most, 3 case studies will be presented per day, with 30 minutes for each presentation and 20 minutes for questions.
A break of 1 week will be observed between the 1st and 2nd quarters.