Program Overview
Our Data Science program offers comprehensive training in data science, statistics, artificial intelligence/machine learning, and mathematical modeling, including best coding practices. It teaches students both traditional methods, proven valuable over decades, and recent cutting-edge techniques. The program prepares graduates for academic research and practical work as AI engineers and data science practitioners.
Masters in Data Science
Course Schedule 2025
Course | Schedule | Duration | Sessions | Final Exam | ECTS |
---|---|---|---|---|---|
Machine Learning I | Monday | Jan 13 – Mar 17 | 10 Lectures • 10 TA | Mar 24 | 6 |
Methods in Software and Data Engineering | Tuesday | Jan 14 – Mar 18 | 10 Lectures • 10 TA | Mar 25 | 6 |
Fundamentals of Mathematical and Statistical Methods | Wednesday | Jan 15 – Mar 19 | 10 Lectures • 10 TA | Mar 26 | 6 |
Course | Schedule | Duration | Sessions | Final Exam | ECTS |
---|---|---|---|---|---|
Machine Learning II | Monday | Mar 31 – Jun 2 | 10 Lectures • 10 TA | Jun 9 | 6 |
Natural Language Processing & Sequence Data Processing | Tuesday | Apr 1 – Jun 3 | 10 Lectures • 10 TA | Jun 10 | 6 |
Regression Analysis | Wednesday | Apr 2 – Jun 4 | 10 Lectures • 10 TA | Jun 11 | 6 |
Course | Schedule | Duration | Sessions | Final Exam | ECTS |
---|---|---|---|---|---|
Applied Data Science Practicum | Monday | Jun 9 – Aug 11 | 10 Lectures • 10 TA | – | 30 |
Markets, Incentives & Game Theory | Tuesday | Jun 10 – Aug 12 | 10 Lectures • 10 TA | Aug 13 | 6 |
Course | Schedule | Duration | Sessions | Final Exam | ECTS |
---|---|---|---|---|---|
Generative Models | Monday | Sep 8 – Nov 10 | 10 Lectures • 10 TA | Nov 17 | 6 |
Maximum Likelihood Estimation | Tuesday | Sep 9 – Nov 11 | 10 Lectures • 10 TA | Nov 18 | 6 |
Time Series Analysis: Applications to Economics and Finance | Wednesday | Sep 10 – Nov 12 | 10 Lectures • 10 TA | Nov 19 | 6 |
Required: Tier 1 Courses
Methods in Software and Data Engineering
This course focuses on integrating methods in software and data engineering to provide a comprehensive understanding of data analysis. Students will learn to employ both Python and R to conduct thorough exploratory data analyses, enhancing their practical skills in data handling and visualization.
Technical Summary
This course focuses on integrating methods in software and data engineering to provide a comprehensive understanding of data analysis. Students will learn to employ both Python and R to conduct thorough exploratory data analyses, enhancing their practical skills in data handling and visualization. The course also emphasizes good software practices, ensuring participants can manage data effectively and maintain robust analysis processes. It is designed to equip learners with the essential tools and techniques for insightful data exploration and effective decision-making in real-world applications.
Fundamentals of Mathematical and Statistical Methods
This course is tailored to cover core concepts in probability theory and foundational statistics. It introduces students to the principles and mathematical tools necessary for understanding variability, uncertainty, and decision-making processes under conditions of uncertainty.
Technical Summary
This course is tailored to cover core concepts in probability theory and foundational statistics. It introduces students to the principles and mathematical tools necessary for understanding variability, uncertainty, and decision-making processes under conditions of uncertainty. Emphasizing theoretical understanding, the course prepares learners to grasp more complex topics in future studies.
Regression Analysis
This advanced course covers regression analysis topics in depth. Students will engage with a range of important regression methods, starting with basic linear regression and progressing through multiple regression techniques, including instrumental variable estimation.
Technical Summary
This advanced course covers regression analysis topics in depth. Regression analysis refers to a class of methods for predicting continuous variables based on observed data. The prediction itself may not be the ultimate goal. Instead, it may serve as a tool for understanding deeper insights regarding the nature of the phenomena responsible for generating the data.
Required: Tier 2 Courses
Maximum Likelihood Estimation
This course focuses on the application of maximum likelihood techniques in statistical modeling. Maximum likelihood estimation refers to a class of methods for predicting both discrete and continuous variables. The ultimate goal of the investigation may be either the prediction itself, or gaining deeper insights into mechanisms behind the data.
Technical Summary
This course focuses on the application of maximum likelihood techniques in statistical modeling. The course introduces maximum likelihood estimation and its general properties. Then it explores discrete-outcome models such as logit, probit, and Poisson regression. It equips students with the skills needed to apply these methods effectively in advanced statistical analysis.
Time Series Analysis:
Applications to Economics and Finance
This course thoroughly explores advanced time series models tailored for economic and financial data analysis. The course carefully introduces relevant details and nuances that are needed for a proper understanding of single-variable and multivariate time series data. It provides insights into different kinds to time dependence, including data seasonality. The models covered by the course can be used for forecasting or deeper understanding of the processes responsible for generating the data.
Technical Summary
This course thoroughly explores advanced time series models tailored for economic and financial data analysis. The course covers a wide range of topics starting with basic components such as AR (autoregressive) and MA (moving average) models, progressing to more complex structures like ARMA, ARIMA, and their seasonal counterparts SARIMA and SARIMAX. The course then proceeds to multivariate models for time series forecasting and analysis.
Machine Learning I
This course provides a detailed introduction to machine learning and its intersection with statistical methods, focusing on foundational concepts, advanced techniques, and practical applications across various domains. Students will explore a range of machine learning paradigms, which include both unlabelled and labelled data. They will also gain a detailed understanding of model evaluation techniques and performance metrics. The course addresses complexities in model building and shows how to make the model capture meaningful patterns in the data without taking too seriously aspects of the data that are present just by random chance. The methods discussed include generalized linear models and decision tree methods, including gradient boosted trees. The course also introduces students to the fundamentals of deep learning, including various neural network architectures and their optimization.
Technical Summary
This course provides a detailed introduction to machine learning and its intersection with statistical methods, focusing on foundational concepts, advanced techniques, and practical applications across various domains. Students will explore a range of machine learning paradigms such as supervised, unsupervised, and self-supervised learning, and gain a detailed understanding of model evaluation techniques and performance metrics. Further, the course addresses complexities in model building, discussing regularization, underfitting, and overfitting. A thorough examination of generalized linear models and decision tree methods, including gradient boosted trees, provides a robust statistical framework. The course also introduces students to the fundamentals of deep learning, including various neural network architectures and their optimization.
Machine Learning II
This course explores important applied areas of machine learning. For computer vision, the course provides detailed understanding two main types of models: convolutional neural networks and transformers. Convolutional neural networks are traditional neural networks used for image processing. Transformer neural networks are more recent and utilize the so called attention mechanism, which retrieves information from other parts of the data as needed. Best performing models often combine both of these neural network types. Through various visualization techniques, students will see firsthand the operation both convolutions and the attention mechanisms. Further, the course covers techniques that leverage unlabelled data, such as contrastive learning and masked autoencoders. The course goes beyond computer vision, exploring for example graph neural networks that are useful in many domains.
Technical Summary
This course explores important applied areas of machine learning. For computer vision, the course provides detailed understanding of convolutional neural networks and relevant transformer architectures, which utilize the attention mechanism. Through various visualization techniques, students will see firsthand the operation both convolutions and the attention mechanisms. Further, the course covers techniques that leverage unlabelled data, such as contrastive learning and masked autoencoders. The course goes beyond computer vision, exploring for example graph neural networks that are useful in many domains.
Natural Language Processing & Sequence Data Processing
This course offers an introduction to natural language processing (NLP) and a detailed examination of large language models (LLMs), focusing on their design, training strategies, and practical applications, as well as the technological environments they inhabit. It begins with foundational NLP techniques, emphasizing the role of word vectors and embeddings. (A word vector is simply a number of characteristics of the word that are related to its mearning.) The course then delves into the core aspects of LLMs, covering both models that predict one word (token) at a time and models that produce all of their output at once. Most of the models discussed in the course rely on the transformer architecture.
The curriculum further explores practical aspects of training and customizing large language models. Various useful customization techniques for open-source LLMs are examined, such as ordinary fine-tuning (retraining the whole model) and low-rank approximation (attaching a trainable module to a fixed original model). Additionally, the course highlights the practical uses of LLMs in advanced areas like retrieval-augmented generation (letting the model access information sources at generation time) or application of similarity search in large-scale settings.
Technical Summary
This course offers an introduction to natural language processing (NLP) and a detailed examination of large language models (LLMs), focusing on their design, training strategies, and practical applications, as well as the technological environments they inhabit. It begins with foundational NLP techniques, emphasizing the role of word vectors and embeddings. The course then delves into the core aspects of LLMs, covering both autoregressive models like GPT and non-autoregressive models such as denoising autoencoders. Students will understand in depth the original transformer architecture, learning about essential elements like multihead attention and positional encoding.
The curriculum further explores the practical aspects of training and customizing large language models, including generative pre-training, fine-tuning, reinforcement learning with human feedback, and prompt engineering. Various customization techniques for open-source LLMs are examined, such as ordinary fine-tuning and low-rank approximation. Additionally, the course highlights the practical uses of LLMs in advanced areas like retrieval-augmented generation, context filtering for retrieval tasks, and the application of similarity search libraries in large-scale settings.
Generative Models
This course provides an in-depth exploration of generative models for image and video data. These are artificial intelligence systems capable of creating images or videos with specified content. The models are composed of statistical variables. Their training builds on traditional statistical methods, such as Bayesian inference. Their functioning is also often related to physical phenomena. The course covers generative models such as variational autoencoders, diffusion models, and generative adversarial networks. Generative models of this kind are useful not just for generation itself, but also for capturing deeper meaning behind the data. They are capable of extracting most important aspects of the data, which is useful for many purposes, such as visualization, data uniderstanding, and data compression.
Technical Summary
This course provides an in-depth exploration of generative models for image and video data, emphasizing variational autoencoders (VAEs) and diffusion models, and explaining also other techniques, such as generative adversarial networks. The course starts with the original, basic variational autoencoder, explaining carefully the intuition behind its components and the evidence lower bound (ELBo), which is used to train it. The course also explores various hierarchical and/or vector-quantized variational autoencoders. For the diffusion model topic, the course explains the physics-related intuition behind the models and proceeds to denoising diffution probabilistic models. It then discusses very advanced techniques, such as classifier-free diffusion guidance, progressive distillation, and denoising diffusion implicit models. The methods are appropriately extended to image-sequence or video settings.
Markets, Incentives & Game Theory
This course covers the essentials of economics and game-theory. It introduces the game-theory perspective on the behavior of humans and other agents. After exploring interactions between few agents, the course provides a comprehensive introduction to markets with different market structures. Then the course covers applications to designing socially optimal policies, various microeconomic and macroeconomic issues, and financial markets.
Technical Summary
This course covers the essentials of economics and game-theory. It introduces the game-theory perspective on the behavior of humans and other agents. After exploring interactions between few agents, the course provides a comprehensive introduction to markets with different market structures. Then the course covers applications to designing socially optimal policies, various microeconomic and macroeconomic issues, and financial markets.
Required: Tier 3 (Capstone)
Applied Data Science Practicum
The Applied Data Science Practicum serves as the capstone experience for the MS in Data Science. Students entering the practicum have completed all other coursework. This course provides an opportunity for students to apply their cumulative knowledge and skills in a real-world setting.
Apply
We are accepting applications on a rolling basis for our Master’s Program in Data Science.