Masters Programs
Biostatistics and Data Science
  • Biostatistics and Data Science
    Master's Program
    Biostatistics and Data Science

    Curriculum

    Students must complete at least 36 credits to graduate; this can be accomplished within 16 months full-time, starting and endig with a Fall term. 

    To complete the program within 16 months, we recommend that students follow the schedule below. The Education Team will help you monitor your progression, but it is ultimately your responsibility to ensure you meet graduation requirements. 

    Course offerings and course availability are subject to change, but we will ensure changes do not elongate the program timeline. For each term, the required courses are listed before the electives.

    Fall 1 Term

    The required five courses make up 12 credits, with a choice of four elective courses worth three credits each. 

    |

    This course provides an introduction to important topics in biostatistical concepts and reasoning. Specific topics include tools for describing central tendency and variability in data, probability distributions, sampling distributions, estimation, and hypothesis testing. Assignments will involve computation using the R programming language.

    The course will describe and apply measures of disease incidence and prevalence, and measures of effect; explain the basic principles underlying different study designs (including descriptive, ecological, cross-sectional, cohort, case-control and intervention studies); assess strengths and limitations of different study designs; identify problems interpreting epidemiological data (chance, bias, confounding and effect modification); address validity, intra-rater reliability and inter-rater reliability.

    The course will describe methods related to categorical data analysis and basic concepts for censored data and Kaplan-Meier; students will learn how to select appreciate methods and how to interpret the results from categorical data analysis and Kaplan-Meier.

    This course provides an introduction to data science using both the R and python programming languages. In this course students will gain experience working directly with data to pose and answer questions. The course will be divided into two parts: the first part will be taught with the programming language R and the second with python. Topics covered include reproducible research, exploratory data analysis, data manipulation, data visualization techniques, simulation design, and unsupervised learning methods

    This is the capstone course of all masters-level graduate education programs. Its two aims are to: (1) help students to discover and develop effective ways of managing and working together with all the stakeholders within the healthcare field, and (2) accelerate a student's development of context awareness, integrative management, and industry skills that are needed to lead in a rapidly changing healthcare sector. This capstone course puts students in a new organization, one they don’t already know well, and gives them the chance to practice hitting the ground running. This culminating course provides a deeper preparation for the next stages of a student's career. The capstone project will last the entire duration of the MS program: the first term involves matching students with the right project, the second term has students working with their client, and the third term consists of a detailed report and final presentation in front of the client as well as faculty and fellow classmates.

    This course provides introduction to the statistical software SAS. Students will receive a hands-on training in data management and report generation with one of the most popular statistical software packages.

    This class teaches statistical methods that are particular to pharmaceutical studies and not routinely taught in conventional biostatistics courses. The first module, “Statistical Aspects of Phase I Clinical Trials,” will include 3+3 Design, accelerated titration; up and down designs; continual reassessment method (CRM), Modified CRM, TITE CRM, Bayesian Logistic Regression Model (BLRM), escalation with overdose control (EWOC), toxicity probability interval (TPI) and modified TPI (mTPI). The second module, “Statistical Aspects of Phase II Clinical Trials,” will include design and analyses for One stage and Simon’s Two Stage Designs, and Multi-arm Phase II design. The third and last module, “Statistical Aspects of Phase III Clinical Trials,” will include randomization, design and analysis for parallel, crossover, factorial, seamless Phase II/III, Adaptive and SMART designs.

    This course provides an introduction to the fundamentals of Python programming and the basics of data analysis and scientific computing techniques with Python. The course will teach basic programming components, including data structures, control flows, functions, and classes; data processing via the libraries – numpy and pandas; data visualization via the libraries -- matplotlib and seaborn; basic statistical analysis via the library – scipy; Monte Carlo method (including random number generation, simulation, and numerical integration) and numerical optimization and how they are applied in biostatistical and data science practices. This course will emphasize hands-on programming as well as the theory and methodology of computing techniques. Every week there will be a 1.5-hour lecture followed by a 1.5-hour in-class programming exercise (with some question-answering problems). The exercise is intensive and not expected to be finished within the class but serves as the weekly assignment that is required to be submitted before the following lecture. There will be a mid-term exam and a final group project (evaluated by report and presentation).

    The goal of this course is to introduce a core set of modern statistical concepts and techniques to the students, and to demonstrate how to use them to answer complex research questions in healthcare. The students will acquire knowledge on causal inference methods using machine learning, including directed acyclic graphs, non-parametric structural equation models, inverse probability weighting, g-computation, survival analysis, marginal structural models, longitudinal data, mediation analyses, effect modification, and precision medicine. This course will use the free software R to perform all statistical analysis.

    Spring 1 Term

    The typical course load is 12 or 15 credits (four or five courses, of which two are required). 

    |

    The focus of this course is theory and application of different types of regression analysis. Topics will include linear regression, logistic regression, and cox proportional hazards regression. Additional topics will include coding of explanatory variables, residual diagnostics, model selection techniques, random effects and mixed models, and maximum likelihood estimation. Homework assignments will involve 4 computations using the R statistical package.

    This course will introduce a combination of necessary skills for statistical consulting through lectures and small workshops. Guest lectures will be given from faculty at WMC, MSKCC, and other institutions on a variety of topics related to statistical consulting and collaboration. Relevant workshops will provide an in-class, hands-on experience.

    The goal of this course is to introduce a core set of modern statistical concepts and techniques to the students, and to demonstrate how to use them to answer complex research questions in healthcare. The students will acquire knowledge on causal inference methods using machine learning, including directed acyclic graphs, non-parametric structural equation models, inverse probability weighting, g-computation, survival analysis, marginal structural models, longitudinal data, mediation analyses, effect modification, and precision medicine. This course will use the free software R to perform all statistical analysis. 

    There has been an explosion of big data in medicine and healthcare. There are four main sources of such big data:  

    1. administrative databases in healthcare such as electronic health records and health insurance claims;
    2. biomedical imaging (e.g. MRI, CT-Scan, X-ray, etc.);
    3. sensors in smartphones, wearable and implantable devices; and
    4. genetics and genomics. It is difficult to navigate and critically assess the statistical methods and analytic tools that are needed to conduct analytics and research with such big biomedical data. 

    This course will introduce the four above-mentioned important sources of big data in medical studies, discuss the nuances and intricacies of how such data are generated, and introduce tools to navigate, visualize, and analyze such data..

    This course teaches tools that students will need to create, manage and maximize value from big databases. The emphasis is on design and implementation of relational databases and the use of Structured Query Language (SQL). At the end of this course, students will be able to explain the requirements for handling large and complex datasets; be able to design, build, and query a relational database; and understand how relational databases and big-data targeted tools complement one another.

    Summer 1 Term

    The typical course load is 3 credits.  

    |

    This is the culminating capstone course of all masters-level graduate education programs. It has two aims: 

    1. helping students to discover and develop new and effective ways of managing and working together with all the stakeholders within the healthcare field and
    2. helping accelerate a student's development of the context awareness, integrative management, and industry skills that are needed to lead in a rapidly changing healthcare sector. 

    This capstone course puts students in a new organization, one they don’t already know well, and gives them the chance to practice hitting the ground running. This culminating course provides a deeper preparation for the next stages of a student's career. The capstone project will last the entire year: the first term involves matching students with the right project, the second term has students working with their client, and the third term consists of a detailed report and final presentation in front of the client as well as faculty and fellow classmates.

    Fall 2 Term

    The typical course load is 6 or 9 credits. 

    |

    An independent biostatistician often encounters data collected on patients over a length of time, or data that are otherwise clustered. This course will give students the necessary tools to analyze such data, while building on the core biostatistics material they have learned from other courses. Specifically, the students will learn to use mixed-effect models, mixed-effect ANOVA, generalized linear mixed models (GLMM), mixed-effect Cox-regression, Bayesian hierarchical models, repeated measure and longitudinal data analysis with appropriate covariance structures. 

    The course starts with logistic regression and discriminant analysis with emphasis on classification and prediction. This course then moves on to more advanced topics such as regularized regression, resampling methods, tree-based methods and support vector machines.

    This course introduces students to the fundamentals of health services research, which evaluates interventions designed to improve healthcare. These interventions can include changes to the organization, delivery and financing of health care and various healthcare policies. Common outcome measures in health services research include (but are not limited to) patient safety, healthcare quality, healthcare utilization, and cost. Specific topics to be covered in this course include: refining your research question, identifying common research designs and their strengths and weaknesses, minimizing bias and confounding, selecting data sources, optimizing measurement, and more. There will also be a component of the course that explores how to present your 9 ideas and iteratively refine your work, based on feedback from peers and reviewers. This course includes both lectures and interactive group discussions. Students will be able to apply the methods learned in this course to their master’s research projects.

    This course provides introduction to the statistical software SAS. Students will receive a hands-on training in data management and report generation with one of the most popular statistical software packages.

    This class teaches statistical methods that are particular to pharmaceutical studies and not routinely taught in conventional biostatistics courses. The first module, “Statistical Aspects of Phase I Clinical Trials,” will include 3+3 Design, accelerated titration; up and down designs; continual reassessment method (CRM), Modified CRM, TITE CRM, Bayesian Logistic Regression Model (BLRM), escalation with overdose control (EWOC), toxicity probability interval (TPI) and modified TPI (mTPI). The second module, “Statistical Aspects of Phase II Clinical Trials,” will include design and analyses for One stage and Simon’s Two Stage Designs, and Multi-arm Phase II design. The third and last module, “Statistical Aspects of Phase III Clinical Trials,” will include randomization, design and analysis for parallel, crossover, factorial, seamless Phase II/III, Adaptive and SMART designs.

    This course provides an introduction to the fundamentals of Python programming and the basics of data analysis and scientific computing techniques with Python. The course will teach basic programming components, including data structures, control flows, functions, and classes; data processing via the libraries – numpy and pandas; data visualization via the libraries -- matplotlib and seaborn; basic statistical analysis via the library – scipy; Monte Carlo method (including random number generation, simulation, and numerical integration) and numerical optimization and how they are applied in biostatistical and data science practices. This course will emphasize hands-on programming as well as the theory and methodology of computing techniques. Every week there will be a 1.5-hour lecture followed by a 1.5-hour in-class programming exercise (with some question-answering problems). The exercise is intensive and not expected to be finished within the class but serves as the weekly assignment that is required to be submitted before the following lecture. There will be a mid-term exam and a final group project (evaluated by report and presentation).

    The goal of this course is to introduce a core set of modern statistical concepts and techniques to the students, and to demonstrate how to use them to answer complex research questions in healthcare. The students will acquire knowledge on causal inference methods using machine learning, including directed acyclic graphs, non-parametric structural equation models, inverse probability weighting, g-computation, survival analysis, marginal structural models, longitudinal data, mediation analyses, effect modification, and precision medicine. This course will use the free software R to perform all statistical analysis.

    Master's Project

    The Master's Project is required for masters-level graduate students in the Department of Population Health Sciences. The purpose of the Masters Project is to: 

    1. help students discover and develop new and effective ways of working with stakeholders within the healthcare field, and
    2. accelerate students’ development of context awareness, integrative management, and industry skills that are needed to be successful in a rapidly changing healthcare sector. The project provides a deeper foundation for the next stages of a student's career.   

    The Master's Project consists of the following courses, which are taken in sequence: 

    • Master's Project I (2 credits) involves a professional development course and matching students in a capstone project
    • Master's Project II (3 credits) introduces a combination of necessary skills for statistical consulting through lectures and small workshops. Guest lectures will be given from faculty at WMC, MSKCC, and other institutions on a variety of topics related to statistical consulting and collaboration. Relevant workshops will provide an in-class, hands-on experience.
    • Master's Project III (3 credits) consists of completing final deliverables, as well as an abstract, final poster, final paper, and final poster presentation in front of faculty, staff, and students
    Back to Top