Carnegie Mellon University

Statistics/Public Policy Joint Ph.D. Degree

Elevate your career with our innovative Ph.D. in Statistics and Public Policy, offered in collaboration with Heinz College.

Over five years, you’ll gain expertise in both Statistics and Public Policy through a dynamic curriculum that merges coursework from each field. You will also benefit from close faculty collaboration, and have advisors dedicated to your success from both disciplines. Lastly, you also have the unique opportunity to split your teaching assistantship between departments, providing a well-rounded educational journey.

Learn how to apply

The Path to the Ph.D.

Below is one possible three-year plan of study for students to complete the coursework requirements. Adjustments can be made for cases where students need to build additional background in a particular area.

Students in this program are subject to all of the core Ph.D. requirements.

The actual curriculum for any given student will be tailored to her or his interests and needs, but the general strategy is similar: to meld the two sets of Ph.D. requirements into a coherent and useful set of courses, with similar core items. The first four semesters cover the main courses for the Ph.D. in Statistics while simultaneously introducing the student to the core disciplines at Heinz College. In the fourth semester, students begin work on the second Heinz research paper, which also satisfies the Advanced Data Analysis (ADA) requirement in Statistics.

Year 1

Fall

  • 36-699: Immigration to Statistics
  • 36-707: Regression Analysis
  • 36-705: Intermediate Statistics
  • 90-908: Microeconomics
  • 90-901: Heinz Ph.D. Seminar I
Spring

  • 36-757: Advanced Data Analysis I
  • 36-709: Advanced Statistical Theory I
  • 36-708: Statistical Methods in Machine Learning
  • 90-902: Heinz Ph.D. Seminar II

Year 2

Fall

  • 36-758: Advanced Data Analysis II
  • 36-750: Statistical Computing
  • 90-907: Econometric Theory and Methods
  • 90-918: Heinz Ph.D. Seminar III
Spring

  • Complete 1st Heinz/ADA paper
  • Heinz curriculum requirements elective
  • Start 2nd Heinz paper

Year 3

  • Complete 2nd Heinz/ADA paper
  • Begin work toward thesis proposal

Course Descriptions

Year 1 - Fall

Students are introduced to the faculty and their interests, the field of statistics, and the facilities at Carnegie Mellon. Each faculty member gives at least one elementary lecture on some topic of his or her choice. In the past, topics have included: the field of statistics and its history, large-scale sample surveys, survival analysis, subjective probability, time series, robustness, multivariate analysis, psychiatric statistics, experimental design, consulting, decision-making, probability models, statistics and the law, and comparative inference. Students are also given information about the libraries at Carnegie Mellon and current bibliographic tools. In addition, students are instructed in the use of the Departmental and University computational facilities and available statistical program packages.

This course covers the fundamentals of theoretical statistics. Topics include: probability inequalities, point and interval estimation, minimax theory, hypothesis testing, data reduction, convergence concepts, Bayesian inference, nonparametric statistics, bootstrap resampling, VC dimension, prediction and model selection.

This course covers the basic principles of causality. Foundations of linear regression, including theory, computation, diagnostics, and generalized linear models. Extensions to nonparametric regression, including splines, kernel regression, and generalized additive models. Discussion of tools to compare statistical models, including hypothesis tests, cross-validation, and bootstrapping. Topics in nonparametric regression and machine learning as time permits, such as regression trees, boosting, and random forests. Emphasis on writing data analysis reports that answer substantive scientific methods with appropriate statistical tools. Students will be equipped with the tools needed to explore a substantive scientific question with data, translate scientific questions into statistical questions, compare different modeling approaches rigorously, and write their results in a clear manner.

This course provides a semester-long introduction to microeconomic analysis and its application. The primary objective of the course is to familiarize students with the microeconomic paradigm and develop an appreciation of the usefulness (and limitations) of microeconomic analysis. A further goal of the course is to develop and exercise students' ability to use economic analysis in examining applied issues and more generally to help students acquire formal modeling skills -- the ability to reduce real-world problems to useful and mathematically tractable representations. There are no formal prerequisites for this course. However, students are assumed to have a solid working knowledge of multivariate calculus.

This seminar is intended for all first-year Ph.D. students in the Heinz College to introduce the research process and bring first-year students into the intellectual community at Heinz. The seminar includes presentations by Heinz College faculty, student presentations of existing research, and presentations from second-year Heinz College students. Through these experiences, students will - Become familiar with the kinds of research being done at Heinz - Find faculty and students with similar research interests - Understand the unseen steps (and missteps) along the way to a published paper - Experience presenting technical research to a diverse audience - Learn to provide helpful feedback on research presentations

Year 1 - Spring

This course focuses on statistical methods for machine learning, a decades-old topic in statistics that now has a life of its own, intersecting with many other fields. While the core focus of this course is methodology (algorithms), the course will have some amount of formalization and rigor (theory/derivation/proof), and some amount of interacting with data (simulated and real). However, the primary way in which this course complements related courses in other departments is the joint ABCDE focus on (A) Algorithm design principles, (B) Bias-variance thinking, (C) Computational considerations (D) Data analysis (E) Explainability and interpretability.

This is a core Ph.D. course in theoretical statistics. The class will cover a selection of modern topics in mathematical statistics, focussing on high-dimensional parametric models and non-parametric models. The main goal of the course is to provide the students with adequate theoretical background and mathematical tools to read and understand the current statistical literature on high-dimensional models. Topics will include: concentration inequalities, covariance estimation, principal component analysis, penalized linear regression, maximal inequalities for empirical processes, Rademacher and Gaussian complexities, non-parametric regression and minimax theory. This will be the first part of a two-semester sequence.

Advanced Data Analysis (ADA) is a Ph.D.-level seminar on advanced methods in statistics, including computationally intensive smoothing, classification, variable selection and simulation techniques. During 36-757, you work with the seminar instructor to identify an ADA project for yourself. The ADA project is an extended project in applied statistics, done in collaboration with an investigator from outside the Department, under the guidance of a faculty committee, culminating in a publishable paper that is presented orally and in writing in 36-758.

Students, by now should have picked a broader field in which they plan to write their first paper. The seminar would require students to first discuss some key findings of that field and set the stage for discussing their chosen research question and how they plan to execute it.

 

Year 2 - Fall

A detailed introduction to elements of computing relating to statistical modeling, targeted to Ph.D. students and masters students in Statistics & Data Science. Topics include important data structures and algorithms; numerical methods; databases; parallelism and concurrency; and coding practices, program design, and testing. Multiple programming languages will be supported (e.g., C, R, Python, etc.). Those with no previous programming experience are welcome but will be required to learn the basics of at least one language via self-study.

Advanced Data Analysis (ADA) is a Ph.D. level seminar on advanced methods in statistics, including computationally intensive smoothing, classification, variable selection and simulation techniques. During 36-757, you work with the seminar instructor to identify an ADA project for yourself. The ADA project is an extended project in applied statistics, done in collaboration with an investigator from outside the Department, under the guidance of a faculty committee, culminating in a publishable paper that is presented orally and in writing in 36-758.

This course covers a number of econometric models and techniques that are commonly used in applied microeconomics. The core topics include a general framework for estimators (which includes maximum likelihood and generalized method of moments), discrete outcome models, sample selection (and related limited dependent variable or switching models), duration and count models, time series models, panel data models, variance estimation (including clustering and the bootstrap), and non-parametric techniques. The course is designed for Ph.D. students who have completed 90-907 (Ph.D. Econometrics I) or an equivalent course.

Students will work with their faculty advisor(s) to develop a draft of their First Research Paper and present that paper to the Ph.D. Seminar I for feedback on its quality.