Mattias Villani

Natural Born Bayesian

Theory of Statistics I

News

  • Added a link to interactive spreadsheets for 22 distributions.
  • Added a book I made on Statistical Distributions out of Wikipedia articles. See under Extra Material. Careful, Wikipedia articles are not always correct.
  • First exam and suggested solution is now posted under exam.
  • Solutions to Computer Lab 2 are now posted. The solutions are in R code (CompLab2.R) and are posted under Schedule below.
  • Three past exams are now posted under the section Exams. Two of them come with solutions, the third one without. Note that I did not write these exams. My exams will resemble these, but they will place a somewhat different emphasis on the different topics. For one thing, I will probably place less emphasis on hypothesis test and confidence intervals than what is done in these exams. And my course includes Bayesian methods, which was not part of the course for these past exams.
  • I have added another of those extra exercise sessions. On Wednesday, 19 at 1 pm.
  • I have added a link to info about the exams (time and registration dates). See under Exam.
  • The final section of CodeSnippets.R (this is true today, but remember that I add continuously to this file) finds the MLE for parameters in a Beta distribution, using numerical optimization.
  • Computer lab 2 is out, and so are my solutions (in R code) for Computer lab 1. See the Schedule below.
  • Change of lecture time. The lecture on October, 10 has been moved to the time 10-12 on the same day. New location, see schedule.
  • The student’s solutions manual to the course book can now be bought at internet book store, for example at Bokus.se.
  • The links to the Excel files for updating the Bernoulli and Normal models were broken. I have fixed that below, but I would suggest that you use the prettier Google Docs versions, if possible.
  • On Exercise 1 on Tuesday September 27 I will solve the problems given in the schedule below, but also some of the other problems that have been listed previously in the course.
  • I have added a Google Docs spreadsheet for learning about Bayesian inference in the Poisson model.
  • The problems for the first Computer Lab are posted below in the Schedule.
  • I have to attend a meeting on Thursday (Sept 22) afternoon. Lecture 9 has therefore been moved to 10-12 the same day. In room R6.
  • I have added two time slots on the schedule for exercises: E1 at September 27, and E2 at October 4.
  • The file CodeSnippets.R, below in the Code section, contains simple R code for some of the illustrations that I produce during the course. I will update and add code snippets to this file as the course progresses.
  • Stat teori I: Den extra föreläsningen blir tisdag 20:e sept i sal S37. Schema på webbsida och TimeEdit är uppdaterade
  • R is a great free open source easy-to-use programming language for statistical computations. There are thousands of packages with statistical routines for almost any imaginable field of statistics. Do this:
    1. Download and install R from http://ftp.sunet.se/pub/lang/CRAN/
    2. Download and install RStudio from www.rstudio.org. RStudio is a complete environment for R.
    3. Read the intro to R: http://cran.r-project.org/doc/manuals/R-intro.pdf
    4. Start writing code!
Aim

The aim of the course is to introduce statistical concepts and principles in enough detail to make it possible to perform statistical analyses in situations where standard textbook formulas do not apply. This requires a deeper and more mathematical understanding of probability and statistical inference. The focus will be on those part of the theory that will be most useful for practical statistical work.

Intended audience

This course is given mainly for students on the third year of the Bachelor’s programme Statistik och Dataanalys. It is also offered to students on the Master’s programme Statistics and Data Mining who have had little previous exposure to probability theory and statistical inference above the basic level.

Outline

The first half of the course is about probability theory. This part focuses on univariate and multivariate random variables and their distributions. Conditional distributions and distributions of functions of random variables is also treated in detail. Law of large numbers and central limit theorems. Simulation methods.

The second half of the course is concerned with statistical inference. Maximum likelihood and its properties is presented in detail. Bayesian inference is given an extensive treatment. Point and interval estimation, sampling distributions and hypothesis testing is also covered.

Organization

The course is organized into 14 lectures and 2 computer labs. The lectures include a presentation of the theory and its application in practical work. The theory is illustrated on problem solving exercises. The computer labs give the student an opportunity to deepen their understanding of the theory and its applications in a practical computer-aided setting. A detailed plan of the lectures and computer labs are given below.

Literature

  • Probability and Statistics by Degroot and Schervish, Pearson, Fourth edition, 2011. The book’s web site can be found here.
  • My Slides.
Lectures

What? When? Where? Read? Contents Exercise
Lecture 1 Mon Aug 29
13-15
A36
1.1-1.11 and 2.1-2.3
Slides 1
Review of basic probability calculus
1.7.5, 1.7.7, 1.8.7, 2.3.4, 2.3.13.
Lecture 2 Fri Sept 2
13-15
I102 3.1-3.3
Slides 2
Univariate random variables, density and distribution functions.
3.1.6, 3.2.2, 3.2.8, 3.3.4, 3.3.5.
Lecture 3 Mon Sept 5
13-15
D33 3.8 (only pages 167-169 and 172-173
Slides 2
Quantiles. Functions of random variables.
3.8.1, 3.8.2, 3.8.3, 3.8.6, 3.8.8.
Lecture 4 Thu Sept 8
13-15
R6 3.4-3.7
Slides 3
Bivariate, marginal, conditional and multivariate distributions.
3.4.4, 3.5.3, 3.6.2, 3.6.4, 3.7.8
Lecture 5 Mon Sept 12
13-15
D33
4.1-4.7
Slides 4
Mean, variance, moment generating function. Gauss approximation formulas. Conditional expectation and variance
4.1.1, 4.2.2, 4.2.4, 4.2.9, 4.3.1, 4.3.5, 4.4.6, 4.4.8, 4.5.3, 4.6.10
Lecture 6 Thu Sept 15
13-15
R6
5.1-5.2, 5.4, 5.6-5-10
Slides 5 
Common discrete and continuous distributions 5.2.6, 5.2.7, 5.4.8, 5.6.2, 5.6.6, 5.6.17, 5.7.1, 5.7.6, 5.8.3.
Lecture 7 Mon Sept 19
13-15
KY22 6.1-6.3 (skip the section ‘The Delta Method’)
Slides 6
Law of large numbers and central limit theorem.  6.2.2, 6.2.3, 6.2.5, 6.3.9.
Lecture 8 Tue Sept 20
13-15
S37
12.1-12.2, page 170-171, 12.3
Slides 7
Simulation
3.3.8 (solve by simulation), 3.8.11, 12.1.3, 12.3.4
Lecture 9 Thu Sept 22
13-15
R6  7.1-7.3
Slides 8
Statistical inference. Bayesian inference.
Computer 1 Mon Sept 26
13-15
PC1 Lab 1
Solutions to Lab 1
Simulating from common distributions. Functions of variables. Central limit theorem.
Exercise 1 Tue Sept 27
13-15
D37 Various exercises from the book 7.2.2, 7.2.10, 7.2.11, 7.3.10, 7.3.11.
Lecture 10 Thu Sept 29
13-15
KY22
7.1-7.3
Slides 9
Statistical inference. Bayesian inference.
Lecture 11 Mon Oct 3
13-15
A36
7.4-7.6
Slides 10
Point and interval estimation. Maximum likelihood. Method of moments.
Exercise 2 Tue Oct 4
13-15
KY22 Various exercises from the book 7.5.6, 7.5.7, 7.5.11, 7.6.2, 7.6.18, 7.6.20, 7.6.23.
Computer 2 Thu Oct 6
13-15
PC1 Lab 2
eBay dataset
Solutions to Lab 2
Maximum likelihood estimates and standard deviations from numerical optimization methods.
Lecture 12 Mon Oct 10
10-12
I204  8.1-8.2 and 8.4
Slides 11
Sampling distributions. Chi-squared and student-t.
Lecture 13 Thu Oct 13
13-15
D33  8.5, 8.7-8.8
Slides 12
Confidence intervals. Unbiased estimators. Fisher information.
Lecture 14 Mon Oct 17
13-15
D37
9.1, 9.5 and 9.7
Slides 13
Hypothesis testing
Exercise 3 Wed Oct 19
13-15
Corner room, Stat division
Various exercises from the book
7.3.19, 7.4.12, 7.5.9, 7.6.9, 8.1.9, 8.2.10, 8.5.6, 8.5.7, 8.7.1, 8.8.3, 8.9.15, 9.1.3, 9.5.4, 9.7.7.
Lecture 15 Thu Oct 20
13-15
A36  My slides Statistical inference in practical work
Exams

  • Exam 1 from November 3, 2011. Exam | Solutions
  • Exam 2 from January 21, 2012 ExamSolutions
  • Exam 3 from June 14, 2012 Exam | Solutions
  • Written exam. Here are the dates and registration info.
  • Past exam 1: Exam and Solutions. (Ignore question 4, we have not covered this Chi-square test in the course)
  • Past exam 2: Exam and Solutions.
  • Past exam 3: Exam. (Ignore question 4, we have not covered this Chi-square test in the course)
Extra material

Code

  • The file CodeSnippets.R contains simple R code for some of the illustrations that I produce during the course. I will update and add bits of code to this file as the course progresses.
  • The file CodeSnippets.nb contains simple Mathematica commands for the illustrations that I have used in the course. Download Mathematica Player if you don’t have Mathematica.
  • OptimizeSpam.R. Finding the posterior mode and approximate covariance matrix by numerical optimization methods. This code fits a logistic or probit regression model to the spam data from the bookElements of Statistical Learning. Its a good example since the optimization for the logistic model is very stable, but the probit is more problematic.