Theory of Statistics I
News
 Added a link to interactive spreadsheets for 22 distributions.
 Added a book I made on Statistical Distributions out of Wikipedia articles. See under Extra Material. Careful, Wikipedia articles are not always correct.
 First exam and suggested solution is now posted under exam.
 Solutions to Computer Lab 2 are now posted. The solutions are in R code (CompLab2.R) and are posted under Schedule below.
 Three past exams are now posted under the section Exams. Two of them come with solutions, the third one without. Note that I did not write these exams. My exams will resemble these, but they will place a somewhat different emphasis on the different topics. For one thing, I will probably place less emphasis on hypothesis test and confidence intervals than what is done in these exams. And my course includes Bayesian methods, which was not part of the course for these past exams.
 I have added another of those extra exercise sessions. On Wednesday, 19 at 1 pm.
 I have added a link to info about the exams (time and registration dates). See under Exam.
 The final section of CodeSnippets.R (this is true today, but remember that I add continuously to this file) finds the MLE for parameters in a Beta distribution, using numerical optimization.
 Computer lab 2 is out, and so are my solutions (in R code) for Computer lab 1. See the Schedule below.
 Change of lecture time. The lecture on October, 10 has been moved to the time 1012 on the same day. New location, see schedule.
 The student’s solutions manual to the course book can now be bought at internet book store, for example at Bokus.se.
 The links to the Excel files for updating the Bernoulli and Normal models were broken. I have fixed that below, but I would suggest that you use the prettier Google Docs versions, if possible.
 On Exercise 1 on Tuesday September 27 I will solve the problems given in the schedule below, but also some of the other problems that have been listed previously in the course.
 I have added a Google Docs spreadsheet for learning about Bayesian inference in the Poisson model.
 The problems for the first Computer Lab are posted below in the Schedule.
 I have to attend a meeting on Thursday (Sept 22) afternoon. Lecture 9 has therefore been moved to 1012 the same day. In room R6.
 I have added two time slots on the schedule for exercises: E1 at September 27, and E2 at October 4.
 The file CodeSnippets.R, below in the Code section, contains simple R code for some of the illustrations that I produce during the course. I will update and add code snippets to this file as the course progresses.
 Stat teori I: Den extra föreläsningen blir tisdag 20:e sept i sal S37. Schema på webbsida och TimeEdit är uppdaterade
 R is a great free open source easytouse programming language for statistical computations. There are thousands of packages with statistical routines for almost any imaginable field of statistics. Do this:
1. Download and install R from http://ftp.sunet.se/pub/lang/CRAN/
2. Download and install RStudio from www.rstudio.org. RStudio is a complete environment for R.
3. Read the intro to R: http://cran.rproject.org/doc/manuals/Rintro.pdf
4. Start writing code!
Aim
The aim of the course is to introduce statistical concepts and principles in enough detail to make it possible to perform statistical analyses in situations where standard textbook formulas do not apply. This requires a deeper and more mathematical understanding of probability and statistical inference. The focus will be on those part of the theory that will be most useful for practical statistical work.
Intended audience
This course is given mainly for students on the third year of the Bachelor’s programme Statistik och Dataanalys. It is also offered to students on the Master’s programme Statistics and Data Mining who have had little previous exposure to probability theory and statistical inference above the basic level.
Outline
The first half of the course is about probability theory. This part focuses on univariate and multivariate random variables and their distributions. Conditional distributions and distributions of functions of random variables is also treated in detail. Law of large numbers and central limit theorems. Simulation methods.
The second half of the course is concerned with statistical inference. Maximum likelihood and its properties is presented in detail. Bayesian inference is given an extensive treatment. Point and interval estimation, sampling distributions and hypothesis testing is also covered.
Organization
The course is organized into 14 lectures and 2 computer labs. The lectures include a presentation of the theory and its application in practical work. The theory is illustrated on problem solving exercises. The computer labs give the student an opportunity to deepen their understanding of the theory and its applications in a practical computeraided setting. A detailed plan of the lectures and computer labs are given below.
Literature
 Probability and Statistics by Degroot and Schervish, Pearson, Fourth edition, 2011. The book’s web site can be found here.
 My Slides.
Lectures
What?  When?  Where?  Read?  Contents  Exercise 
Lecture 1  Mon Aug 29 1315 
A36 
1.11.11 and 2.12.3
Slides 1 
Review of basic probability calculus

1.7.5, 1.7.7, 1.8.7, 2.3.4, 2.3.13.

Lecture 2  Fri Sept 2 1315 
I102  3.13.3 Slides 2 
Univariate random variables, density and distribution functions.

3.1.6, 3.2.2, 3.2.8, 3.3.4, 3.3.5. 
Lecture 3  Mon Sept 5 1315 
D33  3.8 (only pages 167169 and 172173 Slides 2 
Quantiles. Functions of random variables. 
3.8.1, 3.8.2, 3.8.3, 3.8.6, 3.8.8.

Lecture 4  Thu Sept 8 1315 
R6  3.43.7 Slides 3 
Bivariate, marginal, conditional and multivariate distributions. 
3.4.4, 3.5.3, 3.6.2, 3.6.4, 3.7.8

Lecture 5  Mon Sept 12 1315 
D33 
4.14.7
Slides 4 
Mean, variance, moment generating function. Gauss approximation formulas. Conditional expectation and variance 
4.1.1, 4.2.2, 4.2.4, 4.2.9, 4.3.1, 4.3.5, 4.4.6, 4.4.8, 4.5.3, 4.6.10

Lecture 6  Thu Sept 15 1315 
R6 
5.15.2, 5.4, 5.6510
Slides 5 
Common discrete and continuous distributions  5.2.6, 5.2.7, 5.4.8, 5.6.2, 5.6.6, 5.6.17, 5.7.1, 5.7.6, 5.8.3. 
Lecture 7  Mon Sept 19 1315 
KY22  6.16.3 (skip the section ‘The Delta Method’) Slides 6 
Law of large numbers and central limit theorem.  6.2.2, 6.2.3, 6.2.5, 6.3.9. 
Lecture 8  Tue Sept 20 1315 
S37 
12.112.2, page 170171, 12.3
Slides 7 
Simulation 
3.3.8 (solve by simulation), 3.8.11, 12.1.3, 12.3.4

Lecture 9  Thu Sept 22 1315 
R6  7.17.3 Slides 8 
Statistical inference. Bayesian inference.  
Computer 1  Mon Sept 26 1315 
PC1  Lab 1 Solutions to Lab 1 
Simulating from common distributions. Functions of variables. Central limit theorem.  
Exercise 1  Tue Sept 27 1315 
D37  Various exercises from the book  7.2.2, 7.2.10, 7.2.11, 7.3.10, 7.3.11.  
Lecture 10  Thu Sept 29 1315 
KY22 
7.17.3
Slides 9 
Statistical inference. Bayesian inference.  
Lecture 11  Mon Oct 3 1315 
A36 
7.47.6
Slides 10 
Point and interval estimation. Maximum likelihood. Method of moments.  
Exercise 2  Tue Oct 4 1315 
KY22  Various exercises from the book  7.5.6, 7.5.7, 7.5.11, 7.6.2, 7.6.18, 7.6.20, 7.6.23.  
Computer 2  Thu Oct 6 1315 
PC1  Lab 2 eBay dataset Solutions to Lab 2 
Maximum likelihood estimates and standard deviations from numerical optimization methods.


Lecture 12  Mon Oct 10 1012 
I204  8.18.2 and 8.4 Slides 11 
Sampling distributions. Chisquared and studentt.  
Lecture 13  Thu Oct 13 1315 
D33  8.5, 8.78.8 Slides 12 
Confidence intervals. Unbiased estimators. Fisher information.  
Lecture 14  Mon Oct 17 1315 
D37 
9.1, 9.5 and 9.7
Slides 13 
Hypothesis testing


Exercise 3  Wed Oct 19 1315 
Corner room, Stat division 
Various exercises from the book

7.3.19, 7.4.12, 7.5.9, 7.6.9, 8.1.9, 8.2.10, 8.5.6, 8.5.7, 8.7.1, 8.8.3, 8.9.15, 9.1.3, 9.5.4, 9.7.7.  
Lecture 15  Thu Oct 20 1315 
A36  My slides  Statistical inference in practical work 
Exams
 Exam 1 from November 3, 2011. Exam  Solutions
 Exam 2 from January 21, 2012 Exam  Solutions
 Exam 3 from June 14, 2012 Exam  Solutions
 Written exam. Here are the dates and registration info.
 Past exam 1: Exam and Solutions. (Ignore question 4, we have not covered this Chisquare test in the course)
 Past exam 2: Exam and Solutions.
 Past exam 3: Exam. (Ignore question 4, we have not covered this Chisquare test in the course)
Extra material
 Interactive spreadsheets for 22 common statistical distributions.
 Collection of Wikipedia articles on Statistical Distributions.
 Informative clickable chart with relations between distributions: http://www.johndcook.com/distribution_chart.html.
 Spreadsheet file for learning about the priortoposterior updating of the parameters in the Bernoulli model. Google Docs  Excel file
 Spreadsheet file for learning about the priortoposterior updating of the parameters in the Normal model. Google Docs  Excel file
 Spreadsheet file for learning about the priortoposterior updating of the parameters in the Poisson model. Google Docs
Code
 The file CodeSnippets.R contains simple R code for some of the illustrations that I produce during the course. I will update and add bits of code to this file as the course progresses.
 The file CodeSnippets.nb contains simple Mathematica commands for the illustrations that I have used in the course. Download Mathematica Player if you don’t have Mathematica.
 OptimizeSpam.R. Finding the posterior mode and approximate covariance matrix by numerical optimization methods. This code fits a logistic or probit regression model to the spam data from the bookElements of Statistical Learning. Its a good example since the optimization for the logistic model is very stable, but the probit is more problematic.