Mattias Villani

Natural Born Bayesian

Text Mining


See the official web page for updated info:

This my web page for our brand new master and PhD course in Text Mining, which is given in the spring 2013.

The course is a collaboration between the following research labs at the Department of Computer Information Science at Linköping University:

  • Statistics
  • Natural Language Processing
  • Database and Web Information Systems

The course aims to show how to textual data can be retrieved, linguistically pre-processed and subsequently analyzed quantitatively using formal statistical methods and models. The course brings together expertise from the areas of database methodology, computational linguistics and statistics.
The course proceeds in four stages:

  1.  Introductory modules
    • Introduction to Python programming
    • Introduction to statistical modeling
    • Introduction to computational linguistics
  2. Data models and information retrieval for textual data
  3. Statistical models for textual data
  4. Text mining project

The course consists of lectures, lab exercises and a text mining project. The lectures are devoted to presentations of concepts, and methods. The computer exercises are devoted to practical application of text mining tools. In the project work, the student will get hands-on experience in solving a text mining problem.
Language of instruction: English.

Text mining project report. Written reports on lab assignments.

Students entering the course should have been admitted to a master’s programme in Computer Science, Cognitive Science or Statistics, or similar master’s programmes. Advanced students in bachelor’s programmes in engineering may also be admitted to the course. In addition, the equivalent of 18 ECTS credits in Statistics and Computer Science is required, with at least 6 ECTS in both Statistics and Computer Science.