EITM 2016 Document Classification

One important problem with big data is finding meaningful associations in texts all the while automating as much of the process as we can. Text classification consists of assigning textual documents to one or more categories, based on the content of the document. In this module, we will examine three phases of text (or document) classification: Text annotation - how to annotate important features of text and how to use this information for document classification; Training -supervised, semi-supervised, and non-supervised approaches to text and document classification; Prediction (or classification) - classifying your texts and validating performance and accuracy. 

We will take a very hands-on approach, starting with assessing theories in terms of theoretical implications and matching those with textual features. We will learn how to annotate texts, use annotated texts in supervised and semi-supervised approaches, and classify (and cluster) political texts. Finally we will also spend some time learning how to communicate and visualise our results.

Course Syllabus 

Academic Term: 
Weekly Materials: 
WeekNameAttached files
1
Spam Email
1
Movie Reviews
1
Supreme Court Documents
1
Day 1 Readings
2
Twitter Sentiment Analysis
2
Day 2 Readings
3
State of the Union
3
Day 3 Readings