In the digital age, techniques to automatically process textual content have become ubiquitous. Given the breakneck speed at which people produce and consume textual content online – e.g., on micro-blogging and other collaborative Web platforms like wikis, forums, etc. – there is an ever-increasing need for systems that automatically understand human language, answer natural language questions, translate text, and so on. This class will provide a complete introduction to state-of-the-art principles and methods of Natural Language Processing (NLP). The main focus will be on statistical techniques, and their application to a wide variety of problems. This is because statistics and NLP are nowadays highly intertwined, since many NLP problems can be formulated as problems of statistical inference, and statistical methods, in turn, represent de-facto the standard way to solve many, if not the majority, of NLP problems. Covered topics will include:
-
Words
-
Language Modeling
-
Part-Of-Speech Tagging
-
Syntax
-
Semantics and pragmatics
-
Computational Lexical Semantics
-
Computational Discourse
-
Applications
-
Topic Modeling
-
Information Extraction
-
Question Answering and Summarization
-
Statistical Alignment and Machine Translation
Coursework will include homework assignments and a final exam. Homework assignments are meant to introduce the students to the problems that will be covered in the final exam at the end of the course.