Computer Science (COMP) 659

Statistical Language Processing for Text Analytics (Revision 1)

COMP 659 Course website

Delivery Mode: Group Study Online

Credits: 3

Area of Study: IS Elective

Prerequisite: Comp 501 (or an equivalent high-level programming language course) and the essentials of undergraduate-level probability and/or statistics, or course coordinator approval.

Background in related areas such as artificial intelligence (e.g., COMP 657) and linguitics will be an asset for this course.
Students who are concerned that they may not meet the prerequisities for this course are encouraged to contact the course coordinator before registering.

Faculty: Faculty of Science and Technology

Centre: School of Computing and Information Systems

Instructor: Dr. Dunwei (Grant) Wen

**Note: This is a graduate level course and students need to apply and be approved to one of the graduate programs or as a non-program School of Computing and Information Student graduate student in order to take this course. Minimum Admission Requirements must be met. Undergraduate students who do not meet admission requirement will not normally be permitted to take this course. http://scis.athabascau.ca/

Overview

There has been an increasing demand for better retrieval, processing, and analysis of textual information in modern society in recent years due to the availability of a huge and ever growing amount of textual data from both inside organizations and the Internet. Well known examples include web search engines (e.g., Google), document and content management systems, email filtering, social media sentiment analysis, automated question answering (e.g., IBM's Watson on Jeopardy!), natural language interfaces in games and mobile devices, and big data text analytics for business/competitive intelligence. Natural language processing (NLP), also known as computational linguistics, which aims to process and understand natural languages and text, is the driving force that makes these tasks and systems possible.
To better meet the needs of the big data era, this course focuses on the principles and technologies of statistical machine-learning-based NLP and their application in text analytics, including retrieval, extraction, recognition, and analysis of information from large textual collections.

Course Objectives

After completing this course, students should be able to

  • explain fundamental concepts, principles, and models of natural language processing.
  • explain the use of state-of-the-art statistical learning machines and tools for NLP and text analytics.
  • analyze large text collections by using suitable statistical NLP approaches.
  • design system structures and integrate open source components for statistical NLP and text analytics applications.
  • evaluate and improve the performance of a selected statistical learning machine for a specific NLP task.

Outline

This course covers the core topics in statistical NLP and several applications of text analytics as follows:

  • Unit 1: Linguistics and Statistics Essentials
  • Unit 2: Python for Text Processing
  • Unit 3: Language Models for Information Retrieval
  • Unit 4: Hidden Markov Models for POS Tagging
  • Unit 5: Probabilistic Grammar and Parsing
  • Unit 6: Statistical Machine Learning
  • Unit 7: Text Classification and Clustering
  • Unit 8: Semantic Structures and Parsing
  • Unit 9: Named Entity and Relation Extraction
  • Unit 10: Web Search and Question Answering
  • Unit 11: Topic Modeling, Opinion Mining, and Sentiment Analysis

Evaluation

To pass this course, students must achieve an average grade of at least 60% on each assignment.
To receive credit towards the Master of Science in IS for Electives/Career Track, students must achieve a course composite grade of at least C+ (66%).

The weighting of the composite grade is as follows:

Assessment Weight
Assignment 1: Concepts, Design, Demonstration, Reading 15%
Assignment 2: Demonstration, Reading, Analysis 15%
Assignment 3: Research Project 40%
Assignment 4: Research Paper 20%
Assignment 5: Discussion Forum Participation 10%

Course Materials

Required Materials

  • Articles from journals such as Computational Linguistics, Natural Language Engineering, and Machine Learning Journal and conference proceedings (e.g., ACL, ICML, NAACL, EMNLP, HLT, AAAI, IJCAI), all available through AU Library Services databases.
  • Selected materials from online resources including Wikipedia and NLP Tutorials.
  • Selected contents from a list of relevant books (all available online):

S. Bird, E. Klein, and E. Loper. 2009. Natural Language Processing with Python—Analyzing Text with the Natural Language Toolkit. OʼReilly Media. Available at http://www.nltk.org/book/

T., Hastie, R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2ed. Springer. Available at https://web.stanford.edu/~hastie/ElemStatLearn/

C.D. Manning, P. Raghavan, and H. Shutze. 2008. Introduction to Information Retrieval. Cambridge University Press. Available at http://nlp.stanford.edu/IR-book/

Reference Materials

D.Barber. 2012. Bayesian Reasoning and Machine learning. Cambridge University Press.

D. Jurafsky and J. H. Martin. 2008. Speech and Language Processing—An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2ed. Pearson Prentice Hall, Upper Saddle River, NJ.

C. Manning and H. Schutze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.

Other Materials

The remaining learning materials are distributed in electronic format on the course site.

Programming Languages and Software Tools

The Python programming language and Python-based open source machine learning and NLP software tools are mainly used in this course. However, students may select either Java or C++ as an alternative language and use its relevant open source machine learning and NLP tools for their assignments and research projects.

Special Course Features

COMP 659 is offered online and can be completed at the studentʼs workplace or home with a good Internet connection and a computer meeting certain requirements.

Special Note

Students registered in this course will NOT be allowed to apply for a course extension due to the nature of the course activities.
Athabasca University reserves the right to amend course outlines occasionally and without notice. Courses offered by other delivery methods may vary from their individualized-study counterparts.

Athabasca University reserves the right to amend course outlines occasionally and without notice. Courses offered by other delivery methods may vary from their individualized-study counterparts.

Opened in Revision 1

Updated August 22 2017 by Student & Academic Services