Computer Science (COMP) 682

Data Mining (Revision 2)

COMP 682 Course website

Delivery Mode: Individualized study online

Credits: 3

Area of Study: IS Core

Prerequisite: COMP 602 or equivalents. Students registering in this course will need to have some background on database systems and statistics. Student who are concerned about not meeting the prerequisite for this course are encouraged to contact the course coordinator before registering.

This course is not available for challenge credit

Faculty: Faculty of Science and Technology

Centre: School of Computing and Information Systems

Instructor: Dr. Larbi Esmahi

**Note: This is a graduate level course and students need to apply and be approved to one of the graduate programs or as a non-program School of Computing and Information Student graduate student in order to take this course. Minimum Admission Requirements must be met. Undergraduate students who do not meet admission requirement will not normally be permitted to take this course. http://scis.athabascau.ca/

Overview

Our ability to generate and collect data has been increasing rapidly. The widespread use of information technology in our lives has flooded us with a tremendous amount of data. This explosive growth of stored and transient data has generated an urgent need for new techniques and automated tools that can assist us in transforming this data into useful information and knowledge. Data Mining has emerged as a multidisciplinary field that addresses this issue.

This course discusses the techniques for preprocessing data before mining, and defines the concepts related to data warehousing, on-line analytical processing (OLAP), and data generalization. It presents methods for mining frequent patterns, associations, and correlations. It also describes methods for data classification and prediction, and data-clustering approaches.

Course Objectives

  • Interpret the contribution of data warehousing and data mining to the decision support level of organizations;
  • Evaluate different models used for OLAP and data pre-processing;
  • Categorize and carefully differentiate between situations for applying different data mining techniques: mining frequent pattern, association, correlation, classification, prediction, and cluster analysis;
  • Design and implement systems for data mining;
  • Evaluate the performance of different data mining algorithms;
  • Propose data mining solutions for different applications.

Outline

Unit 1: Overview of Data Mining

  • This unit provides some background on data objects and statistical concepts. It also discusses the type of data to be mined and presents a general classification of data-mining tasks.

Unit 2: Data Preprocessing

  • This unit introduces techniques for preprocessing data before mining. Concepts such as data cleaning, data integration, data reduction, data transformation, and data discretization are discussed.

Unit 3: Overview of Data Warehousing and OLAP

  • This unit provides a solid introduction to data warehousing, OLAP, and data generalization.

Unit 4: Data Cube Computation and multidimensional data analysis

  • This unit presents a detailed study of methods for data cube computation, advanced queries processing and multidimensional data analysis.

Unit 5: Mining Frequent Patterns, Associations, and Correlations

  • This unit presents methods for mining frequent patterns, associations, and correlations.

Unit 6: Classification

  • This unit discusses methods such as decision tree induction, Bayesian classification, rule-based classification, neural networks, support vector machines, associative classification, k-nearest neighbor classifier, case-based reasoning, genetic algorithms, rough sets, and fuzzy set approaches.

Unit 7: Cluster Analysis

  • This unit describes methods for data clustering. We discuss several major data-clustering approaches, such as partitioning methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods.

Unit 8: Outlier detection

  • This unit describes methods for anomaly detection. We discuss several major approaches, such as statistical approaches, proximity-based approaches,clustering-based approaches, and classification-based approaches.

Evaluation

In order to receive credit for COMP 682, you must achieve a cumulative course grade of "B-" (70 percent) or better, and must achieve an average grade of at least 60 percent on the assignments and project and achieve a grade of at least 60 percent on the Final Examination. Your cumulative course grade will be based on the following assessment.

The weighting of the composite grade is as follows:

Assessment Weight Due
Assignment 1 10% after Unit 3
Assignment 2 15% after Unit 5
Assignment 3 15% after Unit 7
Project 30% after Unit 8
Final Invigilated Examination 30% after Unit 8

Course Materials

Textbooks

Jiawei Han, Micheline Kamber, and Jian Pei. Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann, 2012. ISBN 978-0-12-381479-1.

Other References

Ian H. Witten, Eibe Frank, and Mark A. Hall. Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.). Morgan Kaufmann, 2011. ISBN 978-0-12-374856-0. (Available as an e-book through the Athabasca University Library.)

Other Materials

The remainder of the course materials are distributed through the online course site.

Special Course Features

COMP 682 is offered entirely online and can be completed at the student's workplace or home. Students will need to order the final examination four weeks prior to the course end date.

Athabasca University reserves the right to amend course outlines occasionally and without notice. Courses offered by other delivery methods may vary from their individualized-study counterparts.

Opened in Revision 2, November 26, 2012.