Data mining


Aim of the course

To present the advanced technologies of modern databases related to the collection and analysis of huge data sets. The course introduces basic concepts, methods and algorithms used in the technologies of data warehousing and data mining in different data repositories. Students learn about the real problems of practical implementation of these systems.

Lecture programme

Basic issues related to the domain of data mining (pre-processing of data, methods, methodologies). The basic architecture of data integration based on data warehouses, ETL software, the issue of detecting changes in data sources, the characterization of analytical processing OLAP, multidimensional data model, and its implementation in relational ROLAP servers (star scheme, snowflake, constellation of facts) and multidimensional MOLAP (implementation and MOLAP operators). Implementation issues and efficiency of OLAP processing. Discovering association rules (types, basic algorithms for discovering association rules - A-Priori algorithm, FP-Growth, multilevel and multidimensional rules). The issue of classification (methods, evaluation criteria, distribution criteria - Gini index and information profit, cutting trees, classification accuracy). Grouping (classification methods, hierarchical clustering, iteration-optimization, basic clustering algorithms: k-mean, k-medoids). Mining of text, Web, social networks.

Overview of the course elements

The course involes laboratory classes whose purpose is to practically illustrate the issues addressed at the lecture. During the laboratory classes students learn some selected data mining systems (e.g. Oracle Data Mining, Weka, Statistica, SPSS, Clementine, Rapid Miner, the R project). During the labs students will implement mining selected algorithms (determination of the validity of attributes, association discovery, classification, clustering, regression).

Reading list

1. Han J., Kamber M., Pei J., Data mining: concepts and techniques, Morgan Kaufmann, 2011.
2. Fronczak A., Fronczak P., Świat sieci złożonych: od fizyki do Internetu, PWN, 2009.
3. Hand D., Mannila H., Smyth P., Eksploracja danych, WNT, 2005.
4. Nisbet R., Elder J., Miner G., Handbook of Statistical Analysis&Data Mining Applications, Elsevier, 2009.
5. Williams G., Data Mining with Rattle and R. The Art of Excavating Data for Knowledge Discovery, Springer Link, 2011

Copyright © 2010 Department of Computer Science   |   AGH University of Science and Technology   |   Created by Creative Bastards