Data Mining, the automatic extraction of implicit and potentially useful information from data, is increasingly used in commercial, scientific and other application areas. Principles of Data Mining explains and explores the principal techniques of Data Mining: for classification, association rule mining and clustering. Each topic is clearly explained and illustrated by detailed worked examples, with a focus on algorithms rather than mathematical formalism. It is written for readers without a strong background in mathematics or statistics, and any formulae used are explained in detail. The second edition expanded on the first to include additional chapters on using frequent pattern trees for Association Rule Mining, comparing classifiers, ensemble classification and dealing with very large volumes of data. The third edition includes detailed descriptions of algorithms for classifying streaming data, both stationary data, where the underlying model is fixed, and data that is time-dependent, where the underlying model changes from time to time - a phenomenon known as concpt drift. This expanded fourth edition gives a detailed description of a feed-forward neural network with backpropagation and shows how it can be used for classification.Principles of Data Mining aims to help general readers develop the necessary
understanding of what is inside the 'black box' so they can use commercial
data mining packages discriminatingly, as well as enabling advanced readers
or academic researchers to understand or contribute to future technical
advances in the field. |

Suitable as a textbook to support courses at undergraduate or postgraduate levels in a wide range of subjects including Computer Science, Business Studies, Marketing, Artificial Intelligence, Bioinformatics and Forensic Science.

- Presents the principal techniques of data mining with particular emphasis on explaining and motivating the techniques used
- Focuses on understanding of the basic algorithms and awareness of their strengths and weaknesses
- Useful as a textbook and also for self-study
- Substantially expanded second edition
- Each chapter contains practical exercises to enable readers to check their progress, and there is a full glossary of technical terms

None known as at June 1st 2020.

**Software**

These web-based programs are provided to support some of the material in *Principles
of Data Mining*

Decision Tree Generation and Testing Using TDIDT (Chapter 5 etc)

Calculation of Performance Measures (Chapter 12)

Comparing Classifiers: Calculation of Paired t Statistic (Chapter 15)

Calculation of Interestingness Measures (Section 17.9)

FP-growth Frequent Pattern Trees Algorithm (Chapter 18)

Classifying Streaming Data: H-Tree Algorithm (Chapter 21)

Classifying Time-Dependent Streaming Data: CDH-Tree Algorithm (Chapter 22)

Neural Network Demonstration Program (Chapter 23)

**Datasets**

Downloadable copies of datasets referred to in the book (all in Inducer format)