Modern computer systems are accumulating data at an almost unimaginable rate and from a very wide variety of sources: from point of sale machines in the high street to machines logging every cheque clearance, bank cash withdrawal and credit card transaction, to Earth observation satellites in space. Three examples will serve to give an indication of the volumes of data involved:
Alongside advances in storage technology which increasingly make it possible to store such vast amounts of data at relatively low cost, whether in commercial data warehouses, scientific research laboratories or elsewhere, has come a growing realisation that such data contains buried within it knowledge that can be critical to a company's growth or decline, knowledge that could lead to important discoveries in science, knowledge that could enable us accurately to predict the weather and natural disasters, knowledge that could enable us to identify the causes of and possible cures for lethal illnesses, knowledge that could literally mean the difference between life and death. Yet the huge volumes involved mean that most of this data is merely stored - never to be examined in more than the most superficial way, if at all. Machine learning technology, some of it very long established, has the potential to solve the problem of the tidal wave of data that is flooding around organisations, governments and individuals. |
Knowledge Discovery has been defined as the 'non-trivial extraction of implicit, previously unknown and potentially useful information from data'. The underlying technologies of knowledge discovery include induction of decision rules and decision trees, neural networks, genetic algorithms, instance-based learning and statistics. There is a rapidly growing body of successful applications in a wide range of areas as diverse as:
The book comprises six papers on technical issues in the field of Knowledge Discovery and Data Mining followed by six chapters on applications. It grew out of a colloquium on Knowledge Discovery and Data Mining which I organised for Professional Group A4 (Artificial Intelligence) of the Institution of Electrical Engineers (IEE) in London on May 7th and 8th 1998. This was the third in a series of colloquia on this topic which began in 1995. The colloquium was co-sponsored by BCS-SGES (the British Computer Society Specialist Group on Knowledge Based Systems and Applied Artificial Intelligence), AISB (the Society for Artificial Intelligence and Simulation of Behaviour) and AIED (the International Society for AI and Education).
The papers included here have been significantly expanded from those presented at the colloquium and were selected for inclusion following a rigorous refereeing process. The book should be of particular interest to researchers and active practitioners in this increasingly important field. I should like to thank the referees for their valuable contribution and Jonathan Simpson (formerly of the IEE) for his encouragement to publish the proceedings in book form.
Part I: Knowledge Discovery and Data Mining in Theory looks at a variety of technical issues, all of considerable practical importance for the future development of the field.