Parallel Data Mining

Data mining is the automated analysis of large volumes of data, looking for the 'interesting' relationships and knowledge that are implicit in large volumes of data. Research and development work in the area of parallel data mining concerns the study and definition of parallel algorithms, methods, and tools for the extraction of novel, useful, and implicit patterns from data using high-performance architectures. When data mining tools are implemented on high-performance parallel computers, they can analyze massive databases in a reasonable time. Faster processing also means that users can experiment with more models to understand complex data. High performance makes it practical for users to analyze greater quantities of data that, in turn, yield improved predictions.

We are evaluating different strategies and techniques to exploit parallelism in data mining algorithms. In particular, we designed and implemented a parallel version of Autoclass - called P-AutoClass - on distributed memory MIMD parallel computers. This implementation is portable on a large number of parallel architectures and it demostrates to be scalable in terms of speedup and scaleup. A paper that describes the system and its implementation will appear on IEEE Transactions on Knowledge and Data Engineering.

I was a co-chair of the Track on High-Performance Data Mining and KDD at Euro-Par'99. You may read the editorial note of the Track.

I am the vice-chair of a Track on Parallel and Distributed Databases, Data Mining and Knowledge Discovery at Euro-Par 2002.

 


© Domenico Talia, DEIS, UNICAL, Rende, Italy.