Data Mining application for analysis and identification of patterns in relational databases
Abstract
Modern enterprises and organizations in their work increasingly prefer to use shared data environment that enables them to process a very large amount of heterogeneous data. However, often large enterprises already have a set of large databases (DB) with conceptual links and comprise general information but not structurally (technically) related to each other. Such DB have a huge data amount, the loss of which can cause unwanted effect (inefficient decisions, accidents, failures, etc.). Hence, there is often need to aggregate existing databases into a single database aimed at further development of enterprise unified information environment and that makes this research work highly relevant. The article deals with the solution of one of the problems of information retrieval about databases structure, in particular, identification of attributes common to several relational databases. Such information about DB structures will eliminate data redundancy and raise their storage efficiency, which will be required for further synthesis of a new DB schema. Search algorithms developed by the authors are shown for attributes common to several relational databases using Data Mining methods. The task of finding common features is viewed as classification (classification is the definition of a category (class) of an object through the set of its features). Objects are active domains of the relational database in our case. One of the developed algorithms is meant to find a learning sample, the other one is to classify objects (attributes) using kNN algorithm.