Data Preparation Process for Construction Knowledge Generation through Knowledge Discovery in Databases

Abstract
As the construction industry is adapting to new computer technologies in terms of hardware and software, computerized construction data are becoming increasingly available. The explosive growth of many business, government, and scientific databases has begun to far outpace our ability to interpret and digest the data. Such volumes of data clearly overwhelm the traditional methods of data analysis such as spreadsheets and ad-hoc queries. The traditional methods can create informative reports from data, but cannot analyze the contents of those reports. A significant need exists for a new generation of techniques and tools with the ability to automatically assist humans in analyzing the mountains of data for useful knowledge. Knowledge discovery in databases (KDD) and data mining (DM) are tools that allow identification of valid, useful, and previously unknown patterns so that the construction manager may analyze the large amount of construction project data. These technologies combine techniques from machine learning, artificial intelligence, pattern recognition, statistics, databases, and visualization to automatically extract concepts, interrelationships, and patterns of interest from large databases. This paper presents the necessary steps such as (1) identification of problems, (2) data preparation, (3) data mining, (4) data analysis, and (5) refinement process required for the implementation of KDD. In order to test the feasibility of the proposed approach, a prototype of the KDD system was developed and tested with a construction management database, RMS (Resident Management System), provided by the U. S. Corps of Engineers. In this paper, the KDD process was applied to identify the cause(s) of construction activity delays. However, its possible applications can be extended to identify cause(s) of cost overrun and quality control/assurance among other construction problems. Predictable patterns may be revealed in construction data that were previously thought to be chaotic.

This publication has 19 references indexed in Scilit: