Wednesday, May 6, 2009

WEKA: The Waikato Environment for Knowledge Analysis

The WEKA machine learning workbench has grown out of the need to be able to apply machine learning to real world data sets in a way that promotes a “what if?…” or exploratory approach. Each machine learning algorithm implementation requires the data to be present in its own format, and has its own way of specifying parameters and output. The WEKA system was designed to bring a range of machine learning techniques or schemes under a common interface so that they may be easily applied to this data in a consistent method. This interface should be flexible enough to encourage the addition of new schemes, and simple enough that users need only concern themselves with the selection of features in the data for analysis and what the output means, rather than how to use a machine learning scheme.
Applications: The WEKA system has been applied successfully in a variety of areas including the areas of agriculture, machine learning research and education.

Agricultural:
The most significant project so far carried out using the WEKA workbench has been the analysis of dairy herd data for the purposes of isolating rules that describe factors that farmers might use for culling decisions [10]. This involved working with a large data set of 19 103 records containing 705 attributes spread across 10 herds and 6 years. About 40 new attributes were derived, including attributes like age and production index relative to herd, and these were added to the original data set which was then processed by various machine learning schemes.

Research:
The WEKA system has also proved to be a valuable in machine learning research. Firstly it is useful in the area of supporting the development of new machine learning algorithms from both the stand point of implementation and evaluation. The presence of defined data set file formats and tools to access and manipulate the contents of data sets reduces the effort required in getting data into a new scheme. The presence of a common output format which can be evaluated using the PREval tool also takes the effort of evaluation away from the development process.

Education:
WEKA has also been used in a limited role to introduce students of an advanced undergraduate course on machine learning to the subject and to the capabilities of the different sorts of schemes.

No comments: