Interbull Quality Service
The aim of the Interbull project work is to explore possibilities to build an alarming system based on the results of the application of data mining techniques in genetic evaluations of dairy cattle, in order to assess and assure data quality. The technique used combines data mining using classification and decision-tree algorithms, Gaussian binned fitting functions, and hypothesis tests.
Data are quarterly national genetic evaluations, computed between February 1999 and February 2003 in nine countries. Each evaluation run includes 73,000 – 90,000 bull records complete with their genetic values and evaluation information. Milk production traits are considered.
Data mining algorithms are applied separately for each country and evaluation run to search for associations across several dimensions, including bull origin, type of proof, age of bull, and number of daughters. Then, data in each node are fitted to the Gaussian function and the quality of the fit is measured, thus providing a measure of the quality of data.
In order to evaluate and ultimately predict decision-tree models, the implemented architecture can compare the node probabilities between two models and decide on their similarity, using hypothesis tests for the standard deviation of their distribution. The key utility of this system lays in its capacity to identify the exact node where anomalies occur, and to fire a focused alarm pointing to erroneous data.