Anomalous data lead to scientific discoveries. Although machine learning systems can be forced to resolve anomalous data, these systems use general learning algorithms to do so. To determine whether anomaly-driven approaches to discovery produce more accurate models than the standard approaches, we built a program called Kalpana. We also used Kalpana to explore means for identifying those anomaly resolutions that are acceptable to domain experts. Our experiments indicated that anomaly-driven approaches can lead to a richer set of model revisions than standard methods. Additionally we identified semantic and syntactic measures that are significantly correlated with the acceptability of model revisions. These results suggest that by interpreting data within the context of a model and by interpreting model revisions within the context of domain knowledge, discovery systems can more readily suggest accurate and acceptable anomaly resolutions.
Bridewell, W. (2004). Science as an Anomaly-Driven Enterprise: A Computational Approach to Generating Acceptable Theory Revisions in the Face of Anomalous Data. PhD Thesis, Computer Science Department, University of Pittsburgh.