M2N

View my home page
http://www.biocom.icmc.usp.br/~lpfgarcia/

View the Project on GitHub
lpfgarcia/m2n

Noise detection in the meta-learning level

Authors: Luís P.F. Garcia, André C.P.L.F. de Carvalho and Ana C. Lorena

Abstract: The presence of noise in real data sets can harm the predictive performance of machine learning algorithms. There are several noise filtering techniques whose goal is to improve the quality of the data in classification tasks. These techniques usually scan the data for noise identification in a preprocessing step. Nonetheless, this is a non-trivial task and some noisy data can remain unidentified, while safe data can also be removed. The bias of each filtering technique influences its performance on a particular data set. Therefore, there is no single technique that can be considered the best for all domains or data distribution and choosing a particular filter is not straightforward. Meta-learning has been largely used in the last years to support the recommendation of the most suitable machine learning algorithm(s) for a new data set. This paper presents a meta-learning recommendation system able to predict the expected performance of noise filters in noisy data identification tasks. For such, a meta-base is created, containing meta-features extracted from several corrupted data sets along with the performance of some noise filters when applied to these data sets. Next, regression models are induced from this meta-base to predict the expected performance of the investigated filters in the identification of noisy data. The experimental results show that meta-learning can provide a good recommendation of the most promising filters to be applied to new classification data sets.

Additional Results

The results for the datasets can be found here:


Evaluation of the Filters


Performance of the Meta-regressors


Performance of meta-regressors after feature selection

Additional Information

The data sets, the measures and all results can be found HERE.

Contact

Luís Paulo Faina Garcia - lpgarcia [at] icmc [dot] usp [dot] br
Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos, São Paulo 13560-970, Brazil

References

[1] K. Bache, M. Lichman, UCI machine learning repository, http://archive.ics.uci.edu/ml (2013).

[2] A. Orriols-Puig, N. Maciá, T. K. Ho, Documentation for the data complexity library in C++, Tech. rep., La Salle - Universitat Ramon Llull (2010).

[3] M. Reif, A comprehensive dataset for evaluating approaches of various meta-learning tasks, in: First International Conference on Pattern Recognition and Methods (ICPRAM), 2012.

[4] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria (2014). URL http://www.r-project.org/