Abstract:
The quality of a dataset plays a central role in the results and conclusion that can be drawn from analysis such a dataset. As it is often said; garbage in, garbage out. In recently years, neural networks have displayed good performance in solving a diverse number of problems. Unfortunately, artificial neural networks are not immune to this misfortune presented by missing values. Furthermore, in most real word settings, it is often the case that, the only data available for the training of artificial neural networks consists of a significant amount of missing values. In such cases, we are left with little choice but to use this data for the purposes of training neural networks, although doing so may result in a poorly performing neural network. In this paper, we describe the use of neural network dropout as a technique for training neural networks in the presence of missing values. We test the performance of different neural network architectures on different levels of artificial generated missing values introduces on the MNIST handwriting recognition dataset, Cifar-10 and the Pima Indians Diabetes Dataset and find that in most cases it results in significantly better performance of the neural network compared to other missing data handling techniques.