6. Conclusion
In the present paper, we proposed a uniforming method for the dimensionality of data using a neural network for materials informatics, in which different numbers of elements in compounds are often dealt with, in order to predict the properties of materials or to classify compounds. In addition, we considered the case of material properties for which the underlying mechanisms or theoretical models are poorly understood. Therefore, we introduced a naive representation for the compound. The length of the vector in the naive representation was varied depending on the number of elements. Hence, uniformed dimensionality of the input data was necessary. Unlike conventional methods, such as the multilayer autoencoder, the denoising autoencoder, and kernel PCA, the proposed method can make uniform the dimensionality of data while simultaneously considering the expansion and reduction of the dimensionality. In the proposed method, uniforming of dimensionality is realized by noise injection into the extended part of the input vector during learning for the autoencoder, which is a variant of the denoising autoencoder [1,15,16]. The latent representation in the neural network becomes a uniformed representation of the input data. Experiments on synthetic data, ion conductivity data, and hydrogen storage materials data revealed that the proposed method works well for the linear regression task, as compared to the conventional methods, and exhibits distance preservation and robustness. The results may enable us to apply the proposed method to a broad range of applications. Further study is necessary in order to investigate the statistical properties ofthe uniformed representation of data. Furthermore, in order to improve the performance of the linear regression task, we proposed the linear guided autoencoder, which has a linear guided term in the objective function. In the future, we hope to confirm the effectiveness of the linear guided autoencoder for use in materials informatics.