Abstract
Computer-aided diagnosis (CAD) is an attractive topic in Alzheimer's disease (AD) research. Many algorithms are based on a relatively large training dataset. However, small hospitals are usually unable to collect sufficient training samples for robust classification. Although data sharing is expanding in scientific research, it is unclear whether a model based on one dataset is well suited for other data sources. Using a small dataset from a local hospital and a large shared dataset from the AD neuroimaging initiative, we conducted a heterogeneity analysis and found that different functional magnetic resonance imaging data sources show different sample distributions in feature space. In addition, we proposed an effective knowledge transfer method to diminish the disparity among different datasets and improve the classification accuracy on datasets with insufficient training samples. The accuracy increased by approximately 20% compared with that of a model based only on the original small dataset. The results demonstrated that the proposed approach is a novel and effective method for CAD in hospitals with only small training datasets. It solved the challenge of limited sample size in detection of AD, which is a common issue but lack of adequate attention. Furthermore, this paper sheds new light on effective use of multi-source data for neurological disease diagnosis.
I. INTRODUCTION
THE problems associated with the aging population are becoming increasingly serious as people live longer and fertility rates decline in most countries. Furthermore, because a greater proportion of individuals are elderly, more people are at high risk of developing dementia. Currently, approximately 47 million people worldwide live with dementia, and this number is predicted to increase to more than 131 million by 2050 [1].
VI. CONCLUSION
In this paper, we demonstrated that the AD classification task using a small dataset can be better solved using the modified subspace alignment method. This method can effectively improve the accuracy of the classification in small sample sets. Researchers can use this method to relieve the challenge of extremely limited sample size, particularly when collecting neuroimaging data is difficult and computer-aided diagnoses with limited samples are required. Our work may also assist researchers to make better use of shared data and promote the exchange of collected data.