Dementia interferes with the individual’s motor, behavioural, and intellectual functions, causing him to be unable to perform instrumental activities of daily living. This study is aimed at identifying the best performing algorithm and the most relevant characteristics to categorise individuals with HIV/AIDS at high risk of dementia from the application of data mining. Principal component analysis (PCA) algorithm was used and tested comparatively between the following machine learning algorithms: logistic regression, decision tree, neural network, KNN, and random forest. The database used for this study was built from the data collection of 270 individuals infected with HIV/AIDS and followed up at the outpatient clinic of a reference hospital for infectious and parasitic diseases in the State of Ceará, Brazil, from January to April 2019. Also, the performance of the algorithms was analysed for the 104 characteristics available in the database; then, with the reduction of dimensionality, there was an improvement in the quality of the machine learning algorithms and identified that during the tests, even losing about 30% of the variation. Besides, when considering only 23 characteristics, the precision of the algorithms was 86% in random forest, 56% logistic regression, 68% decision tree, 60% KNN, and 59% neural network. The random forest algorithm proved to be more effective than the others, obtaining 84% precision and 86% accuracy.
1. Introduction
Data mining (MD) is one of the data exploration processes capable of predicting and extracting consistent patterns by using strategies such as learning algorithms, such as artificial intelligence (AI), or classification in statistics, which can reveal hidden relationships and accurate data [1, 2].
The application of MD is in health information systems, in the public and private spheres, which, through a process of selection, preprocessing, and data transformation, one can discover patterns and generate knowledge through their interpretations. With this method, the health professional will identify, characterise, and guide the patient based on patterns of health problems and care therapies for different diseases [2].