Comparing Decision Trees and Logistic Regression in Predicting HIV among Women in South Africa

Pelumi Oladokun, University of the Witwatersrand

In low resource settings where there is a high disproportionate distribution of the number of doctors, staff, facilities and patients, it is important that patients are categorized and prioritized based on their risk to ensure save time and encourage efficient and effective service delivery. In this study, performances of Logistic regression (LG) and Decision Trees (DT) were compared to predict HIV among women in South Africa. Data was from the Demographic and Health Surveys (DHS) Program (DHS, 2016). Study participants were 7808 women living in South Africa aged 15 to 49 years. The decision tree model had the highest accuracy for both training (70.33%) and testing dataset (68.19%). Accuracy for the LG model was 45.62%. The AUCs from the ROC curve reported 0.697 and 0.667 for the DT and LG respectively. Although logistic regression and decision trees have similar purposes, the results conclude that the decision tree algorithm is better in prediction accuracy.

See extended abstract.

  Presented in Session 143. Computational Approach (Social Media, Big Data…) To Population Studies In sub - Saharan Africa