Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron

Pinkie Akinyi Odipo; Anthony Waititu; Herbert Imboga; Susan Mwelu

doi:doi:10.11648/j.ijdsa.20251104.12

Research Article |

| Peer-Reviewed

Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron

Pinkie Akinyi Odipo^*

, Anthony Waititu

, Herbert Imboga

, Susan Mwelu

Published in International Journal of Data Science and Analysis (Volume 11, Issue 4)

Received: 30 July 2025 Accepted: 15 August 2025 Published: 12 September 2025

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Wealth inequality remains a significant challenge in Kenya, exacerbated by the limitations of traditional wealth measurement methods. This study develops and evaluates an ensemble wealth index model combining Random Forest (RF) and Multilayer Perceptron (MLP) algorithms to improve prediction accuracy using socio-economic data from the 2019 Kenya Population and Housing Census (KPHC). The models were assessed using performance metrics such as accuracy, precision, sensitivity, specificity and ROC-AUC. The Random Forest model, configured with 500 trees and a split variable count of 2, achieved 34.3% accuracy, 58.54% balanced accuracy, 65.98% out-of-bag error and performed best on the ”Poorest” class with a 13.7% class error. It further recorded 41.13% precision, 34.27% recall, 83.17% specificity, and a 67.31% AUC. The MLP model, using sigmoid activation in hidden layers and softmax in the output layer, achieved 33.4% accuracy, 57.7% balanced accuracy, 30.6% precision, 32.54% recall, 82.86% specificity and AUC of 69.12%. The RF-MLP ensemble model outperformed the individual models with a 34.4% accuracy, 37.42% precision, 34.05% recall, 83.2% specificity and AUC of 68.55%. Despite modest overall accuracies, the ensemble model showed enhanced balanced accuracy and specificity, particularly in extreme wealth categories. However, classification of middle and poor wealth levels remains challenging due to feature overlap and class imbalance.

Published in	International Journal of Data Science and Analysis (Volume 11, Issue 4)
DOI	10.11648/j.ijdsa.20251104.12
Page(s)	114-124
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Wealth Index, Ensemble Model, Random Forest, Multilayer Perceptron, Machine Learning

References

[1]	Anwar, H., Qamar, U., & Muzaffar Qureshi, A. W. (2014). Global optimization ensemble model for classification methods. The Scientific World Journal. https://doi.org/10.1155/2014/245353
[2]	Attanasio, O., & Binelli, C. (2003). Inequality, growth, and redistributive policies. Conference on Poverty, Inequalities and Growth: What’s at Stake for Development Aid?, OECD, Paris.
[3]	Azevedo, B., Amoura, Y., Rocha, A., Fernandes, F., Pacheco, M., & Pereira, A. (2022). Analyzing the MATHE platform through clustering algorithms. In Computational Science and Its Applications – ICCSA 2022 Workshops (pp. 201-218). https://doi.org/10.1007/978-3-031-10544-0_15
[4]	Barro, R. J. (2000). Inequality and growth in a panel of countries. Journal of Economic Growth, 5(1), 5-32. https://doi.org/10.1023/A:1009850119329
[5]	Bishop, C. M. (2006). Pattern recognition and machine learning.
[6]	Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
[7]	Gopal, M. (2018). Applied machine learning. McGraw Hill Education Private Limited.
[8]	Haralabopoulos, G., Anagnostopoulos, I., & McAuley, D. (2020). Ensemble deep learning for multilabel binary classification of user-generated content. Algorithms, 13(4), 83. https://doi.org/10.3390/a13040083
[9]	Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359-366. https://doi.org/10.1016/0893-6080(89)90020-8
[10]	Kiprop, R. A., Wamwea, P., Imboga, H. M., & Chelule, J. K. (2023). Classification of contraceptive use among undergraduate students using a supervised machine learning technique. American Journal of Theoretical and Applied Statistics, 12(5), 168-176. https://doi.org/10.11648/j.ajtas.20231205.12
[11]	Martin, L., & Baten, J. (2022). Inequality and life expectancy in Africa and Asia, 1820–2000. Journal of Economic Behavior & Organization, 201, 40-59. https://doi.org/10.1016/j.jebo.2022.08.011
[12]	Martinangeli, A. F., & Windsteiger, L. (2024). Inequality shapes the propagation of unethical behaviours: Cheating responses to tax evasion along the income distribution. Journal of Economic Behavior & Organization, 220, 135-181. https://doi.org/10.1016/j.jebo.2023.09.019
[13]	Nderitu, J. N., Imboga, H. M., & Gathuka, M. N. (2022). On the coverage properties of the ratio-based estimator in presence of non-response error. American Journal of Theoretical and Applied Statistics, 11(2), 49-55. https://doi.org/10.11648/j.ajtas.20221102.12
[14]	Ng’elechei, W. S., & Imboga, H. M. (2020). Modeling frequency and severity of insurance claims in an insurance portfolio. American Journal of Theoretical and Applied Statistics, 9(6), 265-270. https://doi.org/10.11648/j.ajtas.20200906.12
[15]	Piketty, T. (2014). Capital in the Twenty-First Century (A. Goldhammer, Trans.). Harvard University Press. https://doi.org/10.4159/9780674369542
[16]	Ravallion, M. (2012). Why don’t we see poverty convergence? American Economic Review, 102(1), 504-523. https://doi.org/10.1257/aer.102.1.504
[17]	Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249. https://doi.org/10.1002/widm.1249
[18]	Scheidel, W. (2017). The great leveler: Violence and the history of inequality from the Stone Age to the twenty-first century. Princeton University Press. https://doi.org/10.1515/9781400884605
[19]	Turchin, P. (2023). End times: Elites, counter-elites, and the path of political disintegration. Penguin.

Cite This Article

Plain Text BibTeX RIS

APA Style

Odipo, P. A., Waititu, A., Imboga, H., Mwelu, S. (2025). Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron. International Journal of Data Science and Analysis, 11(4), 114-124. https://doi.org/10.11648/j.ijdsa.20251104.12

Copy | Download

ACS Style

Odipo, P. A.; Waititu, A.; Imboga, H.; Mwelu, S. Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron. Int. J. Data Sci. Anal. 2025, 11(4), 114-124. doi: 10.11648/j.ijdsa.20251104.12

Copy | Download

AMA Style

Odipo PA, Waititu A, Imboga H, Mwelu S. Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron. Int J Data Sci Anal. 2025;11(4):114-124. doi: 10.11648/j.ijdsa.20251104.12

Copy | Download

@article{10.11648/j.ijdsa.20251104.12,
  author = {Pinkie Akinyi Odipo and Anthony Waititu and Herbert Imboga and Susan Mwelu},
  title = {Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron
},
  journal = {International Journal of Data Science and Analysis},
  volume = {11},
  number = {4},
  pages = {114-124},
  doi = {10.11648/j.ijdsa.20251104.12},
  url = {https://doi.org/10.11648/j.ijdsa.20251104.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20251104.12},
  abstract = {Wealth inequality remains a significant challenge in Kenya, exacerbated by the limitations of traditional wealth measurement methods. This study develops and evaluates an ensemble wealth index model combining Random Forest (RF) and Multilayer Perceptron (MLP) algorithms to improve prediction accuracy using socio-economic data from the 2019 Kenya Population and Housing Census (KPHC). The models were assessed using performance metrics such as accuracy, precision, sensitivity, specificity and ROC-AUC. The Random Forest model, configured with 500 trees and a split variable count of 2, achieved 34.3% accuracy, 58.54% balanced accuracy, 65.98% out-of-bag error and performed best on the ”Poorest” class with a 13.7% class error. It further recorded 41.13% precision, 34.27% recall, 83.17% specificity, and a 67.31% AUC. The MLP model, using sigmoid activation in hidden layers and softmax in the output layer, achieved 33.4% accuracy, 57.7% balanced accuracy, 30.6% precision, 32.54% recall, 82.86% specificity and AUC of 69.12%. The RF-MLP ensemble model outperformed the individual models with a 34.4% accuracy, 37.42% precision, 34.05% recall, 83.2% specificity and AUC of 68.55%. Despite modest overall accuracies, the ensemble model showed enhanced balanced accuracy and specificity, particularly in extreme wealth categories. However, classification of middle and poor wealth levels remains challenging due to feature overlap and class imbalance.
},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron

AU  - Pinkie Akinyi Odipo
AU  - Anthony Waititu
AU  - Herbert Imboga
AU  - Susan Mwelu
Y1  - 2025/09/12
PY  - 2025
N1  - https://doi.org/10.11648/j.ijdsa.20251104.12
DO  - 10.11648/j.ijdsa.20251104.12
T2  - International Journal of Data Science and Analysis
JF  - International Journal of Data Science and Analysis
JO  - International Journal of Data Science and Analysis
SP  - 114
EP  - 124
PB  - Science Publishing Group
SN  - 2575-1891
UR  - https://doi.org/10.11648/j.ijdsa.20251104.12
AB  - Wealth inequality remains a significant challenge in Kenya, exacerbated by the limitations of traditional wealth measurement methods. This study develops and evaluates an ensemble wealth index model combining Random Forest (RF) and Multilayer Perceptron (MLP) algorithms to improve prediction accuracy using socio-economic data from the 2019 Kenya Population and Housing Census (KPHC). The models were assessed using performance metrics such as accuracy, precision, sensitivity, specificity and ROC-AUC. The Random Forest model, configured with 500 trees and a split variable count of 2, achieved 34.3% accuracy, 58.54% balanced accuracy, 65.98% out-of-bag error and performed best on the ”Poorest” class with a 13.7% class error. It further recorded 41.13% precision, 34.27% recall, 83.17% specificity, and a 67.31% AUC. The MLP model, using sigmoid activation in hidden layers and softmax in the output layer, achieved 33.4% accuracy, 57.7% balanced accuracy, 30.6% precision, 32.54% recall, 82.86% specificity and AUC of 69.12%. The RF-MLP ensemble model outperformed the individual models with a 34.4% accuracy, 37.42% precision, 34.05% recall, 83.2% specificity and AUC of 68.55%. Despite modest overall accuracies, the ensemble model showed enhanced balanced accuracy and specificity, particularly in extreme wealth categories. However, classification of middle and poor wealth levels remains challenging due to feature overlap and class imbalance.

VL  - 11
IS  - 4
ER  -

Copy | Download

Author Information

Pinkie Akinyi Odipo

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

Contact Email

http://orcid.org/0009-0005-7143-5364
Anthony Waititu

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

Contact Email

http://orcid.org/0000-0003-0268-2968
Herbert Imboga

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

Contact Email

http://orcid.org/0009-0003-9963-4977
Susan Mwelu

Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

Contact Email

http://orcid.org/0009-0005-9570-9112

Download PDF

Submit an Article

Sections

Plain Text BibTeX RIS

APA Style

Odipo, P. A., Waititu, A., Imboga, H., Mwelu, S. (2025). Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron. International Journal of Data Science and Analysis, 11(4), 114-124. https://doi.org/10.11648/j.ijdsa.20251104.12

Copy | Download

ACS Style

Odipo, P. A.; Waititu, A.; Imboga, H.; Mwelu, S. Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron. Int. J. Data Sci. Anal. 2025, 11(4), 114-124. doi: 10.11648/j.ijdsa.20251104.12

Copy | Download

AMA Style

Odipo PA, Waititu A, Imboga H, Mwelu S. Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron. Int J Data Sci Anal. 2025;11(4):114-124. doi: 10.11648/j.ijdsa.20251104.12

Copy | Download

@article{10.11648/j.ijdsa.20251104.12,
  author = {Pinkie Akinyi Odipo and Anthony Waititu and Herbert Imboga and Susan Mwelu},
  title = {Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron
},
  journal = {International Journal of Data Science and Analysis},
  volume = {11},
  number = {4},
  pages = {114-124},
  doi = {10.11648/j.ijdsa.20251104.12},
  url = {https://doi.org/10.11648/j.ijdsa.20251104.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20251104.12},
  abstract = {Wealth inequality remains a significant challenge in Kenya, exacerbated by the limitations of traditional wealth measurement methods. This study develops and evaluates an ensemble wealth index model combining Random Forest (RF) and Multilayer Perceptron (MLP) algorithms to improve prediction accuracy using socio-economic data from the 2019 Kenya Population and Housing Census (KPHC). The models were assessed using performance metrics such as accuracy, precision, sensitivity, specificity and ROC-AUC. The Random Forest model, configured with 500 trees and a split variable count of 2, achieved 34.3% accuracy, 58.54% balanced accuracy, 65.98% out-of-bag error and performed best on the ”Poorest” class with a 13.7% class error. It further recorded 41.13% precision, 34.27% recall, 83.17% specificity, and a 67.31% AUC. The MLP model, using sigmoid activation in hidden layers and softmax in the output layer, achieved 33.4% accuracy, 57.7% balanced accuracy, 30.6% precision, 32.54% recall, 82.86% specificity and AUC of 69.12%. The RF-MLP ensemble model outperformed the individual models with a 34.4% accuracy, 37.42% precision, 34.05% recall, 83.2% specificity and AUC of 68.55%. Despite modest overall accuracies, the ensemble model showed enhanced balanced accuracy and specificity, particularly in extreme wealth categories. However, classification of middle and poor wealth levels remains challenging due to feature overlap and class imbalance.
},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Predicting Wealth Index Using an Ensemble Model of Random Forest and Multilayer Perceptron

AU  - Pinkie Akinyi Odipo
AU  - Anthony Waititu
AU  - Herbert Imboga
AU  - Susan Mwelu
Y1  - 2025/09/12
PY  - 2025
N1  - https://doi.org/10.11648/j.ijdsa.20251104.12
DO  - 10.11648/j.ijdsa.20251104.12
T2  - International Journal of Data Science and Analysis
JF  - International Journal of Data Science and Analysis
JO  - International Journal of Data Science and Analysis
SP  - 114
EP  - 124
PB  - Science Publishing Group
SN  - 2575-1891
UR  - https://doi.org/10.11648/j.ijdsa.20251104.12
AB  - Wealth inequality remains a significant challenge in Kenya, exacerbated by the limitations of traditional wealth measurement methods. This study develops and evaluates an ensemble wealth index model combining Random Forest (RF) and Multilayer Perceptron (MLP) algorithms to improve prediction accuracy using socio-economic data from the 2019 Kenya Population and Housing Census (KPHC). The models were assessed using performance metrics such as accuracy, precision, sensitivity, specificity and ROC-AUC. The Random Forest model, configured with 500 trees and a split variable count of 2, achieved 34.3% accuracy, 58.54% balanced accuracy, 65.98% out-of-bag error and performed best on the ”Poorest” class with a 13.7% class error. It further recorded 41.13% precision, 34.27% recall, 83.17% specificity, and a 67.31% AUC. The MLP model, using sigmoid activation in hidden layers and softmax in the output layer, achieved 33.4% accuracy, 57.7% balanced accuracy, 30.6% precision, 32.54% recall, 82.86% specificity and AUC of 69.12%. The RF-MLP ensemble model outperformed the individual models with a 34.4% accuracy, 37.42% precision, 34.05% recall, 83.2% specificity and AUC of 68.55%. Despite modest overall accuracies, the ensemble model showed enhanced balanced accuracy and specificity, particularly in extreme wealth categories. However, classification of middle and poor wealth levels remains challenging due to feature overlap and class imbalance.

VL  - 11
IS  - 4
ER  -

Copy | Download