Prediction of Hepatitis Disease Using Ensemble Learning Methods


Hepatitis B Virus; Hepatitis C Virus; Ensemble Learning; Data Analysis.


Objective: Hepatitis is one of the chronic diseases that can lead to liver cirrhosis and hepatocellular carcinoma, which cause deaths around the world. Hence, early diagnosis is needed to control, treat, and reduce the effects of this disease. This study's main goal was to compare the performance of traditional and ensemble learning methods for predicting hepatitis B virus (HBV), and hepatitis C virus (HCV). Also, important variables related to HBV and HCV were identified.

Methods: This case-control study was conducted in Hamadan Province, Western Iran, between 2018 and 2019. It included 534 subjects (267 cases and 267 controls). The bagging, random forest, AdaBoost, and logistic regression were used for predicting HBV and HCV. These methods' performance was evaluated using accuracy.

Results: According to the results, the accuracy of bagging, random forest, Adaboost, and logistic regression were 0.65±0.03, 0.66±0.03, 0.62±0.04, and 0.64±0.03, respectively, with random forest showing the best performance for predicting HBV. This method showed that ALT was the most important variable for predicting HBV. The accuracy of random forest was 0.77±0.03 for predicting HCV. Also, the random forest showed that the order of variable importance has belonged to AST, ALT, and age for predicting HCV.

Conclusion: This study showed that random forest performed better than other methods for predicting HBV and HCV.


1. Pawlotsky J-M, Negro F, Aghemo A, Berenguer M, Dalgard O, Dusheiko G, et al. EASL recommendations on treatment of hepatitis C: final update of the series☆. Journal of Hepatology. 2020;73(5):1170-218. doi:10.1016/j.jhep.2020.08.018.
2. BAYRAK EA, KIRCI P, Ensari T. Performance Analysis of Machine Learning Algorithms and Feature Selection Methods on Hepatitis Disease. International Journal of Multidisciplinary Studies and Innovative Technologies. 2019;3(2):135-8.
3. Hussien SO, Elkhatem SS, Osman N, Ibrahim AO, editors. A review of data mining techniques for diagnosing hepatitis. 2017 Sudan Conference on Computer Science and Information Technology (SCCSIT); 2017: IEEE.
4. Karthikeyan T, Thangaraju P. Analysis of classification algorithms applied to hepatitis patients. International Journal of Computer Applications. 2013;62(15).
5. Hashem S, Esmat G, Elakel W, Habashy S, Raouf SA, Elhefnawi M, et al. Comparison of machine learning approaches for prediction of advanced liver fibrosis in chronic hepatitis C patients. IEEE/ACM transactions on computational biology and bioinformatics. 2017;15(3):861-8. doi: 10.1109/TCBB.2017.2690848.
6. Chen S, Zhang Z, Wang Y, Fang M, Zhou J, Li Y, et al. Using quasispecies patterns of hepatitis B virus to predict hepatocellular carcinoma with deep sequencing and machine learning. The Journal of Infectious Diseases. 2021;223(11):1887-96. doi:10.1093/infdis/jiaa647.
7. Organization WH. Global hepatitis report 2017: World Health Organization; 2017.
8. Stanaway JD, Flaxman AD, Naghavi M, Fitzmaurice C, Vos T, Abubakar I, et al. The global burden of viral hepatitis from 1990 to 2013: findings from the Global Burden of Disease Study 2013. The Lancet. 2016;388(10049):1081-8. doi:10.1016/S0140-6736(16)30579-7.
9. Moghadami M, Dadashpour N, Mokhtari AM, Ebrahimi M, Mirahmadizadeh A. The effectiveness of the national hepatitis B vaccination program 25 years after its introduction in Iran: a historical cohort study. Brazilian Journal of Infectious Diseases. 2020;23:419-26. doi:10.1016/j.bjid.2019.10.001.
10. Petruzziello A, Marigliano S, Loquercio G, Cozzolino A, Cacciapuoti C. Global epidemiology of hepatitis C virus infection: An up-date of the distribution and circulation of hepatitis C virus genotypes. World journal of gastroenterology. 2016;22(34):7824. doi: 10.3748/wjg.v22.i34.7824.
11. Hanafiah KM, Groeger J, Flaxman AD, Wiersma ST. Global epidemiology of hepatitis C virus infection: new estimates of age-specific antibody to HCV seroprevalence. Hepatology. 2013;57(4):1333-42. doi:10.1002/hep.26141.
12. Gower E, Estes C, Blach S, Razavi-Shearer K, Razavi H. Global epidemiology and genotype distribution of the hepatitis C virus infection. Journal of hepatology. 2014;61(1):S45-S57. doi:10.1016/j.jhep.2014.07.027.
13. Abtahi S, Sharifi M. Machine learning method to control and observe for treatment and monitoring of hepatitis b virus. arXiv preprint arXiv:200409751. 2020.
14. Morozov VA, Lagaye S. Hepatitis C virus: Morphogenesis, infection and therapy. World journal of hepatology. 2018;10(2):186. doi: 10.4254/wjh.v10.i2.186.
15. Salehi-Vaziri M, Sadeghi F, Hashiani AA, Fesharaki MG, Alavian SM. Hepatitis B virus infection in the general population of Iran: an updated systematic review and meta-analysis. Hepatitis monthly. 2016;16(4). doi: 10.5812/hepatmon.35577.
16. Merat S, Rezvan H, Nouraie M, Jafari E, Abolghasemi H, Radmard AR, et al. Seroprevalence of hepatitis C virus: the first population-based study from Iran. International Journal of Infectious Diseases. 2010;14:e113-e6. doi: 10.1016/j.ijid.2009.11.032.
17. Jain D, Singh V. Feature selection and classification systems for chronic disease prediction: A review. Egyptian Informatics Journal. 2018;19(3):179-89. doi:10.1016/j.eij.2018.03.002.
18. Orooji A, Kermani F. Machine learning based methods for handling imbalanced data in hepatitis diagnosis. Frontiers in Health Informatics. 2021;10(1):57. doi: 10.30699/fhi.v10i1.259.
19. Kashif AA, Bakhtawar B, Akhtar A, Akhtar S, Aziz N, Javeid MS. Treatment Response Prediction in Hepatitis C Patients using Machine Learning Techniques. International Journal of Technology, Innovation and Management (IJTIM). 2021;1(2):79-89. doi: 10.54489/ijtim.v1i2.24
20. Bhargav KS, Thota D, Kumari TD, Vikas B. Application of machine learning classification algorithms on hepatitis dataset. International Journal of Applied Engineering Research. 2018;13(16):12732-7.
21. Syafa’ah L, Zulfatman Z, Pakaya I, Lestandy M. Comparison of Machine Learning Classification Methods in Hepatitis C Virus. Jurnal Online Informatika. 2021;6(1):73-8. doi: 10.15575/join.v6i1.719.
22. Kumar N, Sikamani K. Prediction of chronic and infectious diseases using machine learning classifiers—A systematic approach. Int J Intell Eng Syst. 2020;13(4):11-20. doi: 10.22266/ijies2020.0831.02.
23. Breiman L. Bagging predictors. Machine learning. 1996;24(2):123-40.
24. Breiman L. Random forests. Machine learning. 2001;45(1):5-32.
25. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences. 1997;55(1):119-39. doi:10.1006/jcss.1997.1504.
26. KayvanJoo AH, Ebrahimi M, Haqshenas G. Prediction of hepatitis C virus interferon/ribavirin therapy outcome based on viral nucleotide attributes using machine learning algorithms. BMC research notes. 2014;7(1):1-11.
27. Mokhtari AM, Moghadami M, Seif M, Mirahmadizadeh A. Association of Routine Hepatitis B Vaccination and Other Effective Factors with Hepatitis B Virus Infection: 25 Years Since the Introduction of National Hepatitis B Vaccination in Iran. Iranian Journal of Medical Sciences. 2021;46(2):93. doi: 10.30476/ijms.2019.83112.1199.
28. Dolan K, Wirtz AL, Moazen B, Ndeffo-Mbah M, Galvani A, Kinner SA, et al. Global burden of HIV, viral hepatitis, and tuberculosis in prisoners and detainees. The Lancet. 2016;388(10049):1089-102. doi:10.1016/S0140-6736(16)30466-4.
29. Rezaei N, Asadi-Lari M, Sheidaei A, Gohari K, Parsaeian M, Khademioureh S, et al. Epidemiology of hepatitis B in Iran from 2000 to 2016: a systematic review and meta-regression analysis. Archives of Iranian medicine. 2020;23(3):189-96.
30. Agresti A, Kateri M. Categorical data analysis (pp. 206-208). SpringerBerlin Heidelberg. 2011.
31. Team RC. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL 2020.
32. Chicco D, Jurman G. An ensemble learning approach for enhanced classification of patients with hepatitis and cirrhosis. IEEE Access. 2021;9:24485-98. doi: 10.1109/ACCESS.2021.3057196.
33. Akkaya O, Kiyici M, Yilmaz Y, Ulukaya E, Yerci O. Clinical significance of activity of ALT enzyme in patients with hepatitis C virus. World journal of gastroenterology: WJG. 2007;13(41):5481. doi: 10.3748/wjg.v13.i41.5481.
34. Pradat P, Alberti A, Poynard T, Esteban J-I, Weiland O, Marcellin P, et al. Predictive value of ALT levels for histologic findings in chronic hepatitis C: a European collaborative study. Hepatology. 2002;36(4):973-7. doi:10.1053/jhep.2002.35530.
35. Yasin H, Jilani TA, Danish M. Hepatitis-C classification using data mining techniques. International Journal of Computer Applications. 2011;24(3):1-6.
36. Nandipati SC, XinYing C, Wah KK. Hepatitis C virus (HCV) prediction by machine learning techniques. Applications of Modelling and Simulation. 2020;4:89-100.