Machine learning techniques to identify risk factors of breast cancer among women in Mashhad, Iran

Versions

pdf

Keywords

Random Forest
Logistic Regression
Decision Tree
Principal Component Analysis
Breast cancer

Abstract

ABSTRACT

Background: Low survival rates of breast cancer in developing countries are mainly due to the lack of early detection plans and adequate diagnosis and treatment facilities.

Objectives: This study aimed to apply machine learning techniques to recognize the most important breast cancer risk factors.

Methods: This case-control study included women aged 17-75 years who were referred to medical centers affiliated with Mashhad University of Medical Science between March 21, 2015, and March 19, 2016. The study had two datasets: one with 516 samples (258 cases and 258 controls) and another with 606 samples (303 cases and 303 controls). Written informed consent has been observed. Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), and Principal Component Analysis (PCA) were applied using R studio software.

Results: Regarding the DT and RF, the most important features that impact breast cancer were family cancer, individual history of breast cancer, biopsy sampling, rarely consumption of a dairy, fruit, and vegetable meal, while in PCA and LR these features including family cancer, pregnancy number, pregnancy tendency, abortion, first menstruation, the age of first childbirth and childbirth number.

Conclusions: Machine learning algorithms can be used to extract the most important factors in the diagnosis of breast cancer in developing countries such as Iran.

Keywords: Random Forest, Logistic Regression, Decision Tree, Principal Component Analysis, Breast cancer 

https://doi.org/10.15167/2421-4248/jpmh2024.65.2.3045
pdf

References

Najaf Najafi M, Salehi M, Ghazanfarpour M, Hoseini ZS, Khadem‐Rezaiyan M. The association between green tea consumption and breast cancer risk: A systematic review and meta‐analysis. Phytotherapy Research. 2018;32(10):1855-64.

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians. 2021;71(3):209-49.

Arnold M, Morgan E, Rumgay H, Mafra A, Singh D, Laversanne M, et al. Current and future burden of breast cancer: Global statistics for 2020 and 2040. The Breast. 2022;66:15-23.

Mousavi SM, Gouya MM, Ramazani R, Davanlou M, Hajsadeghi N, Seddighi Z. Cancer incidence and mortality in Iran. Annals of oncology. 2009;20(3):556-63.

https://gco.iarc.fr/today/data/factsheets/populations/364-iran-islamic-republic-of-fact-sheets.pdf.

Łukasiewicz S, Czeczelewski M, Forma A, Baj J, Sitarz R, Stanisławek A. Breast cancer—epidemiology, risk factors, classification, prognostic markers, and current treatment strategies—an updated review. Cancers. 2021;13(17):4287.

Lei S, Zheng R, Zhang S, Wang S, Chen R, Sun K, et al. Global patterns of breast cancer incidence and mortality: A population‐based cancer registry data analysis from 2000 to 2020. Cancer Communications. 2021;41(11):1183-94.

Youn HJ, Han W. A review of the epidemiology of breast cancer in Asia: Focus on risk factors. Asian Pacific journal of cancer prevention: APJCP. 2020;21(4):867.

Terry MB, Liao Y, Whittemore AS, Leoce N, Buchsbaum R, Zeinomar N, et al. 10-year performance of four models of breast cancer risk: a validation study. The Lancet Oncology. 2019;20(4):504-17.

Ibrahim SS, Hafez EE, Hashishe MM. Presymptomatic breast cancer in Egypt: role of BRCA1 and BRCA2 tumor suppressor genes mutations detection. Journal of Experimental & Clinical Cancer Research. 2010;29:1-10.

Cancer CGoHFiB. Menarche, menopause, and breast cancer risk: individual participant meta-analysis, including 118 964 women with breast cancer from 117 epidemiological studies. The lancet oncology. 2012;13(11):1141-51.

Zeinomar N, Knight JA, Genkinger JM, Phillips K-A, Daly MB, Milne RL, et al. Alcohol consumption, cigarette smoking, and familial breast cancer risk: findings from the Prospective Family Study Cohort (ProF-SC). Breast Cancer Research. 2019;21:1-14.

Cancer CGoHFiB. Type and timing of menopausal hormone therapy and breast cancer risk: individual participant meta-analysis of the worldwide epidemiological evidence. The Lancet. 2019;394(10204):1159-68.

Li M, Han M, Chen Z, Tang Y, Ma J, Zhang Z, et al. Does marital status correlate with the female breast cancer risk? A systematic review and meta-analysis of observational studies. PLoS One. 2020;15(3):e0229899.

Ekici S, Jawzal H. Breast cancer diagnosis using thermography and convolutional neural networks. Medical hypotheses. 2020;137:109542.

Javaid M, Haleem A, Singh RP, Suman R, Rab S. Significance of machine learning in healthcare: Features, pillars and applications. International Journal of Intelligent Networks. 2022;3:58-73.

Von Winterfeldt D, Edwards W. Decision analysis and behavioral research. (No Title). 1986.

Worth AP, Cronin MT. The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects. Journal of Molecular Structure: THEOCHEM. 2003;622(1-2):97-111.

Shaikhina T, Lowe D, Daga S, Briggs D, Higgins R, Khovanova N. Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation. Biomedical Signal Processing and Control. 2019;52:456-62.

Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences. 2016;374(2065):20150202.

Shipe ME, Deppen SA, Farjah F, Grogan EL. Developing prediction models for clinical use using logistic regression: an overview. Journal of thoracic disease. 2019;11(Suppl 4):S574.

Nguyen H, Bui X-N. Predicting blast-induced air overpressure: a robust artificial intelligence system based on artificial neural networks and random forest. Natural Resources Research. 2019;28(3):893-907.

Shaukat U, Ismail M, Mehmood N. Epidemiology, major risk factors and genetic predisposition for breast cancer in the Pakistani population. Asian Pacific Journal of Cancer Prevention. 2013;14(10):5625-9.

Wacholder S, McLaughlin JK, Silverman DT, Mandel JS. Selection of controls in case-control studies: I. Principles. American journal of epidemiology. 1992;135(9):1019-28.

Rosner B, Tamimi RM, Kraft P, Gao C, Mu Y, Scott C, et al. Simplified breast risk tool integrating questionnaire risk factors, mammographic density, and polygenic risk score: development and validation. Cancer Epidemiology, Biomarkers & Prevention. 2021;30(4):600-7.

Metcalfe K, Finch A, Poll A, Horsman D, Kim-Sing C, Scott J, et al. Breast cancer risks in women with a family history of breast or ovarian cancer who have tested negative for a BRCA1 or BRCA2 mutation. British journal of cancer. 2009;100(2):421-5.

Thakur P, Seam RK, Gupta MK, Gupta M, Sharma M, Fotedar V. Breast cancer risk factor evaluation in a Western Himalayan state: A case–control study and comparison with the Western World. South Asian journal of cancer. 2017;6(03):106-9.

Kim Y, Yoo K-Y, Goodman MT. Differences in incidence, mortality and survival of breast cancer by regions and countries in Asia and contributing factors. Asian Pacific Journal of Cancer Prevention. 2015;16(7):2857-70.

Clavel-Chapelon F, Launoy G, Auquier A, Gairard B, Brémond A, Piana L, et al. Reproductive factors and breast cancer risk: Effect of age at diagnosis. Annals of Epidemiology. 1995;5(4):315-20.

Khalis M, Charbotel B, Chajès V, Rinaldi S, Moskal A, Biessy C, et al. Menstrual and reproductive factors and risk of breast cancer: A case-control study in the Fez region, Morocco. PloS one. 2018;13(1):e0191333.

Lambe M, Hsieh C-c, Chan H-w, Ekbom A, Trichopoulos D, Adami H-O. Parity, age at first and last birth, and risk of breast cancer: a population-based study in Sweden. Breast cancer research and treatment. 1996;38:305-11.

Namiranian N, Moradi-Lakeh M, Razavi-Ratki SK, Doayie M, Nojomi M. Risk factors of breast cancer in the Eastern Mediterranean Region: a systematic review and meta-analysis. Asian Pac J Cancer Prev. 2014;15(21):9535-41.

Kazemi A, Barati-Boldaji R, Soltani S, Mohammadipoor N, Esmaeilinezhad Z, Clark CC, et al. Intake of various food groups and risk of breast cancer: a systematic review and dose-response meta-analysis of prospective studies. Advances in Nutrition. 2021;12(3):809-49.

Moorman PG, Terry PD. Consumption of dairy products and the risk of breast cancer: a review of the literature. The American journal of clinical nutrition. 2004;80(1):5-14.