Machine learning techniques applied to electronic healthcare records to predict cancer patient survivability

dc.contributor.authorBardhi, Ornela
dc.contributor.authorGarcía-Zapirain, Begoña
dc.date.accessioned2025-08-11T12:13:58Z
dc.date.available2025-08-11T12:13:58Z
dc.date.issued2021-04-13
dc.date.updated2025-08-11T12:13:58Z
dc.description.abstractBreast cancer (BCa) and prostate cancer (PCa) are the two most common types of cancer. Various factors play a role in these cancers, and discovering the most important ones might help patients live longer, better lives. This study aims to determine the variables that most affect patient survivability, and how the use of different machine learning algorithms can assist in such predictions. The AURIA database was used, which contains electronic healthcare records (EHRs) of 20,006 individual patients diagnosed with either breast or prostate cancer in a particular region in Finland. In total, there were 178 features for BCa and 143 for PCa. Six feature selection algorithms were used to obtain the 21 most important variables for BCa, and 19 for PCa. These features were then used to predict patient survivability by employing nine different machine learning algorithms. Seventy-five percent of the dataset was used to train the models and 25% for testing. Cross-validation was carried out using the Stratified Kfold technique to test the effectiveness of the machine learning models. The support vector machine classifier yielded the best ROC with an area under the curve (AUC) = 0.83, followed by the KNeighbors Classifier with AUC = 0.82 for the BCa dataset. The two algorithms that yielded the best results for PCa are the random forest classifier and KNeighbors Classifier, both with AUC = 0.82. This study shows that not all variables are decisive when predicting breast or prostate cancer patient survivability. By narrowing down the input variables, healthcare professionals were able to focus on the issues that most impact patients, and hence devise better, more individualized care plans.en
dc.description.sponsorshipO. B. received funding from the European Union’s Horizon 2020 CATCH ITN project under the Marie Sklodowska-Curie grant agreement no. 722012, website https://www.catchitn.eu/.en
dc.identifier.citationBardhi, O., & Zapirain, B. G. (2021). Machine learning techniques applied to electronic healthcare records to predict cancer patient survivability. Computers, Materials and Continua, 68(2), 1595-1613. https://doi.org/10.32604/CMC.2021.015326
dc.identifier.doi10.32604/CMC.2021.015326
dc.identifier.eissn1546-2226
dc.identifier.issn1546-2218
dc.identifier.urihttps://hdl.handle.net/20.500.14454/3349
dc.language.isoeng
dc.publisherTech Science Press
dc.rights© 2021 The Author(s)
dc.subject.otherBreast cancer
dc.subject.otherEHRs
dc.subject.otherFeature selection
dc.subject.otherFinland
dc.subject.otherMachine learning
dc.subject.otherProstate cancer
dc.subject.otherSurvivability
dc.titleMachine learning techniques applied to electronic healthcare records to predict cancer patient survivabilityen
dc.typejournal article
dcterms.accessRightsopen access
oaire.citation.endPage1613
oaire.citation.issue2
oaire.citation.startPage1595
oaire.citation.titleComputers, Materials and Continua
oaire.citation.volume68
oaire.licenseConditionhttps://creativecommons.org/licenses/by/4.0/
oaire.versionVoR
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
bardhi_machine_2021.pdf
Tamaño:
628.05 KB
Formato:
Adobe Portable Document Format
Colecciones