Corporate relation extraction for the construction of knowledge-bases against tax fraud

dc.contributor.authorLópez Gazpio, Íñigo
dc.contributor.authorBaselga Pascual, Laura
dc.contributor.authorGarmendia-Lazcano, Aitor
dc.date.accessioned2025-11-28T12:22:42Z
dc.date.available2025-11-28T12:22:42Z
dc.date.issued2025-01-19
dc.date.updated2025-11-28T12:22:42Z
dc.description.abstractTax fraud is a criminal activity that entails significant losses for governments. Due to its clandestine nature, it is difficult to reliably estimate the amount of taxes evaded. To fight tax fraud, this investigation details the construction and evaluation of a corporate relation extraction system designed to access an unstructured knowledge-base and extract corporate relations for further validation. The system was developed in response to a need raised by the Treasury and Finance Department of the Provincial Council of Gipuzkoa (Spain). It follows a waterfall architecture that integrates Natural Language Processing (NLP) and Computer Vision (CV) components, including web scraping, optical character recognition, syntactic parsing, and information extraction. The proposed system produces a relational knowledge-base with structured data representing 23 types of corporate operations published in the Official Gazette of the Commercial Registry (e.g., incorporation of companies, terminations, capital increases and reductions, mergers and takeovers, etc.), allowing for comparison with the fiscal information available in the tax agency. Facilitating such comparison across distinct sources is key to identifying discrepancies that might be indicators of tax fraud.en
dc.description.sponsorshipThis research was conducted as part of the Projects PID2022-136818NB-100 and PID2021-122133NB-I00 financed by MCIN/AEI/10.13039/501100011033/FEDER, EU. We also gratefully acknowl-edge financial support from the Basque Government Department ofEducation (IT1497-22 y IT1570-22)en
dc.identifier.citationLopez-Gazpio, I., Baselga-Pascual, L., & Garmendia-Lazcano, A. (2025). Corporate relation extraction for the construction of knowledge-bases against tax fraud. Knowledge-Based Systems, 311. https://doi.org/10.1016/J.KNOSYS.2025.113026
dc.identifier.doi10.1016/J.KNOSYS.2025.113026
dc.identifier.issn0950-7051
dc.identifier.urihttps://hdl.handle.net/20.500.14454/4490
dc.language.isoeng
dc.publisherElsevier B.V.
dc.rights© 2025 The Authors
dc.subject.otherComputer Vision
dc.subject.otherInformation extraction
dc.subject.otherKnowledge-base generation
dc.subject.otherNatural Language Processing
dc.subject.otherTax fraud investigation
dc.titleCorporate relation extraction for the construction of knowledge-bases against tax frauden
dc.typejournal article
dcterms.accessRightsopen access
oaire.citation.titleKnowledge-Based Systems
oaire.citation.volume311
oaire.licenseConditionhttps://creativecommons.org/licenses/by/4.0/
oaire.versionVoR
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
lopez_corporate_2025.pdf
Tamaño:
1.5 MB
Formato:
Adobe Portable Document Format
Colecciones