Corporate relation extraction for the construction of knowledge-bases against tax fraud
| dc.contributor.author | López Gazpio, Íñigo | |
| dc.contributor.author | Baselga Pascual, Laura | |
| dc.contributor.author | Garmendia-Lazcano, Aitor | |
| dc.date.accessioned | 2025-11-28T12:22:42Z | |
| dc.date.available | 2025-11-28T12:22:42Z | |
| dc.date.issued | 2025-01-19 | |
| dc.date.updated | 2025-11-28T12:22:42Z | |
| dc.description.abstract | Tax fraud is a criminal activity that entails significant losses for governments. Due to its clandestine nature, it is difficult to reliably estimate the amount of taxes evaded. To fight tax fraud, this investigation details the construction and evaluation of a corporate relation extraction system designed to access an unstructured knowledge-base and extract corporate relations for further validation. The system was developed in response to a need raised by the Treasury and Finance Department of the Provincial Council of Gipuzkoa (Spain). It follows a waterfall architecture that integrates Natural Language Processing (NLP) and Computer Vision (CV) components, including web scraping, optical character recognition, syntactic parsing, and information extraction. The proposed system produces a relational knowledge-base with structured data representing 23 types of corporate operations published in the Official Gazette of the Commercial Registry (e.g., incorporation of companies, terminations, capital increases and reductions, mergers and takeovers, etc.), allowing for comparison with the fiscal information available in the tax agency. Facilitating such comparison across distinct sources is key to identifying discrepancies that might be indicators of tax fraud. | en |
| dc.description.sponsorship | This research was conducted as part of the Projects PID2022-136818NB-100 and PID2021-122133NB-I00 financed by MCIN/AEI/10.13039/501100011033/FEDER, EU. We also gratefully acknowl-edge financial support from the Basque Government Department ofEducation (IT1497-22 y IT1570-22) | en |
| dc.identifier.citation | Lopez-Gazpio, I., Baselga-Pascual, L., & Garmendia-Lazcano, A. (2025). Corporate relation extraction for the construction of knowledge-bases against tax fraud. Knowledge-Based Systems, 311. https://doi.org/10.1016/J.KNOSYS.2025.113026 | |
| dc.identifier.doi | 10.1016/J.KNOSYS.2025.113026 | |
| dc.identifier.issn | 0950-7051 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14454/4490 | |
| dc.language.iso | eng | |
| dc.publisher | Elsevier B.V. | |
| dc.rights | © 2025 The Authors | |
| dc.subject.other | Computer Vision | |
| dc.subject.other | Information extraction | |
| dc.subject.other | Knowledge-base generation | |
| dc.subject.other | Natural Language Processing | |
| dc.subject.other | Tax fraud investigation | |
| dc.title | Corporate relation extraction for the construction of knowledge-bases against tax fraud | en |
| dc.type | journal article | |
| dcterms.accessRights | open access | |
| oaire.citation.title | Knowledge-Based Systems | |
| oaire.citation.volume | 311 | |
| oaire.licenseCondition | https://creativecommons.org/licenses/by/4.0/ | |
| oaire.version | VoR |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- lopez_corporate_2025.pdf
- Tamaño:
- 1.5 MB
- Formato:
- Adobe Portable Document Format