Corporate relation extraction for the construction of knowledge-bases against tax fraud
Cargando...
Fecha
2025-01-19
Título de la revista
ISSN de la revista
Título del volumen
Editor
Elsevier B.V.
Resumen
Tax fraud is a criminal activity that entails significant losses for governments. Due to its clandestine nature, it is difficult to reliably estimate the amount of taxes evaded. To fight tax fraud, this investigation details the construction and evaluation of a corporate relation extraction system designed to access an unstructured knowledge-base and extract corporate relations for further validation. The system was developed in response to a need raised by the Treasury and Finance Department of the Provincial Council of Gipuzkoa (Spain). It follows a waterfall architecture that integrates Natural Language Processing (NLP) and Computer Vision (CV) components, including web scraping, optical character recognition, syntactic parsing, and information extraction. The proposed system produces a relational knowledge-base with structured data representing 23 types of corporate operations published in the Official Gazette of the Commercial Registry (e.g., incorporation of companies, terminations, capital increases and reductions, mergers and takeovers, etc.), allowing for comparison with the fiscal information available in the tax agency. Facilitating such comparison across distinct sources is key to identifying discrepancies that might be indicators of tax fraud.
Palabras clave
Computer Vision
Information extraction
Knowledge-base generation
Natural Language Processing
Tax fraud investigation
Information extraction
Knowledge-base generation
Natural Language Processing
Tax fraud investigation
Descripción
Materias
Cita
Lopez-Gazpio, I., Baselga-Pascual, L., & Garmendia-Lazcano, A. (2025). Corporate relation extraction for the construction of knowledge-bases against tax fraud. Knowledge-Based Systems, 311. https://doi.org/10.1016/J.KNOSYS.2025.113026
