A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws

Yessenia Díaz Álvarez, Miguel Ángel Hidalgo Reyes, Virginia Lagunes Barradas, Obdulia Pichardo Lagunas, Bella Martínez Seis

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

This article focuses on the one hand, on showing some techniques applied during the preprocessing of texts represented by environmental laws of Mexico. The need to carry out this type of analysis is due to several factors such as: the large number of existing legislative documents such as laws, programs, regulations, etc., the modifications that are made to the legal system due to reforms and decrees, and especially, to those possible contradictions that may arise among one or more laws. On the other hand, certain tasks of the CRISP-DM methodology were selected and, specifically, for the data preparation phase in the generic tasks of selection, cleaning, transformation, and formatting. This was done using the NLTK library through text preprocessing techniques of tokenization, segmentation, denoising and normalization. Among the most remarkable results there is a combination between CRISP-DM and Team Data Science Process by Microsoft oriented to the preprocessing of Mexican federal environmental laws. In addition, this article shows a detailed application of the hybrid methodology with the execution of a specialized task related to the extraction of text from a pdf file using the PyPDF2 and Pdfplumber libraries.

Idioma originalInglés
Título de la publicación alojadaAdvances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings
EditoresObdulia Pichardo Lagunas, Bella Martínez Seis, Juan Martínez-Miranda
EditorialSpringer Science and Business Media Deutschland GmbH
Páginas68-82
Número de páginas15
ISBN (versión impresa)9783031194955
DOI
EstadoPublicada - 2022
Evento21st Mexican International Conference on Artificial Intelligence, MICAI 2022 - Monterrey, México
Duración: 24 oct. 202229 oct. 2022

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen13613 LNAI
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia21st Mexican International Conference on Artificial Intelligence, MICAI 2022
País/TerritorioMéxico
CiudadMonterrey
Período24/10/2229/10/22

Huella

Profundice en los temas de investigación de 'A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws'. En conjunto forman una huella única.

Citar esto