A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws

Yessenia Díaz Álvarez; Miguel Ángel Hidalgo Reyes; Virginia Lagunes Barradas; Obdulia Pichardo Lagunas; Bella Martínez Seis

doi:10.1007/978-3-031-19496-2_6

A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws

Yessenia Díaz Álvarez, Miguel Ángel Hidalgo Reyes, Virginia Lagunes Barradas, Obdulia Pichardo Lagunas, Bella Martínez Seis

Unidad Profesional Interdisciplinaria de Ingeniería y Tecnologías Avanzadas (UPIITA)

Producción científica: Capítulo del libro/informe/acta de congreso › Contribución a la conferencia › revisión exhaustiva

Resumen

This article focuses on the one hand, on showing some techniques applied during the preprocessing of texts represented by environmental laws of Mexico. The need to carry out this type of analysis is due to several factors such as: the large number of existing legislative documents such as laws, programs, regulations, etc., the modifications that are made to the legal system due to reforms and decrees, and especially, to those possible contradictions that may arise among one or more laws. On the other hand, certain tasks of the CRISP-DM methodology were selected and, specifically, for the data preparation phase in the generic tasks of selection, cleaning, transformation, and formatting. This was done using the NLTK library through text preprocessing techniques of tokenization, segmentation, denoising and normalization. Among the most remarkable results there is a combination between CRISP-DM and Team Data Science Process by Microsoft oriented to the preprocessing of Mexican federal environmental laws. In addition, this article shows a detailed application of the hybrid methodology with the execution of a specialized task related to the extraction of text from a pdf file using the PyPDF2 and Pdfplumber libraries.

Idioma original	Inglés
Título de la publicación alojada	Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings
Editores	Obdulia Pichardo Lagunas, Bella Martínez Seis, Juan Martínez-Miranda
Editorial	Springer Science and Business Media Deutschland GmbH
Páginas	68-82
Número de páginas	15
ISBN (versión impresa)	9783031194955
DOI	https://doi.org/10.1007/978-3-031-19496-2_6
Estado	Publicada - 2022
Evento	21st Mexican International Conference on Artificial Intelligence, MICAI 2022 - Monterrey, México Duración: 24 oct. 2022 → 29 oct. 2022

Serie de la publicación

Nombre	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen	13613 LNAI
ISSN (versión impresa)	0302-9743
ISSN (versión digital)	1611-3349

Conferencia

Conferencia	21st Mexican International Conference on Artificial Intelligence, MICAI 2022
País/Territorio	México
Ciudad	Monterrey
Período	24/10/22 → 29/10/22

Acceder al documento

10.1007/978-3-031-19496-2_6

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

Díaz Álvarez, Y., Hidalgo Reyes, M. Á., Lagunes Barradas, V., Pichardo Lagunas, O., & Martínez Seis, B. (2022). A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws. En O. Pichardo Lagunas, B. Martínez Seis, & J. Martínez-Miranda (Eds.), Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings (pp. 68-82). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13613 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-19496-2_6

Díaz Álvarez, Yessenia ; Hidalgo Reyes, Miguel Ángel ; Lagunes Barradas, Virginia et al. / A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws. Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings. editor / Obdulia Pichardo Lagunas ; Bella Martínez Seis ; Juan Martínez-Miranda. Springer Science and Business Media Deutschland GmbH, 2022. pp. 68-82 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{7762701eb3094c9bbeb2beae081bc58f,

title = "A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws",

abstract = "This article focuses on the one hand, on showing some techniques applied during the preprocessing of texts represented by environmental laws of Mexico. The need to carry out this type of analysis is due to several factors such as: the large number of existing legislative documents such as laws, programs, regulations, etc., the modifications that are made to the legal system due to reforms and decrees, and especially, to those possible contradictions that may arise among one or more laws. On the other hand, certain tasks of the CRISP-DM methodology were selected and, specifically, for the data preparation phase in the generic tasks of selection, cleaning, transformation, and formatting. This was done using the NLTK library through text preprocessing techniques of tokenization, segmentation, denoising and normalization. Among the most remarkable results there is a combination between CRISP-DM and Team Data Science Process by Microsoft oriented to the preprocessing of Mexican federal environmental laws. In addition, this article shows a detailed application of the hybrid methodology with the execution of a specialized task related to the extraction of text from a pdf file using the PyPDF2 and Pdfplumber libraries.",

keywords = "Environmental laws, Methodologies, NLTK, Preprocessing, Text mining",

author = "{D{\'i}az {\'A}lvarez}, Yessenia and {Hidalgo Reyes}, {Miguel {\'A}ngel} and {Lagunes Barradas}, Virginia and {Pichardo Lagunas}, Obdulia and {Mart{\'i}nez Seis}, Bella",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 21st Mexican International Conference on Artificial Intelligence, MICAI 2022 ; Conference date: 24-10-2022 Through 29-10-2022",

year = "2022",

doi = "10.1007/978-3-031-19496-2_6",

language = "Ingl{\'e}s",

isbn = "9783031194955",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "68--82",

editor = "{Pichardo Lagunas}, Obdulia and {Mart{\'i}nez Seis}, Bella and Juan Mart{\'i}nez-Miranda",

booktitle = "Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings",

address = "Alemania",

}

Díaz Álvarez, Y, Hidalgo Reyes, MÁ, Lagunes Barradas, V, Pichardo Lagunas, O & Martínez Seis, B 2022, A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws. En O Pichardo Lagunas, B Martínez Seis & J Martínez-Miranda (eds.), Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13613 LNAI, Springer Science and Business Media Deutschland GmbH, pp. 68-82, 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Monterrey, México, 24/10/22. https://doi.org/10.1007/978-3-031-19496-2_6

A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws. / Díaz Álvarez, Yessenia; Hidalgo Reyes, Miguel Ángel; Lagunes Barradas, Virginia et al.
Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings. ed. / Obdulia Pichardo Lagunas; Bella Martínez Seis; Juan Martínez-Miranda. Springer Science and Business Media Deutschland GmbH, 2022. p. 68-82 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13613 LNAI).

Producción científica: Capítulo del libro/informe/acta de congreso › Contribución a la conferencia › revisión exhaustiva

TY - GEN

T1 - A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws

AU - Díaz Álvarez, Yessenia

AU - Hidalgo Reyes, Miguel Ángel

AU - Lagunes Barradas, Virginia

AU - Pichardo Lagunas, Obdulia

AU - Martínez Seis, Bella

PY - 2022

Y1 - 2022

N2 - This article focuses on the one hand, on showing some techniques applied during the preprocessing of texts represented by environmental laws of Mexico. The need to carry out this type of analysis is due to several factors such as: the large number of existing legislative documents such as laws, programs, regulations, etc., the modifications that are made to the legal system due to reforms and decrees, and especially, to those possible contradictions that may arise among one or more laws. On the other hand, certain tasks of the CRISP-DM methodology were selected and, specifically, for the data preparation phase in the generic tasks of selection, cleaning, transformation, and formatting. This was done using the NLTK library through text preprocessing techniques of tokenization, segmentation, denoising and normalization. Among the most remarkable results there is a combination between CRISP-DM and Team Data Science Process by Microsoft oriented to the preprocessing of Mexican federal environmental laws. In addition, this article shows a detailed application of the hybrid methodology with the execution of a specialized task related to the extraction of text from a pdf file using the PyPDF2 and Pdfplumber libraries.

AB - This article focuses on the one hand, on showing some techniques applied during the preprocessing of texts represented by environmental laws of Mexico. The need to carry out this type of analysis is due to several factors such as: the large number of existing legislative documents such as laws, programs, regulations, etc., the modifications that are made to the legal system due to reforms and decrees, and especially, to those possible contradictions that may arise among one or more laws. On the other hand, certain tasks of the CRISP-DM methodology were selected and, specifically, for the data preparation phase in the generic tasks of selection, cleaning, transformation, and formatting. This was done using the NLTK library through text preprocessing techniques of tokenization, segmentation, denoising and normalization. Among the most remarkable results there is a combination between CRISP-DM and Team Data Science Process by Microsoft oriented to the preprocessing of Mexican federal environmental laws. In addition, this article shows a detailed application of the hybrid methodology with the execution of a specialized task related to the extraction of text from a pdf file using the PyPDF2 and Pdfplumber libraries.

KW - Environmental laws

KW - Methodologies

KW - NLTK

KW - Preprocessing

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=85142801722&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-19496-2_6

DO - 10.1007/978-3-031-19496-2_6

M3 - Contribución a la conferencia

AN - SCOPUS:85142801722

SN - 9783031194955

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 68

EP - 82

BT - Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings

A2 - Pichardo Lagunas, Obdulia

A2 - Martínez Seis, Bella

A2 - Martínez-Miranda, Juan

PB - Springer Science and Business Media Deutschland GmbH

T2 - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022

Y2 - 24 October 2022 through 29 October 2022

ER -

Díaz Álvarez Y, Hidalgo Reyes MÁ, Lagunes Barradas V, Pichardo Lagunas O, Martínez Seis B. A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws. En Pichardo Lagunas O, Martínez Seis B, Martínez-Miranda J, editores, Advances in Computational Intelligence - 21st Mexican International Conference on Artificial Intelligence, MICAI 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. p. 68-82. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-19496-2_6

A Hybrid Methodology Based on CRISP-DM and TDSP for the Execution of Preprocessing Tasks in Mexican Environmental Laws

Resumen

Serie de la publicación

Conferencia

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto