TY - JOUR
T1 - Evaluating the irregularity of natural languages
AU - Hernández-Gómez, Candelario
AU - Basurto-Flores, Rogelio
AU - Obregón-Quintana, Bibiana
AU - Guzmán-Vargas, Lev
N1 - Publisher Copyright:
© 2017 by the authors.
PY - 2017/10/1
Y1 - 2017/10/1
N2 - In the present work, we quantify the irregularity of different European languages belonging to four linguistic families (Romance, Germanic, Uralic and Slavic) and an artificial language (Esperanto). We modified a well-known method to calculate the approximate and sample entropy of written texts. We find differences in the degree of irregularity between the families and our method, which is based on the search of regularities in a sequence of symbols, and consistently distinguishes between natural and synthetic randomized texts. Moreover, we extended our study to the case where multiple scales are accounted for, such as the multiscale entropy analysis. Our results revealed that real texts have non-trivial structure compared to the ones obtained from randomization procedures.
AB - In the present work, we quantify the irregularity of different European languages belonging to four linguistic families (Romance, Germanic, Uralic and Slavic) and an artificial language (Esperanto). We modified a well-known method to calculate the approximate and sample entropy of written texts. We find differences in the degree of irregularity between the families and our method, which is based on the search of regularities in a sequence of symbols, and consistently distinguishes between natural and synthetic randomized texts. Moreover, we extended our study to the case where multiple scales are accounted for, such as the multiscale entropy analysis. Our results revealed that real texts have non-trivial structure compared to the ones obtained from randomization procedures.
KW - Approximate entropy of texts
KW - Sample entropy
KW - Symbol sequences
KW - Text irregularity
UR - http://www.scopus.com/inward/record.url?scp=85031906559&partnerID=8YFLogxK
U2 - 10.3390/e19100521
DO - 10.3390/e19100521
M3 - Artículo
SN - 1099-4300
VL - 19
JO - Entropy
JF - Entropy
IS - 10
M1 - 521
ER -