TY - JOUR
T1 - Word-length correlations and memory in large texts
T2 - A visibility network analysis
AU - Guzmán-Vargas, Lev
AU - Obregón-Quintana, Bibiana
AU - Aguilar-Velázquez, Daniel
AU - Hernández-Pérez, Ricardo
AU - Liebovitch, Larry S.
N1 - Publisher Copyright:
© 2015 by the authors.
PY - 2015
Y1 - 2015
N2 - We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org) using the natural visibility graph method (NVG). NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P(k) k-γ , with two regimes, which are characterized by the exponents γs≈1.7 (at short degree scales) and γl ≈ 1.3 (at large degree scales). This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA) and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.
AB - We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org) using the natural visibility graph method (NVG). NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P(k) k-γ , with two regimes, which are characterized by the exponents γs≈1.7 (at short degree scales) and γl ≈ 1.3 (at large degree scales). This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA) and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.
KW - Syllables
KW - Texts
KW - Words frequency
KW - Words recurrence
UR - http://www.scopus.com/inward/record.url?scp=84951990591&partnerID=8YFLogxK
U2 - 10.3390/e17117798
DO - 10.3390/e17117798
M3 - Artículo
SN - 1099-4300
VL - 17
SP - 7798
EP - 7810
JO - Entropy
JF - Entropy
IS - 11
ER -