Word-length correlations and memory in large texts: A visibility network analysis

Lev Guzmán-Vargas; Bibiana Obregón-Quintana; Daniel Aguilar-Velázquez; Ricardo Hernández-Pérez; Larry S. Liebovitch

doi:10.3390/e17117798

Word-length correlations and memory in large texts: A visibility network analysis

Lev Guzmán-Vargas, Bibiana Obregón-Quintana, Daniel Aguilar-Velázquez, Ricardo Hernández-Pérez, Larry S. Liebovitch

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

11 Citas (Scopus)

Resumen

We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org) using the natural visibility graph method (NVG). NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P(k) k^-γ , with two regimes, which are characterized by the exponents γ_s≈1.7 (at short degree scales) and γ_l ≈ 1.3 (at large degree scales). This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA) and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.

Idioma original	Inglés
Páginas (desde-hasta)	7798-7810
Número de páginas	13
Publicación	Entropy
Volumen	17
N.º	11
DOI	https://doi.org/10.3390/e17117798
Estado	Publicada - 2015

Acceder al documento

10.3390/e17117798

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{fc03d2ea04a8487e88a02f5411172111,

title = "Word-length correlations and memory in large texts: A visibility network analysis",

abstract = "We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org) using the natural visibility graph method (NVG). NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P(k) k-γ , with two regimes, which are characterized by the exponents γs≈1.7 (at short degree scales) and γl ≈ 1.3 (at large degree scales). This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA) and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.",

keywords = "Syllables, Texts, Words frequency, Words recurrence",

author = "Lev Guzm{\'a}n-Vargas and Bibiana Obreg{\'o}n-Quintana and Daniel Aguilar-Vel{\'a}zquez and Ricardo Hern{\'a}ndez-P{\'e}rez and Liebovitch, {Larry S.}",

note = "Publisher Copyright: {\textcopyright} 2015 by the authors.",

year = "2015",

doi = "10.3390/e17117798",

language = "Ingl{\'e}s",

volume = "17",

pages = "7798--7810",

journal = "Entropy",

issn = "1099-4300",

number = "11",

}

TY - JOUR

T1 - Word-length correlations and memory in large texts

T2 - A visibility network analysis

AU - Guzmán-Vargas, Lev

AU - Obregón-Quintana, Bibiana

AU - Aguilar-Velázquez, Daniel

AU - Hernández-Pérez, Ricardo

AU - Liebovitch, Larry S.

PY - 2015

Y1 - 2015

N2 - We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org) using the natural visibility graph method (NVG). NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P(k) k-γ , with two regimes, which are characterized by the exponents γs≈1.7 (at short degree scales) and γl ≈ 1.3 (at large degree scales). This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA) and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.

AB - We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org) using the natural visibility graph method (NVG). NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P(k) k-γ , with two regimes, which are characterized by the exponents γs≈1.7 (at short degree scales) and γl ≈ 1.3 (at large degree scales). This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA) and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.

KW - Syllables

KW - Texts

KW - Words frequency

KW - Words recurrence

UR - http://www.scopus.com/inward/record.url?scp=84951990591&partnerID=8YFLogxK

U2 - 10.3390/e17117798

DO - 10.3390/e17117798

M3 - Artículo

SN - 1099-4300

VL - 17

SP - 7798

EP - 7810

JO - Entropy

JF - Entropy

IS - 11

ER -

Word-length correlations and memory in large texts: A visibility network analysis

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto