Compilation of a Spanish representative corpus

Alexander Gelbukh, Grigori Sidorov, Liliana Chanona-Hernández

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

10 Citas (Scopus)

Resumen

Due to the Zipf law, even a very large corpus contains very few occurrences (tokens) for the majority of its different words (types). Only a corpus containing enough occurrences of even rare words can provide necessary statistical information for the study of contextual usage of words. We call such corpus representative and suggest to use Internet for its compilation. The corresponding algorithm and its application to Spanish are described. Different concepts of a representative corpus are discussed.

Idioma originalInglés
Título de la publicación alojadaComputational Linguistics and Intelligent Text Processing - 3rd International Conference, CICLing 2002, Proceedings
EditoresAlexander Gelbukh
EditorialSpringer Verlag
Páginas285-288
Número de páginas4
ISBN (versión impresa)3540432191, 9783540457152
DOI
EstadoPublicada - 2002
Evento3rd Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2002 - Mexico City, México
Duración: 17 feb. 200223 feb. 2002

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen2276
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia3rd Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2002
País/TerritorioMéxico
CiudadMexico City
Período17/02/0223/02/02

Huella

Profundice en los temas de investigación de 'Compilation of a Spanish representative corpus'. En conjunto forman una huella única.

Citar esto