Syntactic dependency-based n-grams as classification features

Grigori Sidorov, Francisco Velasquez, Efstathios Stamatatos, Alexander Gelbukh, Liliana Chanona-Hernández

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

69 Citas (Scopus)

Resumen

In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.

Idioma originalInglés
Título de la publicación alojadaAdvances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers
Páginas1-11
Número de páginas11
EdiciónPART 2
DOI
EstadoPublicada - 2013
Evento11th Mexican International Conference on Artificial Intelligence, MICAI 2012 - San Luis Potosi, México
Duración: 27 oct. 20124 nov. 2012

Serie de la publicación

NombreLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NúmeroPART 2
Volumen7630 LNAI
ISSN (versión impresa)0302-9743
ISSN (versión digital)1611-3349

Conferencia

Conferencia11th Mexican International Conference on Artificial Intelligence, MICAI 2012
País/TerritorioMéxico
CiudadSan Luis Potosi
Período27/10/124/11/12

Huella

Profundice en los temas de investigación de 'Syntactic dependency-based n-grams as classification features'. En conjunto forman una huella única.

Citar esto