Syntactic dependency-based n-grams as classification features

Grigori Sidorov, Francisco Velasquez, Efstathios Stamatatos, Alexander Gelbukh, Liliana Chanona-Hernández

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

69 Scopus citations

Abstract

In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.

Original languageEnglish
Title of host publicationAdvances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers
Pages1-11
Number of pages11
EditionPART 2
DOIs
StatePublished - 2013
Event11th Mexican International Conference on Artificial Intelligence, MICAI 2012 - San Luis Potosi, Mexico
Duration: 27 Oct 20124 Nov 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume7630 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th Mexican International Conference on Artificial Intelligence, MICAI 2012
Country/TerritoryMexico
CitySan Luis Potosi
Period27/10/124/11/12

Keywords

  • authorship attribution
  • classification features
  • parsing
  • sn-grams
  • syntactic n-grams
  • syntactic paths

Fingerprint

Dive into the research topics of 'Syntactic dependency-based n-grams as classification features'. Together they form a unique fingerprint.

Cite this