Syntactic dependency-based n-grams as classification features

Grigori Sidorov; Francisco Velasquez; Efstathios Stamatatos; Alexander Gelbukh; Liliana Chanona-Hernández

doi:10.1007/978-3-642-37798-3_1

Syntactic dependency-based n-grams as classification features

Grigori Sidorov, Francisco Velasquez, Efstathios Stamatatos, Alexander Gelbukh, Liliana Chanona-Hernández

Centro de Investigación en Computación (CIC)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

69 Scopus citations

Abstract

In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.

Original language	English
Title of host publication	Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers
Pages	1-11
Number of pages	11
Edition	PART 2
DOIs	https://doi.org/10.1007/978-3-642-37798-3_1
State	Published - 2013
Event	11th Mexican International Conference on Artificial Intelligence, MICAI 2012 - San Luis Potosi, Mexico Duration: 27 Oct 2012 → 4 Nov 2012

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Number	PART 2
Volume	7630 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	11th Mexican International Conference on Artificial Intelligence, MICAI 2012
Country/Territory	Mexico
City	San Luis Potosi
Period	27/10/12 → 4/11/12

Keywords

authorship attribution
classification features
parsing
sn-grams
syntactic n-grams
syntactic paths

Access to Document

10.1007/978-3-642-37798-3_1

Cite this

Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., & Chanona-Hernández, L. (2013). Syntactic dependency-based n-grams as classification features. In Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers (PART 2 ed., pp. 1-11). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7630 LNAI, No. PART 2). https://doi.org/10.1007/978-3-642-37798-3_1

Sidorov, Grigori ; Velasquez, Francisco ; Stamatatos, Efstathios et al. / Syntactic dependency-based n-grams as classification features. Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers. PART 2. ed. 2013. pp. 1-11 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2).

@inproceedings{ac733cf64ab64638a6258acf42482cd9,

title = "Syntactic dependency-based n-grams as classification features",

abstract = "In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.",

keywords = "authorship attribution, classification features, parsing, sn-grams, syntactic n-grams, syntactic paths",

author = "Grigori Sidorov and Francisco Velasquez and Efstathios Stamatatos and Alexander Gelbukh and Liliana Chanona-Hern{\'a}ndez",

year = "2013",

doi = "10.1007/978-3-642-37798-3_1",

language = "Ingl{\'e}s",

isbn = "9783642377976",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

number = "PART 2",

pages = "1--11",

booktitle = "Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers",

edition = "PART 2",

note = "11th Mexican International Conference on Artificial Intelligence, MICAI 2012 ; Conference date: 27-10-2012 Through 04-11-2012",

}

Sidorov, G, Velasquez, F, Stamatatos, E, Gelbukh, A & Chanona-Hernández, L 2013, Syntactic dependency-based n-grams as classification features. in Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers. PART 2 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 2, vol. 7630 LNAI, pp. 1-11, 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, San Luis Potosi, Mexico, 27/10/12. https://doi.org/10.1007/978-3-642-37798-3_1

Syntactic dependency-based n-grams as classification features. / Sidorov, Grigori; Velasquez, Francisco; Stamatatos, Efstathios et al.
Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers. PART 2. ed. 2013. p. 1-11 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7630 LNAI, No. PART 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Syntactic dependency-based n-grams as classification features

AU - Sidorov, Grigori

AU - Velasquez, Francisco

AU - Stamatatos, Efstathios

AU - Gelbukh, Alexander

AU - Chanona-Hernández, Liliana

PY - 2013

Y1 - 2013

N2 - In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.

AB - In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.

KW - authorship attribution

KW - classification features

KW - parsing

KW - sn-grams

KW - syntactic n-grams

KW - syntactic paths

UR - http://www.scopus.com/inward/record.url?scp=84875865567&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-37798-3_1

DO - 10.1007/978-3-642-37798-3_1

M3 - Contribución a la conferencia

SN - 9783642377976

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 1

EP - 11

BT - Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers

T2 - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012

Y2 - 27 October 2012 through 4 November 2012

ER -

Sidorov G, Velasquez F, Stamatatos E, Gelbukh A, Chanona-Hernández L. Syntactic dependency-based n-grams as classification features. In Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers. PART 2 ed. 2013. p. 1-11. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2). doi: 10.1007/978-3-642-37798-3_1

Syntactic dependency-based n-grams as classification features

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this