TY - GEN
T1 - Relevance of named entities in authorship attribution
AU - Ríos-Toledo, Germán
AU - Sidorov, Grigori
AU - Castro-Sánchez, Noé Alejandro
AU - Nava-Zea, Alondra
AU - Chanona-Hernández, Liliana
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Named entities (NE) are words that refer to names of people, locations, organization, etc. NE are present in every kind of documents: e-mails, letters, essays, novels, poems. Automatic detection of these words is very important task in natural language processing. Sometimes, NE are used in authorship attribution studies as a stylometric feature. The goal of this paper is to evaluate the effect of the presence of NE in texts for the authorship attribution task: are we really detecting the style of an author or are we just discovering the appearance of the same NE. We used the corpus that consists of 91 novels of 7 authors of XVIII century. These authors spoke and wrote English, their native language. All novels belong to fiction genre. The used stylometric features were character n-grams, word n-gram and n-gram of POS tags of various sizes (2-grams, 3-grams, etc.). Five novels were selected for each author, these novels contain between 4 and 7% of the NE. All novels were divided into blocks, each block contains 10,000 terms. Two kinds of experiment were conducted: automatic classification of blocks containing NE and of the same blocks without NE. In some cases, we use only the most frequent n-grams (500, 2,000 and 4,000 n-grams). Three machine learning algorithms were used for classification task: NB, SVM (SMO) and J48. The results show that as a tendency the presence of the NE helps to classify (improvements from 5% to 20%), but there are specific authors when NE do not help and even make the classification worse (about 10% of experimental data).
AB - Named entities (NE) are words that refer to names of people, locations, organization, etc. NE are present in every kind of documents: e-mails, letters, essays, novels, poems. Automatic detection of these words is very important task in natural language processing. Sometimes, NE are used in authorship attribution studies as a stylometric feature. The goal of this paper is to evaluate the effect of the presence of NE in texts for the authorship attribution task: are we really detecting the style of an author or are we just discovering the appearance of the same NE. We used the corpus that consists of 91 novels of 7 authors of XVIII century. These authors spoke and wrote English, their native language. All novels belong to fiction genre. The used stylometric features were character n-grams, word n-gram and n-gram of POS tags of various sizes (2-grams, 3-grams, etc.). Five novels were selected for each author, these novels contain between 4 and 7% of the NE. All novels were divided into blocks, each block contains 10,000 terms. Two kinds of experiment were conducted: automatic classification of blocks containing NE and of the same blocks without NE. In some cases, we use only the most frequent n-grams (500, 2,000 and 4,000 n-grams). Three machine learning algorithms were used for classification task: NB, SVM (SMO) and J48. The results show that as a tendency the presence of the NE helps to classify (improvements from 5% to 20%), but there are specific authors when NE do not help and even make the classification worse (about 10% of experimental data).
KW - Authorship attribution
KW - Machine learning
KW - N-grams
KW - Named entities
UR - http://www.scopus.com/inward/record.url?scp=85028472313&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-62434-1_1
DO - 10.1007/978-3-319-62434-1_1
M3 - Contribución a la conferencia
SN - 9783319624334
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 15
BT - Advances in Soft Computing - 15th Mexican International Conference on Artificial Intelligence, MICAI 2016, Proceedings
A2 - Herrera-Alcantara, Oscar
A2 - Sidorov, Grigori
PB - Springer Verlag
T2 - 15th Mexican International Conference on Artificial Intelligence, MICAI 2016
Y2 - 23 October 2016 through 28 October 2016
ER -