Cic-IPN@INLi2018: Indian native language identification

Ilia Markov; Grigori Sidorov

Cic-IPN@INLi2018: Indian native language identification

Ilia Markov, Grigori Sidorov

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

3 Scopus citations

Abstract

In this paper, we describe the CIC-IPN submissions to the shared task on Indian Native Language Identification (INLI 2018). We use the Support Vector Machines algorithm trained on numerous feature types: word, character, part-of-speech tag, and punctuation mark n-grams, as well as character n-grams from misspelled words and emotion-based features. The features are weighted using log-entropy scheme. Our team achieved 41.8% accuracy on the test set 1 and 34.5% accuracy on the test set 2, ranking 3rd in the official INLI shared task scoring.

Original language	English
Pages (from-to)	82-88
Number of pages	7
Journal	CEUR Workshop Proceedings
Volume	2266
State	Published - 2018
Event	10th Working Notes of FIRE - Forum for Information Retrieval Evaluation, FIRE-WN 2018 - Gandhinagar, India Duration: 6 Dec 2018 → 9 Dec 2018

Keywords

Feature engineering
Indian languages
Machine learning
Native Language Identification
Social media

Cite this

@article{1963b60ee6fd4884ac6f16fb514e6b30,

title = "Cic-IPN@INLi2018: Indian native language identification",

abstract = "In this paper, we describe the CIC-IPN submissions to the shared task on Indian Native Language Identification (INLI 2018). We use the Support Vector Machines algorithm trained on numerous feature types: word, character, part-of-speech tag, and punctuation mark n-grams, as well as character n-grams from misspelled words and emotion-based features. The features are weighted using log-entropy scheme. Our team achieved 41.8% accuracy on the test set 1 and 34.5% accuracy on the test set 2, ranking 3rd in the official INLI shared task scoring.",

keywords = "Feature engineering, Indian languages, Machine learning, Native Language Identification, Social media",

author = "Ilia Markov and Grigori Sidorov",

note = "Publisher Copyright: {\textcopyright} 2018 CEUR-WS. All Rights Reserved.; 10th Working Notes of FIRE - Forum for Information Retrieval Evaluation, FIRE-WN 2018 ; Conference date: 06-12-2018 Through 09-12-2018",

year = "2018",

language = "Ingl{\'e}s",

volume = "2266",

pages = "82--88",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - Cic-IPN@INLi2018

T2 - 10th Working Notes of FIRE - Forum for Information Retrieval Evaluation, FIRE-WN 2018

AU - Markov, Ilia

AU - Sidorov, Grigori

PY - 2018

Y1 - 2018

N2 - In this paper, we describe the CIC-IPN submissions to the shared task on Indian Native Language Identification (INLI 2018). We use the Support Vector Machines algorithm trained on numerous feature types: word, character, part-of-speech tag, and punctuation mark n-grams, as well as character n-grams from misspelled words and emotion-based features. The features are weighted using log-entropy scheme. Our team achieved 41.8% accuracy on the test set 1 and 34.5% accuracy on the test set 2, ranking 3rd in the official INLI shared task scoring.

AB - In this paper, we describe the CIC-IPN submissions to the shared task on Indian Native Language Identification (INLI 2018). We use the Support Vector Machines algorithm trained on numerous feature types: word, character, part-of-speech tag, and punctuation mark n-grams, as well as character n-grams from misspelled words and emotion-based features. The features are weighted using log-entropy scheme. Our team achieved 41.8% accuracy on the test set 1 and 34.5% accuracy on the test set 2, ranking 3rd in the official INLI shared task scoring.

KW - Feature engineering

KW - Indian languages

KW - Machine learning

KW - Native Language Identification

KW - Social media

UR - http://www.scopus.com/inward/record.url?scp=85058662261&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85058662261

SN - 1613-0073

VL - 2266

SP - 82

EP - 88

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 6 December 2018 through 9 December 2018

ER -

Cic-IPN@INLi2018: Indian native language identification

Abstract

Keywords

Other files and links

Fingerprint

Cite this