Cic-IPN@INLi2018: Indian native language identification

Ilia Markov; Grigori Sidorov

Cic-IPN@INLi2018: Indian native language identification

Ilia Markov, Grigori Sidorov

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo de la conferencia › revisión exhaustiva

3 Citas (Scopus)

Resumen

In this paper, we describe the CIC-IPN submissions to the shared task on Indian Native Language Identification (INLI 2018). We use the Support Vector Machines algorithm trained on numerous feature types: word, character, part-of-speech tag, and punctuation mark n-grams, as well as character n-grams from misspelled words and emotion-based features. The features are weighted using log-entropy scheme. Our team achieved 41.8% accuracy on the test set 1 and 34.5% accuracy on the test set 2, ranking 3rd in the official INLI shared task scoring.

Idioma original	Inglés
Páginas (desde-hasta)	82-88
Número de páginas	7
Publicación	CEUR Workshop Proceedings
Volumen	2266
Estado	Publicada - 2018
Evento	10th Working Notes of FIRE - Forum for Information Retrieval Evaluation, FIRE-WN 2018 - Gandhinagar, India Duración: 6 dic. 2018 → 9 dic. 2018

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{1963b60ee6fd4884ac6f16fb514e6b30,

title = "Cic-IPN@INLi2018: Indian native language identification",

abstract = "In this paper, we describe the CIC-IPN submissions to the shared task on Indian Native Language Identification (INLI 2018). We use the Support Vector Machines algorithm trained on numerous feature types: word, character, part-of-speech tag, and punctuation mark n-grams, as well as character n-grams from misspelled words and emotion-based features. The features are weighted using log-entropy scheme. Our team achieved 41.8% accuracy on the test set 1 and 34.5% accuracy on the test set 2, ranking 3rd in the official INLI shared task scoring.",

keywords = "Feature engineering, Indian languages, Machine learning, Native Language Identification, Social media",

author = "Ilia Markov and Grigori Sidorov",

note = "Publisher Copyright: {\textcopyright} 2018 CEUR-WS. All Rights Reserved.; 10th Working Notes of FIRE - Forum for Information Retrieval Evaluation, FIRE-WN 2018 ; Conference date: 06-12-2018 Through 09-12-2018",

year = "2018",

language = "Ingl{\'e}s",

volume = "2266",

pages = "82--88",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - Cic-IPN@INLi2018

T2 - 10th Working Notes of FIRE - Forum for Information Retrieval Evaluation, FIRE-WN 2018

AU - Markov, Ilia

AU - Sidorov, Grigori

PY - 2018

Y1 - 2018

N2 - In this paper, we describe the CIC-IPN submissions to the shared task on Indian Native Language Identification (INLI 2018). We use the Support Vector Machines algorithm trained on numerous feature types: word, character, part-of-speech tag, and punctuation mark n-grams, as well as character n-grams from misspelled words and emotion-based features. The features are weighted using log-entropy scheme. Our team achieved 41.8% accuracy on the test set 1 and 34.5% accuracy on the test set 2, ranking 3rd in the official INLI shared task scoring.

AB - In this paper, we describe the CIC-IPN submissions to the shared task on Indian Native Language Identification (INLI 2018). We use the Support Vector Machines algorithm trained on numerous feature types: word, character, part-of-speech tag, and punctuation mark n-grams, as well as character n-grams from misspelled words and emotion-based features. The features are weighted using log-entropy scheme. Our team achieved 41.8% accuracy on the test set 1 and 34.5% accuracy on the test set 2, ranking 3rd in the official INLI shared task scoring.

KW - Feature engineering

KW - Indian languages

KW - Machine learning

KW - Native Language Identification

KW - Social media

UR - http://www.scopus.com/inward/record.url?scp=85058662261&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85058662261

SN - 1613-0073

VL - 2266

SP - 82

EP - 88

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 6 December 2018 through 9 December 2018

ER -

Cic-IPN@INLi2018: Indian native language identification

Resumen

Otros archivos y enlaces

Huella

Citar esto