Cic-IPN@INLi2018: Indian native language identification

Ilia Markov, Grigori Sidorov

Producción científica: Contribución a una revistaArtículo de la conferenciarevisión exhaustiva

3 Citas (Scopus)

Resumen

In this paper, we describe the CIC-IPN submissions to the shared task on Indian Native Language Identification (INLI 2018). We use the Support Vector Machines algorithm trained on numerous feature types: word, character, part-of-speech tag, and punctuation mark n-grams, as well as character n-grams from misspelled words and emotion-based features. The features are weighted using log-entropy scheme. Our team achieved 41.8% accuracy on the test set 1 and 34.5% accuracy on the test set 2, ranking 3rd in the official INLI shared task scoring.

Idioma originalInglés
Páginas (desde-hasta)82-88
Número de páginas7
PublicaciónCEUR Workshop Proceedings
Volumen2266
EstadoPublicada - 2018
Evento10th Working Notes of FIRE - Forum for Information Retrieval Evaluation, FIRE-WN 2018 - Gandhinagar, India
Duración: 6 dic. 20189 dic. 2018

Huella

Profundice en los temas de investigación de 'Cic-IPN@INLi2018: Indian native language identification'. En conjunto forman una huella única.

Citar esto