NLP-NITMZ@DPIL-FIRE2016: Language independent paraphrases detection

Sandip Sarkar, Saurav Saha, Jereemi Bentham, Partha Pakray, Dipankar Das, Alexander Gelbukh

Research output: Contribution to journalConference articlepeer-review

4 Scopus citations

Abstract

In this paper we describe the detailed information of NLP-NITMZ system on the participation of DPIL1 shared task at Forum for Information Retrieval Evaluation (FIRE 2016). The main aim of DPIL shared task is to detect paraphrases in Indian Languages. Paraphrase detection is an important part in the field of Information Retrieval, Document Summarization, Question Answering, Plagiarism Detection etc. In our approach, we used language independent feature-set to detect paraphrases in Indian languages. Features are mainly based on lexical based similarity. Our system's three features are: Jaccard Similarity, length normalized Edit Distance and Cosine Similarity. Finally, these feature-set are trained using Probabilistic Neural Network (PNN) to detect the paraphrases. With our feature-set, we achieved 88.13% average accuracy in Sub-Task 1 and 71.98% average accuracy in Sub-Task 2.

Original languageEnglish
Pages (from-to)256-259
Number of pages4
JournalCEUR Workshop Proceedings
Volume1737
StatePublished - 2016
Event2016 Forum for Information Retrieval Evaluation, FIRE 2016 - Kolkata, India
Duration: 7 Dec 201610 Dec 2016

Keywords

  • DPIL
  • Jaccard similarity
  • Plagiarism detection
  • Probabilistic neural network (PNN)

Fingerprint

Dive into the research topics of 'NLP-NITMZ@DPIL-FIRE2016: Language independent paraphrases detection'. Together they form a unique fingerprint.

Cite this