NLP-NITMZ@DPIL-FIRE2016: Language independent paraphrases detection

Sandip Sarkar; Saurav Saha; Jereemi Bentham; Partha Pakray; Dipankar Das; Alexander Gelbukh

NLP-NITMZ@DPIL-FIRE2016: Language independent paraphrases detection

Sandip Sarkar, Saurav Saha, Jereemi Bentham, Partha Pakray, Dipankar Das, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Conference article › peer-review

4 Scopus citations

Abstract

In this paper we describe the detailed information of NLP-NITMZ system on the participation of DPIL¹ shared task at Forum for Information Retrieval Evaluation (FIRE 2016). The main aim of DPIL shared task is to detect paraphrases in Indian Languages. Paraphrase detection is an important part in the field of Information Retrieval, Document Summarization, Question Answering, Plagiarism Detection etc. In our approach, we used language independent feature-set to detect paraphrases in Indian languages. Features are mainly based on lexical based similarity. Our system's three features are: Jaccard Similarity, length normalized Edit Distance and Cosine Similarity. Finally, these feature-set are trained using Probabilistic Neural Network (PNN) to detect the paraphrases. With our feature-set, we achieved 88.13% average accuracy in Sub-Task 1 and 71.98% average accuracy in Sub-Task 2.

Original language	English
Pages (from-to)	256-259
Number of pages	4
Journal	CEUR Workshop Proceedings
Volume	1737
State	Published - 2016
Event	2016 Forum for Information Retrieval Evaluation, FIRE 2016 - Kolkata, India Duration: 7 Dec 2016 → 10 Dec 2016

Keywords

DPIL
Jaccard similarity
Plagiarism detection
Probabilistic neural network (PNN)

Cite this

@article{f96380adcba642248d84eea325f9c8d4,

title = "NLP-NITMZ@DPIL-FIRE2016: Language independent paraphrases detection",

abstract = "In this paper we describe the detailed information of NLP-NITMZ system on the participation of DPIL1 shared task at Forum for Information Retrieval Evaluation (FIRE 2016). The main aim of DPIL shared task is to detect paraphrases in Indian Languages. Paraphrase detection is an important part in the field of Information Retrieval, Document Summarization, Question Answering, Plagiarism Detection etc. In our approach, we used language independent feature-set to detect paraphrases in Indian languages. Features are mainly based on lexical based similarity. Our system's three features are: Jaccard Similarity, length normalized Edit Distance and Cosine Similarity. Finally, these feature-set are trained using Probabilistic Neural Network (PNN) to detect the paraphrases. With our feature-set, we achieved 88.13% average accuracy in Sub-Task 1 and 71.98% average accuracy in Sub-Task 2.",

keywords = "DPIL, Jaccard similarity, Plagiarism detection, Probabilistic neural network (PNN)",

author = "Sandip Sarkar and Saurav Saha and Jereemi Bentham and Partha Pakray and Dipankar Das and Alexander Gelbukh",

year = "2016",

language = "Ingl{\'e}s",

volume = "1737",

pages = "256--259",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

note = "2016 Forum for Information Retrieval Evaluation, FIRE 2016 ; Conference date: 07-12-2016 Through 10-12-2016",

}

TY - JOUR

T1 - NLP-NITMZ@DPIL-FIRE2016

T2 - 2016 Forum for Information Retrieval Evaluation, FIRE 2016

AU - Sarkar, Sandip

AU - Saha, Saurav

AU - Bentham, Jereemi

AU - Pakray, Partha

AU - Das, Dipankar

AU - Gelbukh, Alexander

PY - 2016

Y1 - 2016

N2 - In this paper we describe the detailed information of NLP-NITMZ system on the participation of DPIL1 shared task at Forum for Information Retrieval Evaluation (FIRE 2016). The main aim of DPIL shared task is to detect paraphrases in Indian Languages. Paraphrase detection is an important part in the field of Information Retrieval, Document Summarization, Question Answering, Plagiarism Detection etc. In our approach, we used language independent feature-set to detect paraphrases in Indian languages. Features are mainly based on lexical based similarity. Our system's three features are: Jaccard Similarity, length normalized Edit Distance and Cosine Similarity. Finally, these feature-set are trained using Probabilistic Neural Network (PNN) to detect the paraphrases. With our feature-set, we achieved 88.13% average accuracy in Sub-Task 1 and 71.98% average accuracy in Sub-Task 2.

AB - In this paper we describe the detailed information of NLP-NITMZ system on the participation of DPIL1 shared task at Forum for Information Retrieval Evaluation (FIRE 2016). The main aim of DPIL shared task is to detect paraphrases in Indian Languages. Paraphrase detection is an important part in the field of Information Retrieval, Document Summarization, Question Answering, Plagiarism Detection etc. In our approach, we used language independent feature-set to detect paraphrases in Indian languages. Features are mainly based on lexical based similarity. Our system's three features are: Jaccard Similarity, length normalized Edit Distance and Cosine Similarity. Finally, these feature-set are trained using Probabilistic Neural Network (PNN) to detect the paraphrases. With our feature-set, we achieved 88.13% average accuracy in Sub-Task 1 and 71.98% average accuracy in Sub-Task 2.

KW - DPIL

KW - Jaccard similarity

KW - Plagiarism detection

KW - Probabilistic neural network (PNN)

UR - http://www.scopus.com/inward/record.url?scp=85006154542&partnerID=8YFLogxK

M3 - Artículo de la conferencia

AN - SCOPUS:85006154542

SN - 1613-0073

VL - 1737

SP - 256

EP - 259

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

Y2 - 7 December 2016 through 10 December 2016

ER -

NLP-NITMZ@DPIL-FIRE2016: Language independent paraphrases detection

Abstract

Keywords

Other files and links

Fingerprint

Cite this