JUNITMZ at SemEval-2016 Task 1: Identifying semantic similarity using levenshtein ratio

Sandip Sarkar; Partha Pakray; Dipankar Das; Alexander Gelbukh

doi:10.18653/v1/s16-1108

JUNITMZ at SemEval-2016 Task 1: Identifying semantic similarity using levenshtein ratio

Sandip Sarkar, Partha Pakray, Dipankar Das, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

11 Scopus citations

Abstract

In this paper we describe the JUNITMZ 1 system that was developed for participation in Se-mEval 2016 Task 1: Semantic Textual Similarity. Methods for measuring the textual similarity are useful to a broad range of applications including: text mining, information retrieval, dialogue systems, machine translation and text summarization. However, many systems developed specifically for STS are complex, making them hard to incorporate as a module within a larger applied system. In this paper, we present an STS system based on three simple and robust similarity features that can be easily incorporated into more complex applied systems. The shared task results show that on most of the shared tasks evaluation sets, these signals achieve a strong (>0.70) level of correlation with human judgements. Our system's three features are: unigram overlap count, length normalized edit distance and the score computed by the METEOR machine translation metric. Features are combined to produces a similarity prediction using both a feedforward and recurrent neural network.

Original language	English
Title of host publication	SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings
Publisher	Association for Computational Linguistics (ACL)
Pages	702-705
Number of pages	4
ISBN (Electronic)	9781941643952
DOIs	https://doi.org/10.18653/v1/s16-1108
State	Published - 2016
Event	10th International Workshop on Semantic Evaluation, SemEval 2016 - San Diego, United States Duration: 16 Jun 2016 → 17 Jun 2016

Publication series

Name	SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings

Conference

Conference	10th International Workshop on Semantic Evaluation, SemEval 2016
Country/Territory	United States
City	San Diego
Period	16/06/16 → 17/06/16

Access to Document

10.18653/v1/s16-1108

Cite this

Sarkar, S., Pakray, P., Das, D., & Gelbukh, A. (2016). JUNITMZ at SemEval-2016 Task 1: Identifying semantic similarity using levenshtein ratio. In SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings (pp. 702-705). (SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s16-1108

Sarkar, Sandip ; Pakray, Partha ; Das, Dipankar et al. / JUNITMZ at SemEval-2016 Task 1 : Identifying semantic similarity using levenshtein ratio. SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings. Association for Computational Linguistics (ACL), 2016. pp. 702-705 (SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings).

@inproceedings{9a74bb638a3f428796892d911c4b04fa,

title = "JUNITMZ at SemEval-2016 Task 1: Identifying semantic similarity using levenshtein ratio",

abstract = "In this paper we describe the JUNITMZ 1 system that was developed for participation in Se-mEval 2016 Task 1: Semantic Textual Similarity. Methods for measuring the textual similarity are useful to a broad range of applications including: text mining, information retrieval, dialogue systems, machine translation and text summarization. However, many systems developed specifically for STS are complex, making them hard to incorporate as a module within a larger applied system. In this paper, we present an STS system based on three simple and robust similarity features that can be easily incorporated into more complex applied systems. The shared task results show that on most of the shared tasks evaluation sets, these signals achieve a strong (>0.70) level of correlation with human judgements. Our system's three features are: unigram overlap count, length normalized edit distance and the score computed by the METEOR machine translation metric. Features are combined to produces a similarity prediction using both a feedforward and recurrent neural network.",

author = "Sandip Sarkar and Partha Pakray and Dipankar Das and Alexander Gelbukh",

note = "Publisher Copyright: {\textcopyright} 2016 Association for Computational Linguistics.; 10th International Workshop on Semantic Evaluation, SemEval 2016 ; Conference date: 16-06-2016 Through 17-06-2016",

year = "2016",

doi = "10.18653/v1/s16-1108",

language = "Ingl{\'e}s",

series = "SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings",

publisher = "Association for Computational Linguistics (ACL)",

pages = "702--705",

booktitle = "SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings",

}

Sarkar, S, Pakray, P, Das, D & Gelbukh, A 2016, JUNITMZ at SemEval-2016 Task 1: Identifying semantic similarity using levenshtein ratio. in SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings. SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings, Association for Computational Linguistics (ACL), pp. 702-705, 10th International Workshop on Semantic Evaluation, SemEval 2016, San Diego, United States, 16/06/16. https://doi.org/10.18653/v1/s16-1108

JUNITMZ at SemEval-2016 Task 1: Identifying semantic similarity using levenshtein ratio. / Sarkar, Sandip; Pakray, Partha; Das, Dipankar et al.
SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings. Association for Computational Linguistics (ACL), 2016. p. 702-705 (SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - JUNITMZ at SemEval-2016 Task 1

T2 - 10th International Workshop on Semantic Evaluation, SemEval 2016

AU - Sarkar, Sandip

AU - Pakray, Partha

AU - Das, Dipankar

AU - Gelbukh, Alexander

PY - 2016

Y1 - 2016

N2 - In this paper we describe the JUNITMZ 1 system that was developed for participation in Se-mEval 2016 Task 1: Semantic Textual Similarity. Methods for measuring the textual similarity are useful to a broad range of applications including: text mining, information retrieval, dialogue systems, machine translation and text summarization. However, many systems developed specifically for STS are complex, making them hard to incorporate as a module within a larger applied system. In this paper, we present an STS system based on three simple and robust similarity features that can be easily incorporated into more complex applied systems. The shared task results show that on most of the shared tasks evaluation sets, these signals achieve a strong (>0.70) level of correlation with human judgements. Our system's three features are: unigram overlap count, length normalized edit distance and the score computed by the METEOR machine translation metric. Features are combined to produces a similarity prediction using both a feedforward and recurrent neural network.

AB - In this paper we describe the JUNITMZ 1 system that was developed for participation in Se-mEval 2016 Task 1: Semantic Textual Similarity. Methods for measuring the textual similarity are useful to a broad range of applications including: text mining, information retrieval, dialogue systems, machine translation and text summarization. However, many systems developed specifically for STS are complex, making them hard to incorporate as a module within a larger applied system. In this paper, we present an STS system based on three simple and robust similarity features that can be easily incorporated into more complex applied systems. The shared task results show that on most of the shared tasks evaluation sets, these signals achieve a strong (>0.70) level of correlation with human judgements. Our system's three features are: unigram overlap count, length normalized edit distance and the score computed by the METEOR machine translation metric. Features are combined to produces a similarity prediction using both a feedforward and recurrent neural network.

UR - http://www.scopus.com/inward/record.url?scp=85006154481&partnerID=8YFLogxK

U2 - 10.18653/v1/s16-1108

DO - 10.18653/v1/s16-1108

M3 - Contribución a la conferencia

AN - SCOPUS:85006154481

T3 - SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings

SP - 702

EP - 705

BT - SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings

PB - Association for Computational Linguistics (ACL)

Y2 - 16 June 2016 through 17 June 2016

ER -

Sarkar S, Pakray P, Das D, Gelbukh A. JUNITMZ at SemEval-2016 Task 1: Identifying semantic similarity using levenshtein ratio. In SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings. Association for Computational Linguistics (ACL). 2016. p. 702-705. (SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings). doi: 10.18653/v1/s16-1108

JUNITMZ at SemEval-2016 Task 1: Identifying semantic similarity using levenshtein ratio

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this