Semantic textual similarity methods, tools, and applications: A survey

Goutam Majumder, Partha Pakray, Alexander Gelbukh, David Pinto

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Measuring Semantic Textual Similarity (STS), between words/terms, sentences, paragraph and document plays an important role in computer science and computational linguistic. It also has many applications over several fields such as Biomedical Informatics and Geoinformation. In this paper, we present a survey on different methods of textual similarity and we also reported about the availability of different software and tools those are useful for STS. In natural language processing (NLP), STS is a important component for many tasks such as document summarization, word sense disambiguation, short answer grading, information retrieval and extraction. We split out the measures for semantic similarity into three broad categories such as (i) Topological/Knowledge-based (ii) Statistical/Corpus Based (iii) String based. More emphasis is given to the methods related to the WordNet taxonomy. Because topological methods, plays an important role to understand intended meaning of an ambiguous word, which is very difficult to process computationally. We also propose a new method for measuring semantic similarity between sentences. This proposed method, uses the advantages of taxonomy methods and merge these information to a language model. It considers the WordNet synsets for lexical relationships between nodes/words and a uni-gram language model is implemented over a large corpus to assign the information content value between the two nodes of different classes.
Original languageAmerican English
Pages (from-to)647-665
Number of pages580
JournalComputacion y Sistemas
DOIs
StatePublished - 1 Jan 2016

Fingerprint

Semantics
Taxonomies
Computational linguistics
informatics
Information retrieval
Computer science
method
Availability
software
Processing
document
measuring

Cite this

Majumder, Goutam ; Pakray, Partha ; Gelbukh, Alexander ; Pinto, David. / Semantic textual similarity methods, tools, and applications: A survey. In: Computacion y Sistemas. 2016 ; pp. 647-665.
@article{82573f87373145e1a5b303d045ee8643,
title = "Semantic textual similarity methods, tools, and applications: A survey",
abstract = "Measuring Semantic Textual Similarity (STS), between words/terms, sentences, paragraph and document plays an important role in computer science and computational linguistic. It also has many applications over several fields such as Biomedical Informatics and Geoinformation. In this paper, we present a survey on different methods of textual similarity and we also reported about the availability of different software and tools those are useful for STS. In natural language processing (NLP), STS is a important component for many tasks such as document summarization, word sense disambiguation, short answer grading, information retrieval and extraction. We split out the measures for semantic similarity into three broad categories such as (i) Topological/Knowledge-based (ii) Statistical/Corpus Based (iii) String based. More emphasis is given to the methods related to the WordNet taxonomy. Because topological methods, plays an important role to understand intended meaning of an ambiguous word, which is very difficult to process computationally. We also propose a new method for measuring semantic similarity between sentences. This proposed method, uses the advantages of taxonomy methods and merge these information to a language model. It considers the WordNet synsets for lexical relationships between nodes/words and a uni-gram language model is implemented over a large corpus to assign the information content value between the two nodes of different classes.",
author = "Goutam Majumder and Partha Pakray and Alexander Gelbukh and David Pinto",
year = "2016",
month = "1",
day = "1",
doi = "10.13053/CyS-20-4-2506",
language = "American English",
pages = "647--665",
journal = "Computacion y Sistemas",
issn = "1405-5546",
publisher = "Centro de Investigacion en Computacion (CIC) del Instituto Politecnico Nacional (IPN)",

}

Semantic textual similarity methods, tools, and applications: A survey. / Majumder, Goutam; Pakray, Partha; Gelbukh, Alexander; Pinto, David.

In: Computacion y Sistemas, 01.01.2016, p. 647-665.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Semantic textual similarity methods, tools, and applications: A survey

AU - Majumder, Goutam

AU - Pakray, Partha

AU - Gelbukh, Alexander

AU - Pinto, David

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Measuring Semantic Textual Similarity (STS), between words/terms, sentences, paragraph and document plays an important role in computer science and computational linguistic. It also has many applications over several fields such as Biomedical Informatics and Geoinformation. In this paper, we present a survey on different methods of textual similarity and we also reported about the availability of different software and tools those are useful for STS. In natural language processing (NLP), STS is a important component for many tasks such as document summarization, word sense disambiguation, short answer grading, information retrieval and extraction. We split out the measures for semantic similarity into three broad categories such as (i) Topological/Knowledge-based (ii) Statistical/Corpus Based (iii) String based. More emphasis is given to the methods related to the WordNet taxonomy. Because topological methods, plays an important role to understand intended meaning of an ambiguous word, which is very difficult to process computationally. We also propose a new method for measuring semantic similarity between sentences. This proposed method, uses the advantages of taxonomy methods and merge these information to a language model. It considers the WordNet synsets for lexical relationships between nodes/words and a uni-gram language model is implemented over a large corpus to assign the information content value between the two nodes of different classes.

AB - Measuring Semantic Textual Similarity (STS), between words/terms, sentences, paragraph and document plays an important role in computer science and computational linguistic. It also has many applications over several fields such as Biomedical Informatics and Geoinformation. In this paper, we present a survey on different methods of textual similarity and we also reported about the availability of different software and tools those are useful for STS. In natural language processing (NLP), STS is a important component for many tasks such as document summarization, word sense disambiguation, short answer grading, information retrieval and extraction. We split out the measures for semantic similarity into three broad categories such as (i) Topological/Knowledge-based (ii) Statistical/Corpus Based (iii) String based. More emphasis is given to the methods related to the WordNet taxonomy. Because topological methods, plays an important role to understand intended meaning of an ambiguous word, which is very difficult to process computationally. We also propose a new method for measuring semantic similarity between sentences. This proposed method, uses the advantages of taxonomy methods and merge these information to a language model. It considers the WordNet synsets for lexical relationships between nodes/words and a uni-gram language model is implemented over a large corpus to assign the information content value between the two nodes of different classes.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85007309341&origin=inward

UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85007309341&origin=inward

U2 - 10.13053/CyS-20-4-2506

DO - 10.13053/CyS-20-4-2506

M3 - Article

SP - 647

EP - 665

JO - Computacion y Sistemas

JF - Computacion y Sistemas

SN - 1405-5546

ER -