Recognizing textual entailment in non-English text via automatic translation into English

Partha Pakray; Snehasis Neogi; Sivaji Bandyopadhyay; Alexander Gelbukh

doi:10.1007/978-3-642-37798-3_3

Recognizing textual entailment in non-English text via automatic translation into English

Partha Pakray, Snehasis Neogi, Sivaji Bandyopadhyay, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

4 Scopus citations

Abstract

We show that a task that typically involves rather deep semantic processing of text-being recognizing textual entailment our case study-can be successfully solved without any tools at all specific for the language of the texts on which the task is performed. Instead, we automatically translate the text into English using a standard machine translation system, and then perform all linguistic processing, including syntactic and semantic levels, using only English language linguistic tools. In this case study we use Italian annotated data. Textual entailment is a relation between two texts. To detect it, we use various measures, which allow us to make entailment decision in the two-way classification task (yes / no). We set up various heuristics and measures for evaluating the entailment between two texts based on lexical relations. To make entailment judgments, the system applies named entity recognition module, chunking, part-of-speech tagging, n-grams, and text similarity modules to both texts, all those modules being for English and not for Italian. Rules have been developed to perform the two-way entailment classification. Our system makes entailment judgments basing on the entailment scores for the text pairs. The system was evaluated on Italian textual entailment data sets: we trained our system on Italian development datasets using the WEKA machine learning toolset and tested it on Italian test data sets. The accuracy of our system on the development corpus is 0.525 and on the test corpus is 0.66, which is a good result given that no Italian-specific linguistic information was used.

Original language	English
Title of host publication	Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers
Pages	26-35
Number of pages	10
Edition	PART 2
DOIs	https://doi.org/10.1007/978-3-642-37798-3_3
State	Published - 2013
Event	11th Mexican International Conference on Artificial Intelligence, MICAI 2012 - San Luis Potosi, Mexico Duration: 27 Oct 2012 → 4 Nov 2012

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Number	PART 2
Volume	7630 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	11th Mexican International Conference on Artificial Intelligence, MICAI 2012
Country/Territory	Mexico
City	San Luis Potosi
Period	27/10/12 → 4/11/12

Keywords

Recognizing textual entailment
cross-lingual textual entailment
machine translation
n-grams
text similarity

Access to Document

10.1007/978-3-642-37798-3_3

Cite this

Pakray, P., Neogi, S., Bandyopadhyay, S., & Gelbukh, A. (2013). Recognizing textual entailment in non-English text via automatic translation into English. In Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers (PART 2 ed., pp. 26-35). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7630 LNAI, No. PART 2). https://doi.org/10.1007/978-3-642-37798-3_3

Pakray, Partha ; Neogi, Snehasis ; Bandyopadhyay, Sivaji et al. / Recognizing textual entailment in non-English text via automatic translation into English. Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers. PART 2. ed. 2013. pp. 26-35 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2).

@inproceedings{4a7db6ed9ca749e9a0b0a7be5e8823e4,

title = "Recognizing textual entailment in non-English text via automatic translation into English",

abstract = "We show that a task that typically involves rather deep semantic processing of text-being recognizing textual entailment our case study-can be successfully solved without any tools at all specific for the language of the texts on which the task is performed. Instead, we automatically translate the text into English using a standard machine translation system, and then perform all linguistic processing, including syntactic and semantic levels, using only English language linguistic tools. In this case study we use Italian annotated data. Textual entailment is a relation between two texts. To detect it, we use various measures, which allow us to make entailment decision in the two-way classification task (yes / no). We set up various heuristics and measures for evaluating the entailment between two texts based on lexical relations. To make entailment judgments, the system applies named entity recognition module, chunking, part-of-speech tagging, n-grams, and text similarity modules to both texts, all those modules being for English and not for Italian. Rules have been developed to perform the two-way entailment classification. Our system makes entailment judgments basing on the entailment scores for the text pairs. The system was evaluated on Italian textual entailment data sets: we trained our system on Italian development datasets using the WEKA machine learning toolset and tested it on Italian test data sets. The accuracy of our system on the development corpus is 0.525 and on the test corpus is 0.66, which is a good result given that no Italian-specific linguistic information was used.",

keywords = "Recognizing textual entailment, cross-lingual textual entailment, machine translation, n-grams, text similarity",

author = "Partha Pakray and Snehasis Neogi and Sivaji Bandyopadhyay and Alexander Gelbukh",

year = "2013",

doi = "10.1007/978-3-642-37798-3_3",

language = "Ingl{\'e}s",

isbn = "9783642377976",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

number = "PART 2",

pages = "26--35",

booktitle = "Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers",

edition = "PART 2",

note = "11th Mexican International Conference on Artificial Intelligence, MICAI 2012 ; Conference date: 27-10-2012 Through 04-11-2012",

}

Pakray, P, Neogi, S, Bandyopadhyay, S & Gelbukh, A 2013, Recognizing textual entailment in non-English text via automatic translation into English. in Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers. PART 2 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 2, vol. 7630 LNAI, pp. 26-35, 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, San Luis Potosi, Mexico, 27/10/12. https://doi.org/10.1007/978-3-642-37798-3_3

Recognizing textual entailment in non-English text via automatic translation into English. / Pakray, Partha; Neogi, Snehasis; Bandyopadhyay, Sivaji et al.
Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers. PART 2. ed. 2013. p. 26-35 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7630 LNAI, No. PART 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Recognizing textual entailment in non-English text via automatic translation into English

AU - Pakray, Partha

AU - Neogi, Snehasis

AU - Bandyopadhyay, Sivaji

AU - Gelbukh, Alexander

PY - 2013

Y1 - 2013

N2 - We show that a task that typically involves rather deep semantic processing of text-being recognizing textual entailment our case study-can be successfully solved without any tools at all specific for the language of the texts on which the task is performed. Instead, we automatically translate the text into English using a standard machine translation system, and then perform all linguistic processing, including syntactic and semantic levels, using only English language linguistic tools. In this case study we use Italian annotated data. Textual entailment is a relation between two texts. To detect it, we use various measures, which allow us to make entailment decision in the two-way classification task (yes / no). We set up various heuristics and measures for evaluating the entailment between two texts based on lexical relations. To make entailment judgments, the system applies named entity recognition module, chunking, part-of-speech tagging, n-grams, and text similarity modules to both texts, all those modules being for English and not for Italian. Rules have been developed to perform the two-way entailment classification. Our system makes entailment judgments basing on the entailment scores for the text pairs. The system was evaluated on Italian textual entailment data sets: we trained our system on Italian development datasets using the WEKA machine learning toolset and tested it on Italian test data sets. The accuracy of our system on the development corpus is 0.525 and on the test corpus is 0.66, which is a good result given that no Italian-specific linguistic information was used.

AB - We show that a task that typically involves rather deep semantic processing of text-being recognizing textual entailment our case study-can be successfully solved without any tools at all specific for the language of the texts on which the task is performed. Instead, we automatically translate the text into English using a standard machine translation system, and then perform all linguistic processing, including syntactic and semantic levels, using only English language linguistic tools. In this case study we use Italian annotated data. Textual entailment is a relation between two texts. To detect it, we use various measures, which allow us to make entailment decision in the two-way classification task (yes / no). We set up various heuristics and measures for evaluating the entailment between two texts based on lexical relations. To make entailment judgments, the system applies named entity recognition module, chunking, part-of-speech tagging, n-grams, and text similarity modules to both texts, all those modules being for English and not for Italian. Rules have been developed to perform the two-way entailment classification. Our system makes entailment judgments basing on the entailment scores for the text pairs. The system was evaluated on Italian textual entailment data sets: we trained our system on Italian development datasets using the WEKA machine learning toolset and tested it on Italian test data sets. The accuracy of our system on the development corpus is 0.525 and on the test corpus is 0.66, which is a good result given that no Italian-specific linguistic information was used.

KW - Recognizing textual entailment

KW - cross-lingual textual entailment

KW - machine translation

KW - n-grams

KW - text similarity

UR - http://www.scopus.com/inward/record.url?scp=84875871109&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-37798-3_3

DO - 10.1007/978-3-642-37798-3_3

M3 - Contribución a la conferencia

SN - 9783642377976

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 26

EP - 35

BT - Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers

T2 - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012

Y2 - 27 October 2012 through 4 November 2012

ER -

Pakray P, Neogi S, Bandyopadhyay S, Gelbukh A. Recognizing textual entailment in non-English text via automatic translation into English. In Advances in Artificial Intelligence - 11th Mexican International Conference on Artificial Intelligence, MICAI 2012, Revised Selected Papers. PART 2 ed. 2013. p. 26-35. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2). doi: 10.1007/978-3-642-37798-3_3

Recognizing textual entailment in non-English text via automatic translation into English

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this