Extracting context of math formulae contained inside scientific documents

Amarnath Pathak; Ranjita Das; Partha Pakray; Alexander Gelbukh

doi:10.13053/CyS-23-3-3246

Extracting context of math formulae contained inside scientific documents

Amarnath Pathak, Ranjita Das, Partha Pakray, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

1 Cita (Scopus)

Resumen

A math formula present inside a scientific document is often preceded by its textual description, which is commonly referred to as the context of formula. Annotating context to the formula enriches its semantics, and consequently impacts the retrieval of mathematical contents from scientific documents. Also, with a considerable surety, a context can be assumed to be one of the Noun Phrases (NPs) of the sentence in which formula occurs. However, the presence of several different misleading NPs in the sentence necessitates extraction of an NP, which is more precise to the formula than the rest. Although a fair number of methods are developed for precise context extraction, it can be fascinating to prospect other competent techniques which can further their performances. To this end, this paper discusses implementation of an automated context extraction system, which follows certain heuristics in assigning weights to different candidate NPs, and tune those weights using a development set comprising annotated formulae. The implemented system significantly outperforms nearest noun and sentence–pattern based methods on the ground of F–score.

Idioma original	Inglés
Páginas (desde-hasta)	803-818
Número de páginas	16
Publicación	Computacion y Sistemas
Volumen	23
N.º	3
DOI	https://doi.org/10.13053/CyS-23-3-3246
Estado	Publicada - 2019

Acceder al documento

10.13053/CyS-23-3-3246

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{823238f06dba4501802ea9828f23364e,

title = "Extracting context of math formulae contained inside scientific documents",

abstract = "A math formula present inside a scientific document is often preceded by its textual description, which is commonly referred to as the context of formula. Annotating context to the formula enriches its semantics, and consequently impacts the retrieval of mathematical contents from scientific documents. Also, with a considerable surety, a context can be assumed to be one of the Noun Phrases (NPs) of the sentence in which formula occurs. However, the presence of several different misleading NPs in the sentence necessitates extraction of an NP, which is more precise to the formula than the rest. Although a fair number of methods are developed for precise context extraction, it can be fascinating to prospect other competent techniques which can further their performances. To this end, this paper discusses implementation of an automated context extraction system, which follows certain heuristics in assigning weights to different candidate NPs, and tune those weights using a development set comprising annotated formulae. The implemented system significantly outperforms nearest noun and sentence–pattern based methods on the ground of F–score.",

keywords = "Context extraction, Math information retrieval, NTCIR, Noun phrase, Parser",

author = "Amarnath Pathak and Ranjita Das and Partha Pakray and Alexander Gelbukh",

year = "2019",

doi = "10.13053/CyS-23-3-3246",

language = "Ingl{\'e}s",

volume = "23",

pages = "803--818",

journal = "Computacion y Sistemas",

issn = "1405-5546",

number = "3",

}

TY - JOUR

T1 - Extracting context of math formulae contained inside scientific documents

AU - Pathak, Amarnath

AU - Das, Ranjita

AU - Pakray, Partha

AU - Gelbukh, Alexander

PY - 2019

Y1 - 2019

N2 - A math formula present inside a scientific document is often preceded by its textual description, which is commonly referred to as the context of formula. Annotating context to the formula enriches its semantics, and consequently impacts the retrieval of mathematical contents from scientific documents. Also, with a considerable surety, a context can be assumed to be one of the Noun Phrases (NPs) of the sentence in which formula occurs. However, the presence of several different misleading NPs in the sentence necessitates extraction of an NP, which is more precise to the formula than the rest. Although a fair number of methods are developed for precise context extraction, it can be fascinating to prospect other competent techniques which can further their performances. To this end, this paper discusses implementation of an automated context extraction system, which follows certain heuristics in assigning weights to different candidate NPs, and tune those weights using a development set comprising annotated formulae. The implemented system significantly outperforms nearest noun and sentence–pattern based methods on the ground of F–score.

AB - A math formula present inside a scientific document is often preceded by its textual description, which is commonly referred to as the context of formula. Annotating context to the formula enriches its semantics, and consequently impacts the retrieval of mathematical contents from scientific documents. Also, with a considerable surety, a context can be assumed to be one of the Noun Phrases (NPs) of the sentence in which formula occurs. However, the presence of several different misleading NPs in the sentence necessitates extraction of an NP, which is more precise to the formula than the rest. Although a fair number of methods are developed for precise context extraction, it can be fascinating to prospect other competent techniques which can further their performances. To this end, this paper discusses implementation of an automated context extraction system, which follows certain heuristics in assigning weights to different candidate NPs, and tune those weights using a development set comprising annotated formulae. The implemented system significantly outperforms nearest noun and sentence–pattern based methods on the ground of F–score.

KW - Context extraction

KW - Math information retrieval

KW - NTCIR

KW - Noun phrase

KW - Parser

UR - http://www.scopus.com/inward/record.url?scp=85076688676&partnerID=8YFLogxK

U2 - 10.13053/CyS-23-3-3246

DO - 10.13053/CyS-23-3-3246

M3 - Artículo

AN - SCOPUS:85076688676

SN - 1405-5546

VL - 23

SP - 803

EP - 818

JO - Computacion y Sistemas

JF - Computacion y Sistemas

IS - 3

ER -

Extracting context of math formulae contained inside scientific documents

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto