TY - JOUR
T1 - Extracting context of math formulae contained inside scientific documents
AU - Pathak, Amarnath
AU - Das, Ranjita
AU - Pakray, Partha
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2019 Instituto Politecnico Nacional. All rights reserved.
PY - 2019
Y1 - 2019
N2 - A math formula present inside a scientific document is often preceded by its textual description, which is commonly referred to as the context of formula. Annotating context to the formula enriches its semantics, and consequently impacts the retrieval of mathematical contents from scientific documents. Also, with a considerable surety, a context can be assumed to be one of the Noun Phrases (NPs) of the sentence in which formula occurs. However, the presence of several different misleading NPs in the sentence necessitates extraction of an NP, which is more precise to the formula than the rest. Although a fair number of methods are developed for precise context extraction, it can be fascinating to prospect other competent techniques which can further their performances. To this end, this paper discusses implementation of an automated context extraction system, which follows certain heuristics in assigning weights to different candidate NPs, and tune those weights using a development set comprising annotated formulae. The implemented system significantly outperforms nearest noun and sentence–pattern based methods on the ground of F–score.
AB - A math formula present inside a scientific document is often preceded by its textual description, which is commonly referred to as the context of formula. Annotating context to the formula enriches its semantics, and consequently impacts the retrieval of mathematical contents from scientific documents. Also, with a considerable surety, a context can be assumed to be one of the Noun Phrases (NPs) of the sentence in which formula occurs. However, the presence of several different misleading NPs in the sentence necessitates extraction of an NP, which is more precise to the formula than the rest. Although a fair number of methods are developed for precise context extraction, it can be fascinating to prospect other competent techniques which can further their performances. To this end, this paper discusses implementation of an automated context extraction system, which follows certain heuristics in assigning weights to different candidate NPs, and tune those weights using a development set comprising annotated formulae. The implemented system significantly outperforms nearest noun and sentence–pattern based methods on the ground of F–score.
KW - Context extraction
KW - Math information retrieval
KW - NTCIR
KW - Noun phrase
KW - Parser
UR - http://www.scopus.com/inward/record.url?scp=85076688676&partnerID=8YFLogxK
U2 - 10.13053/CyS-23-3-3246
DO - 10.13053/CyS-23-3-3246
M3 - Artículo
AN - SCOPUS:85076688676
SN - 1405-5546
VL - 23
SP - 803
EP - 818
JO - Computacion y Sistemas
JF - Computacion y Sistemas
IS - 3
ER -