TY - JOUR
T1 - A formula embedding approach to math information retrieval
AU - Pathak, Amarnath
AU - Pakray, Partha
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2018 Lithuanian Institute of Philosophy and Sociology. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Intricate math formulae, which majorly constitute the content of scientific documents, add to the complexity of scientific document retrieval. Although modifications in conventional indexing and search mechanisms have eased the complexity and exhibited notable performance, the formula embedding approach to scientific document retrieval sounds equally appealing and promising. Formula Embedding Module of the proposed system uses a Bit Position Information Table to transform math formulae, contained inside scientific documents, into binary formulae vectors. Each set bit of a formula vector designates presence of a specific mathematical entity. Mathematical user query is transformed into query vector, in similar fashion, and the corresponding relevant documents are retrieved. Relevance of a search result is characterized by extent of similarity between the indexed formula vector and the query vector. Promising performance, under moderately constrained situation, substantiates competence of the proposed approach.
AB - Intricate math formulae, which majorly constitute the content of scientific documents, add to the complexity of scientific document retrieval. Although modifications in conventional indexing and search mechanisms have eased the complexity and exhibited notable performance, the formula embedding approach to scientific document retrieval sounds equally appealing and promising. Formula Embedding Module of the proposed system uses a Bit Position Information Table to transform math formulae, contained inside scientific documents, into binary formulae vectors. Each set bit of a formula vector designates presence of a specific mathematical entity. Mathematical user query is transformed into query vector, in similar fashion, and the corresponding relevant documents are retrieved. Relevance of a search result is characterized by extent of similarity between the indexed formula vector and the query vector. Promising performance, under moderately constrained situation, substantiates competence of the proposed approach.
KW - Formula embedding
KW - Math formula search
KW - Math information retrieval
KW - Precision
KW - Scientific document retrieval
UR - http://www.scopus.com/inward/record.url?scp=85055505893&partnerID=8YFLogxK
U2 - 10.13053/CyS-22-3-3015
DO - 10.13053/CyS-22-3-3015
M3 - Artículo
SN - 1405-5546
VL - 22
SP - 819
EP - 833
JO - Computacion y Sistemas
JF - Computacion y Sistemas
IS - 3
ER -