A formula embedding approach to math information retrieval

Amarnath Pathak, Partha Pakray, Alexander Gelbukh

Research output: Contribution to journalArticle

Abstract

© 2018 Lithuanian Institute of Philosophy and Sociology. All rights reserved. Intricate math formulae, which majorly constitute the content of scientific documents, add to the complexity of scientific document retrieval. Although modifications in conventional indexing and search mechanisms have eased the complexity and exhibited notable performance, the formula embedding approach to scientific document retrieval sounds equally appealing and promising. Formula Embedding Module of the proposed system uses a Bit Position Information Table to transform math formulae, contained inside scientific documents, into binary formulae vectors. Each set bit of a formula vector designates presence of a specific mathematical entity. Mathematical user query is transformed into query vector, in similar fashion, and the corresponding relevant documents are retrieved. Relevance of a search result is characterized by extent of similarity between the indexed formula vector and the query vector. Promising performance, under moderately constrained situation, substantiates competence of the proposed approach.
Original languageAmerican English
Pages (from-to)819-833
Number of pages735
JournalComputacion y Sistemas
DOIs
StatePublished - 1 Jan 2018

Fingerprint

Information retrieval
Acoustic waves

Cite this

@article{f444c2a3c1574f71a3ce2dbcd39f6e9e,
title = "A formula embedding approach to math information retrieval",
abstract = "{\circledC} 2018 Lithuanian Institute of Philosophy and Sociology. All rights reserved. Intricate math formulae, which majorly constitute the content of scientific documents, add to the complexity of scientific document retrieval. Although modifications in conventional indexing and search mechanisms have eased the complexity and exhibited notable performance, the formula embedding approach to scientific document retrieval sounds equally appealing and promising. Formula Embedding Module of the proposed system uses a Bit Position Information Table to transform math formulae, contained inside scientific documents, into binary formulae vectors. Each set bit of a formula vector designates presence of a specific mathematical entity. Mathematical user query is transformed into query vector, in similar fashion, and the corresponding relevant documents are retrieved. Relevance of a search result is characterized by extent of similarity between the indexed formula vector and the query vector. Promising performance, under moderately constrained situation, substantiates competence of the proposed approach.",
author = "Amarnath Pathak and Partha Pakray and Alexander Gelbukh",
year = "2018",
month = "1",
day = "1",
doi = "10.13053/CyS-22-3-3015",
language = "American English",
pages = "819--833",
journal = "Computacion y Sistemas",
issn = "1405-5546",
publisher = "Centro de Investigacion en Computacion (CIC) del Instituto Politecnico Nacional (IPN)",

}

A formula embedding approach to math information retrieval. / Pathak, Amarnath; Pakray, Partha; Gelbukh, Alexander.

In: Computacion y Sistemas, 01.01.2018, p. 819-833.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A formula embedding approach to math information retrieval

AU - Pathak, Amarnath

AU - Pakray, Partha

AU - Gelbukh, Alexander

PY - 2018/1/1

Y1 - 2018/1/1

N2 - © 2018 Lithuanian Institute of Philosophy and Sociology. All rights reserved. Intricate math formulae, which majorly constitute the content of scientific documents, add to the complexity of scientific document retrieval. Although modifications in conventional indexing and search mechanisms have eased the complexity and exhibited notable performance, the formula embedding approach to scientific document retrieval sounds equally appealing and promising. Formula Embedding Module of the proposed system uses a Bit Position Information Table to transform math formulae, contained inside scientific documents, into binary formulae vectors. Each set bit of a formula vector designates presence of a specific mathematical entity. Mathematical user query is transformed into query vector, in similar fashion, and the corresponding relevant documents are retrieved. Relevance of a search result is characterized by extent of similarity between the indexed formula vector and the query vector. Promising performance, under moderately constrained situation, substantiates competence of the proposed approach.

AB - © 2018 Lithuanian Institute of Philosophy and Sociology. All rights reserved. Intricate math formulae, which majorly constitute the content of scientific documents, add to the complexity of scientific document retrieval. Although modifications in conventional indexing and search mechanisms have eased the complexity and exhibited notable performance, the formula embedding approach to scientific document retrieval sounds equally appealing and promising. Formula Embedding Module of the proposed system uses a Bit Position Information Table to transform math formulae, contained inside scientific documents, into binary formulae vectors. Each set bit of a formula vector designates presence of a specific mathematical entity. Mathematical user query is transformed into query vector, in similar fashion, and the corresponding relevant documents are retrieved. Relevance of a search result is characterized by extent of similarity between the indexed formula vector and the query vector. Promising performance, under moderately constrained situation, substantiates competence of the proposed approach.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85055505893&origin=inward

UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85055505893&origin=inward

U2 - 10.13053/CyS-22-3-3015

DO - 10.13053/CyS-22-3-3015

M3 - Article

SP - 819

EP - 833

JO - Computacion y Sistemas

JF - Computacion y Sistemas

SN - 1405-5546

ER -