MathIRs: Retrieval system for scientific documents

Amarnath Pathak; Partha Pakray; Sandip Sarkar; Dipankar Das; Alexander Gelbukh

doi:10.13053/CyS-21-2-2743

MathIRs: Retrieval system for scientific documents

Amarnath Pathak, Partha Pakray, Sandip Sarkar, Dipankar Das, Alexander Gelbukh

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

Effective retrieval of mathematical contents from vast corpus of scientific documents demands enhancement in the conventional indexing and searching mechanisms. Indexing mechanism and the choice of semantic similarity measures guide the results of Math Information Retrieval system (MathIRs) to perfection. Tokenization and formula unification are among the distinguishing features of indexing mechanism, used in MathIRs, which facilitate sub-formula and similarity search. Besides, the scientific documents and the user queries in MathIRs will contain math as well as text contents and to match these contents we require three important modules: Text-Text Similarity (TS), Math-Math Similarity (MS) and Text-Math Similarity (TMS). In this paper we have proposed MathIRs comprising these important modules and a substitution tree based mechanism for indexing mathematical expressions. We have also presented experimental results for similarity search and argued that proposal of MathIRs will ease the task of scientific document retrieval.

Original language	English
Pages (from-to)	253-265
Number of pages	13
Journal	Computacion y Sistemas
Volume	21
Issue number	2
DOIs	https://doi.org/10.13053/CyS-21-2-2743
State	Published - 2017

Keywords

Indexing
Information retrieval
MathIRs
Natural language processing

Access to Document

10.13053/CyS-21-2-2743

Cite this

@article{a062ec58aa814a7d801f5663e379018d,

title = "MathIRs: Retrieval system for scientific documents",

abstract = "Effective retrieval of mathematical contents from vast corpus of scientific documents demands enhancement in the conventional indexing and searching mechanisms. Indexing mechanism and the choice of semantic similarity measures guide the results of Math Information Retrieval system (MathIRs) to perfection. Tokenization and formula unification are among the distinguishing features of indexing mechanism, used in MathIRs, which facilitate sub-formula and similarity search. Besides, the scientific documents and the user queries in MathIRs will contain math as well as text contents and to match these contents we require three important modules: Text-Text Similarity (TS), Math-Math Similarity (MS) and Text-Math Similarity (TMS). In this paper we have proposed MathIRs comprising these important modules and a substitution tree based mechanism for indexing mathematical expressions. We have also presented experimental results for similarity search and argued that proposal of MathIRs will ease the task of scientific document retrieval.",

keywords = "Indexing, Information retrieval, MathIRs, Natural language processing",

author = "Amarnath Pathak and Partha Pakray and Sandip Sarkar and Dipankar Das and Alexander Gelbukh",

note = "Funding Information: The work presented here falls under the Research Project Grant No. YSS/2015/000988 and supported by the Department of Science & Technology (DST) and Science and Engineering Research Board (SERB), Govt. of India. The authors would like to acknowledge the Department of Computer Science & Engineering, National Institute of Technology Mizoram, India for providing infrastructural facilities and support. The fifth author acknowledges the suuport of Mexican Government though Instituto Polit{\'e}cnico Nacional SIP grant 20172008.",

year = "2017",

doi = "10.13053/CyS-21-2-2743",

language = "Ingl{\'e}s",

volume = "21",

pages = "253--265",

journal = "Computacion y Sistemas",

issn = "1405-5546",

number = "2",

}

TY - JOUR

T1 - MathIRs

T2 - Retrieval system for scientific documents

AU - Pathak, Amarnath

AU - Pakray, Partha

AU - Sarkar, Sandip

AU - Das, Dipankar

AU - Gelbukh, Alexander

N1 - Funding Information: The work presented here falls under the Research Project Grant No. YSS/2015/000988 and supported by the Department of Science & Technology (DST) and Science and Engineering Research Board (SERB), Govt. of India. The authors would like to acknowledge the Department of Computer Science & Engineering, National Institute of Technology Mizoram, India for providing infrastructural facilities and support. The fifth author acknowledges the suuport of Mexican Government though Instituto Politécnico Nacional SIP grant 20172008.

PY - 2017

Y1 - 2017

N2 - Effective retrieval of mathematical contents from vast corpus of scientific documents demands enhancement in the conventional indexing and searching mechanisms. Indexing mechanism and the choice of semantic similarity measures guide the results of Math Information Retrieval system (MathIRs) to perfection. Tokenization and formula unification are among the distinguishing features of indexing mechanism, used in MathIRs, which facilitate sub-formula and similarity search. Besides, the scientific documents and the user queries in MathIRs will contain math as well as text contents and to match these contents we require three important modules: Text-Text Similarity (TS), Math-Math Similarity (MS) and Text-Math Similarity (TMS). In this paper we have proposed MathIRs comprising these important modules and a substitution tree based mechanism for indexing mathematical expressions. We have also presented experimental results for similarity search and argued that proposal of MathIRs will ease the task of scientific document retrieval.

AB - Effective retrieval of mathematical contents from vast corpus of scientific documents demands enhancement in the conventional indexing and searching mechanisms. Indexing mechanism and the choice of semantic similarity measures guide the results of Math Information Retrieval system (MathIRs) to perfection. Tokenization and formula unification are among the distinguishing features of indexing mechanism, used in MathIRs, which facilitate sub-formula and similarity search. Besides, the scientific documents and the user queries in MathIRs will contain math as well as text contents and to match these contents we require three important modules: Text-Text Similarity (TS), Math-Math Similarity (MS) and Text-Math Similarity (TMS). In this paper we have proposed MathIRs comprising these important modules and a substitution tree based mechanism for indexing mathematical expressions. We have also presented experimental results for similarity search and argued that proposal of MathIRs will ease the task of scientific document retrieval.

KW - Indexing

KW - Information retrieval

KW - MathIRs

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85021786682&partnerID=8YFLogxK

U2 - 10.13053/CyS-21-2-2743

DO - 10.13053/CyS-21-2-2743

M3 - Artículo

SN - 1405-5546

VL - 21

SP - 253

EP - 265

JO - Computacion y Sistemas

JF - Computacion y Sistemas

IS - 2

ER -

MathIRs: Retrieval system for scientific documents

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this