TY - JOUR
T1 - MathIRs
T2 - Retrieval system for scientific documents
AU - Pathak, Amarnath
AU - Pakray, Partha
AU - Sarkar, Sandip
AU - Das, Dipankar
AU - Gelbukh, Alexander
N1 - Funding Information:
The work presented here falls under the Research Project Grant No. YSS/2015/000988 and supported by the Department of Science & Technology (DST) and Science and Engineering Research Board (SERB), Govt. of India. The authors would like to acknowledge the Department of Computer Science & Engineering, National Institute of Technology Mizoram, India for providing infrastructural facilities and support. The fifth author acknowledges the suuport of Mexican Government though Instituto Politécnico Nacional SIP grant 20172008.
PY - 2017
Y1 - 2017
N2 - Effective retrieval of mathematical contents from vast corpus of scientific documents demands enhancement in the conventional indexing and searching mechanisms. Indexing mechanism and the choice of semantic similarity measures guide the results of Math Information Retrieval system (MathIRs) to perfection. Tokenization and formula unification are among the distinguishing features of indexing mechanism, used in MathIRs, which facilitate sub-formula and similarity search. Besides, the scientific documents and the user queries in MathIRs will contain math as well as text contents and to match these contents we require three important modules: Text-Text Similarity (TS), Math-Math Similarity (MS) and Text-Math Similarity (TMS). In this paper we have proposed MathIRs comprising these important modules and a substitution tree based mechanism for indexing mathematical expressions. We have also presented experimental results for similarity search and argued that proposal of MathIRs will ease the task of scientific document retrieval.
AB - Effective retrieval of mathematical contents from vast corpus of scientific documents demands enhancement in the conventional indexing and searching mechanisms. Indexing mechanism and the choice of semantic similarity measures guide the results of Math Information Retrieval system (MathIRs) to perfection. Tokenization and formula unification are among the distinguishing features of indexing mechanism, used in MathIRs, which facilitate sub-formula and similarity search. Besides, the scientific documents and the user queries in MathIRs will contain math as well as text contents and to match these contents we require three important modules: Text-Text Similarity (TS), Math-Math Similarity (MS) and Text-Math Similarity (TMS). In this paper we have proposed MathIRs comprising these important modules and a substitution tree based mechanism for indexing mathematical expressions. We have also presented experimental results for similarity search and argued that proposal of MathIRs will ease the task of scientific document retrieval.
KW - Indexing
KW - Information retrieval
KW - MathIRs
KW - Natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85021786682&partnerID=8YFLogxK
U2 - 10.13053/CyS-21-2-2743
DO - 10.13053/CyS-21-2-2743
M3 - Artículo
SN - 1405-5546
VL - 21
SP - 253
EP - 265
JO - Computacion y Sistemas
JF - Computacion y Sistemas
IS - 2
ER -