Compression of Boolean inverted files by document ordering

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Boolean queries are used to search a document collection for the documents that contain specific terms, independently of the frequency of a term in the document. To perform such queries, a search engine maintains an inverted file, which lists for each keyword the documents containing it. The size of such a file is comparable with that of the document collection, which is a considerable storage overhead. We show how the inverted file can be compressed by ordering the documents in the collection in a specific way. Finding the near-optimal order can be recast as a Hamming-distance traveling salesman problem.

Original languageEnglish
Title of host publicationNLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings
EditorsChengqing Zong
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages244-249
Number of pages6
ISBN (Electronic)0780379020, 9780780379022
DOIs
StatePublished - 2003
EventInternational Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003 - Beijing, China
Duration: 26 Oct 200329 Oct 2003

Publication series

NameNLP-KE 2003 - 2003 International Conference on Natural Language Processing and Knowledge Engineering, Proceedings

Conference

ConferenceInternational Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2003
Country/TerritoryChina
CityBeijing
Period26/10/0329/10/03

Keywords

  • Boolean search
  • Information retrieval
  • Inverted file size
  • Traveling salesman problem

Fingerprint

Dive into the research topics of 'Compression of Boolean inverted files by document ordering'. Together they form a unique fingerprint.

Cite this