Measuring semantic gap between user-generated content and product descriptions through compression comparison in e-commerce

Carlos A. Rodriguez-Diaz, Sergio Jimenez, Daniel Bejarano, Julio A. Bernal-Chávez, Alexander Gelbukh

Research output: Contribution to journalArticlepeer-review

Abstract

The significance of user-generated content as a source for business intelligence and analytics has been on the rise since the inception of electronic commerce platforms and has been solidified in the wake of the pandemic due to the prominence of electronic commerce as a sales channel. The prevailing approach to harnessing unstructured data involves the utilization of Artificial Intelligence; however, there exist simpler alternatives capable of yielding valuable information. This article introduces a methodology grounded in information theory to quantify the semantic disparity between the consumer community and product descriptions. This disparity can result in potential misunderstandings in the dialogue among consumers, and incidental costs in the dialogue between consumers and vendors. One plausible explanation for this disparity is that the terminology employed by consumers may possess different meanings compared to that utilized by product description writers. Our methodology employs large corpora of consumer reviews and product descriptions to quantify this semantic disparity across multiple electronic commerce domains through the implementation of random word exchanges and compression. Furthermore, we utilize neural word embeddings to identify specific words exhibiting the greatest semantic drift between reviews and descriptions, thereby providing lexical examples of these gaps. Our findings indicate that lower levels of lexical-semantic gap are associated with better consumer satisfaction.

Original languageEnglish
Article number118953
JournalInformation Sciences
Volume638
DOIs
StatePublished - Aug 2023

Keywords

  • Customer-vendor communication
  • Lexical-semantic gap
  • Product description ambiguity
  • Product reviews ambiguity
  • Semantic drift
  • Social commerce language

Fingerprint

Dive into the research topics of 'Measuring semantic gap between user-generated content and product descriptions through compression comparison in e-commerce'. Together they form a unique fingerprint.

Cite this