TY - JOUR
T1 - Measuring semantic gap between user-generated content and product descriptions through compression comparison in e-commerce
AU - Rodriguez-Diaz, Carlos A.
AU - Jimenez, Sergio
AU - Bejarano, Daniel
AU - Bernal-Chávez, Julio A.
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2023 Elsevier Inc.
PY - 2023/8
Y1 - 2023/8
N2 - The significance of user-generated content as a source for business intelligence and analytics has been on the rise since the inception of electronic commerce platforms and has been solidified in the wake of the pandemic due to the prominence of electronic commerce as a sales channel. The prevailing approach to harnessing unstructured data involves the utilization of Artificial Intelligence; however, there exist simpler alternatives capable of yielding valuable information. This article introduces a methodology grounded in information theory to quantify the semantic disparity between the consumer community and product descriptions. This disparity can result in potential misunderstandings in the dialogue among consumers, and incidental costs in the dialogue between consumers and vendors. One plausible explanation for this disparity is that the terminology employed by consumers may possess different meanings compared to that utilized by product description writers. Our methodology employs large corpora of consumer reviews and product descriptions to quantify this semantic disparity across multiple electronic commerce domains through the implementation of random word exchanges and compression. Furthermore, we utilize neural word embeddings to identify specific words exhibiting the greatest semantic drift between reviews and descriptions, thereby providing lexical examples of these gaps. Our findings indicate that lower levels of lexical-semantic gap are associated with better consumer satisfaction.
AB - The significance of user-generated content as a source for business intelligence and analytics has been on the rise since the inception of electronic commerce platforms and has been solidified in the wake of the pandemic due to the prominence of electronic commerce as a sales channel. The prevailing approach to harnessing unstructured data involves the utilization of Artificial Intelligence; however, there exist simpler alternatives capable of yielding valuable information. This article introduces a methodology grounded in information theory to quantify the semantic disparity between the consumer community and product descriptions. This disparity can result in potential misunderstandings in the dialogue among consumers, and incidental costs in the dialogue between consumers and vendors. One plausible explanation for this disparity is that the terminology employed by consumers may possess different meanings compared to that utilized by product description writers. Our methodology employs large corpora of consumer reviews and product descriptions to quantify this semantic disparity across multiple electronic commerce domains through the implementation of random word exchanges and compression. Furthermore, we utilize neural word embeddings to identify specific words exhibiting the greatest semantic drift between reviews and descriptions, thereby providing lexical examples of these gaps. Our findings indicate that lower levels of lexical-semantic gap are associated with better consumer satisfaction.
KW - Customer-vendor communication
KW - Lexical-semantic gap
KW - Product description ambiguity
KW - Product reviews ambiguity
KW - Semantic drift
KW - Social commerce language
UR - http://www.scopus.com/inward/record.url?scp=85153327781&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2023.118953
DO - 10.1016/j.ins.2023.118953
M3 - Artículo
AN - SCOPUS:85153327781
SN - 0020-0255
VL - 638
JO - Information Sciences
JF - Information Sciences
M1 - 118953
ER -