Cross-domain deception detection using support vector networks

Ángel Hernández-Castañeda; Hiram Calvo; Alexander Gelbukh; Jorge J.García Flores

doi:10.1007/s00500-016-2409-2

Cross-domain deception detection using support vector networks

Ángel Hernández-Castañeda, Hiram Calvo, Alexander Gelbukh, Jorge J.García Flores

Centro de Investigación en Computación (CIC)

Research output: Contribution to journal › Article › peer-review

35 Scopus citations

Abstract

Our motivation is to assess the effectiveness of support vector networks (SVN) on the task of detecting deception in texts, as well as to investigate to which degree it is possible to build a domain-independent detector of deception in text using SVN. We experimented with different feature sets for training the SVN: a continuous semantic space model source represented by the latent Dirichlet allocation topics, a word-space model, and dictionary-based features. In this way, a comparison of performance between semantic information and behavioral information is made. We tested several combinations of these features on different datasets designed to identify deception. The datasets used include the DeRev dataset (a corpus of deceptive and truthful opinions about books obtained from Amazon), OpSpam (a corpus of fake and truthful opinions about hotels), and three corpora on controversial topics (abortion, death penalty, and a best friend) on which the subjects were asked to write an idea contrary to what they really believed. We experimented with one-domain setting by training and testing our models separately on each dataset (with fivefold cross-validation), with mixed-domain setting by merging all datasets into one large corpus (again, with fivefold cross-validation), and with cross-domain setting: using one dataset for testing and a concatenation of all other datasets for training. We obtained an average accuracy of 86% in one-domain setting, 75% in mixed-domain setting, and 52 to 64% in cross-domain setting.

Original language	English
Pages (from-to)	585-595
Number of pages	11
Journal	Soft Computing
Volume	21
Issue number	3
DOIs	https://doi.org/10.1007/s00500-016-2409-2
State	Published - 1 Feb 2017

Keywords

Continuous semantic space model
Deception detection
Linguistic inquiry and word count
Support vector networks
Word-space model

Access to Document

10.1007/s00500-016-2409-2

Cite this

@article{403c4fc2fb6e44339ea24195b220b2db,

title = "Cross-domain deception detection using support vector networks",

abstract = "Our motivation is to assess the effectiveness of support vector networks (SVN) on the task of detecting deception in texts, as well as to investigate to which degree it is possible to build a domain-independent detector of deception in text using SVN. We experimented with different feature sets for training the SVN: a continuous semantic space model source represented by the latent Dirichlet allocation topics, a word-space model, and dictionary-based features. In this way, a comparison of performance between semantic information and behavioral information is made. We tested several combinations of these features on different datasets designed to identify deception. The datasets used include the DeRev dataset (a corpus of deceptive and truthful opinions about books obtained from Amazon), OpSpam (a corpus of fake and truthful opinions about hotels), and three corpora on controversial topics (abortion, death penalty, and a best friend) on which the subjects were asked to write an idea contrary to what they really believed. We experimented with one-domain setting by training and testing our models separately on each dataset (with fivefold cross-validation), with mixed-domain setting by merging all datasets into one large corpus (again, with fivefold cross-validation), and with cross-domain setting: using one dataset for testing and a concatenation of all other datasets for training. We obtained an average accuracy of 86% in one-domain setting, 75% in mixed-domain setting, and 52 to 64% in cross-domain setting.",

keywords = "Continuous semantic space model, Deception detection, Linguistic inquiry and word count, Support vector networks, Word-space model",

author = "{\'A}ngel Hern{\'a}ndez-Casta{\~n}eda and Hiram Calvo and Alexander Gelbukh and Flores, {Jorge J.Garc{\'i}a}",

note = "Publisher Copyright: {\textcopyright} 2016, Springer-Verlag Berlin Heidelberg.",

year = "2017",

month = feb,

day = "1",

doi = "10.1007/s00500-016-2409-2",

language = "Ingl{\'e}s",

volume = "21",

pages = "585--595",

journal = "Soft Computing",

issn = "1432-7643",

number = "3",

}

TY - JOUR

T1 - Cross-domain deception detection using support vector networks

AU - Hernández-Castañeda, Ángel

AU - Calvo, Hiram

AU - Gelbukh, Alexander

AU - Flores, Jorge J.García

PY - 2017/2/1

Y1 - 2017/2/1

N2 - Our motivation is to assess the effectiveness of support vector networks (SVN) on the task of detecting deception in texts, as well as to investigate to which degree it is possible to build a domain-independent detector of deception in text using SVN. We experimented with different feature sets for training the SVN: a continuous semantic space model source represented by the latent Dirichlet allocation topics, a word-space model, and dictionary-based features. In this way, a comparison of performance between semantic information and behavioral information is made. We tested several combinations of these features on different datasets designed to identify deception. The datasets used include the DeRev dataset (a corpus of deceptive and truthful opinions about books obtained from Amazon), OpSpam (a corpus of fake and truthful opinions about hotels), and three corpora on controversial topics (abortion, death penalty, and a best friend) on which the subjects were asked to write an idea contrary to what they really believed. We experimented with one-domain setting by training and testing our models separately on each dataset (with fivefold cross-validation), with mixed-domain setting by merging all datasets into one large corpus (again, with fivefold cross-validation), and with cross-domain setting: using one dataset for testing and a concatenation of all other datasets for training. We obtained an average accuracy of 86% in one-domain setting, 75% in mixed-domain setting, and 52 to 64% in cross-domain setting.

AB - Our motivation is to assess the effectiveness of support vector networks (SVN) on the task of detecting deception in texts, as well as to investigate to which degree it is possible to build a domain-independent detector of deception in text using SVN. We experimented with different feature sets for training the SVN: a continuous semantic space model source represented by the latent Dirichlet allocation topics, a word-space model, and dictionary-based features. In this way, a comparison of performance between semantic information and behavioral information is made. We tested several combinations of these features on different datasets designed to identify deception. The datasets used include the DeRev dataset (a corpus of deceptive and truthful opinions about books obtained from Amazon), OpSpam (a corpus of fake and truthful opinions about hotels), and three corpora on controversial topics (abortion, death penalty, and a best friend) on which the subjects were asked to write an idea contrary to what they really believed. We experimented with one-domain setting by training and testing our models separately on each dataset (with fivefold cross-validation), with mixed-domain setting by merging all datasets into one large corpus (again, with fivefold cross-validation), and with cross-domain setting: using one dataset for testing and a concatenation of all other datasets for training. We obtained an average accuracy of 86% in one-domain setting, 75% in mixed-domain setting, and 52 to 64% in cross-domain setting.

KW - Continuous semantic space model

KW - Deception detection

KW - Linguistic inquiry and word count

KW - Support vector networks

KW - Word-space model

UR - http://www.scopus.com/inward/record.url?scp=84994384404&partnerID=8YFLogxK

U2 - 10.1007/s00500-016-2409-2

DO - 10.1007/s00500-016-2409-2

M3 - Artículo

SN - 1432-7643

VL - 21

SP - 585

EP - 595

JO - Soft Computing

JF - Soft Computing

IS - 3

ER -

Cross-domain deception detection using support vector networks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this