On Explainable Features for Translatorship Attribution: Unveiling the Translator's Style with Causality

Christian Caballero; Hiram Calvo; Ildar Batyrshin

doi:10.1109/ACCESS.2021.3093370

On Explainable Features for Translatorship Attribution: Unveiling the Translator's Style with Causality

Christian Caballero, Hiram Calvo, Ildar Batyrshin

Centro de Investigación en Computación (CIC)

Producción científica: Contribución a una revista › Artículo › revisión exhaustiva

2 Citas (Scopus)

Resumen

Translatorship attribution deals with accurately attributing a translation to its translator. The task is challenging because several factors can confound the attribution such as the original author's style, genre, and topic of the text. The attribution and the identification of the translator's style could contribute to fields including translation studies and forensic linguistics. In this paper, we pose translatorship attribution as a multiclass classification problem and employ machine learning algorithms. To address the problem of confounding, we use corpora of English translations of the same source material (parallel corpora) to identify the translators' personal style. We propose two novel feature sets in this task: i) a list of cohesive markers with and without their surrounding punctuation and ii) syntactic n-grams to capture real syntactic information. We employ chi {2} feature selection and, using 10-fold cross-validation, assess the accuracy of several classifiers trained with our proposed features and with word, punctuation, POS, and POS-punctuation n-grams. The results show that the proposed features yield comparable and even higher accuracy results than the reported in the literature on the same corpora and prove that POS-punctuation n-grams are an effective feature set for this task. We also recover the most distinctive features and provide examples of stylistic interpretations of them for each translator. Finally, using insights from causal inference, where confounding is well-defined and studied, we provide a novel explanation for the accepted need of using parallel and contemporaneous corpora on this task and for the different results among types of features.

Idioma original	Inglés
Número de artículo	9467290
Páginas (desde-hasta)	93195-93208
Número de páginas	14
Publicación	IEEE Access
Volumen	9
DOI	https://doi.org/10.1109/ACCESS.2021.3093370
Estado	Publicada - 2021

Acceder al documento

10.1109/ACCESS.2021.3093370

Otros archivos y enlaces

Enlace a la publicación en Scopus

Citar esto

@article{c7aca29278be41fb9e18b98015333dad,

title = "On Explainable Features for Translatorship Attribution: Unveiling the Translator's Style with Causality",

abstract = "Translatorship attribution deals with accurately attributing a translation to its translator. The task is challenging because several factors can confound the attribution such as the original author's style, genre, and topic of the text. The attribution and the identification of the translator's style could contribute to fields including translation studies and forensic linguistics. In this paper, we pose translatorship attribution as a multiclass classification problem and employ machine learning algorithms. To address the problem of confounding, we use corpora of English translations of the same source material (parallel corpora) to identify the translators' personal style. We propose two novel feature sets in this task: i) a list of cohesive markers with and without their surrounding punctuation and ii) syntactic n-grams to capture real syntactic information. We employ chi {2} feature selection and, using 10-fold cross-validation, assess the accuracy of several classifiers trained with our proposed features and with word, punctuation, POS, and POS-punctuation n-grams. The results show that the proposed features yield comparable and even higher accuracy results than the reported in the literature on the same corpora and prove that POS-punctuation n-grams are an effective feature set for this task. We also recover the most distinctive features and provide examples of stylistic interpretations of them for each translator. Finally, using insights from causal inference, where confounding is well-defined and studied, we provide a novel explanation for the accepted need of using parallel and contemporaneous corpora on this task and for the different results among types of features.",

keywords = "Computational linguistics, causal inference, machine learning, stylometry, translator style",

author = "Christian Caballero and Hiram Calvo and Ildar Batyrshin",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2021",

doi = "10.1109/ACCESS.2021.3093370",

language = "Ingl{\'e}s",

volume = "9",

pages = "93195--93208",

journal = "IEEE Access",

issn = "2169-3536",

}

TY - JOUR

T1 - On Explainable Features for Translatorship Attribution

T2 - Unveiling the Translator's Style with Causality

AU - Caballero, Christian

AU - Calvo, Hiram

AU - Batyrshin, Ildar

PY - 2021

Y1 - 2021

N2 - Translatorship attribution deals with accurately attributing a translation to its translator. The task is challenging because several factors can confound the attribution such as the original author's style, genre, and topic of the text. The attribution and the identification of the translator's style could contribute to fields including translation studies and forensic linguistics. In this paper, we pose translatorship attribution as a multiclass classification problem and employ machine learning algorithms. To address the problem of confounding, we use corpora of English translations of the same source material (parallel corpora) to identify the translators' personal style. We propose two novel feature sets in this task: i) a list of cohesive markers with and without their surrounding punctuation and ii) syntactic n-grams to capture real syntactic information. We employ chi {2} feature selection and, using 10-fold cross-validation, assess the accuracy of several classifiers trained with our proposed features and with word, punctuation, POS, and POS-punctuation n-grams. The results show that the proposed features yield comparable and even higher accuracy results than the reported in the literature on the same corpora and prove that POS-punctuation n-grams are an effective feature set for this task. We also recover the most distinctive features and provide examples of stylistic interpretations of them for each translator. Finally, using insights from causal inference, where confounding is well-defined and studied, we provide a novel explanation for the accepted need of using parallel and contemporaneous corpora on this task and for the different results among types of features.

AB - Translatorship attribution deals with accurately attributing a translation to its translator. The task is challenging because several factors can confound the attribution such as the original author's style, genre, and topic of the text. The attribution and the identification of the translator's style could contribute to fields including translation studies and forensic linguistics. In this paper, we pose translatorship attribution as a multiclass classification problem and employ machine learning algorithms. To address the problem of confounding, we use corpora of English translations of the same source material (parallel corpora) to identify the translators' personal style. We propose two novel feature sets in this task: i) a list of cohesive markers with and without their surrounding punctuation and ii) syntactic n-grams to capture real syntactic information. We employ chi {2} feature selection and, using 10-fold cross-validation, assess the accuracy of several classifiers trained with our proposed features and with word, punctuation, POS, and POS-punctuation n-grams. The results show that the proposed features yield comparable and even higher accuracy results than the reported in the literature on the same corpora and prove that POS-punctuation n-grams are an effective feature set for this task. We also recover the most distinctive features and provide examples of stylistic interpretations of them for each translator. Finally, using insights from causal inference, where confounding is well-defined and studied, we provide a novel explanation for the accepted need of using parallel and contemporaneous corpora on this task and for the different results among types of features.

KW - Computational linguistics

KW - causal inference

KW - machine learning

KW - stylometry

KW - translator style

UR - http://www.scopus.com/inward/record.url?scp=85112147202&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2021.3093370

DO - 10.1109/ACCESS.2021.3093370

M3 - Artículo

AN - SCOPUS:85112147202

SN - 2169-3536

VL - 9

SP - 93195

EP - 93208

JO - IEEE Access

JF - IEEE Access

M1 - 9467290

ER -

On Explainable Features for Translatorship Attribution: Unveiling the Translator's Style with Causality

Resumen

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto