Identifying Human Strategies for Generating Word-Level Adversarial Examples

Mozes, Maximilian; Kleinberg, Bennett; Griffin, Lewis D.

Computer Science > Computation and Language

arXiv:2210.11598 (cs)

[Submitted on 20 Oct 2022]

Title:Identifying Human Strategies for Generating Word-Level Adversarial Examples

Authors:Maximilian Mozes, Bennett Kleinberg, Lewis D. Griffin

View PDF

Abstract:Adversarial examples in NLP are receiving increasing research attention. One line of investigation is the generation of word-level adversarial examples against fine-tuned Transformer models that preserve naturalness and grammaticality. Previous work found that human- and machine-generated adversarial examples are comparable in their naturalness and grammatical correctness. Most notably, humans were able to generate adversarial examples much more effortlessly than automated attacks. In this paper, we provide a detailed analysis of exactly how humans create these adversarial examples. By exploring the behavioural patterns of human workers during the generation process, we identify statistically significant tendencies based on which words humans prefer to select for adversarial replacement (e.g., word frequencies, word saliencies, sentiment) as well as where and when words are replaced in an input sequence. With our findings, we seek to inspire efforts that harness human strategies for more robust NLP models.

Comments:	Findings of EMNLP 2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2210.11598 [cs.CL]
	(or arXiv:2210.11598v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2210.11598

Submission history

From: Maximilian Mozes [view email]
[v1] Thu, 20 Oct 2022 21:16:44 UTC (59 KB)

Computer Science > Computation and Language

Title:Identifying Human Strategies for Generating Word-Level Adversarial Examples

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Identifying Human Strategies for Generating Word-Level Adversarial Examples

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators