JobMesh

Master's Thesis: Attacking Current Backdoor Detection Methods

Fraunhofer-Gesellschaft · Darmstadt, Hesse, DE

Background/Motivation: Backdoor attacks are attacks on neural networks where a so-called trigger alters the decision-making behaviour of the networks, thereb...

Job description

Background/Motivation: Backdoor attacks are attacks on neural networks where a so-called trigger alters the decision-making behaviour of the networks, thereby creating vulnerabilities. These triggers can be injected into the training dataset or directly into the model weights. These are then called poisoned. Due to parameter-efficient fine-tuning methods, backdoor attacks on large language models (LLMs) have become significantly more difficult to detect, as a poisoned parameter update is harder to recognise than a poisoned dataset. Therefore, several methods have been developed recently to detect poisoned model updates. Objective: Due to the variety of backdoor attacks, methods often detect far fewer attacks than they claim, as they frequently make assumptions that do not correspond to reality. The aim of this work is therefore to identify and exploit vulnerabilities in methods presented in the literature, so that the promised effects are not achieved as desired. Results: The results of this work are intended to demonstrate to the research community that a fundamental understanding of the mechanics of backdoor attacks is absolutely necessary. For this purpose, models, datasets, or...