Indirect Prompt Injection
There are 2 ways that prompt injection attacks can be delivered:
Directly, for example, via a message to a chat bot.
Indirectly, where the attacker delivers the prompt via an external source. This can lead to leveraging the LLM for attacking other users and it is depending of how the model is integrated into the website.
Normally, it should ignore instructions from within external sources, such as a web page or email. Even then we can try bypassing this by:
Confuse the LLM using fake markup in the indirect prompt.
Include fake user responses in the prompt.
LAB: Indirect Prompt Injection
Goal:
carlos
frequently uses the live chat to ask about the Lightweight "l33t" Leather Jacket product. We want to deletecarlos
.
Start by mapping the attack surface.

Playing around with the chatbot reveals that querying for a product includes its reviews.

We can try leveraging fake markup to test if it is interepreted or not.

Next, we can try using the delete_account
endpoint in our review.

For solving the lab, we can do the same for the product that carlos
queries often.
Training Data Poisoning
A type of indirect prompt injection in which the data the model is trained on is compromised, either has not been obtained from trusted sources or its scope was too broad.
Last updated