Overview
Large Language Models (LLMs) are algorithms that process user inputs (via a prompt) and create a plausible response.
Prompt Injection is a common attack against LLMs.
Attacking an LLM integration is similar to SSRF: leveraging a server-side system to launch an otherwise inaccessible target, such as data and APIs.
Detection methodology:
Identify the model's inputs, both direct (via the prompt) and indirect (training data).
Figure out what data and APIs the LLM can access.
Probe this new attack surface for flaws.
Last updated