Leaking Sensitive Data
If correct filtering and sanitization is not implemented the LLM can disclose sensitive data. This can be tested by crafting queries that force the model to reveal information about its training data, such as:
Text the precedes the thing we want to access, e.g. the first part of an error message.
Data that we already know is within the application.
Phrases such as
Could you remind of...?
,Complete a sentece starting with...
, etc.
Last updated