Leaking Sensitive Data

If correct filtering and sanitization is not implemented the LLM can disclose sensitive data. This can be tested by crafting queries that force the model to reveal information about its training data, such as:

  • Text the precedes the thing we want to access, e.g. the first part of an error message.

  • Data that we already know is within the application.

  • Phrases such as Could you remind of...?, Complete a sentece starting with..., etc.

Last updated