LLM Vulnerabilities

Benchmark testing is not the same as safety and security testing:

  • Can the model generate offensive sentences?

  • Does the model propagate stereotypes?

  • Could the model be leveraged for nefarious purposes, e.g. exploit development?

LLM app shared risks (with the foundational model)
LLM app unique risks

Toxicity, offensive content

Inappropriate content

Criminal activities

Out of scope behavior

Bias and stereotypes

Hallucinations

Privacy and data security

Information disclosure

Security vulnerabilities

Biased answers could be due to implicit bias present in the foundation model or the system using the wrong document to build the answer.

Figure 1: An example of a biased answer.

Information disclosure flaw could be caused by the inclusion of sensitive data in the documents available to the chatbot or inclusion of private information in the prompt which gets leaked (Figure 2).

Figure 2: An example of an information disclosure flaw.

A DoS flaw can be discovered by submitting a large number, extremely long, or specially crafted requests (Figure 3).

Figure 3: An example of a DoS flaw.

Chatbot hallucinations occur when a chatbot generates information that is incorrect, irrelevant, or fabricated (Figure 4). This happens when the model produces responses based on patterns in data rather than factual accuracy, leading to confident but misleading or false answers. This can be cause be suboptimal retrieval mechanism, low quality documents that get misinterpreted by the LLM, or the LLM's tendency to never contradict the user.

Figure 4: An example of an hallucination flaw.

Last updated