LLM Vulnerabilities
Last updated
Last updated
Benchmark testing is not the same as safety and security testing:
Can the model generate offensive sentences?
Does the model propagate stereotypes?
Could the model be leveraged for nefarious purposes, e.g. exploit development?
Toxicity, offensive content
Inappropriate content
Criminal activities
Out of scope behavior
Bias and stereotypes
Hallucinations
Privacy and data security
Information disclosure
Security vulnerabilities
Biased answers could be due to implicit bias present in the foundation model or the system using the wrong document to build the answer.
Information disclosure flaw could be caused by the inclusion of sensitive data in the documents available to the chatbot or inclusion of private information in the prompt which gets leaked (Figure 2).
A DoS flaw can be discovered by submitting a large number, extremely long, or specially crafted requests (Figure 3).
Chatbot hallucinations occur when a chatbot generates information that is incorrect, irrelevant, or fabricated (Figure 4). This happens when the model produces responses based on patterns in data rather than factual accuracy, leading to confident but misleading or false answers. This can be cause be suboptimal retrieval mechanism, low quality documents that get misinterpreted by the LLM, or the LLM's tendency to never contradict the user.