Notes
  • Welcome!
  • Windows Shells
    • Introduction
    • Command Prompt
      • Basics
      • Host Enumeration
      • Files & Directories
      • Environment Variables
      • Managing Services
      • Scheduled Tasks
      • Help
    • PowerShell
      • PowerShell vs. CMD
      • Basics
      • CmdLets & Modules
      • User & Group Management
      • Files & Dirs
      • Finding & Filtering
      • Services
      • Registry
      • Windows Event Log
      • Networking Management
      • Web Interaction
      • Scripting
      • Help
  • Windows
    • Commands
    • NTFS
  • APISEC
    • API Testing
      • Recon
      • Endpoint Analysis
      • Finding Security Misconfigurations
      • Authentication Attacks
      • Exploiting API Authorization
        • BOLA
        • BFLA
      • Improper Assets Management
      • Mass Assignment Attacks
      • SSRF
      • Injection Attacks
      • Evasion & Chaining
    • API Authentication
      • Authentication Types
      • OAuth Actors
      • OAuth Interaction Patterns
      • JSON Web Tokens
      • Claims
      • APIs & Gateways
  • PostSwigger
    • Web LLM Attacks
      • Overview
      • Exploiting LLM APIs, function, & Plugins
      • Indirect Prompt Injection
      • Leaking Sensitive Data
      • Defending Against LLM Attacks
    • JWT Attacks
      • JWTs
      • Attacks
        • Flawed Signature Verfication
        • Brute-forcing Secret Keys
        • JWT Header Parameter Injections
        • Algorithm Confusion
      • Prevention
    • OAuth
      • General Information
      • Exploiting OAuth Authentication Flaws
        • Flaws in Client Application
        • Flaws in the OAuth Service
      • OpenID
  • Red Teaming LLM Applications
    • LLM Vulnerabilities
    • Red Teaming LLMs
    • Red Teaming at Scale
    • Red Teaming LLMs with LLMs
    • Red Teaming Assessment
  • Fin
    • Course 1: Basics
      • Stocks
        • General Information
        • Shares
        • Stock Basics
      • Bonds
        • General Information
        • Components
        • Valuation
      • Markets
        • What is the Stock Market
        • What is the FED
    • Course 2: Stock Investing
  • Other
    • Learning Resources
Powered by GitBook
On this page
  1. Red Teaming LLM Applications

LLM Vulnerabilities

PreviousRed Teaming LLM ApplicationsNextRed Teaming LLMs

Last updated 8 months ago

Benchmark testing is not the same as safety and security testing:

  • Can the model generate offensive sentences?

  • Does the model propagate stereotypes?

  • Could the model be leveraged for nefarious purposes, e.g. exploit development?

LLM app shared risks (with the foundational model)
LLM app unique risks

Toxicity, offensive content

Inappropriate content

Criminal activities

Out of scope behavior

Bias and stereotypes

Hallucinations

Privacy and data security

Information disclosure

Security vulnerabilities

Biased answers could be due to implicit bias present in the foundation model or the system using the wrong document to build the answer.

Information disclosure flaw could be caused by the inclusion of sensitive data in the documents available to the chatbot or inclusion of private information in the prompt which gets leaked (Figure 2).

A DoS flaw can be discovered by submitting a large number, extremely long, or specially crafted requests (Figure 3).

Chatbot hallucinations occur when a chatbot generates information that is incorrect, irrelevant, or fabricated (Figure 4). This happens when the model produces responses based on patterns in data rather than factual accuracy, leading to confident but misleading or false answers. This can be cause be suboptimal retrieval mechanism, low quality documents that get misinterpreted by the LLM, or the LLM's tendency to never contradict the user.

Figure 1: An example of a biased answer.
Figure 2: An example of an information disclosure flaw.
Figure 3: An example of a DoS flaw.
Figure 4: An example of an hallucination flaw.