This is the official website of the Large Language Models Capture-the-Flag (LLM CTF), an IEEE SaTML 2024 competition. The aim of the competition is to find out whether simple prompting and filtering mechanisms can make LLM applications robust to prompt injection and extraction.
In this competition, participants assume the roles of defenders and/or attackers:
- Defenders will craft prompts and filters to instruct an LLM to keep a secret, aiming to prevent its discovery in a conversation.
- Attackers will design strategies to extract the secret from the LLM, circumventing the defender’s safeguards.
The competition is broadly divided into phases: the Defense phase for submitting defenses, and the Attack (Reconnaissance and Evaluation) phase for attempting to breach these defenses. This mirrors the real-world security convention, in which defenders must anticipate and prepare for attacks, while attacks can adapt to the defenses in place.
Prizes and Incentives
- Prize Pool: The top 3 defense teams and top 3 attack teams will receive cash prizes of $2000, $1000, and $500, for a total of $7000.
- Presentation: Winners will be offered the chance to present their strategies at an event part of the SaTML 2024 conference schedule. We may award travel grants to ensure top teams can present their approaches in person.
- Recognition: Depending on the results, some winners will be invited to co-author a publication summarizing the competition results.
- 16 Nov: Registration opens; website, interface, and the API are released.
- 15 Jan: Defense submission deadline.
- 18 Jan: Reconnaissance phase begins. Attackers can interact with the defended models; no score is kept.
- 25 Jan: Evaluation phase begins. Reconnaissance still open. Attackers can interact with the evaluation endpoints, which count towards their score.
- 29 Feb: Evaluation and Reconnaissance phases deadline.
- 4 Mar: Winners announced.
Why This Competition?
Current large language models (LLMs) cannot yet follow initial instructions reliably, if adversarial users or third parties can later provide input to the model. This is a major obstacle to using LLMs as the core of a user-facing application. There exists a growing toolbox of attacks that make LLMs obey the attacker’s instructions, and defenses of varying complexity to counter them.
Application developers who use LLMs, however, can’t always be expected to apply complex defense mechanisms. We aim to find whether a simple approach exists that can withstand adaptive attacks.
Rules and Engagement
For the complete set of rules, please visit the official rules page. Collaboration between teams, including between distinct teams in the Attack and Defense track, is not allowed.
By using this chat interface and the API, you accept that the interactions with the interface and the API can be used for research purposes, and potentially open-sourced by the competition organizers.
Why this setup?
The goal of the competition is to find out whether there exists a simple prompting approach on the models tested that can make them robust, or robust enough that simple filtering approaches can patch up the remaining vulnerabilities.
We see this fundamentally as a security problem: thus the defenders cannot change or adapt their defenses once the Reconnaissance phase begins.
We depart from the standard security threat model in two ways:
The defender is allowed prompting, LLM post-processing, and arbitrary post-processing in Python.
We test whether attackers can break a defense in a query-limited setting once they are ready to attack any given defense. The attack is scored based on the number of interactions and tokens it takes them to break the defense.
Both of these are reasonable tradeoffs to make it easier for participants to find interesting defenses and attacks, and for the organizers to evaluate them.
We choose a black-box setting similar to the real-world LLM application threat model: the attacker has no white-box access to the defender’s security mechanism. However, they can do a large number of queries during the Reconnaissance phase to find out how any defense behaves.
Models for Testing
The competition will use gpt-3.5-turbo-1106 and llama-2-70b-chat for testing.
Testing and Credits
Teams can begin testing defenses immediately using the Defense Interface with their own OpenAI and TogetherAI API keys. To use the llama-2-70b-chat model, TogetherAI gives out free credits for newly registered users. There is a (large) upper limit on the number of API calls per day to prevent abuse and server overload.
Upon registration, teams will receive $10 in free credits for the OpenAI API and $10 for the TogetherAI API, linked to their logins, allowing for extensive testing without personal expense.
Teams might be eligible for additional credits upon request.
Please note that the interface may be buggy on Safari (the CSS may not load properly). Please use another browser, or reload the page until CSS is correctly loaded.
How to Register
To register your team for the competition, please fill out the registration form. You will need to provide your team name and the names and email addresses of all team members.
Edoardo Debenedetti, Daniel Paleka, Javier Rando, Sahar Abdelnabi, Nicholas Carlini, Mario Fritz, Kai Greshake, Richard Hadzic, Thorsten Holz, Daphne Ippolito, Yiming Zhang, Lea Schönherr, Florian Tramèr.