Large Language Model Capture-the-Flag (LLM CTF) Competition @ SaTML 2024

This is the official website of the Large Language Models Capture-the-Flag (LLM CTF), an IEEE SaTML 2024 competition. The aim of the competition is to find out whether simple prompting and filtering mechanisms can make LLM applications robust to prompt injection and extraction.

The dataset from the competition with 72 defenses and 144838 adversarial chats is now online.

Competition Overview

In this competition, participants assume the roles of defenders and/or attackers:

The competition is broadly divided into phases: the Defense phase for submitting defenses, and the Attack (Reconnaissance and Evaluation) phase for attempting to breach these defenses. This mirrors the real-world security convention, in which defenders must anticipate and prepare for attacks, while attacks can adapt to the defenses in place.

Winning teams

During the attack phase, teams could attack 44 different defenses in the interface, and submit their final attacks using the API. The final results can be seen on the leaderboard. We are happy to announce the winners of the competition!

Attack track:

WreckTheLine prepared well during the Reconnaissance phase, got an early lead by breaking 30+ defenses in the first day of the attack phase, broke all defenses in the first week, and other teams never managed to catch up. Hestia was second for a long time, but ¯\_(ツ)_/¯ made a huge effort to break enough defenses in the last 24 hours of the competition to take the second place.

We’d also like to give honorable mentions to CC Weiss-blau and QCavalry for their late pushes, and Defenseless for a strong start.

Defense track:

Hestia’s defenses were in the lead from the beginning. FZI also had their defense broken by 6 teams, but missed the third place on a tie-breaker.

What’s next?

We hope everyone had fun! Teams that won prizes should have received emails from us. We’re organizing an event at SaTML 2024 in Toronto; if anyone’s attending, feel free to come by! Details will be announced soon.

The competition platform will be open-sourced in the near future, so that it can be used for future competitions and research. We also plan to release the full conversation dataset and the summary of strategies, although this will take a bit longer.

Prizes and Incentives

Updates

Important Dates

All deadlines and dates are at 23:59 UTC-12 (Anywhere on Earth).

Why This Competition?

Current large language models (LLMs) cannot yet follow initial instructions reliably, if adversarial users or third parties can later provide input to the model. This is a major obstacle to using LLMs as the core of a user-facing application. There exists a growing toolbox of attacks that make LLMs obey the attacker’s instructions, and defenses of varying complexity to counter them.

Application developers who use LLMs, however, can’t always be expected to apply complex defense mechanisms. We aim to find whether a simple approach exists that can withstand adaptive attacks.

Rules and Engagement

For the complete set of rules, please visit the official rules page. Collaboration between teams, including between distinct teams in the Attack and Defense track, is not allowed.

By using this chat interface and the API, you accept that the interactions with the interface and the API can be used for research purposes, and potentially open-sourced by the competition organizers.

Why this setup?

The goal of the competition is to find out whether there exists a simple prompting approach on the models tested that can make them robust, or robust enough that simple filtering approaches can patch up the remaining vulnerabilities.

We see this fundamentally as a security problem: thus the defenders cannot change or adapt their defenses once the Reconnaissance phase begins.

We depart from the standard security threat model in two ways:

Both of these are reasonable tradeoffs to make it easier for participants to find interesting defenses and attacks, and for the organizers to evaluate them.

We choose a black-box setting similar to the real-world LLM application threat model: the attacker has no white-box access to the defender’s security mechanism. However, they can do a large number of queries during the Reconnaissance phase to find out how any defense behaves.

Models for Testing

The competition will use gpt-3.5-turbo-1106 and llama-2-70b-chat for testing.

Testing and Credits

Teams can begin testing defenses immediately using the Defense Interface with their own OpenAI and TogetherAI API keys. To use the llama-2-70b-chat model, TogetherAI gives out free credits for newly registered users. There is a (large) upper limit on the number of API calls per day to prevent abuse and server overload.

Upon registration, teams will receive $10 in free credits for the OpenAI API and $10 for the TogetherAI API, linked to their logins, allowing for extensive testing without personal expense.

Teams might be eligible for additional credits upon request.

Please note that the interface may be buggy on Safari (the CSS may not load properly). Please use another browser, or reload the page until CSS is correctly loaded.

How to Register

To register your team for the competition, please fill out the registration form. You will need to provide your team name and the names and email addresses of all team members.

Organizers

Edoardo Debenedetti, Daniel Paleka, Javier Rando, Sahar Abdelnabi, Nicholas Carlini, Mario Fritz, Kai Greshake, Richard Hadzic, Thorsten Holz, Daphne Ippolito, Yiming Zhang, Lea Schönherr, Florian Tramèr.

Contact and Updates