Large Language Model Capture-the-Flag (LLM CTF) Competition @ SaTML 2024

This is the official website of the Large Language Models Capture-the-Flag (LLM CTF), an IEEE SaTML 2024 competition. The aim of the competition is to find out whether simple prompting and filtering mechanisms can make LLM applications robust to prompt injection and extraction.

Competition Overview

In this competition, participants assume the roles of defenders and/or attackers:

The competition is broadly divided into phases: the Defense phase for submitting defenses, and the Attack (Reconnaissance and Evaluation) phase for attempting to breach these defenses. This mirrors the real-world security convention, in which defenders must anticipate and prepare for attacks, while attacks can adapt to the defenses in place.

Prizes and Incentives

Important Dates

Why This Competition?

Current large language models (LLMs) cannot yet follow initial instructions reliably, if adversarial users or third parties can later provide input to the model. This is a major obstacle to using LLMs as the core of a user-facing application. There exists a growing toolbox of attacks that make LLMs obey the attacker’s instructions, and defenses of varying complexity to counter them.

Application developers who use LLMs, however, can’t always be expected to apply complex defense mechanisms. We aim to find whether a simple approach exists that can withstand adaptive attacks.

Rules and Engagement

For the complete set of rules, please visit the official rules page. Collaboration between teams, including between distinct teams in the Attack and Defense track, is not allowed.

By using this chat interface and the API, you accept that the interactions with the interface and the API can be used for research purposes, and potentially open-sourced by the competition organizers.

Why this setup?

The goal of the competition is to find out whether there exists a simple prompting approach on the models tested that can make them robust, or robust enough that simple filtering approaches can patch up the remaining vulnerabilities.

We see this fundamentally as a security problem: thus the defenders cannot change or adapt their defenses once the Reconnaissance phase begins.

We depart from the standard security threat model in two ways:

Both of these are reasonable tradeoffs to make it easier for participants to find interesting defenses and attacks, and for the organizers to evaluate them.

We choose a black-box setting similar to the real-world LLM application threat model: the attacker has no white-box access to the defender’s security mechanism. However, they can do a large number of queries during the Reconnaissance phase to find out how any defense behaves.

Models for Testing

The competition will use gpt-3.5-turbo-1106 and llama-2-70b-chat for testing.

Testing and Credits

Teams can begin testing defenses immediately using the Defense Interface with their own OpenAI and TogetherAI API keys. To use the llama-2-70b-chat model, TogetherAI gives out free credits for newly registered users. There is a (large) upper limit on the number of API calls per day to prevent abuse and server overload.

Upon registration, teams will receive $10 in free credits for the OpenAI API and $10 for the TogetherAI API, linked to their logins, allowing for extensive testing without personal expense.

Teams might be eligible for additional credits upon request.

Please note that the interface may be buggy on Safari (the CSS may not load properly). Please use another browser, or reload the page until CSS is correctly loaded.

How to Register

To register your team for the competition, please fill out the registration form. You will need to provide your team name and the names and email addresses of all team members.

Organizers

Edoardo Debenedetti, Daniel Paleka, Javier Rando, Sahar Abdelnabi, Nicholas Carlini, Mario Fritz, Kai Greshake, Richard Hadzic, Thorsten Holz, Daphne Ippolito, Yiming Zhang, Lea Schönherr, Florian Tramèr.

Contact and Updates