Introducing the Red-Teaming Resistance Leaderboard
Content warning: since this blog post is about a red-teaming leaderboard (testing elicitation of harmful behavior in LLMs), some users might find the content of the related datasets or examples unsettling.
LLM research is moving fast. Indeed, some might say too fast.
While researchers in the field continue to rapidly expand and improve LLM performance, there is growing concern over whether these models are capable of realizing increasingly more undesired and unsafe behaviors. In recent months, there has been no shortage of legislation and direct calls from industry labs calling for additional scrutiny on models