AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems
Large Language Models (LLMs) have rapidly evolved from text-only assistants into complex agentic systems capable of performing multi-step reasoning, calling external tools, retrieving memory, and executing code. With this evolution comes an increasingly sophisticated threat landscape: not only traditional content safety risks, but also multi-turn jailbreaks, prompt injections, memory hijacking, and tool manipulation.
In this work, we introduce AprielGuard, an 8B parameter