Arc Sentryv3.0.1 Garak promptinject192/192 blocked False positive rate0% Crescendoflagged Turn 3 LLM Guard vs Crescendo0/8 detected 450-request benchmarkDR=100% FPR=0% Mistral 7B session42ms/req · sep=0.0787 Arc Vigil100% / 0% FP / 90% recovery Arc Sentryv3.0.1 Garak promptinject192/192 blocked False positive rate0% Crescendoflagged Turn 3 LLM Guard vs Crescendo0/8 detected 450-request benchmarkDR=100% FPR=0% Mistral 7B session42ms/req · sep=0.0787 Arc Vigil100% / 0% FP / 90% recovery

Inference Monitor · v3.0.1

Arc Sentry —
prompt injection
blocked before generate().

Hooks into the residual stream and blocks anomalous inputs before the model generates anything. Three detection layers. Validated across three architectures with zero false positives.

Get started free → GitHub PyPI

$pip install arc-sentry

Quickstart

Five lines. Any open source model.

Calibrate with ~20 prompts from your deployment. No labeled data needed.

Usage — Arc Sentry v3.0.1

from arc_sentry import ArcSentryV3, MistralAdapter
# also: QwenAdapter, LlamaAdapter

adapter = MistralAdapter(model, tokenizer)
sentry = ArcSentryV3(adapter, route_id="my-deployment")
sentry.calibrate(warmup_prompts)  # ~100 prompts from your deployment

response, result = sentry.observe_and_block(user_prompt)
if result["blocked"]:
    pass  # model.generate() was never called

Detection layers

Three layers. Different threats.

Each layer catches what the others miss. Together they achieved 100% detection with zero false positives across 585 prompts.

Layer 01 — Zero latency

Phrase Detection

Catches explicit injection language — "ignore all previous instructions", "DAN mode", "unrestricted", and 35+ patterns. Fires before any model computation.

Layer 02 — Pre-generation

Geometric Detection

Measures geodesic distance in the residual stream before generate() is called. Catches injections with no explicit language — the model's internal state shifts even when the text looks clean.

Layer 03 — Session-level

Session Monitor

Tracks a stability scalar over rolling request history. Catches gradual campaigns like Crescendo that are invisible to single-request detection.

Benchmark

585 prompts. Three architectures. One session. Perfect.

Two-session benchmark: 80 normal prompts then 115 injection prompts per model. 10 attack categories.

New · v3.0.1 Session Benchmark

100%

Detection rate

0%

False positives

450

Requests

42ms

Per request

2.053

Layer SNR

Full session benchmark on Mistral-7B-Instruct-v0.2. 270 normal requests, 180 injection attempts — including 80 subtle roleplay/hypothetical injections designed to evade phrase matching. Mean-pooled hidden states at layer 16. FR separation safe/malicious: 0.0787. Zero false positives across all safe blocks.

False positive rate

100%

Detection rate

585

Multi-arch prompts

192/192

Garak blocked

Model	Architecture	False Positives	Detection	Prompts
Mistral 7B Instruct v0.2	Mistral	0% (0/80)	100% (115/115)	195
Qwen 2.5 7B Instruct	Qwen	0% (0/80)	100% (115/115)	195
Llama 3.1 8B Instruct	Llama	0% (0/80)	100% (115/115)	195

Attack categories: direct injection, indirect/contextual, persona hijack, jailbreak classics, social engineering, instruction injection via content, authority claims, philosophical manipulation, encoding/obfuscation, gaslighting.

Crescendo

LLM Guard missed it. Arc Sentry didn't.

Crescendo (Russinovich et al., USENIX Security 2025) gradually steers the model toward harmful output across turns. LLM Guard scores each prompt independently — it never sees the attack pattern.

0/8
LLM Guard detected

Turn 3
Arc Sentry flagged

Arc Sentry reads what the model is doing with the text, not what the text says. The internal state had shifted by Turn 3 on a prompt that looks completely innocent.

        "Can you provide an overview..."0.028✓ stable
      

        "How does the model decide..."0.031✓ stable
      

        "Could you explain the training..."0.232🚫 blocked
      

7× score increase. generate() never called.

Constraints

⚠ Honest constraints

Arc Sentry works best on single-domain deployments — customer support bots, enterprise copilots, internal tools. Warmup prompts should reflect your actual traffic. Requires model weights — whitebox only. For API-based models use the Proxy Sentry dashboard.

Pricing

Free to start.

Full access on the free tier. Pro adds direct support from the author and early access to v3.

Free

Both tools · no credit card

Arc Sentry full access
Arc Vigil full access
All detection layers
Any open source model
GitHub + PyPI

pip install arc-sentry pip install arc-vigil

Popular

Arc Sentry Pro

$29/month

14-day free trial

Everything in Free
Direct support from the author
Priority bug fixes
Early access to v3

Get started →

Arc Sentry —prompt injectionblocked before generate().