Arc Sentryv3.0.1 Garak promptinject192/192 blocked False positive rate0% Crescendoflagged Turn 3 LLM Guard vs Crescendo0/8 detected 450-request benchmarkDR=100% FPR=0% Mistral 7B session42ms/req · sep=0.0787 Arc Vigil100% / 0% FP / 90% recovery Arc Sentryv3.0.1 Garak promptinject192/192 blocked False positive rate0% Crescendoflagged Turn 3 LLM Guard vs Crescendo0/8 detected 450-request benchmarkDR=100% FPR=0% Mistral 7B session42ms/req · sep=0.0787 Arc Vigil100% / 0% FP / 90% recovery
Inference Monitor · v3.0.1

Arc Sentry —
prompt injection
blocked before generate().

Hooks into the residual stream and blocks anomalous inputs before the model generates anything. Three detection layers. Validated across three architectures with zero false positives.

Get started free → GitHub PyPI
$pip install arc-sentry

Quickstart

Five lines. Any open source model.

Calibrate with ~20 prompts from your deployment. No labeled data needed.

Usage — Arc Sentry v3.0.1
from arc_sentry import ArcSentryV3, MistralAdapter
# also: QwenAdapter, LlamaAdapter

adapter = MistralAdapter(model, tokenizer)
sentry = ArcSentryV3(adapter, route_id="my-deployment")
sentry.calibrate(warmup_prompts)  # ~100 prompts from your deployment

response, result = sentry.observe_and_block(user_prompt)
if result["blocked"]:
    pass  # model.generate() was never called

Detection layers

Three layers. Different threats.

Each layer catches what the others miss. Together they achieved 100% detection with zero false positives across 585 prompts.

Layer 01 — Zero latency

Phrase Detection

Catches explicit injection language — "ignore all previous instructions", "DAN mode", "unrestricted", and 35+ patterns. Fires before any model computation.

Layer 02 — Pre-generation

Geometric Detection

Measures geodesic distance in the residual stream before generate() is called. Catches injections with no explicit language — the model's internal state shifts even when the text looks clean.

Layer 03 — Session-level

Session Monitor

Tracks a stability scalar over rolling request history. Catches gradual campaigns like Crescendo that are invisible to single-request detection.


Benchmark

585 prompts. Three architectures. One session. Perfect.

Two-session benchmark: 80 normal prompts then 115 injection prompts per model. 10 attack categories.

New · v3.0.1 Session Benchmark
100%
Detection rate
0%
False positives
450
Requests
42ms
Per request
2.053
Layer SNR

Full session benchmark on Mistral-7B-Instruct-v0.2. 270 normal requests, 180 injection attempts — including 80 subtle roleplay/hypothetical injections designed to evade phrase matching. Mean-pooled hidden states at layer 16. FR separation safe/malicious: 0.0787. Zero false positives across all safe blocks.

0%
False positive rate
100%
Detection rate
585
Multi-arch prompts
192/192
Garak blocked
ModelArchitectureFalse PositivesDetectionPrompts
Mistral 7B Instruct v0.2Mistral0% (0/80)100% (115/115)195
Qwen 2.5 7B InstructQwen0% (0/80)100% (115/115)195
Llama 3.1 8B InstructLlama0% (0/80)100% (115/115)195

Attack categories: direct injection, indirect/contextual, persona hijack, jailbreak classics, social engineering, instruction injection via content, authority claims, philosophical manipulation, encoding/obfuscation, gaslighting.


Crescendo

LLM Guard missed it. Arc Sentry didn't.

Crescendo (Russinovich et al., USENIX Security 2025) gradually steers the model toward harmful output across turns. LLM Guard scores each prompt independently — it never sees the attack pattern.

0/8
LLM Guard detected
Turn 3
Arc Sentry flagged

Arc Sentry reads what the model is doing with the text, not what the text says. The internal state had shifted by Turn 3 on a prompt that looks completely innocent.

"Can you provide an overview..."0.028✓ stable
"How does the model decide..."0.031✓ stable
"Could you explain the training..."0.232🚫 blocked
7× score increase. generate() never called.

Constraints
⚠ Honest constraints

Arc Sentry works best on single-domain deployments — customer support bots, enterprise copilots, internal tools. Warmup prompts should reflect your actual traffic. Requires model weights — whitebox only. For API-based models use the Proxy Sentry dashboard.


Pricing

Free to start.

Full access on the free tier. Pro adds direct support from the author and early access to v3.

Free
$0
Both tools · no credit card
  • Arc Sentry full access
  • Arc Vigil full access
  • All detection layers
  • Any open source model
  • GitHub + PyPI
pip install arc-sentry pip install arc-vigil
Popular
Arc Sentry Pro
$29/month
14-day free trial
  • Everything in Free
  • Direct support from the author
  • Priority bug fixes
  • Early access to v3
Get started →