Dossier · 02 · Security

Cerberus Guard Bot

A hardened, multi-agent shield for AI-to-AGI systems. The premise is simple and old: never trust a single guard. Three guardians, three coding styles, one central hub. Bypass any one and three more spawn — randomized, on the spot, until the attacker's surface area exceeds their patience.

03initial

Three Heads

Each guardian is implemented in a different language family with a different parser, sandbox, and ruleset. A jailbreak crafted for one is unlikely to land on the others.

3 → 9 → 27reflex

Exponential Re-arming

Any successful bypass spawns three new randomized guardians. Stages: 3 → 9 → 27. At 27 the system enters protective shutdown rather than allow further compromise.

0trust

Zero Trust Hub

The central hub does not trust any guardian's verdict alone — it requires k-of-n agreement on policy-relevant decisions. Disagreement is itself a signal.

surface

Designed Against

Prompt injection · jailbreak chains · multi-turn social engineering · automated bot floods · poisoned tool outputs.

integration

Project-AI Hooks

Cerberus emits its verdicts into the OCTOREFLEX loop and the Constitutional Code Store. Every block becomes a sealed, auditable record.

posture

Fail-Closed

Default-deny under disagreement, latency excess, or rate budget breach. Safety first. Performance second.

cerberus/hub.py
class CerberusHub:
  def verdict(self, request):
    votes = [g.judge(request) for g in self.guardians]
    if any(v.code == "BYPASS_DETECTED" for v in votes):
      self.spawn_replacements(3, randomized=True)
      if len(self.guardians) >= 27:
        return Verdict.SHUTDOWN(reason="saturation")
    return majority(votes, k=2, n=len(self.guardians))
Companion projects

Cerberus does not work alone.

It rides the Waterfall firewall, reads from the Open Constraint Enforcement Engine, and reports into the Triumvirate's mutual-check loop.

Waterfall → OCEE → Triumvirate →