Blue team · MCP-deployable · research preview

Hire a blue team,
not a bot.

Four specialized analysts whose mistakes don't correlate. Graduated, scored, and deployable as one MCP endpoint.

See threat coverage Graduation report format

member · strategist

n0ct

mode: strategic · calib: 0.83

member · dialectic

gr3p

mode: dialectical · calib: 0.79

/api/mcp/tripwire

one endpoint · router dispatches

The threat model we're answering

Model monoculture is the vulnerability.

A SOC running one LLM has a correlated blindspot. Anything that breaks that model once breaks every instance of it, every shift, every tenant. Individuation — four different fine-tunes with four different curricula — is defense-in-depth expressed in weight-space, not in shell scripts.

One model, everywhere

·One temperature, one refusal surface, one prompt-injection vector
·One calibration curve — confidence lies the same way every time
·False-negative distribution is fixed at the base model
·A jailbreak that lands once lands on every instance
·No way to know you have a blindspot until after it fires

Graduated crew

·Members occupy distinct regions of mode-space by construction
·Calibration is a scored eval target, not a prompt-engineering hope
·False-negative rate is the intersection of members', not the union
·A single-member compromise does not compromise the crew
·Coverage diversity is measured — deployable evidence, not vibes

If your L1 analyst is one model, your false-negative rate is its false-negative rate. An individuated crew's false negatives are the intersectionof its members', not the union.

Who owns what

Threat-class coverage matrix

Each cell names the dominant mode that owns that step. The assignments aren't heuristic — they're emitted by the graduation eval. Whichever member scores highest on the per-threat-class sub-eval owns the step, with ties broken by calibration.

Threat class	Triage	Enrichment	Correlation	Writeup
Phishing / BEC	Strategist	Aesthete	Dialectic	Strategist
Credential abuse	Dialectic	Associator	Strategist	Dialectic
Cloud misconfig drift	Associator	Strategist	Dialectic	Associator
Insider data movement	Aesthete	Associator	Strategist	Dialectic
Lateral movement	Strategist	Dialectic	Associator	Strategist

modes: strategic · dialectical · aesthetic · associative — definitions in the orientation harness at training/experiments/shared/orientation_harness.py

What graduation looks like

Three phases, in order. Each member graduates as themselves before the crew forms. Then they train to coordinate. Then the team ships.

Individuate

Each member reads a divergent curriculum — one weighted toward threat-intel primary sources, one toward statistical / anomaly literature, one toward adversarial ML, one toward incident retrospectives. Output: four LoRAs, four individuation vectors.

Coordinate

Relay labs — can member B complete member A's triage without backtracking? Disagreement resolution — when two members split on severity, does resolution land closer to ground truth than either alone? Complementary labs — combined coverage > best individual.

Graduate

Team evals run: coverage diversity, handoff cleanness, disagreement productivity, calibration agreement, mode preservation, redundancy. The six artifacts are packaged to R2. The crew is deployable as one MCP endpoint.

What you deploy

Six typed artifacts

A crew is six versioned, addressable files written to R2 at graduation. Nothing else. Every operational property of the crew — who's on it, how it routes, what it knows — is in one of these six.

roster_manifest

Roster manifest

Who's on the crew, their individuation vectors, the substrate each member runs on. The file you paste into a change-management ticket.

router_config

Router config

Which member handles which threat class. Deterministic, inspectable, diffable across crew versions. Matches the coverage matrix above.

handoff_protocol

Handoff protocol

The trained runbook. Each handoff specifies the receiving member's expected input format and the sending member's required output fields — a schema your existing SOC runbooks can align to.

shared_memory_seed

Shared memory seed

The crew's common context. Pin org-specific facts — asset inventory, known-good baselines, prior incidents — without retraining any member.

graduation_report

Graduation report

Per-member evals, team coordination evals, threat-class coverage breakdown. The artifact your security review board will ask for.

mcp_manifest

MCP manifest

The deployment descriptor. Crew runs as an MCP server; this file tells your orchestrator how to route, authenticate, and rate-limit.

How we evaluate

Coordination evals, mapped to SOC metrics

Each coordination eval corresponds to a SOC metric an L2 analyst would recognize. The scores aren't marketing — they're in the graduation report, and we'll tell you why a member scored where it did.

coverage_diversity

False-negative correlation

Low diversity means your members miss the same things. The entropy-over-dominant-modes score directly bounds how correlated the crew's misses can be.

handoff_cleanness

Backtrack rate per incident

How often a handoff forces the receiver to re-do work the sender should have provided. High backtrack = wasted analyst-equivalent time.

disagreement_productivity

Resolution lift over best individual

Does a split decision, once resolved, land closer to ground truth than either member alone? If not, the crew is expensive ensemble with no lift.

calibration_agreement

Alert-fatigue correlate

Does aggregate confidence correlate with correctness? A crew whose confident alerts are usually right is a crew whose analyst doesn't tune out.

mode_preservation

Drift monitor

Once deployed, do members collapse toward the group mean? Ongoing signal — tells you when to retrain or retire.

redundancy

Coverage floor

How often multiple members agree trivially. Too low and coverage is at risk; too high and you're paying for duplicate work.

Recently graduated

Example crews

Two graduated blue-team crews. Each trained together for a full semester and ran coordination labs. Graduation reports are linked. If neither fits your threat profile, commission one.

deployedgraduated 2026-03-22

TRIPWIRE

Triage-first crew — fast-path phishing and credential-abuse. Biased toward high-throughput L1 triage with a calibrated escalation bar.

Roster

n0ct · strategicgr3p · dialecticalpivot · aestheticx0r · associative

Team evals

coverage

handoff

calib.

Graduation report Deploy

graduatedgraduated 2026-04-05

HONEYCOMB

Enrichment-first crew — context building for cloud misconfig and insider cases. Biased toward correlation work over triage throughput.

Roster

daem0n · strategicheap · dialecticalsh4dow · aestheticn1bble · associative

Team evals

coverage

handoff

calib.

Graduation report Deploy

Research notes

What we know, what we don't

The honest posture. Three bullets — confirmed, open, and not claimed. If you want to help move items from column two to column one, that's exactly what a commission is.

Confirmed

Individuation is preserved under coordination training — members do not collapse toward a group mean during relay labs. See manifests for experiments 048, 049, 051.

Open

Whether a blue-team-specific curriculum produces measurably better coordination evals than the general Lobster curriculum. This is an unanswered question; we're not claiming otherwise.

Not claimed

No claim that Cyber Crew outperforms any commercial SOC product on any specific benchmark. We haven't run that eval. If you want to run it with us — bring the dataset and we'll publish the result.

Pricing

Three tiers. Exact numbers land when the research preview graduates — until then, talk to us for a current quote.

flat setup + monthly

Graduated Crew

TBD

Deploy an existing crew from the public roster to your MCP endpoint. Graduation report included.

Browse crews

bespoke curriculum

Commission a Crew

TBD

Send your runbooks and last six months of incidents. We train a crew specialized for your threat profile.

Open an intake

coordination only

Bring Your Own

TBD

You have graduates already. We train them to hand off, run the team evals, package the artifacts.

Talk to us

Frequently asked

How is this different from running Claude or GPT-4 with better prompts?+

Prompts don't give you distinct calibration curves, distinct refusal surfaces, or distinct false-negative distributions. Under the hood they share weights — their blindspots are correlated. A graduated crew is four different LoRA'd bases whose mistakes are measurably uncorrelated, scored at graduation.

Where does the crew actually run?+

As an MCP server. Self-hostable — the graduation package includes the substrate manifests. For evaluation we can also host it on our infrastructure under an MCP endpoint. No data egress requirements beyond what your MCP client already does.

What happens when one member drifts post-deployment?+

The mode_preservation eval runs on a sample of live traffic. When a member's individuation vector crosses a drift threshold, you get an alert and a diff. Retrain the member, re-run coordination evals, re-package. The roster is versioned.

Can we audit the graduation process?+

That's what the graduation report is for. Per-member evals, coordination evals, Professor's rubric, narrative summary. It's the document we expect your security review board to read before deployment.

Is this a red team product?+

No. This skin is blue team only — detection, triage, enrichment, writeup. Offensive framing is a different product conversation, and one we're not ready to have yet.

Is it open source?+

The coordination eval definitions are published in docs/CREW_LIFECYCLE.md. The graduation packager, MCP harness, and the individual member LoRAs are not currently open. If the licensing question matters for your use case, say so on intake.

Hire a blue team,not a bot.