Four specialized analysts whose mistakes don't correlate. Graduated, scored, and deployable as one MCP endpoint.
A SOC running one LLM has a correlated blindspot. Anything that breaks that model once breaks every instance of it, every shift, every tenant. Individuation — four different fine-tunes with four different curricula — is defense-in-depth expressed in weight-space, not in shell scripts.
If your L1 analyst is one model, your false-negative rate is its false-negative rate. An individuated crew's false negatives are the intersectionof its members', not the union.
Each cell names the dominant mode that owns that step. The assignments aren't heuristic — they're emitted by the graduation eval. Whichever member scores highest on the per-threat-class sub-eval owns the step, with ties broken by calibration.
| Threat class | Triage | Enrichment | Correlation | Writeup |
|---|---|---|---|---|
| Phishing / BEC | Strategist | Aesthete | Dialectic | Strategist |
| Credential abuse | Dialectic | Associator | Strategist | Dialectic |
| Cloud misconfig drift | Associator | Strategist | Dialectic | Associator |
| Insider data movement | Aesthete | Associator | Strategist | Dialectic |
| Lateral movement | Strategist | Dialectic | Associator | Strategist |
modes: strategic · dialectical · aesthetic · associative — definitions in the orientation harness at training/experiments/shared/orientation_harness.py
Three phases, in order. Each member graduates as themselves before the crew forms. Then they train to coordinate. Then the team ships.
Each member reads a divergent curriculum — one weighted toward threat-intel primary sources, one toward statistical / anomaly literature, one toward adversarial ML, one toward incident retrospectives. Output: four LoRAs, four individuation vectors.
Relay labs — can member B complete member A's triage without backtracking? Disagreement resolution — when two members split on severity, does resolution land closer to ground truth than either alone? Complementary labs — combined coverage > best individual.
Team evals run: coverage diversity, handoff cleanness, disagreement productivity, calibration agreement, mode preservation, redundancy. The six artifacts are packaged to R2. The crew is deployable as one MCP endpoint.
A crew is six versioned, addressable files written to R2 at graduation. Nothing else. Every operational property of the crew — who's on it, how it routes, what it knows — is in one of these six.
Who's on the crew, their individuation vectors, the substrate each member runs on. The file you paste into a change-management ticket.
Which member handles which threat class. Deterministic, inspectable, diffable across crew versions. Matches the coverage matrix above.
The trained runbook. Each handoff specifies the receiving member's expected input format and the sending member's required output fields — a schema your existing SOC runbooks can align to.
The crew's common context. Pin org-specific facts — asset inventory, known-good baselines, prior incidents — without retraining any member.
Per-member evals, team coordination evals, threat-class coverage breakdown. The artifact your security review board will ask for.
The deployment descriptor. Crew runs as an MCP server; this file tells your orchestrator how to route, authenticate, and rate-limit.
Each coordination eval corresponds to a SOC metric an L2 analyst would recognize. The scores aren't marketing — they're in the graduation report, and we'll tell you why a member scored where it did.
Two graduated blue-team crews. Each trained together for a full semester and ran coordination labs. Graduation reports are linked. If neither fits your threat profile, commission one.
Triage-first crew — fast-path phishing and credential-abuse. Biased toward high-throughput L1 triage with a calibrated escalation bar.
Enrichment-first crew — context building for cloud misconfig and insider cases. Biased toward correlation work over triage throughput.
The honest posture. Three bullets — confirmed, open, and not claimed. If you want to help move items from column two to column one, that's exactly what a commission is.
Three tiers. Exact numbers land when the research preview graduates — until then, talk to us for a current quote.
Deploy an existing crew from the public roster to your MCP endpoint. Graduation report included.
Browse crewsSend your runbooks and last six months of incidents. We train a crew specialized for your threat profile.
Open an intakeYou have graduates already. We train them to hand off, run the team evals, package the artifacts.
Talk to usPrompts don't give you distinct calibration curves, distinct refusal surfaces, or distinct false-negative distributions. Under the hood they share weights — their blindspots are correlated. A graduated crew is four different LoRA'd bases whose mistakes are measurably uncorrelated, scored at graduation.
As an MCP server. Self-hostable — the graduation package includes the substrate manifests. For evaluation we can also host it on our infrastructure under an MCP endpoint. No data egress requirements beyond what your MCP client already does.
The mode_preservation eval runs on a sample of live traffic. When a member's individuation vector crosses a drift threshold, you get an alert and a diff. Retrain the member, re-run coordination evals, re-package. The roster is versioned.
That's what the graduation report is for. Per-member evals, coordination evals, Professor's rubric, narrative summary. It's the document we expect your security review board to read before deployment.
No. This skin is blue team only — detection, triage, enrichment, writeup. Offensive framing is a different product conversation, and one we're not ready to have yet.
The coordination eval definitions are published in docs/CREW_LIFECYCLE.md. The graduation packager, MCP harness, and the individual member LoRAs are not currently open. If the licensing question matters for your use case, say so on intake.