Living Processes
Always-on agents and services where the running process IS the product. Vocabulary discipline: see VOCABULARY.md. Reference: METHOD.md for full methodology.
Definition
A LivingProcess is a first-class entity for an always-on agent or service whose product IS the running process. Cortex Nexus's inference gateway. Jungle Obsidian's registry. The Council-tier Webmaster agent. JARVIS at emailme. For these clans, the running state is part of what the clan owns and tracks — and the methodology needs an entity that represents it.
The primitive emerged from Council-tier clans. Lower-tier clans typically don't need it — they ship Outcomes against a static product, deploy artifacts that sit there, and the running thing is plumbing. Council-tier clans are different. Cortex Nexus is not a product Cortex ships; it is the product. Jungle Obsidian is not a feature of Jungle; it is Jungle's identity layer running right now. The audit caught this distinction and the v7.0 decision promoted it.
A LivingProcess is not an Outcome. Outcomes describe changes — discrete pieces of work that ship and complete. A LivingProcess describes a persistent thing that exists between changes. When Cortex deploys a new version of Nexus, the deploy is an Outcome. Nexus itself is not the Outcome — Nexus is the entity the Outcome modifies. Outcomes that touch a LivingProcess link TO it via linked_outcome_ids. They don't replace it.
Canopy does not replicate observability tools. Grafana, Datadog, CloudWatch, and custom dashboards exist and do their job well. The LivingProcess record holds NAME, STATE, and LINKS OUT to those tools. It does not ingest metrics. Canopy is the entity layer; observability lives where observability already lives. The external_monitoring_links field is intentionally a list of URLs, not a metrics stream. Users click out to see live dashboards; the LivingProcess entity is the connective tissue between the methodology and the running world.
LivingProcesses don't move through Flow stages. They have a current_state (running / stopped / restarting / crashed / unknown) that updates from external signals — typically a sidecar process or deploy pipeline writing back. The state is observed, not declared.
Field Table
| Field | Type | Required | Default | Description | Notes |
|---|---|---|---|---|---|
| id | uuid | yes | — | Unique identifier | — |
| name | string | yes | — | Human-readable name (e.g., "Cortex Nexus") | Max 200 chars |
| type | enum | yes | — | Process type | See type enum below |
| runtime_location | text | yes | — | Where it runs (e.g., "Fargate cluster cortex-prod", "EC2 i-xxx") |
Free-form |
| current_state | enum | yes | unknown | Operational state | See current_state enum below |
| last_deploy_at | timestamp | no | null | When the running version was deployed | — |
| last_deploy_version | string | no | null | Git SHA or version string of running build | — |
| consumption_metrics | jsonb | no | {} |
Free-form metrics (e.g., tokens/day, requests/min) | Updated externally, periodically |
| health_status | enum | no | unknown | Computed from external monitoring | See health_status enum below |
| health_last_checked_at | timestamp | no | null | When the most recent health check was recorded via recordHealthCheck. |
Stamped server-side on every check |
| external_monitoring_url | text | no | null | Single URL pointing OUT to Grafana / Datadog / CloudWatch / etc. Rendered as a button on the detail page. | Canopy does not replicate observability; it links |
| (links) | M:M | no | — | Outcomes linked via the living_process_outcome_links junction table (Phase 15 / LIV-03). |
See "Relationships and link tables" below |
| owner | uuid | yes | — | Single accountable human | — |
| created_at | timestamp | yes | now() | — | — |
Relationships and link tables
The relationship between LivingProcess and Outcome is many-to-many via the living_process_outcome_links junction table (Phase 15 / CONTEXT.md decision). Many Outcomes typically affect one LivingProcess (config tweaks, capacity work, version upgrades); one Outcome may touch multiple LivingProcesses during a cross-process refactor (rare but supported).
Link table columns: id, workspace_id (RLS), living_process_id, outcome_id, created_at. Unique on (living_process_id, outcome_id).
Phase 15 reconciliation: the original draft included linked_outcome_ids and linked_invariant_ids as uuid arrays on the LivingProcess row. The shipped v7.0 form uses a dedicated link table for Outcomes (matching the Invariant / Initiative ↔ Outcome shape from Phase 13/14). Invariant linkage to LivingProcess is intentionally deferred — when SLO Invariants are linked to Outcomes that also link to LivingProcesses, the existing invariant_outcome_links table is the bridge. Direct Invariant ↔ LivingProcess linkage is not part of v7.0 scope.
The owner field is intentionally omitted in v7.0 (CONTEXT.md <deferred> covers per-process notification rules; the workspace's existing owner/admin roles cover accountability for now).
type enum
| Value | Meaning |
|---|---|
agent |
A Canopy Alpha process — the always-on owner-of-record for a clan. (Was alpha-agent in the original draft; renamed for snake_case enum compatibility.) |
service |
A long-running production service (web server, API gateway, worker). (Was service-daemon.) |
scheduler |
A periodically-firing process (cron, scheduled lambda). (Was scheduled-job.) |
webhook_handler |
A reactive process that wakes on external events (queue consumer, webhook receiver). (Was event-listener.) |
nexus |
A messaging / routing always-on agent that brokers cross-clan or cross-system communication. (New in shipped v7.0.) |
current_state enum
| Value | Meaning |
|---|
| running | Process is alive. Health detail (healthy / warning / critical) lives in health_status. |
| degraded | Process is running but with reduced capacity, elevated error rate, or known partial failure. Distinct from health_status=warning — degraded is an operational fact, warning is a monitoring read. |
| stopped | Process is intentionally not running (scheduled downtime, retired version) OR has crashed and not been restarted. |
| unknown | State could not be determined. Treated as soft failure. |
Phase 15 reconciliation: the original draft listed 5 values (running / stopped / restarting / crashed / unknown). The shipped v7.0 form is 4 values (running / degraded / stopped / unknown). restarting and crashed are transient or terminal sub-states better surfaced via external monitoring + the recent-deploy timeline; degraded is the operationally useful middle state that captures "running but unhappy". See VOCABULARY.md ### LivingProcess.current_state enum values.
health_status enum
| Value | Meaning | Severity rank |
|---|
| healthy | External monitoring reports healthy. SLOs met. | 1 |
| warning | Elevated latency, error rate, or capacity pressure. Approaching SLO breach. | 2 |
| critical | External monitoring reports failing. Attention Alert raised (Signal layer). | 3 |
| unknown | Monitoring data unavailable or stale. | 0 |
Phase 15 reconciliation: the original draft used healthy / degraded / unhealthy / unknown. The shipped v7.0 form uses the standard severity ladder healthy / warning / critical / unknown to match what external monitoring tools (Grafana / Datadog) display. The router emits a living_process.health_change Signal event only when severity worsens — the severity rank above is the comparison key. See VOCABULARY.md ### LivingProcess.health_status enum values.
external_monitoring_url semantics
A single URL string. The detail page renders it as an outbound button ("Open in monitoring ↗"). When NULL, no button is shown.
Phase 15 reconciliation: the original draft proposed a jsonb array of {label, url, type} objects. The shipped v7.0 form is a single text URL — simpler, sufficient for the dominant single-dashboard case, and easy to extend in a later milestone if multi-dashboard linking becomes a real need. The original deferral notes are captured in .canopy/phases/15-living-process-primitive/15-CONTEXT.md (<deferred>).
This table is the source of truth. Phase 15 derives the Zod schema in @canopy/shared directly from this table.
Examples
Deep example: Cortex Nexus
Cortex Nexus is the always-on inference gateway for the Cortex clan. It receives 9.45M tokens/day from JARVIS alone, routes requests to the appropriate model backend, and maintains the plugin sandbox. The Cortex team logged ~100 "tweak Nexus config" Outcomes over six months. The Outcomes shipped successfully but the framing — "what's the state of Nexus right now?" — was always invisible.
The LivingProcess record:
| Field | Value |
|---|---|
| name | Cortex Nexus |
| type | alpha-agent |
| runtime_location | Fargate cluster cortex-prod, 4 tasks across 2 AZs |
| current_state | running |
| last_deploy_at | 2026-05-15T14:22:00Z |
| last_deploy_version | git-sha-abc123 |
| consumption_metrics | {tokens_per_day: 9450000, p95_latency_ms: 180, gpu_utilization_pct: 67} |
| health_status | healthy |
| external_monitoring_links | [{label: "Cortex Grafana", url: "...", type: "grafana"}, {label: "GPU pool", url: "...", type: "custom"}] |
| linked_outcome_ids | [COR-XX "Cortex Nexus v2 deploy", COR-XX "Add Anthropic model routing"] |
| linked_invariant_ids | [INV-XX "Cortex p95 latency under 200ms"] |
| owner | (Cortex Nexus owner — one human) |
Why isn't this an Outcome? COR-XX (the v2 deploy) is the change. Cortex Nexus is the thing being changed — and the thing that persists between deploys. If you frame Nexus as an Outcome, you have to invent a status that says "running" and a lifecycle that never ends, which is exactly what Outcomes are not for. The LivingProcess primitive holds the persistent identity; Outcomes describe each discrete change to it.
Why doesn't Canopy replicate Grafana? Grafana already does it. Canopy's role is "here's the entity, here's where to look." The consumption_metrics JSON gets updated periodically (a sidecar writes back every 5 minutes or on deploy) so the entity carries a recent snapshot, but querying real-time data is via the external link. Replicating the metric pipeline inside Canopy defeats both tools.
Audit reference: VALIDATION.md Council-tier section; v7.0 decision in DECISIONS-RESOLVED.md#D-08.
Short examples
Jungle Obsidian. Registry / identity always-on agent inside the Jungle council.
type: alpha-agent.runtime_location: Jungle's hosting environment. The clan-registry data IS Obsidian's product. Outcomes that add a new clan, change a policy, or update a permission link to Obsidian vialinked_outcome_ids.Jungle Nexus. Messaging / routing always-on agent — the backbone for CrossClanDependency sync.
type: alpha-agent. SLO Invariants (sync latency, message-delivery success rate) link vialinked_invariant_ids. When Nexus health degrades, every dependent clan sees the Attention Alert because their CrossClanDependencysync_statusflips topendingorfailed.Jungle Webmaster. Council-tier Webmaster agent.
type: alpha-agent.runtime_location: a Fargate task whose restart cadence has caused incidents — documented inVALIDATION.md. Thelast_deploy_at+last_deploy_versionfields exist specifically to support post-incident reconstruction: "what was running when this broke?"
Anti-patterns
Anti-pattern: Tracking config changes as detached Outcomes
Cortex logged ~100 "tweak Nexus config" Outcomes over six months. The Outcomes shipped successfully — that's not the problem. The problem is that the framing — "what's the state of Nexus right now, with all those tweaks compounded?" — was always invisible. Reviewers reading the Outcome log saw 100 small commits. They could not see the entity those commits modified.
The correct framing: the LivingProcess (Cortex Nexus) is the persistent entity. Outcomes link TO it via linked_outcome_ids. The LivingProcess record shows current_state + last_deploy_at + last_deploy_version + health_status so the question "what's the state of Nexus right now?" is answerable directly. The Outcome history becomes the audit trail, not the primary view.
Anti-pattern: Replicating Grafana inside Canopy
Early Council-tier proposals included ingesting metrics into a Canopy-side dashboard — building charts, alerting, the whole observability surface. That defeats both tools. Grafana already does it better, with deeper time-series support, richer alerting rules, and integrations with the underlying telemetry pipelines. Canopy's role isn't observability; Canopy's role is the entity layer that the observability tools annotate.
The correct framing: external_monitoring_links carries the URLs to the right Grafana / Datadog / CloudWatch / custom-dashboard view. Users click out. The LivingProcess record is the connective tissue, not the dashboard.
Anti-pattern: Deploying without tracking deploy lineage
VALIDATION.md notes Webmaster's Fargate restarts have caused incidents. Without last_deploy_version + last_deploy_at on the LivingProcess record, post-incident review can't reconstruct "what was running when this broke?" Investigators have to grep deploy logs, hope the deploy tool's history hasn't rolled over, and reconstruct timelines from CloudWatch.
The correct framing: every deploy updates the LivingProcess record. The deploy pipeline writes last_deploy_version and last_deploy_at as part of release. The timeline of versions is queryable from the entity, not from external logs that may not exist.
Cross-refs
- METHOD.md — LivingProcess joins the 9 method primitives
- OUTCOMES.md — Outcomes that modify a LivingProcess link via
linked_outcome_ids - INVARIANTS.md — SLO commitments on a LivingProcess are Invariants
- FLOW-AND-SIGNAL.md — LivingProcess health degradations surface as Attention Alerts
- VOCABULARY.md — preferred terms
- PORTFOLIO.md — clan context for Cortex Nexus, Jungle Obsidian/Nexus/Webmaster, JARVIS
docs/methodology/validation/VALIDATION.md— Council-tier audit evidencedocs/methodology/validation/DECISIONS-RESOLVED.md— D-08 origin decision
Relationship to Phase 15
This doc is the contract Phase 15 (LivingProcess Primitive) must satisfy. The Field Table is the source of truth for:
- The Zod schema in
@canopy/shared/living-process.ts - The Drizzle table in
@canopy/db/schema/living_process.ts - The tRPC procedures in
@canopy/api/router/living-process.ts(CRUD + state-update mutations) - The LivingProcess UI in
apps/web/app/[workspace]/processes/— process detail with deploy timeline, external-monitoring link card, and linked Outcomes / Invariants - The deploy-pipeline integration spec for writing back
last_deploy_at/last_deploy_version/current_state
The pipeline write-back contract is intentionally minimal — a single PATCH endpoint accepting the four updateable fields — to keep the integration surface tiny for clans onboarding their deploy pipelines.