The Canopy Method

Issues (Bugs)

Defects, bugs, and broken things. With the v2.0 diagnostic_state vocabulary. Vocabulary discipline: see VOCABULARY.md. Reference: METHOD.md for full methodology.

Issue has been a Canopy primitive since v3.0. This doc is the first canonical methodology spec, expanded with v2.0's diagnostic_state and owner_uncertain fields (D-07, D-12).


Definition

An Issue is a defect, bug, or broken behavior. Filed when something doesn't work as intended — a feature that worked yesterday doesn't work today, an API returns the wrong response, a UI element renders incorrectly, a deploy fails its smoke check. Issues are how the system tracks what is broken; Outcomes are how the system tracks what is being built.

An Issue is not an Outcome. Outcomes describe results that deliver value — "Users can log in reliably." When an Outcome's success criteria fail, the failure surfaces as an Issue. The Outcome that fixes the Issue is separate; it might be the same Outcome reopened, or a new Outcome describing the broader fix. The relationship is captured by linked_outcome_id: an Issue can point at the Outcome whose work introduced or is blocked by the defect.

The v2.0 audit caught a real failure mode in the v3.0 Issue model. The high-level status enum (open / in-progress / resolved / closed) captures lifecycle — where the Issue is in its journey to closure. But it misses the detective work. ROM Mobile filed Issues with internal status notes like "AWAITING DEVICE DETAILS," "HOST-APP CONFIG (not SDK)," "ROOT-CAUSE INVESTIGATION STILL PENDING," "SDK-VS-HOST-APP AMBIGUITY." Six clans shipping SDKs showed the same pattern. The v2.0 diagnostic_state enum captures these states as first-class.

diagnostic_state and status are orthogonal. They move independently. An Issue can be status: in-progress AND diagnostic_state: awaiting_evidence simultaneously — the lifecycle says "we're working on it"; the diagnostic state says "we can't make progress without the user's device logs." Reporting surfaces filter on both: "show me all Issues in awaiting_evidence for more than 7 days."

The owner_uncertain flag (v2.0, D-12) handles the common case where an Issue is filed before ownership is determined. Forcing a premature owner assignment — "assign to SDK team OR host-app team, pick one" — creates wrong-team handoffs and damages cross-team trust. With owner_uncertain: true, the Issue is filed honestly: the symptom is captured, the repro is captured, but ownership is "under investigation." Diagnostic work clarifies ownership; owner gets set then, and owner_uncertain flips to false.


Field Table

Fields marked NEW (v2.0) were added by D-07 and D-12. All other fields existed since v3.0.

Field Type Required Default Description Notes
id uuid yes Unique identifier
title string yes Short summary of the bug Max 200 chars
description text yes Repro steps, expected vs actual, environment
severity enum yes critical / high / medium / low Existing — see severity sub-table
status enum yes open open / in-progress / resolved / closed Existing high-level lifecycle
diagnostic_state enum no triage NEW (D-07): triage / investigating / likely_resolved / awaiting_evidence / misdiagnosed / not_our_bug Finer-grained than status; orthogonal
investigation_notes text no null NEW (D-07): free-form structured context, repro attempts, hypotheses Living working hypothesis, not comment history
owner_uncertain bool no false NEW (D-12): true when SDK-vs-host-app or similar ownership ambiguity exists Diagnostic-in-progress flag
owner uuid no null Single human accountable (may be null if owner_uncertain: true)
reporter uuid yes Who filed the issue
linked_outcome_id uuid no null Outcome whose work introduced or is blocked by this issue
created_at timestamp yes now()
resolved_at timestamp no null When status moved to resolved
tags string[] no []

status enum (existing)

Value Meaning
open Filed; not yet picked up.
in-progress Someone is working on it.
resolved Fix shipped; awaiting closure verification.
closed Verified resolved. Terminal.

diagnostic_state enum (NEW in v2.0)

Value Meaning
triage Just filed, not yet looked at. The default.
investigating Actively root-causing. Hypothesis under test.
likely_resolved Fix shipped; awaiting user confirmation that the symptom is gone.
awaiting_evidence Need logs, device, repro environment, or other external input to confirm or deny a hypothesis.
misdiagnosed Turned out to be a different bug than originally framed. The Issue stays open under its new framing; the misdiagnosis is preserved in investigation_notes so the trail is visible.
not_our_bug Confirmed external party owns this (SDK, third-party service, host app, upstream library). May be closed with link to external tracker.

severity enum (existing)

Severity is the impact dimension for a bug — distinct from priority (Now/Next/Later). A low-severity bug can still be Now priority on the Outcome that fixes it; a critical-severity bug almost always is.

Value Meaning
critical Production-blocking. Wakes someone.
high Affects many users or major functionality; no immediate workaround.
medium Workaround exists or affects a subset of users; fix within the milestone.
low Cosmetic, edge-case, or deferred. Fix when convenient.

This table is the source of truth. Phase 12 derives the Zod schema additions in @canopy/shared directly from this table — the new fields integrate with the existing v3.0 Issue schema.


Examples

Deep example: ROM Mobile — SDK-vs-host-app diagnostic case

ROM Mobile shipped an iOS SDK. A user reported intermittent network failures. The Issue's diagnostic arc — captured under v2.0 vocabulary — is the canonical example of why the new fields matter.

The Issue record at filing:

Field Value (Day 1)
title Network calls failing on iOS only, intermittent
description User reports network calls fail roughly 30% of the time on iOS 17.4, iPhone 14 Pro. iOS 16 and Android unaffected. Failure mode: request returns 0 bytes after ~5s timeout. Expected: 200 OK with payload. No correlation with network type (wifi vs LTE). Failure happens both in app foreground and after backgrounding.
severity high
status open
diagnostic_state triage
owner_uncertain false (initially — we assumed SDK)
owner (ROM Mobile SDK team)
reporter (user-reporting agent)
linked_outcome_id null

Day 2: triage promoted to investigating. diagnostic_state: investigating. status: in-progress.

Day 3: SDK team realizes the failure mode looks like it could be either SDK or host-app config. They flip owner_uncertain: true. owner stays set but the team treats it as provisional.

Day 5: Investigators request device logs from the user. diagnostic_state: awaiting_evidence. The Issue can't progress until the logs arrive.

Day 8: Logs arrive. They show a host-app code path setting an HTTP timeout to 5s that overrides the SDK default of 30s. The SDK is fine; the host app is misconfigured. investigation_notes updated with the full trace.

Day 9: diagnostic_state: misdiagnosed (we thought SDK; the actual bug is host-app config). The Issue stays open under its new framing.

Day 10: Confirmed with host-app team that they own the config. diagnostic_state: not_our_bug. owner: null (host-app team owns; they file their own Issue). status: resolved (from ROM Mobile's perspective). owner_uncertain: false.

Why this matters: v3.0's open → in-progress → resolved → closed would have shown a flat lifecycle with no visibility into the investigation arc. Investigators looking at the Issue mid-arc would not have known if "in-progress" meant "we're root-causing" or "we're waiting for device logs" or "we already think it's misdiagnosed." The diagnostic_state timeline is the audit trail of the detective work. investigation_notes captures the working hypothesis without polluting the comment thread.

Audit reference: VALIDATION.md ROM Mobile section; v7.0 decisions in DECISIONS-RESOLVED.md#D-07 and DECISIONS-RESOLVED.md#D-12.

Short examples

  • likely_resolved + awaiting user confirmation. A fix shipped to production. The reporting user has been notified. status: resolved, diagnostic_state: likely_resolved. The Issue stays open until the user confirms the symptom is gone. If the user reports the symptom persists, diagnostic_state returns to investigating and status returns to in-progress. Common pattern, especially for intermittent bugs where the team can't reproduce.

  • misdiagnosed. A Cortex Issue filed as "Cortex inference latency spike" turned out to be a database connection-pool exhaustion in a downstream service. The Cortex Issue gets diagnostic_state: misdiagnosed; the team opens a new Issue against the downstream service with linked_outcome_id to whatever capacity work fixes it. The original Issue stays in the audit trail with investigation_notes documenting the misdiagnosis.

  • not_our_bug (clean). A third-party API returns 500s on a specific input. Logged for awareness with diagnostic_state: not_our_bug from the start; ownership is unambiguous (the third-party tracker URL goes in investigation_notes). The Issue closes quickly because the team's role is "track it for our incident timeline," not "fix it."


Anti-patterns

Anti-pattern: Using only OPEN/IN_PROGRESS/DONE for SDK debugging

ROM Mobile tried to live with v3.0 status only. Investigators kept losing context — "is this open because we haven't looked at it, or because we're awaiting device logs from the user, or because we already think it's misdiagnosed?" Without diagnostic_state, the status field had to carry too many meanings; reviewers in the Weekly Pulse couldn't tell the difference between "we haven't started" and "we're blocked on the reporter."

The correct framing: status captures lifecycle (open → in-progress → resolved → closed). diagnostic_state captures investigation state. Both are needed. They move independently. Reports filter on both: "show me Issues stuck in awaiting_evidence more than 7 days" is the kind of question v3.0 couldn't answer cleanly.

Anti-pattern: Forcing owner assignment when ownership is uncertain

Filing an Issue and assigning it to a team that turns out not to own the code wastes that team's time and damages cross-team trust. The SDK team gets pinged about a host-app config bug; they triage, eventually find it's not their code, push back; the host-app team sees the bounce and assumes the SDK team is dodging work. Cross-team trust erodes faster than the bug gets fixed.

The correct framing: owner_uncertain: true on initial filing when SDK-vs-host-app or similar ambiguity exists. The Issue is filed with the symptom and repro; ownership is honestly "under investigation." Diagnostic work clarifies ownership; owner gets set then. No team gets pinged about something that isn't theirs.

Anti-pattern: Burying investigation history in comments

Plain debugged a single Issue across 27 comments over 3 weeks. Reconstructing "what did we already try?" required reading every comment in order, including the ones that were dead-ends, including the ones that were back-and-forth about scheduling the next investigation session. The signal-to-noise ratio for someone joining the investigation late was awful.

The correct framing: investigation_notes is the structured field for the live working hypothesis. Comments are for back-and-forth conversation. When someone joins the investigation, they read investigation_notes first — that's the current state of the detective work. Comments are the meeting transcript; investigation_notes is the case file.


Cross-refs


Relationship to Phase 12

This doc is the contract Phase 12 (Outcome / Issue / Milestone Field Additions) must satisfy for the Issue surface. The three new fields — diagnostic_state, investigation_notes, owner_uncertain — get added to the existing v3.0 Issue schema:

  • Zod schema additions in @canopy/shared/issue.ts
  • Drizzle migration adding the three columns to issues
  • tRPC procedure updates in @canopy/api/router/issue.ts for the new fields
  • UI updates in apps/web/app/[workspace]/issues/diagnostic_state dropdown, investigation_notes textarea, owner_uncertain checkbox

Migration safety: all three fields are nullable with defaults that match existing rows (diagnostic_state: triage, investigation_notes: null, owner_uncertain: false). No backfill required.