## Issue Triage — 2026-05-12

### 1) Cloud: App-scoped chat endpoint misclassifies auth failures as HTTP 500 (API-key callers)
- **Issue Title & ID:** `cloud/apps/api` — `/api/v1/apps/:appId/chat` returns 500 instead of 401/403 on invalid API key (Follow-up to **PR #7376**)
- **Current Status:** Unconfirmed fix; flagged in PR review notes as a P1; needs validation in `develop`/production
- **Impact Assessment:**
  - **User Impact:** **High** (all API-key based clients for monetized apps; breaks correct retry/UX behavior)
  - **Functional Impact:** **Partial** (endpoint works, but errors become opaque; client logic may fail)
  - **Brand Impact:** **High** (auth failures presented as “internal server error” looks unreliable)
- **Technical Classification:**
  - **Issue Category:** Bug
  - **Component Affected:** Cloud API / Auth middleware / Apps Chat route
  - **Complexity:** Moderate effort (error handling + tests)
- **Resource Requirements:**
  - **Required Expertise:** Cloud API (Hono/Next route handlers), auth middleware patterns, error mapping conventions
  - **Dependencies:** None, but should align with global auth middleware behavior for API-key paths
  - **Estimated Effort (1-5):** 3
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Reproduce with an invalid/expired API key against `/api/v1/apps/:id/chat` and confirm response status/body.
  2. Refactor route to ensure `AuthenticationError`/`ForbiddenError` thrown inside `Promise.all` are caught and mapped to **401/403** (not swallowed by outer catch-all).
  3. Add unit/integration tests asserting:
     - invalid key → 401 (`code: not_authenticated` or equivalent)
     - wrong org/app ownership → 403
  4. Confirm logging does not leak key material and error codes are stable for SDK consumers.
- **Potential Assignees:** **NubsCarson** (Cloud apps/domains author), **standujar** (cloud auth fixes), **0xSolace** (stability/auth regressions)

---

### 2) Cloud: Credit reconciliation failure can yield “free inference” (streaming) or “charged/no response” (non-streaming)
- **Issue Title & ID:** `cloud/apps/api` — Credits reconciliation error paths inconsistent; refunds/charges incorrect (Follow-up to **PR #7376**)
- **Current Status:** Unconfirmed fix; multiple P1 findings in review notes; needs immediate audit
- **Impact Assessment:**
  - **User Impact:** **Critical** (direct financial correctness for monetized apps)
  - **Functional Impact:** **Yes** (billing integrity is core to monetization)
  - **Brand Impact:** **High** (billing bugs severely damage trust)
- **Technical Classification:**
  - **Issue Category:** Bug / Security-adjacent (abuse potential) / Reliability
  - **Component Affected:** Cloud API / Billing & credits ledger / Streaming response pipeline
  - **Complexity:** Complex solution (transactionality, idempotency, streaming lifecycle)
- **Resource Requirements:**
  - **Required Expertise:** Payments/credits ledger design, streaming HTTP handling, idempotent reconciliation patterns, DB transactions
  - **Dependencies:** May require small schema/logging additions (e.g., reconciliation state), plus consistent error contract
  - **Estimated Effort (1-5):** 5
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Enumerate all billing states for chat: **reserve → deliver → finalize(actual) → refund(delta)**.
  2. Ensure reconciliation is **idempotent** (e.g., keyed by requestId/turnId) and can be retried safely.
  3. Streaming path:
     - If stream has already delivered tokens, do **not** “full refund on reconciliation failure” by default.
     - Persist a “delivery occurred” marker before closing writer; reconcile async if needed.
  4. Non-streaming path:
     - If provider response is already obtained, return it even if reconciliation fails; queue reconciliation retry and/or issue conservative refund policy.
  5. Add tests simulating transient DB failures at each step (reserve success, reconcile failure, provider success).
  6. Add observability: structured logs + metrics for reconciliation failures, refunds issued, and mismatches.
- **Potential Assignees:** **NubsCarson** (feature owner), **standujar** (cloud infra/auth), plus a billing-focused maintainer (Cloud billing package owner)

---

### 3) Cloud: Domain sync never marks Cloudflare domains as `verified: true` after zone provisioning
- **Issue Title & ID:** `cloud/apps/api` — `/api/v1/apps/:id/domains/sync` doesn’t set `verified=true`, breaking verified origins/CORS (Follow-up to **PR #7376**)
- **Current Status:** Unconfirmed fix; flagged in review notes
- **Impact Assessment:**
  - **User Impact:** **High** (apps using managed domains may never become “verified” for CORS/origin allowlists)
  - **Functional Impact:** **Yes** (app domain feature becomes non-functional or intermittently blocked)
  - **Brand Impact:** **High** (domain setup “stuck” is a visible product failure)
- **Technical Classification:**
  - **Issue Category:** Bug
  - **Component Affected:** Cloud API / Managed domains service / CORS origin generation
  - **Complexity:** Simple fix (field propagation) to Moderate (ensure state machine correct)
- **Resource Requirements:**
  - **Required Expertise:** Cloudflare domain lifecycle, managed domain state transitions, CORS/origin logic
  - **Dependencies:** None
  - **Estimated Effort (1-5):** 2
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Add a failing test: domain purchased with `verified=false`, later sync sees Cloudflare `active` → must set `verified=true`.
  2. Patch sync route to update both `status` and `verified` consistently once CF indicates ready.
  3. Validate `listVerifiedAppOrigins` includes the domain after sync.
  4. Backfill: run a one-off migration/script to re-sync existing managed domains stuck unverified.
- **Potential Assignees:** **NubsCarson**, **standujar**

---

### 4) Slack connector: unhandled Slack API errors can drop incoming messages silently
- **Issue Title & ID:** `@elizaos/plugin-slack` — missing try/catch around `users.info` in event handlers causes message loss (Follow-up to **PR #7375**)
- **Current Status:** Newly migrated into monorepo; review notes indicate P1 defect; verify whether follow-up patch exists
- **Impact Assessment:**
  - **User Impact:** **High** (Slack is a major connector; message drops are highly visible)
  - **Functional Impact:** **Yes** (core message ingestion path)
  - **Brand Impact:** **High** (“bot ignores messages” perception)
- **Technical Classification:**
  - **Issue Category:** Bug / Reliability
  - **Component Affected:** Plugin System → Connector Plugins → Slack
  - **Complexity:** Simple fix (guard + fallback) to Moderate (rate-limit handling/backoff)
- **Resource Requirements:**
  - **Required Expertise:** Slack Bolt event lifecycle, connector error handling conventions, rate limit strategies
  - **Dependencies:** None
  - **Estimated Effort (1-5):** 2
- **Recommended Priority:** **P1**
- **Specific Actionable Next Steps:**
  1. Wrap `getUser()` / `client.users.info` in try/catch inside `handleMessage` and `handleAppMention`.
  2. On failure, proceed with minimal identity (use user ID) rather than aborting the handler.
  3. Add tests simulating Slack API error (429 / network) ensuring:
     - memory still created
     - agent still replies (or at least message is not lost)
  4. Add structured warning logs including Slack error code but no sensitive token output.
- **Potential Assignees:** **2-A-M** (plugin migration author), **0xSolace** (stability), Slack plugin maintainers

---

### 5) Security: Possible compromise signal raised in Discord with no recorded follow-up
- **Issue Title & ID:** Discord `#coders` — “was something compromised?” unanswered security concern (Discord log **2026-05-10**)
- **Current Status:** Open question; no incident record in provided logs
- **Impact Assessment:**
  - **User Impact:** **Critical** if real; otherwise low—needs triage to classify
  - **Functional Impact:** **Partial/Yes** (depends on what’s compromised—credentials, bots, CI, packages)
  - **Brand Impact:** **High** (security credibility)
- **Technical Classification:**
  - **Issue Category:** Security
  - **Component Affected:** Community Ops / CI/CD / Package publishing / Discord integrations (unknown until scoped)
  - **Complexity:** Moderate effort (incident triage) to Architectural change (if systemic)
- **Resource Requirements:**
  - **Required Expertise:** Security incident response, GitHub org audit, token rotation, Discord admin
  - **Dependencies:** Needs access to logs/audit trails (GitHub audit, npm publish history, Discord bot logs)
  - **Estimated Effort (1-5):** 3 (triage) / 5 (if confirmed)
- **Recommended Priority:** **P0** (triage immediately; can be downgraded once disproven)
- **Specific Actionable Next Steps:**
  1. Ask reporter (gokumaster64) for specifics: what indicator, which service, timeframe, screenshots/logs.
  2. Run a quick org-level checklist:
     - recent GitHub tokens/Actions secrets changes
     - abnormal npm publishes / version bumps
     - Discord bot token rotations / permission changes
  3. Post a short public status note: “Investigating / No evidence yet / Actions taken” to reduce uncertainty.
  4. If any credential exposure suspected: rotate keys (Discord bots, cloud env, CI secrets) and invalidate sessions.
- **Potential Assignees:** **odilitime** (Community Ops/mod + platform), **0xSolace** (stability), **standujar** (cloud ops/auth), a designated security responder

---

### 6) Discord sandbox agent testing: OAuth invite/whitelist process not documented (blocks safe external agent evals)
- **Issue Title & ID:** Discord `#coders` — OAuth invite/whitelist needed for sandbox research agent testing (Request by **rma_bot**, 2026-05-11)
- **Current Status:** In progress; **odilitime** offered to handle via DM; no documented procedure
- **Impact Assessment:**
  - **User Impact:** **Medium** (affects contributors trying to test integrations)
  - **Functional Impact:** **Partial** (blocks evaluation of multi-agent orchestrator + MCP tooling in official server)
  - **Brand Impact:** **Medium** (appears ad-hoc; unclear onboarding for external devs)
- **Technical Classification:**
  - **Issue Category:** Documentation / Security / Process
  - **Component Affected:** Discord Ops / OAuth / Bot permissions policy
  - **Complexity:** Moderate effort (policy + minimal automation)
- **Resource Requirements:**
  - **Required Expertise:** Discord admin, OAuth app configuration, least-privilege bot permissions, threat modeling
  - **Dependencies:** None, but should align with any broader “plugin verification” or “agent sandbox” policy
  - **Estimated Effort (1-5):** 3
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Create a written “Sandbox Agent Evaluation” checklist:
     - required scopes, required intents, message restrictions, logging, no-DM rule
     - dedicated channel + role permissions template
  2. Define an allowlist flow (who approves, expected turnaround, what artifacts are required).
  3. Add a short security rubric for external agents (data retention, outbound tools, rate limits).
  4. After rma_bot test: capture learnings and decide whether to pursue an Eliza-native port.
- **Potential Assignees:** **odilitime** (Discord/OAuth), **trace.g** (process/project mgmt), **_ky0078** (agent orchestration background)

---

### 7) Community/Brand: Unanswered token/project support question creating uncertainty
- **Issue Title & ID:** Discord `#discussion` — “Is the team still supporting the ElizaOS token…?” unanswered (User **.chomppp**, 2026-05-11)
- **Current Status:** Unaddressed in thread
- **Impact Assessment:**
  - **User Impact:** **Medium** (affects community confidence; can spill into support channels)
  - **Functional Impact:** **No** (not a framework bug)
  - **Brand Impact:** **High** (perceived abandonment/instability narrative)
- **Technical Classification:**
  - **Issue Category:** Documentation / UX (communications)
  - **Component Affected:** Community Ops / Project communications
  - **Complexity:** Simple fix
- **Resource Requirements:**
  - **Required Expertise:** Maintainer communications, governance/roadmap clarity
  - **Dependencies:** Need confirmation of official stance
  - **Estimated Effort (1-5):** 1
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Publish a pinned/FAQ response clarifying token status and where updates will be posted.
  2. If token is out-of-scope for core dev, state so explicitly and point to correct channel/repo.
  3. Add moderation guidance: route finance/token questions to a single canonical statement to avoid rumor loops.
- **Potential Assignees:** **odilitime** (Community Ops), project leads/maintainers with authority to confirm status

---

### 8) Security hygiene: “Scam warning” posted without details; needs capture + response pattern
- **Issue Title & ID:** Discord `#coders` — scam warning without context (User **dieantwoord1337**, 2026-05-09)
- **Current Status:** No follow-up in logs
- **Impact Assessment:**
  - **User Impact:** **Medium** (scams commonly target OSS communities)
  - **Functional Impact:** **No**
  - **Brand Impact:** **Medium** (unaddressed scam reports reduce trust)
- **Technical Classification:**
  - **Issue Category:** Security / Community Ops
  - **Component Affected:** Discord moderation processes
  - **Complexity:** Simple fix (playbook)
- **Resource Requirements:**
  - **Required Expertise:** Discord moderation, phishing/scam pattern recognition
  - **Dependencies:** None
  - **Estimated Effort (1-5):** 2
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Ask reporter for details (usernames, links, screenshots).
  2. If credible, post a short warning with concrete indicators; remove malicious links; ban offenders.
  3. Add an internal “incident note” template for mods to record what happened.
- **Potential Assignees:** **odilitime**, mini-mods (e.g., **satsbased**), helpers

---

## Top Priority Summary (fix/triage immediately)
1. **P0:** Cloud chat auth errors returning **500 instead of 401/403** (PR #7376 follow-up)
2. **P0:** Cloud **credit reconciliation correctness** (free inference / charged-no-response) (PR #7376 follow-up)
3. **P0:** Cloud domain sync never setting **`verified=true`** → CORS/origin breakage (PR #7376 follow-up)
4. **P0:** Investigate potential **security compromise** signal raised in Discord (2026-05-10)
5. **P1:** Slack connector **unhandled `users.info` failures** → silent message loss (PR #7375 follow-up)
6. **P2:** Document/standardize Discord **OAuth allowlist + sandbox agent** evaluation process
7. **P2:** Address **token support** uncertainty with a canonical public statement
8. **P2:** Capture and act on **scam warning** (create repeatable moderation playbook)

---

## Patterns / Themes Indicating Deeper Issues
- **Silent failure modes in connectors** (Slack message drops; historically Telegram had race/message loss): connectors need standardized “never drop inbound message without recording + logging” guarantees.
- **Cloud monetization shipped with multiple correctness hazards** (auth status mapping, billing reconciliation, domain verification): suggests insufficient “financial correctness” and “state machine completeness” review gates for Cloud PRs.
- **Security/process signals not consistently closed-loop** in Discord (compromise question unanswered, scam warning without follow-up): indicates a need for an explicit incident intake and response workflow.

---

## Process Improvement Recommendations
1. **Add “Connector Reliability Contract” tests** for every messaging connector:
   - simulate upstream API error (429/5xx) and assert message is not lost (memory persisted, handler doesn’t crash).
2. **Introduce Cloud “Monetization Readiness” checklist** before merge/deploy:
   - auth status mapping tests, reconciliation idempotency, refund/charge invariants, domain lifecycle state transitions.
3. **Establish a lightweight security incident workflow**:
   - single intake channel/tag, required fields (what/when/where), response SLA, and a short public status update policy.
4. **Improve PR gating for high-risk Cloud changes**:
   - require at least one reviewer with billing/auth expertise; require failure-injection tests for critical paths.
5. **Community comms canonicalization**:
   - pinned FAQ entries for recurring sensitive topics (token support, official accounts, how to verify links) to reduce rumor-driven support load.