## Issue Triage — 2026-05-06

### 1) Cloud: App-scoped chat returns **500** for auth failures (API-key callers)
- **Issue Title & ID:** `cloud/apps/api: /api/v1/apps/:id/chat masks AuthenticationError/ForbiddenError as 500` (PR elizaos/eliza **#7376**, needs follow-up issue)
- **Current Status:** **Open (regression risk)** — PR merged; behavior likely in `develop` unless patched post-merge.
- **Impact Assessment:**
  - **User Impact:** **High** (any app using API keys / programmatic access; breaks client retry/handling)
  - **Functional Impact:** **Partial** (endpoint works but fails incorrectly; callers can’t distinguish auth vs server fault)
  - **Brand Impact:** **High** (appears as instability/outages rather than auth misconfig)
- **Technical Classification:**
  - **Category:** Bug / UX (API contract)
  - **Component:** Cloud API (Hono routes, auth middleware)
  - **Complexity:** **Simple fix**
- **Resource Requirements:**
  - **Required Expertise:** Cloud API, Hono error handling, auth middleware conventions
  - **Dependencies:** None, but should align with global middleware behavior for API key bypass
  - **Estimated Effort:** **2/5**
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Refactor route to call `requireAuthOrApiKeyWithOrg` **outside** `Promise.all`, and/or add explicit `catch` mapping `AuthenticationError → 401`, `ForbiddenError → 403`.
  2. Add unit/integration test: invalid API key must return `401/403` with stable error codes (not `500`).
  3. Audit other new Cloud routes introduced in #7376 for similar `Promise.all`/catch masking.
- **Potential Assignees:** **standujar** (Cloud auth fixes), **NubsCarson** (author area), **0xSolace** (stability hotfixes)

---

### 2) Cloud: Credit reconciliation failure modes enable “free inference” or “charged w/o response”
- **Issue Title & ID:** `cloud/apps/api: app chat credit reconciliation inconsistent in streaming & non-streaming` (PR elizaos/eliza **#7376**, needs follow-up issue)
- **Current Status:** **Open (financial correctness)** — reported in automated review notes; not confirmed fixed.
- **Impact Assessment:**
  - **User Impact:** **Critical** (billing/credits correctness affects all monetized apps)
  - **Functional Impact:** **Yes** (monetized chat economics unreliable; can cause loss or user disputes)
  - **Brand Impact:** **High** (trust + billing integrity)
- **Technical Classification:**
  - **Category:** Bug / Security-adjacent (abuse potential)
  - **Component:** Cloud API billing/credits, app-scoped chat route
  - **Complexity:** **Complex solution** (needs carefully designed transaction boundaries)
- **Resource Requirements:**
  - **Required Expertise:** Payments/credits ledger design, streaming response lifecycle, idempotency
  - **Dependencies:** Billing service semantics; provider gateway response handling
  - **Estimated Effort:** **5/5**
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Define invariants:
     - If content is delivered, user must be charged at least actual cost (or documented promotional floor).
     - If reconciliation fails after provider success, system must return content and queue reconciliation retry (or escrow).
  2. Implement idempotent “debit → finalize/reconcile” state machine with durable outbox/job retry.
  3. For **streaming**: prevent “refund on post-stream failure” by isolating reconciliation errors from response completion; record “delivered=true” before close.
  4. For **non-streaming**: if provider response is already obtained, return it even if reconciliation fails; separately issue reconciliation retry or safe fallback.
  5. Add abuse tests (simulate DB outage at each await point) + chaos tests for streaming close timing.
- **Potential Assignees:** **NubsCarson** (domain/monetization path owner), **standujar** (Cloud reliability), plus someone with billing depth (Cloud billing maintainers)

---

### 3) Cloud Domains: `/sync` never sets `verified=true` after Cloudflare provisioning → broken CORS origins
- **Issue Title & ID:** `cloud/apps/api: domains sync does not mark verified true; CORS origin list stays empty` (PR elizaos/eliza **#7376**, needs follow-up issue)
- **Current Status:** **Open**
- **Impact Assessment:**
  - **User Impact:** **High** (apps on purchased/managed domains can remain unusable due to CORS)
  - **Functional Impact:** **Yes** (blocks core “monetized app domains” feature)
  - **Brand Impact:** **High** (domains appear “active” but apps still fail)
- **Technical Classification:**
  - **Category:** Bug
  - **Component:** Cloud API managed domains + CORS origin derivation
  - **Complexity:** **Moderate effort**
- **Resource Requirements:**
  - **Required Expertise:** Cloudflare domain lifecycle, Cloud managed domain DB schema, CORS configuration pipeline
  - **Dependencies:** Domain status polling + DB syncStatus contract
  - **Estimated Effort:** **3/5**
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Patch `/domains/sync` to set `verified=true` when status transitions to active and zone is ready.
  2. Add regression test: purchased while pending → later sync → `verified=true` and origin appears in `listVerifiedAppOrigins`.
  3. Add an operational dashboard indicator: “Active but not verified” with remediation steps.
- **Potential Assignees:** **standujar**, **NubsCarson**

---

### 4) Cloud Domains: Domain purchase flow can consume credits without refund on DB write failure
- **Issue Title & ID:** `cloud/apps/api: domain buy is not atomic vs credits; failure after registrar success can strand credits` (PR elizaos/eliza **#7376**, needs follow-up issue)
- **Current Status:** **Open**
- **Impact Assessment:**
  - **User Impact:** **High** (direct monetary/credit loss)
  - **Functional Impact:** **Partial** (feature works but with unacceptable failure behavior)
  - **Brand Impact:** **High**
- **Technical Classification:**
  - **Category:** Bug / Reliability
  - **Component:** Cloud billing + managed domains purchase orchestration
  - **Complexity:** **Complex solution** (distributed transaction problem)
- **Resource Requirements:**
  - **Required Expertise:** Billing ledger, compensating transactions, idempotency keys, Cloudflare registrar API
  - **Dependencies:** Billing service refund semantics; registrar purchase idempotency
  - **Estimated Effort:** **4/5**
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Add idempotency key per purchase attempt and persist “purchase intent” before charging.
  2. If registrar succeeds and DB fails, execute compensating refund OR enqueue “reconcile purchase” job to complete DB state and avoid double-charge.
  3. Add test harness for simulated DB failure post-registrar and verify credit outcome.
- **Potential Assignees:** **NubsCarson**, **standujar**

---

### 5) Cloud Container Control Plane: Internal auth becomes a no-op when token env var missing
- **Issue Title & ID:** `cloud/services/container-control-plane: requireInternalToken no-op when CONTAINER_CONTROL_PLANE_TOKEN unset` (PR elizaos/eliza **#7376**, needs follow-up issue)
- **Current Status:** **Open (security hardening)**
- **Impact Assessment:**
  - **User Impact:** **Medium → High** (depends on deployment exposure; could become Critical if publicly reachable)
  - **Functional Impact:** **Partial** (service works but may be unsecured)
  - **Brand Impact:** **High** (security posture)
- **Technical Classification:**
  - **Category:** Security
  - **Component:** Cloud container control plane service + API forwarder
  - **Complexity:** **Moderate effort**
- **Resource Requirements:**
  - **Required Expertise:** Cloud deployment, internal service auth patterns, secret management
  - **Dependencies:** Deployment pipeline must ensure token is always set
  - **Estimated Effort:** **3/5**
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Make token **mandatory in production**: fail fast on boot if env missing (or gate by `NODE_ENV/DEPLOY_ENV`).
  2. Ensure Cloud API forwarder also enforces internal auth expectations and logs misconfig loudly.
  3. Add CI/deploy checks: required env var present for prod/staging.
- **Potential Assignees:** **0xSolace** (hardening), **standujar** (Cloud), **NubsCarson**

---

### 6) Slack plugin: Unhandled Slack API error in `getUser()` can silently drop inbound messages
- **Issue Title & ID:** `@elizaos/plugin-slack: missing try/catch around users.info in event handlers causes message loss` (PR elizaos/eliza **#7375**, needs follow-up issue)
- **Current Status:** **Open** — plugin migrated and merged; failure is on critical inbound path.
- **Impact Assessment:**
  - **User Impact:** **High** (any Slack workspace with rate limits, deactivated users, transient API faults)
  - **Functional Impact:** **Yes** (messages dropped; agent appears unreliable)
  - **Brand Impact:** **High** (connector quality perception)
- **Technical Classification:**
  - **Category:** Bug / Reliability
  - **Component:** Plugin System → Slack connector (`plugins/plugin-slack/src/service.ts`)
  - **Complexity:** **Simple fix**
- **Resource Requirements:**
  - **Required Expertise:** Slack Bolt/Socket Mode event handling, connector resilience patterns
  - **Dependencies:** None
  - **Estimated Effort:** **2/5**
- **Recommended Priority:** **P1** (P0 if Slack connector is actively marketed this week)
- **Specific Actionable Next Steps:**
  1. Wrap `getUser()` calls in `try/catch`; on failure, fall back to minimal identity (userId) and continue processing.
  2. Add structured logging with rate-limit detection and safe retry/backoff (but don’t drop the event).
  3. Add unit/integration test: simulate Slack API error → message still stored + agent still replies.
  4. (Hygiene) fix `package.json.repository.url` and remove unused deps if confirmed.
- **Potential Assignees:** **2-A-M** (connector patterns), **0xSolace** (stability), **lalalune** (repo hygiene)

---

### 7) Discord Security/Ops: Active scam links around migration/bridging → need official guidance + automated enforcement
- **Issue Title & ID:** `Discord: recurring scam links targeting "ai16z→ElizaOS migration" and "BSC→Solana bridging"` (Discord incident thread **2026-05-05**, no GitHub issue yet)
- **Current Status:** **Ongoing**
- **Impact Assessment:**
  - **User Impact:** **Critical** (potential fund loss; affects broad community)
  - **Functional Impact:** **No** (not framework runtime), but impacts ecosystem safety/support load
  - **Brand Impact:** **High**
- **Technical Classification:**
  - **Category:** Security / UX (community safety)
  - **Component:** Community Ops (Discord moderation), Documentation site, Announcement channels
  - **Complexity:** **Moderate effort**
- **Resource Requirements:**
  - **Required Expertise:** Community moderation, phishing/scam patterns, Discord automod/bots, official comms
  - **Dependencies:** Final canonical migration/bridge stance + supported links list
  - **Estimated Effort:** **3/5**
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Publish a canonical “Migration & Bridging” notice: **what is supported**, **what is not**, and **official links only**.
  2. Pin message in relevant channels + add to website docs/FAQ; include explicit warning: “No DMs, no airdrops, no ‘bridge now’ links.”
  3. Add/adjust Discord automod rules: block common scam domains/phrases; rate-limit new accounts posting links; rapid-ban workflow.
  4. Create a single “report scam” procedure for users; log incidents for trend analysis.
- **Potential Assignees:** **odilitime** (Community Ops/Moderator), **satsbased** (Mini Mod), **shawmakesmagic** (official comms sign-off)

---

### 8) Discord UX/Safety: Spam filter blocks legitimate URLs; workaround is backticks
- **Issue Title & ID:** `Discord: aggressive URL filtering blocks legitimate links; users forced to wrap URLs in backticks` (Discord **2026-05-03** report)
- **Current Status:** **Ongoing**
- **Impact Assessment:**
  - **User Impact:** **Medium** (hurts support/dev collaboration; increases friction)
  - **Functional Impact:** **Partial** (support workflows degraded)
  - **Brand Impact:** **Medium** (community feels “hostile” to normal sharing)
- **Technical Classification:**
  - **Category:** UX / Community Ops
  - **Component:** Discord moderation tooling/config
  - **Complexity:** **Moderate effort**
- **Resource Requirements:**
  - **Required Expertise:** Discord automod configuration, bot configuration, threat modeling
  - **Dependencies:** Must not reduce scam protection from Issue #7
  - **Estimated Effort:** **2/5**
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Implement allowlist for common safe domains (github.com, docs, discord CDN, elizaos.*).
  2. Channel-scoped rules: stricter in public channels, looser in dev/support channels with role gating.
  3. Improve user-facing feedback: when a link is blocked, DM user with reason + safe alternatives.
- **Potential Assignees:** **odilitime**, **satsbased**

---

## Highest-Priority Focus (Top 5–10)
1. **P0:** Cloud app chat auth failures return **500** instead of **401/403** (PR **#7376** follow-up).
2. **P0:** Cloud app chat credit reconciliation can enable **free inference** or **charge-without-response** (PR **#7376** follow-up).
3. **P0:** Cloud domains sync never sets `verified=true` → persistent CORS failures (PR **#7376** follow-up).
4. **P0:** Cloud domain purchase can strand credits on partial failure (PR **#7376** follow-up).
5. **P0:** Container control-plane internal auth is a no-op if token env missing (PR **#7376** follow-up).
6. **P0:** Discord scam-link wave around migration/bridging → publish canonical guidance + automod enforcement.
7. **P1:** Slack connector can silently drop messages on `users.info` error (PR **#7375** follow-up).
8. **P2:** Discord spam filter blocks legitimate URLs; refine allowlist + channel policies.

---

## Patterns / Themes Suggesting Deeper Issues
- **Merged high-risk Cloud changes without blocking on P1 findings:** Multiple monetization-path correctness and auth-contract bugs appear after merge, indicating insufficient “must-fix-before-merge” gates for billing/auth/CORS.
- **Distributed transaction problems (billing + external providers):** Domain purchase and chat reconciliation both need explicit idempotency + compensating actions; current patterns risk stranded state and disputes.
- **Connector reliability gaps on inbound message path:** Slack (and historically Telegram) show a recurring class of issues: unhandled upstream API failures causing **silent message loss**.
- **Security pressure shifting to community tooling:** Active scam attempts plus aggressive URL blocking suggests the project needs a more formalized, layered security posture (docs + automod + role gating), not ad-hoc workarounds.

---

## Process Improvement Recommendations
1. **Introduce “Billing/Auth/CORS release gates”:** Any PR touching monetization, auth, or CORS must pass a checklist and have at least one reviewer with domain expertise; P1 findings must block merge.
2. **Add chaos/failure-injection tests for monetized chat:**
   - DB outage during reconcile, provider timeout after debit, stream close timing, retry idempotency.
3. **Standardize connector resilience contract:**
   - “Never drop inbound message due to upstream metadata fetch failure.”
   - Require try/catch around external API calls inside event handlers + fallback identity behavior.
4. **Create a single canonical “Token Migration & Bridging” doc page** and require moderators/helpers to link only that page; rotate and pin it during high-scam periods.
5. **Discord automod tiering:** role-gated link posting in public channels; allowlist in dev channels; explicit block reason messaging to reduce support churn.