## Issue Triage — 2026-05-07

### 1) Cloud monetized app chat: auth errors surfaced as HTTP 500 (should be 401/403)
- **Issue Title & ID:** `cloud/apps api: /api/v1/apps/:id/chat returns 500 on auth failures for API-key callers` (Follow-up from **PR #7376**; needs issue filed)
- **Current Status:** **Untracked** (code merged; regression risk remains)
- **Impact Assessment:**
  - **User Impact:** **High** (all app-scoped API-key clients; common integration pattern)
  - **Functional Impact:** **Yes** (breaks correct client behavior/retries; misleads debugging)
  - **Brand Impact:** **High** (looks like unstable Cloud API)
- **Technical Classification:**
  - **Category:** Bug
  - **Component:** Cloud API / Auth middleware / App Chat endpoint
  - **Complexity:** Moderate effort
- **Resource Requirements:**
  - **Required Expertise:** Cloud API (Hono/Workers & Node), auth error mapping, API contract hygiene
  - **Dependencies:** None, but should coordinate with credit reconciliation fixes in same route
  - **Estimated Effort:** **3/5**
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Create a tracking GitHub issue with minimal repro (invalid API key → expect 401, currently 500).
  2. Refactor route so `requireAuthOrApiKeyWithOrg` executes **outside** `Promise.all`, and/or add explicit `catch` mapping for `AuthenticationError`/`ForbiddenError`.
  3. Add unit/integration tests asserting status codes for: no auth, invalid API key, wrong org, valid key.
  4. Validate that error bodies include stable `code` fields (e.g., `not_authenticated`, `forbidden`) for client handling.
- **Potential Assignees:** **NubsCarson** (PR owner), **standujar** (Cloud auth), **0xSolace** (cloud migration/hotfixes)

---

### 2) Cloud monetized app chat: credit reconciliation failure can refund delivered streams or overcharge users
- **Issue Title & ID:** `cloud/apps api: streaming/non-stream credit reconcile failures cause free inference or charged-without-response` (Follow-up from **PR #7376**; needs issue filed)
- **Current Status:** **Untracked** (code merged; financial correctness risk)
- **Impact Assessment:**
  - **User Impact:** **Critical** (billing/credits correctness affects all monetized usage)
  - **Functional Impact:** **Partial** (chat may work, but billing is incorrect/unreliable)
  - **Brand Impact:** **Critical** (monetization trust & disputes)
- **Technical Classification:**
  - **Category:** Bug / Reliability
  - **Component:** Cloud billing/credits + app chat endpoint
  - **Complexity:** Complex solution (must handle streaming lifecycle + transactional semantics)
- **Resource Requirements:**
  - **Required Expertise:** Payments/billing semantics, streaming response handling, idempotency, DB transaction patterns
  - **Dependencies:** Depends on how credits are reserved/debited (billing service implementation); may require schema/idempotency keys
  - **Estimated Effort:** **5/5**
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. File issue capturing both failure modes:
     - Streaming: response delivered then reconcile fails → full refund
     - Non-stream: provider response obtained but reconcile fails → user charged, no response, no refund
  2. Introduce an **idempotent “usage record”** (or debit ledger entry) keyed by request id; reconcile should be retryable.
  3. For streaming: separate “delivery complete” from “billing complete”; on reconcile failure, **do not** set cost to zero—store a “pending reconcile” job.
  4. For non-stream: if provider succeeded but reconcile fails, return the provider response and mark billing as pending (or attempt immediate safe refund).
  5. Add chaos tests that inject DB failures during reconcile and assert invariants (no free inference unless explicitly policy).
- **Potential Assignees:** **NubsCarson**, **standujar**, plus Cloud billing owner(s) if distinct (loop in maintainers of `cloud/packages/billing`)

---

### 3) Cloudflare managed domain sync: domains never become “verified” → CORS origins remain empty
- **Issue Title & ID:** `cloud/apps api: /domains/sync doesn’t set verified=true after zone provisioning` (Follow-up from **PR #7376**; needs issue filed)
- **Current Status:** **Untracked**
- **Impact Assessment:**
  - **User Impact:** **High** (anyone using managed domains for apps)
  - **Functional Impact:** **Yes** (CORS/origin verification blocks app traffic)
  - **Brand Impact:** **High** (domain feature appears broken)
- **Technical Classification:**
  - **Category:** Bug
  - **Component:** Cloud managed domains / CORS origin resolution
  - **Complexity:** Simple fix
- **Resource Requirements:**
  - **Required Expertise:** Cloud domains service, Cloudflare status model, CORS/origin plumbing
  - **Dependencies:** None
  - **Estimated Effort:** **2/5**
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Add a regression test for the state transition: purchased (pending) → active should set `verified=true`.
  2. Patch `/domains/sync` to set `verified=true` when Cloudflare reports active/zone ready.
  3. Confirm `listVerifiedAppOrigins` uses the same `verified` flag and doesn’t have additional hidden gating.
- **Potential Assignees:** **NubsCarson**, **0xSolace** (cloud hotfix cadence)

---

### 4) Slack connector: unguarded `getUser()` can throw and silently drop inbound messages
- **Issue Title & ID:** `plugin-slack: missing try/catch around users.info in message handlers causes silent message loss` (Follow-up from **PR #7375**; needs issue filed)
- **Current Status:** **Untracked** (plugin merged into monorepo; production risk for Slack users)
- **Impact Assessment:**
  - **User Impact:** **Medium → High** (Slack connector users; drops are intermittent & hard to diagnose)
  - **Functional Impact:** **Yes** (core connector behavior—messages disappear)
  - **Brand Impact:** **High** (connector perceived unreliable)
- **Technical Classification:**
  - **Category:** Bug / Reliability
  - **Component:** Plugin System → `@elizaos/plugin-slack`
  - **Complexity:** Simple fix
- **Resource Requirements:**
  - **Required Expertise:** Slack API error handling, connector robustness, event handler reliability
  - **Dependencies:** None
  - **Estimated Effort:** **2/5**
- **Recommended Priority:** **P1**
- **Specific Actionable Next Steps:**
  1. Add `try/catch` around Slack `users.info` calls in `handleMessage` and `handleAppMention`.
  2. Define fallback behavior when user lookup fails (use `user` id; store memory; still respond if permitted).
  3. Add tests simulating Slack API failures (rate limit / network) and asserting message is still processed (or at least stored with an error marker).
- **Potential Assignees:** **2-A-M** (Slack migration), contributors familiar with connector patterns (e.g., **odilitime** for ops review)

---

### 5) Container control plane auth: internal-token enforcement may be a no-op when env var is absent
- **Issue Title & ID:** `cloud/container-control-plane: requireInternalToken effectively disabled if token env var not set` (Follow-up from **PR #7376**; needs issue filed)
- **Current Status:** **Untracked**
- **Impact Assessment:**
  - **User Impact:** **Medium** (Cloud operators; depends on exposure)
  - **Functional Impact:** **Partial** (service works but security posture weakened)
  - **Brand Impact:** **High** (security expectations for infra plane)
- **Technical Classification:**
  - **Category:** Security
  - **Component:** Cloud services → container control plane
  - **Complexity:** Moderate effort
- **Resource Requirements:**
  - **Required Expertise:** Service-to-service auth, deployment environment validation, secure defaults
  - **Dependencies:** Deployment pipeline changes may be required to ensure token is always set
  - **Estimated Effort:** **3/5**
- **Recommended Priority:** **P1**
- **Specific Actionable Next Steps:**
  1. Change default to **fail closed**: if `CONTAINER_CONTROL_PLANE_TOKEN` missing, refuse to start (or refuse all requests).
  2. Add startup healthcheck that asserts token configured in non-dev environments.
  3. Add a minimal integration test ensuring requests without token are rejected.
- **Potential Assignees:** **standujar**, **0xSolace**, **NubsCarson**

---

### 6) x402 payment methods now default: needs validation, docs, and threat-modeling
- **Issue Title & ID:** `payments: elizaOS + DegenAI added as default payment methods for x402 routes — validate defaults & document configuration` (Ops follow-up; needs issue filed)
- **Current Status:** **Announced in Discord** (implementation assumed done; verification not documented)
- **Impact Assessment:**
  - **User Impact:** **High** (payments touch many downstream builders)
  - **Functional Impact:** **Partial** (could affect monetization flows; misconfig breaks payments)
  - **Brand Impact:** **High** (payments are trust-sensitive)
- **Technical Classification:**
  - **Category:** UX / Documentation / Security-hardening
  - **Component:** Payments / Cloud routes / Plugin integration
  - **Complexity:** Moderate effort
- **Resource Requirements:**
  - **Required Expertise:** Payments integration, API config, security review (signing/authorization), documentation
  - **Dependencies:** Depends on where x402 routes live post-repo consolidation
  - **Estimated Effort:** **3/5**
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Add an end-to-end smoke test (or scripted curl flow) proving default methods work on a clean config.
  2. Document: enabling/disabling default methods, environment variables, expected callbacks/webhooks, failure modes.
  3. Perform a quick threat review: replay prevention, signature validation, and safe logging (no secrets).
- **Potential Assignees:** **odilitime** (payment infra), **shawmakesmagic** (core/integration), **NubsCarson** (Cloud monetization touchpoints)

---

### 7) Repository consolidation + migration off Vercel: missing “single source of truth” docs for ElizaCloud
- **Issue Title & ID:** `docs: ElizaCloud consolidated into github.com/elizaOS/eliza — add navigation + ownership map + deployment guide` (Needs issue filed)
- **Current Status:** **Reported in Discord**; user confusion persists (“where is ElizaCloud now?”)
- **Impact Assessment:**
  - **User Impact:** **Medium** (new contributors/builders onboarding)
  - **Functional Impact:** **No** (not runtime-blocking, but slows adoption)
  - **Brand Impact:** **Medium** (appears disorganized after migration)
- **Technical Classification:**
  - **Category:** Documentation
  - **Component:** Repo structure, Cloud deployment, contribution workflow
  - **Complexity:** Simple fix
- **Resource Requirements:**
  - **Required Expertise:** Repo knowledge, Cloud deploy knowledge, documentation writing
  - **Dependencies:** None
  - **Estimated Effort:** **2/5**
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Add top-level `cloud/README.md` (or expand existing) describing: services, local dev, deploy, where configs live, and what replaced Vercel.
  2. Add a “Where did ElizaCloud go?” section in the main README + Discord pinned message link.
  3. Provide a minimal “container deployment” guide aligned with the new cheaper container approach.
- **Potential Assignees:** **shawmakesmagic**, **0xSolace**, **standujar**

---

### 8) Community deployment questions unanswered: self-hosting cost guidance + Mac mini feasibility + Hyperfy integration status
- **Issue Title & ID:** 
  - `docs: cost expectations for running Eliza as a Twitter bot + hardware guidance (Mac mini)` (Needs issue filed)
  - `integration: does elizaOS still connect to Hyperfy?` (Needs issue filed or discussion thread)
- **Current Status:** **Unanswered on Discord** (coders channel)
- **Impact Assessment:**
  - **User Impact:** **Medium** (recurring onboarding questions)
  - **Functional Impact:** **Partial** (blocks some adoption decisions)
  - **Brand Impact:** **Medium** (support responsiveness perception)
- **Technical Classification:**
  - **Category:** Documentation / UX
  - **Component:** Self-hosting, integrations
  - **Complexity:** Simple fix
- **Resource Requirements:**
  - **Required Expertise:** Ops/self-hosting experience, integration ownership knowledge
  - **Dependencies:** Need authoritative answer on Hyperfy connector status
  - **Estimated Effort:** **1–2/5**
- **Recommended Priority:** **P3**
- **Specific Actionable Next Steps:**
  1. Publish a short “Sizing & Cost” doc: CPU/RAM baseline, typical cloud instance sizes, expected token/LLM cost drivers, storage notes.
  2. Add a “Supported deployments” matrix: Linux server, Windows, macOS (incl. Mac mini), and limitations (GPU optionality, ARM notes).
  3. Confirm Hyperfy connector status (supported/deprecated/experimental) and point to setup steps or deprecation notice.
- **Potential Assignees:** **shawmakesmagic** (project direction), **odilitime** (ops/community), volunteer contributor (**tomy.0315** offered dev help)

---

## Highest-Priority Summary (Top 5–10 to address immediately)
1. **P0:** Cloud app chat auth failures incorrectly returned as **500** (PR #7376 follow-up).
2. **P0:** Cloud app chat **credit reconciliation** failure modes (free inference / charged-without-response) (PR #7376 follow-up).
3. **P0:** Managed domain sync never setting **verified=true** → broken CORS/origins (PR #7376 follow-up).
4. **P1:** Slack connector unguarded `getUser()` → **silent inbound message loss** (PR #7375 follow-up).
5. **P1:** Container control plane internal auth may be effectively **disabled by default** if env var missing (PR #7376 follow-up).
6. **P2:** x402 default payment methods rollout: add **docs + smoke tests + security review**.
7. **P2:** Post-migration documentation: “ElizaCloud is now in eliza repo” + container hosting guide.
8. **P3:** Deployment cost/hardware guidance + Hyperfy integration status clarification.

---

## Patterns / Themes Indicating Deeper Architectural Issues
- **Critical-path changes merged with known P1/P0 findings:** Multiple high-impact items surfaced via review notes on PRs that still landed, implying gaps in merge gating for monetization/auth/connectors.
- **Reliability edge cases around “async boundaries”:** The most severe problems are where requests cross boundaries (Promise.all auth, streaming write-close vs reconcile, external API lookups inside event handlers).
- **Operational clarity lagging behind rapid consolidation:** Repo migration and infrastructure shifts (Vercel → containers) are moving faster than the documentation/support surface, creating repeated user confusion and redundant Discord Q&A.

---

## Process Recommendations (to prevent recurrence)
1. **Introduce “P0/P1 review gate” labels**: if a PR review flags P0/P1, require explicit resolution or a tracked follow-up issue linked in the PR before merge.
2. **Add contract tests for Cloud endpoints**: especially status-code correctness (401/403/429/5xx), plus billing invariants under fault injection.
3. **Connector reliability checklist**: mandate “external API call isolation” (try/catch + fallback) for inbound message handlers across Slack/Telegram/Discord.
4. **Post-migration documentation SLA**: for any infra consolidation, require a single canonical doc update + Discord pinned link in the same week.
5. **Billing/credits idempotency standard**: adopt a uniform ledger/idempotency-key pattern for all monetized endpoints, with replay-safe reconciliation jobs.