# Issue Triage — 2026-01-10

## 1) Critical / High Priority Issues

### 1. Discord plugin release blocked by publishing pipeline failure — elizaos-plugins/plugin-discord #40
- **Current Status:** Open, under active investigation (release v1.3.4 blocked)
- **Impact Assessment**
  - **User Impact:** High (all Discord plugin users waiting on fixes)
  - **Functional Impact:** Yes (blocks delivering critical compatibility fixes)
  - **Brand Impact:** High (signals “project can’t ship releases”)
- **Technical Classification**
  - **Category:** Bug / DevOps
  - **Component Affected:** Plugin System (plugin-discord), CI/CD publishing
  - **Complexity:** Moderate effort (could become Complex if registry/auth is misconfigured)
- **Resource Requirements**
  - **Required Expertise:** npm publishing, GitHub Actions, semantic-release/changesets, package provenance/tags
  - **Dependencies:** Unblocks Discord compatibility work reaching users (see v1.7.0/serverId migration issues)
  - **Estimated Effort:** 3/5
- **Recommended Priority:** **P0**
- **Actionable Next Steps**
  1. Reproduce publish in a clean environment (dry-run release + verify package metadata).
  2. Inspect CI logs for auth/token scope, tag collisions, or monorepo versioning mismatch.
  3. Validate package name, access level (public), and dist-tag strategy (latest/next).
  4. Add a “publish preflight” CI job (npm whoami, npm view, version check) to fail fast.
  5. Once fixed, cut v1.3.4 immediately and announce upgrade path for v1.7.0 users.
- **Potential Assignees:** **Stan** (infra/ops), **Odilitime** (Discord/plugin context), **cjft** (cloud/platform ops)

---

### 2. Telegram plugin crashes on certain images — elizaos-plugins/plugin-telegram #23
- **Current Status:** Open (newly identified), not yet patched
- **Impact Assessment**
  - **User Impact:** Medium→High (Telegram is a common entry plugin; crash breaks bots)
  - **Functional Impact:** Yes (hard crash on a normal user action: sending a photo)
  - **Brand Impact:** High (bots crashing is highly visible)
- **Technical Classification**
  - **Category:** Bug
  - **Component Affected:** Plugin System (plugin-telegram), media parsing pipeline
  - **Complexity:** Moderate effort
- **Resource Requirements**
  - **Required Expertise:** Telegram Bot API update types, Node/Bun runtime, image/file handling
  - **Dependencies:** None, but should align with any shared message schema expectations
  - **Estimated Effort:** 3/5
- **Recommended Priority:** **P0**
- **Actionable Next Steps**
  1. Add a minimal reproduction test fixture: “photo” message variant that triggers `TypeError`.
  2. Patch parsing to handle Telegram’s `photo[]` vs `document` vs `file_id` cases safely.
  3. Add runtime guards + structured logging (message subtype, sizes, mime).
  4. Release patch version and add a “known limitations” note in docs until verified broadly.
- **Potential Assignees:** **cjft** (platform integration), **Stan** (stability), plus a Telegram plugin maintainer if available

---

### 3. elizacloud.ai/docs is down (docs outage) — (No GitHub issue logged yet)
- **Current Status:** Reported by community (docs unreachable)
- **Impact Assessment**
  - **User Impact:** High (new users blocked; support load increases)
  - **Functional Impact:** Partial (core runs, but onboarding/documentation is blocked)
  - **Brand Impact:** High (“broken docs” is a credibility hit)
- **Technical Classification**
  - **Category:** Bug / UX
  - **Component Affected:** Documentation hosting / Cloud web
  - **Complexity:** Simple fix → Moderate (depends on DNS/deploy)
- **Resource Requirements**
  - **Required Expertise:** Web hosting, DNS/CDN, deployment pipelines
  - **Dependencies:** Blocks resolving common questions (Twitter API reqs, Discord settings, cloud usage)
  - **Estimated Effort:** 2/5
- **Recommended Priority:** **P0**
- **Actionable Next Steps**
  1. Create a tracking issue in `elizaos/docs` or cloud repo with incident details and timestamps.
  2. Verify DNS, TLS certs, redirect rules, and deploy status; roll back if recent change caused outage.
  3. Add uptime monitoring + status page link in Discord.
  4. Provide temporary fallback link to HackMD book: https://hackmd.io/@elizaos/book
- **Potential Assignees:** **Stan** (cloud ops), **jin** (docs), **cjft** (cloud/dev support)

---

### 4. v1.7.0 Discord “No server ID found 10” / serverId→messageServerId migration fallout — (core fixed via elizaos/eliza PR #6333; plugin release still pending)
- **Current Status:** Core patch merged; Discord plugin compatibility and release still a user pain point
- **Impact Assessment**
  - **User Impact:** High (Discord is a primary integration; multiple reports)
  - **Functional Impact:** Yes (Discord bots fail to function correctly)
  - **Brand Impact:** High (breakage after version bump)
- **Technical Classification**
  - **Category:** Bug / Compatibility
  - **Component Affected:** Core Framework + plugin-bootstrap + plugin-discord
  - **Complexity:** Moderate effort (testing across branches + release coordination)
- **Resource Requirements**
  - **Required Expertise:** Discord integration, schema migrations, release management
  - **Dependencies:** **Blocked by plugin-discord publishing issue #40** to ship fixes
  - **Estimated Effort:** 3/5
- **Recommended Priority:** **P1**
- **Actionable Next Steps**
  1. Define the blessed version matrix: core v1.6.5 vs v1.7.0 and Discord plugin versions.
  2. Add automated integration test for serverId/messageServerId paths using a mocked Discord event.
  3. Publish a short migration note: what changed, symptoms, how to fix (upgrade/downgrade guidance).
  4. After publish pipeline is fixed, cut Discord plugin release and pin in Discord announcements.
- **Potential Assignees:** **Odilitime** (primary), **Stan** (release coordination), **Shaw** (architecture review)

---

### 5. ElizaCloud app creator intermittently returns “operation failed” — (No GitHub issue referenced)
- **Current Status:** Reported by users; devs say “works but early”
- **Impact Assessment**
  - **User Impact:** Medium→High (blocks cloud onboarding and first success)
  - **Functional Impact:** Partial (core may work, but creation flow fails)
  - **Brand Impact:** High (cloud reliability perception)
- **Technical Classification**
  - **Category:** Bug / Reliability
  - **Component Affected:** Cloud API + UI workflow
  - **Complexity:** Moderate effort
- **Resource Requirements**
  - **Required Expertise:** Cloud backend, observability, frontend error handling
  - **Dependencies:** May depend on billing/credits state and new billing page rollout
  - **Estimated Effort:** 3/5
- **Recommended Priority:** **P1**
- **Actionable Next Steps**
  1. Capture failing request IDs and correlate with server logs (add client-visible error codes).
  2. Identify whether failures correlate with billing/credits, rate limits, or workspace/project creation.
  3. Add retries for known transient failures + clearer UI messaging (“try again” vs actionable).
  4. Add an automated smoke test: create app → run minimal agent → verify chat responds.
- **Potential Assignees:** **cjft** (cloud), **Stan** (cloud optimizations), **wtfsayo** (dev workflow improvements if tooling-related)

---

### 6. TOCTOU race conditions in cloud billing/credits flows — (tracked internally; mentioned by Stan)
- **Current Status:** In progress (deduct-before, reconcile-after approach implemented/ongoing)
- **Impact Assessment**
  - **User Impact:** Medium (depends on cloud usage volume)
  - **Functional Impact:** Partial (incorrect billing or credit balance can block usage)
  - **Brand Impact:** High (billing correctness is trust-critical)
- **Technical Classification**
  - **Category:** Bug / Reliability
  - **Component Affected:** Cloud billing/credits services
  - **Complexity:** Complex solution (concurrency + reconciliation)
- **Resource Requirements**
  - **Required Expertise:** Distributed systems, idempotency, transactional design, audits
  - **Dependencies:** Impacts cloud “billing page for top-ups” rollout confidence
  - **Estimated Effort:** 4/5
- **Recommended Priority:** **P1**
- **Actionable Next Steps**
  1. Formalize invariants (no negative credits; idempotent deduction; reconciliation windows).
  2. Add ledger-style event log + periodic reconciliation job with alerting on drift.
  3. Load-test concurrent message sends and top-ups.
  4. Add incident playbook and dashboards (error budgets, reconciliation mismatch rate).
- **Potential Assignees:** **Stan** (lead), **cjft** (cloud), **Shaw** (architecture oversight)

---

## 2) Medium Priority Issues (Near-term planning)

### 7. Performance/memory concerns in core — elizaos/eliza #6332 (+ parallelism opportunities #6334, #6337)
- **Current Status:** Open (identified as a “new wave” of performance challenges)
- **Impact Assessment**
  - **User Impact:** Medium (grows with agent complexity and uptime)
  - **Functional Impact:** Partial (can degrade/kill long-running agents)
  - **Brand Impact:** Medium→High (performance regressions erode trust)
- **Technical Classification**
  - **Category:** Performance
  - **Component Affected:** Core Framework (runtime/message processing)
  - **Complexity:** Complex solution (profiling + architectural changes likely)
- **Resource Requirements**
  - **Required Expertise:** Profiling (Bun/Node), memory leak detection, concurrency design
  - **Dependencies:** Related to caching work (see draft CachedDatabaseAdapter PR #6329)
  - **Estimated Effort:** 4/5
- **Recommended Priority:** **P2**
- **Actionable Next Steps**
  1. Establish baseline perf suite (long conversation + tool use + embeddings).
  2. Add heap snapshots + flamegraphs in CI for regression detection.
  3. Prioritize wins: reduce duplicate provider calls, tighten caching boundaries, streaming backpressure.
- **Potential Assignees:** **Shaw** (core architecture), **standujar** (messaging), **0xbbjoker** (data layer)

---

### 8. CachedDatabaseAdapter draft (serverless optimization) — elizaos/eliza PR #6329 (DRAFT)
- **Current Status:** Draft, explicitly “DO NOT MERGE”; has noted TypeScript/syntax issues
- **Impact Assessment**
  - **User Impact:** Medium (could significantly help cloud/serverless users)
  - **Functional Impact:** No (enhancement), but can reduce costs and latency
  - **Brand Impact:** Medium (performance story)
- **Technical Classification**
  - **Category:** Performance / Feature
  - **Component Affected:** plugin-sql + runtime
  - **Complexity:** Complex solution
- **Resource Requirements**
  - **Required Expertise:** TypeScript correctness, caching invalidation, data consistency, Redis/Upstash
  - **Dependencies:** Should align with ongoing performance issues and data isolation/RLS testing
  - **Estimated Effort:** 4/5
- **Recommended Priority:** **P2**
- **Actionable Next Steps**
  1. Fix compilation blockers (optional method declaration issues) and remove unsafe casts.
  2. Define correctness rules (what is safe to cache; how to invalidate aggregates).
  3. Add a benchmark showing DB call reduction and latency improvements.
  4. Ship behind a feature flag and document recommended TTL defaults.
- **Potential Assignees:** **0xbbjoker** (author), **Shaw** (review), **Stan** (cloud fit)

---

### 9. Documentation gaps causing repeated support questions (Twitter API requirements; Discord timer/interval settings; model “provider prefix” format)
- **Current Status:** Multiple unanswered Discord questions; partial guidance exists in chat only
- **Impact Assessment**
  - **User Impact:** Medium (common onboarding stumbling blocks)
  - **Functional Impact:** Partial (users assume features are broken)
  - **Brand Impact:** Medium (support burden + confusion)
- **Technical Classification**
  - **Category:** Documentation / UX
  - **Component Affected:** Docs + plugin-twitter + plugin-discord + cloud quickstart
  - **Complexity:** Simple fix
- **Resource Requirements**
  - **Required Expertise:** Product/docs writing, basic plugin knowledge
  - **Dependencies:** Also depends on docs site availability (P0 above)
  - **Estimated Effort:** 2/5
- **Recommended Priority:** **P2**
- **Actionable Next Steps**
  1. Add FAQ pages:
     - “Do I need X/Twitter API keys? Which tier? What features work without it?”
     - “Discord scheduler/timer/interval settings: where configured + examples”
     - “Cloud model names: `provider/model` required (openai/, anthropic/, google/)”
  2. Pin the answers in Discord and link to canonical docs pages.
- **Potential Assignees:** **jin** (docs), **cjft** (cloud/devrel), **Odilitime** (Discord specifics)

---

## 3) Lower Priority / Roadmap Items (Track but don’t block)

### 10. Eliza 2.0 redesign (remove API/server/CLI/projects; TS/Rust/Python + FFI plugin interop) — elizaos/eliza PR #6351
- **Current Status:** In progress branch/PR (large diff), active exploration
- **Impact Assessment**
  - **User Impact:** Medium (future-facing; could simplify adoption long-term)
  - **Functional Impact:** No (not a current outage), but can fragment focus if unmanaged
  - **Brand Impact:** Medium (big promise; needs clear messaging)
- **Technical Classification**
  - **Category:** Feature / Architectural change
  - **Component Affected:** Core Framework + Plugin System + Tooling
  - **Complexity:** Architectural change
- **Resource Requirements**
  - **Required Expertise:** Systems design, multi-language runtime, FFI, packaging, migration strategy
  - **Dependencies:** Should not derail P0/P1 stability and release pipeline work
  - **Estimated Effort:** 5/5
- **Recommended Priority:** **P3** (continue in parallel with clear gates)
- **Actionable Next Steps**
  1. Define migration plan (v1→v2 compatibility, plugin porting priority list).
  2. Establish acceptance criteria and a minimal “v2 runtime MVP.”
  3. Communicate scope and timelines to avoid confusion with current stable releases.
- **Potential Assignees:** **Shaw** (lead), **lalalune** (PR owner), senior plugin maintainers

---

## Summary: Top Issues to Address Immediately (Top 5–10)

1. **P0:** plugin-discord publishing pipeline failure — **plugin-discord #40**
2. **P0:** Telegram plugin image crash — **plugin-telegram #23**
3. **P0:** Docs outage — **elizacloud.ai/docs down** (create tracking issue + restore)
4. **P1:** Discord v1.7.0 migration/compatibility still impacting users — (post-fix release coordination)
5. **P1:** ElizaCloud app creator “operation failed” onboarding blocker — (instrument + fix)
6. **P1:** Cloud TOCTOU/billing race conditions — (complete hardening + monitoring)
7. **P2:** Core memory/performance issues — **eliza #6332** (+ **#6334/#6337**)
8. **P2:** Cached adapter draft maturation — **eliza PR #6329**
9. **P2:** Fill recurring docs gaps (Twitter API, Discord timers, model naming)

---

## Patterns / Themes Indicating Deeper Issues

- **Release engineering fragility:** A single plugin’s publishing failure blocks user-facing fixes; this amplifies the impact of otherwise-resolved bugs (e.g., serverId→messageServerId migration).
- **Schema/migration coordination risk:** The serverId→messageServerId transition shows how cross-repo compatibility breaks without a strict version matrix + integration tests.
- **Cloud reliability + billing correctness:** App creator failures and TOCTOU fixes suggest the platform is scaling into concurrency edge cases; observability and idempotent design are now mandatory, not optional.
- **Docs as a stability surface:** Repeated Discord questions and a docs outage indicate documentation and hosting are part of operational reliability, not “nice to have.”

---

## Process Improvements (Prevention)

1. **Add a “Release Readiness Checklist” per repo**
   - CI publish dry-run, versioning verification, changelog generation, rollback plan.
2. **Cross-repo compatibility gates**
   - Maintain a tested matrix (core × plugin versions) and run nightly integration tests for Discord/Telegram/Twitter.
3. **Incident-grade observability for Cloud**
   - Correlation IDs surfaced to UI, structured error codes, dashboards for top failure modes, automated smoke tests for critical flows (create app → run agent).
4. **Docs reliability improvements**
   - Uptime monitoring for docs, static fallback mirror (GitHub Pages), and a single canonical FAQ page linked/pinned in Discord.
5. **Migration playbooks**
   - For any renamed field/schema change: deprecation window, adapter shims, and explicit “breaking change” notes with upgrade steps.