## Issue Triage — 2026-01-09

### 1) Discord plugin release blocked by publishing pipeline failure — elizaos-plugins/plugin-discord #40
- **Current Status:** Open (under active investigation); blocking release of v1.3.4
- **Impact Assessment:**
  - **User Impact:** High (anyone relying on npm releases for Discord integration)
  - **Functional Impact:** Yes (prevents shipping fixes; users stuck on broken/old versions)
  - **Brand Impact:** High (signals poor release hygiene; slows response to breakages)
- **Technical Classification:**
  - **Issue Category:** Bug / DevEx
  - **Component Affected:** Plugin System (Discord plugin), CI/CD release pipeline
  - **Complexity:** Moderate effort
- **Resource Requirements:**
  - **Required Expertise:** Node/npm publishing, GitHub Actions, release tooling, mono/semantic-release experience
  - **Dependencies:** May block downstream fix deployment for the 1.7.0 compatibility issue
  - **Estimated Effort:** 3/5
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Pull the exact failing workflow logs and identify failure class (auth/permissions, provenance/2FA, build artifact missing, tag/version mismatch).
  2. Reproduce locally: run package build + pack, validate `package.json` fields, `files`, `exports`, and postbuild outputs.
  3. Verify npm token scope/expiry, org permissions, and whether provenance/SLSA is enforced.
  4. Add a “dry-run publish” CI job on PRs to catch failures before release branches/tags.
  5. Once fixed, cut **v1.3.4** with known compatibility fixes and publish immediately.
- **Potential Assignees:** **Odilitime**, **Shaw**, **Stan (standujar)** (CI experience), plus a maintainer with npm org access

---

### 2) Discord bot fails on ElizaOS v1.7.0 with “No server ID found 10” (serverId → messageServerId migration) — *Discord report (needs canonical GH issue in elizaos/eliza + plugin-discord)*
- **Issue Title & ID:** “Discord plugin broken on core v1.7.0: room.serverId undefined / ‘No server ID found 10’” — **ID: N/A (Discord)**
- **Current Status:** Partially mitigated in core via **elizaos/eliza PR #6333 (merged)**; still reports of incompatibility with **plugin-discord v1.3.3** pending plugin-side release/testing
- **Impact Assessment:**
  - **User Impact:** High (Discord is a primary connector; repeated community reports)
  - **Functional Impact:** Yes (blocks Discord integration in common setups)
  - **Brand Impact:** High (core upgrade appears to “break Discord”)
- **Technical Classification:**
  - **Issue Category:** Bug / Compatibility
  - **Component Affected:** Core Framework (bootstrap/actions/providers), Plugin System (plugin-discord), Messaging/Rooms schema
  - **Complexity:** Moderate effort (cross-repo coordination + release)
- **Resource Requirements:**
  - **Required Expertise:** eliza runtime schema/migrations, Discord plugin internals, regression testing across versions
  - **Dependencies:** **plugin-discord publishing issue #40**; also requires branch testing across Discord plugin variants
  - **Estimated Effort:** 3/5
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Open/confirm tracking issues in both repos with a pinned repro matrix:
     - core: 1.6.5 vs 1.7.0
     - plugin-discord: 1.3.3 vs next
     - plugin-bootstrap: current
  2. Add an automated integration test that spins a minimal Discord adapter mock validating `messageServerId` presence through the pipeline.
  3. Ensure docs/release notes explicitly call out the migration and compatible version pairs.
  4. After #40 is resolved: cut and publish plugin-discord **v1.3.4** (or higher) and announce upgrade path.
- **Potential Assignees:** **Odilitime** (migration context), **Shaw** (Discord bridge), **Casino** (triage), plugin-discord maintainers

---

### 3) Potential SQL injection vector in RLS context setting + Neon serverless support awaiting merge — elizaos/eliza PR #6343
- **Current Status:** Open PR (not merged); intended to replace `SET LOCAL ...` raw interpolation with parameterized `set_config()` and unify isolation context API
- **Impact Assessment:**
  - **User Impact:** Medium → High (any production deployment using Postgres RLS / multi-tenant isolation)
  - **Functional Impact:** Partial (security hardening + new Neon adapter; not strictly blocking all users)
  - **Brand Impact:** High (security posture + multi-tenant isolation correctness)
- **Technical Classification:**
  - **Issue Category:** **Security** + Feature (DB support)
  - **Component Affected:** Model/Data layer (plugin-sql), Core DB isolation patterns
  - **Complexity:** Moderate effort (merge + follow-up breaking rename handling)
- **Resource Requirements:**
  - **Required Expertise:** Postgres RLS, drizzle/sql templating, Neon serverless driver, security review
  - **Dependencies:** Any downstream callers of `withEntityContext` → renamed `withIsolationContext` (breaking change coordination)
  - **Estimated Effort:** 3/5
- **Recommended Priority:** **P0**
- **Specific Actionable Next Steps:**
  1. Fast-track security review of the RLS context changes (confirm all set_config usage is parameterized).
  2. Identify all internal/external call sites of `withEntityContext` and provide a deprecation shim (or codemod) to reduce breakage.
  3. Merge PR with clear migration notes; cut a patch/minor release depending on API policy.
  4. Add a regression test specifically asserting no raw interpolation is used for RLS variables.
- **Potential Assignees:** **Stan (standujar)** (author), **0xbbjoker** (plugin-sql history), a security-minded reviewer from core maintainers

---

### 4) Telegram plugin crashes when processing certain uploaded images — elizaos-plugins/plugin-telegram #23
- **Current Status:** Open
- **Impact Assessment:**
  - **User Impact:** Medium (Telegram users; reproducible crash on common media type)
  - **Functional Impact:** Yes (crashes bot/runtime on specific inputs)
  - **Brand Impact:** Medium (connector reliability)
- **Technical Classification:**
  - **Issue Category:** Bug
  - **Component Affected:** Plugin System (Telegram)
  - **Complexity:** Moderate effort
- **Resource Requirements:**
  - **Required Expertise:** Telegram Bot API media payloads, Node image/file handling, type validation
  - **Dependencies:** None, but should align with standardized `handleMessage` patterns if applicable
  - **Estimated Effort:** 2–3/5
- **Recommended Priority:** **P1**
- **Specific Actionable Next Steps:**
  1. Capture failing payload samples (photo vs document variants) and add as test fixtures (redacted).
  2. Add defensive parsing for `photo[]` sizes and missing fields; ensure graceful fallback instead of throwing.
  3. Add an integration test that simulates Telegram update objects for photo messages.
  4. Release a patch version once fixed.
- **Potential Assignees:** Telegram plugin maintainers; secondary: **Odilitime** (plugin patterns), **Stan** (testing discipline)

---

### 5) ElizaCloud documentation outage — elizacloud.ai/docs (*Discord report; needs tracking issue*)
- **Issue Title & ID:** “ElizaCloud docs down at elizacloud.ai/docs” — **ID: N/A (Discord)**
- **Current Status:** Reported broken/down (unconfirmed root cause)
- **Impact Assessment:**
  - **User Impact:** High (blocks onboarding and deployment success)
  - **Functional Impact:** Partial (product runs, but users cannot follow official docs)
  - **Brand Impact:** High (first-impression failure)
- **Technical Classification:**
  - **Issue Category:** Documentation / Infrastructure
  - **Component Affected:** Docs site hosting / routing / DNS / build pipeline
  - **Complexity:** Simple fix → Moderate effort (depends on hosting failure)
- **Resource Requirements:**
  - **Required Expertise:** Web hosting (Vercel/Cloudflare/etc.), DNS, static build pipelines
  - **Dependencies:** None
  - **Estimated Effort:** 2/5
- **Recommended Priority:** **P1**
- **Specific Actionable Next Steps:**
  1. Add uptime check + alerting for docs endpoints.
  2. Verify DNS/redirects, TLS cert, and build output paths; confirm last successful deploy.
  3. Add a temporary fallback link in README/Linktree to HackMD book: `https://hackmd.io/@elizaos/book`.
  4. Create a postmortem note if outage > 2 hours to prevent repeat.
- **Potential Assignees:** **cjft** (infra responses in Discord), website/docs maintainers, **jin** (docs ownership)

---

### 6) Cloud correctness: TOCTOU race conditions in billing/credit deduction (“deduct-before, reconcile-after”) — *Dev log (Linear tickets referenced; needs GH linkage)*
- **Issue Title & ID:** “Fix TOCTOU race conditions in cloud transactions” — **ID: N/A (Linear / dev log)**
- **Current Status:** In progress (approach defined; tickets created)
- **Impact Assessment:**
  - **User Impact:** Medium → High (inconsistent balances, double-spend-like behavior, trust issues)
  - **Functional Impact:** Partial (platform works but correctness/financial integrity at risk)
  - **Brand Impact:** High (billing correctness is trust-critical)
- **Technical Classification:**
  - **Issue Category:** Bug / Reliability
  - **Component Affected:** Cloud runtime, billing/credits, concurrency control
  - **Complexity:** Complex solution
- **Resource Requirements:**
  - **Required Expertise:** Distributed systems, transactional semantics, idempotency keys, database locking/transactions
  - **Dependencies:** Telemetry to validate fixes; possibly schema changes
  - **Estimated Effort:** 4/5
- **Recommended Priority:** **P1**
- **Specific Actionable Next Steps:**
  1. Ensure every deduction operation is idempotent (request IDs) and uses transactional boundaries.
  2. Add high-concurrency tests (burst sends) validating invariants (no negative balances; exact once).
  3. Instrument reconciliation metrics and dashboards to confirm drift reduction.
  4. Publish operational guidance for incident response when reconciliation detects drift.
- **Potential Assignees:** **Stan (standujar)**, cloud maintainers

---

### 7) Web search intermittently non-functional — *Daily report item; needs GH issue*
- **Issue Title & ID:** “Web search intermittently not working” — **ID: N/A (Daily report)**
- **Current Status:** Reported intermittent failures; no root cause published
- **Impact Assessment:**
  - **User Impact:** Medium (depends on how many agents rely on web search)
  - **Functional Impact:** Partial (feature degradation)
  - **Brand Impact:** Medium (perceived flakiness)
- **Technical Classification:**
  - **Issue Category:** Bug / Reliability
  - **Component Affected:** Tooling/Integration (web search provider), runtime tool invocation
  - **Complexity:** Moderate effort
- **Resource Requirements:**
  - **Required Expertise:** External API reliability, retries/backoff, observability/logging
  - **Dependencies:** Identify which provider(s) and quotas are involved
  - **Estimated Effort:** 3/5
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Open a GH issue with timestamps, error codes, provider config, and whether failures correlate with rate limits.
  2. Add structured error reporting surfaced to UI (not silent failure).
  3. Implement retry with jitter + circuit breaker and provider fallback (if multiple search backends exist).
- **Potential Assignees:** Core integrations maintainer, **cjft** (API config help), **Stan** (observability discipline)

---

### 8) Performance concern: memory consumption regression/concern — elizaos/eliza #6332
- **Current Status:** Open (reported as a “new wave” of performance challenges)
- **Impact Assessment:**
  - **User Impact:** Medium (worse on long-running agents / multi-agent workloads)
  - **Functional Impact:** Partial (can cause slowdowns/OOM in production)
  - **Brand Impact:** Medium
- **Technical Classification:**
  - **Issue Category:** Performance
  - **Component Affected:** Core Framework (runtime, message/memory handling)
  - **Complexity:** Complex solution (profiling + root cause)
- **Resource Requirements:**
  - **Required Expertise:** Node/Bun memory profiling, heap snapshots, load testing
  - **Dependencies:** Clear reproduction scenario and baseline metrics
  - **Estimated Effort:** 4/5
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Add a minimal reproducible benchmark (multi-session chat loop) and capture heap snapshots over time.
  2. Identify top retainers (message history retention, embeddings cache, event bus listeners).
  3. Add memory budget tests in CI for key workloads (guardrail, not blocker initially).
- **Potential Assignees:** **Stan**, performance-focused contributors (e.g., **wtfsayo**)

---

### 9) Performance/opportunity: parallel processing improvements — elizaos/eliza #6334 and #6337
- **Current Status:** Open
- **Impact Assessment:**
  - **User Impact:** Medium (latency reductions; throughput improvements)
  - **Functional Impact:** No (optimization)
  - **Brand Impact:** Medium (responsiveness)
- **Technical Classification:**
  - **Issue Category:** Performance
  - **Component Affected:** Core Framework (multi-step execution, provider/tool pipeline)
  - **Complexity:** Complex solution
- **Resource Requirements:**
  - **Required Expertise:** Concurrency patterns, deterministic execution, race-free caching, benchmarking
  - **Dependencies:** Should not regress correctness (especially around message ordering and memory writes)
  - **Estimated Effort:** 4/5
- **Recommended Priority:** **P3** (promote to P2 if memory work uncovers easy wins here)
- **Specific Actionable Next Steps:**
  1. Define which steps are safe to parallelize (pure providers vs mutating memory/tools).
  2. Implement behind a feature flag; benchmark against baseline.
  3. Add tracing spans to validate parallel execution actually reduces critical path time.
- **Potential Assignees:** **Stan**, core runtime maintainers

---

### 10) Agent memory configuration documentation gap — elizaos/docs #82
- **Current Status:** Open
- **Impact Assessment:**
  - **User Impact:** Medium (frequent confusion, misconfiguration)
  - **Functional Impact:** Partial (agents behave “wrong” without correct memory settings)
  - **Brand Impact:** Medium
- **Technical Classification:**
  - **Issue Category:** Documentation
  - **Component Affected:** Docs (agent configuration)
  - **Complexity:** Simple fix
- **Resource Requirements:**
  - **Required Expertise:** Product + runtime knowledge; examples/testing
  - **Dependencies:** None
  - **Estimated Effort:** 1–2/5
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Add “recipes” for common memory setups (chatbot, researcher, tool-using agent).
  2. Document pitfalls (embedding model config, vector dims, retention limits).
  3. Link prominently from onboarding + CLI quickstart.
- **Potential Assignees:** **jin** (docs lead), **borisudovicic** (UX/docs direction), docs contributors

---

## Summary: Top Highest-Priority Issues (address immediately)
1. **P0 — plugin-discord publishing pipeline failure** (elizaos-plugins/plugin-discord **#40**) blocking releases.
2. **P0 — Discord integration broken on core v1.7.0 (“No server ID found 10”)** (Discord report; track in GH) requiring coordinated core+plugin release.
3. **P0 — Security hardening for SQL RLS context + Neon support** (elizaos/eliza **PR #6343**) to eliminate raw interpolation risk and improve serverless DB support.
4. **P1 — Telegram plugin image crash** (elizaos-plugins/plugin-telegram **#23**) causing runtime failures on common inputs.
5. **P1 — ElizaCloud docs outage** (Discord report; track in GH) blocking onboarding/deployments.
6. **P1 — Cloud TOCTOU race conditions in billing/credits** (dev log/Linear; link to GH) correctness + trust issue.
7. **P2 — Intermittent web search failures** (daily report; track in GH) feature reliability.
8. **P2 — Memory consumption/perf regression** (elizaos/eliza **#6332**) risk of instability at scale.

---

## Patterns / Themes Indicating Deeper Architectural Problems
- **Cross-repo version coupling without enforced compatibility checks:** Core schema changes (serverId → messageServerId) broke connector behavior; needs automated compatibility matrix tests.
- **Release engineering fragility:** A single publishing failure (#40) blocks shipping critical fixes; indicates insufficient redundancy and preflight validation.
- **Security vs correctness tradeoffs in DB isolation:** The earlier fix (`sql.raw()` for `SET LOCAL`) solved correctness but reintroduced potential injection surface; suggests the need for a formal secure query pattern library for isolation context.
- **Operational reliability gaps (docs + web search + cloud correctness):** Outages and intermittent failures point to missing SLOs, monitoring, and incident playbooks for user-facing infrastructure.

---

## Process Recommendations (to prevent recurrence)
1. **Add a “compatibility gate” CI workflow** that tests pinned combinations:
   - core release candidate + latest plugin-discord/plugin-telegram
   - validates schema fields (e.g., `messageServerId`) and basic message flow.
2. **Harden release pipelines**:
   - preflight “dry-run publish” on every PR touching publish config
   - maintain a documented release checklist and an emergency “hotfix publish” path.
3. **Codify secure DB isolation APIs**:
   - provide one blessed helper for setting RLS context (parameterized only)
   - add lint/check to reject raw interpolation for isolation variables.
4. **Introduce lightweight reliability SLOs** for docs and web-search integrations:
   - uptime checks + alerting + status page note
   - structured error surfacing to users and logs for debugging.
5. **Require a tracking issue for Discord-reported production-impacting problems** (docs down, intermittent search, migration blockers) within 24 hours to avoid losing context and to prioritize objectively.