# Issue Triage — 2026-05-04 (elizaOS)

## 1) Cloudflare Workers cache singleton causes cross-request I/O crash (cache currently disabled) — PR: elizaos/eliza #7336 (WIP)
- **Current Status:** In progress (Draft/WIP PR). Cache remains effectively constrained by prior hotfix (`CACHE_ENABLED=false`) with follow-up bypasses already merged (#7324/#7327).
- **Impact Assessment:**
  - **User Impact:** **High** (Cloud users; affects auth, latency, and stability at scale)
  - **Functional Impact:** **Partial** (system can run with cache disabled/bypassed, but perf + reliability + future re-enable are blocked)
  - **Brand Impact:** **High** (crashy infra + auth instability is highly visible)
- **Technical Classification:**
  - **Issue Category:** Bug / Performance / Reliability
  - **Component Affected:** Cloud (Hono Workers), Cache layer, Auth (SIWE), DI/container architecture
  - **Complexity:** **Architectural change** (Clean Architecture migration + request-scope DI)
- **Resource Requirements:**
  - **Required Expertise:** Cloudflare Workers constraints, Hono middleware lifecycle, Redis/cache client design, DI/container patterns, SIWE auth flow
  - **Dependencies:** Completion of #7336 phases (esp. Phase D: per-request cache client + re-enable cache + rollback bypass); staging deploy + hammer tests
  - **Estimated Effort (1–5):** **5**
- **Recommended Priority:** **P1** (must land this sprint to safely re-enable cache and prevent regression)
- **Specific Actionable Next Steps:**
  1. Extract a minimal “Phase D” slice: make cache client strictly request-scoped (no module singleton), with explicit per-request instantiation via constructor.
  2. Add a regression test that reproduces the Workers error (`Cannot perform I/O on behalf of a different request`) via parallel request hammer.
  3. Re-enable cache in staging first; run tail-based verification (50–200 concurrent requests).
  4. Remove/rollback SIWE cache-availability bypass (#7324) only after cache is proven safe.
  5. Document new invariants: “no module-level sockets on Workers” and enforce via lint/test pattern if possible.
- **Potential Assignees:**
  - **standujar** (owner of #7336; Cloud + auth context)
  - **0xSolace** (cloud migration + infra stabilization)
  - Backup reviewer: **NubsCarson** (runtime/integration stability)

---

## 2) n8n clarification roundtrip has real-world paramPath and “free_text” loop hazards — PR: elizaos/eliza #7316 (merged) + follow-up needed
- **Current Status:** Merged, but automated review flagged P1 defects that can break production roundtrips (format mismatch + unprunable clarifications).
- **Impact Assessment:**
  - **User Impact:** **Medium–High** (users using Automations/n8n NL→workflow flow)
  - **Functional Impact:** **Partial** (feature can return 400s or trap users in “needs_clarification” loop)
  - **Brand Impact:** **High** (automation UX regressions look like “AI doesn’t work”)
- **Technical Classification:**
  - **Issue Category:** Bug / UX
  - **Component Affected:** App-Core API (`/api/n8n/workflows/*`), clarification parser/patcher utilities
  - **Complexity:** **Moderate effort**
- **Resource Requirements:**
  - **Required Expertise:** TypeScript, n8n draft structure, robust JSON path patching, API contract design
  - **Dependencies:** Coordination with UI work (Clarification UI) so API contract remains stable
  - **Estimated Effort (1–5):** **3**
- **Recommended Priority:** **P1** (automation path is a flagship UX surface; fix before scaling usage)
- **Specific Actionable Next Steps:**
  1. Add an end-to-end test that feeds the *exact* `/generate` response `{draft, paramPath}` into `/resolve-clarification` without modification (prevents format drift).
  2. Fix `setByDotPath` to correctly handle bracket-string keys when the target is an array (`nodes["Discord Send"]` vs `nodes[0]`)—either normalize to numeric indices or support name→index mapping.
  3. Define pruning rules for legacy `free_text` clarifications with empty `paramPath` (e.g., generate a synthetic path or mark resolved by `id`).
  4. Add server-side validation hardening for client-supplied `draft` to prevent unsafe/invalid deploy attempts.
- **Potential Assignees:**
  - **2-A-M** (original author; app-core + n8n integration context)
  - Reviewer: **lalalune** (broad repo context), **NubsCarson** (stability/testing)

---

## 3) Discord server spam filter blocks legitimate URL sharing (workaround: backticks) — Discord thread: #💬-coders (2026-05-03)
- **Current Status:** Known issue; workaround suggested (`\`https://...\``). No permanent mitigation implemented.
- **Impact Assessment:**
  - **User Impact:** **Medium** (community-wide; affects support/debug velocity)
  - **Functional Impact:** **No** (doesn’t break runtime, but breaks support workflows)
  - **Brand Impact:** **Medium** (feels hostile/frictional for contributors)
- **Technical Classification:**
  - **Issue Category:** UX / Community Ops
  - **Component Affected:** Discord moderation tooling / server configuration
  - **Complexity:** **Simple fix** to **Moderate effort** (depending on bot/ruleset)
- **Resource Requirements:**
  - **Required Expertise:** Discord automod configuration, anti-spam bot tuning, allowlist patterns
  - **Dependencies:** Ongoing spam-bot pressure; must avoid reopening abuse vector
  - **Estimated Effort (1–5):** **2**
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Identify which layer is blocking (Discord AutoMod vs third-party moderation bot).
  2. Add an allowlist for common developer domains (github.com, docs sites, npmjs.com) and/or permit links in specific channels (e.g., #coders).
  3. Improve user feedback: when a message is blocked, DM the user with the reason + safe formatting guidance.
  4. Document “safe link sharing” in the server README / pinned message.
- **Potential Assignees:**
  - **odilitime** (Community Ops + moderator; already engaged)
  - Support: **shawmakesmagic** (moderator) for policy decisions

---

## 4) “Memory rot” in long-lived agents (3+ months) needs a first-class mitigation design in core memory pipeline — Discord research: sentient_dawn (2026-05-01)
- **Current Status:** Research + production-proven mitigation described; not yet integrated as an upstream pattern/module.
- **Impact Assessment:**
  - **User Impact:** **Medium** now, **High** as agents become persistent (Cloud/V3)
  - **Functional Impact:** **Partial** (agents respond but drift; correctness degrades silently)
  - **Brand Impact:** **High** (silent contradictions erode trust)
- **Technical Classification:**
  - **Issue Category:** Bug / Reliability / Architecture
  - **Component Affected:** Core Framework (memory), RAG/vector store usage, evaluators/providers
  - **Complexity:** **Architectural change**
- **Resource Requirements:**
  - **Required Expertise:** Retrieval systems, memory architectures, embedding/versioning, evaluation harnesses, agent runtime integration
  - **Dependencies:** Agreement on “freshness gates” spec; storage schema/versioning approach; scheduling (periodic reconciliation job)
  - **Estimated Effort (1–5):** **4**
- **Recommended Priority:** **P2** (start design now; implement incrementally)
- **Specific Actionable Next Steps:**
  1. Request/publish the promised “full field report” and convert it into an ADR + implementation plan.
  2. Define minimal upstream MVP: freshness metadata on facts + outbound claim gating + periodic diff/re-embed hooks.
  3. Add a “staleness detection” evaluator that can fail closed (ask clarification) instead of confidently outputting stale claims.
  4. Build a long-run test harness (simulated timeline + changing truths) to catch drift regressions.
- **Potential Assignees:**
  - **sentient_dawn** (primary SME; already implemented externally)
  - **pmairca** / **NubsCarson** (core/runtime integration)
  - Optional advisor: **trace.g** (LLM systems to production stability)

---

## 5) V3 launch readiness: roadmap + announcements documentation not yet finalized — Docs/Comms task (Discord: 2026-05-03)
- **Current Status:** Roadmap image shared; text roadmap update + announcements documentation pending.
- **Impact Assessment:**
  - **User Impact:** **High** (all users affected by unclear upgrade path/expectations)
  - **Functional Impact:** **No** (product works, but adoption/retention suffers)
  - **Brand Impact:** **High** (perception of chaos/inconsistency)
- **Technical Classification:**
  - **Issue Category:** Documentation / UX (developer experience)
  - **Component Affected:** Docs, Roadmap repo, Release communication process
  - **Complexity:** **Moderate effort**
- **Resource Requirements:**
  - **Required Expertise:** Technical writing, release management, repo knowledge of V3/Cloud/Milady deliverables
  - **Dependencies:** Final V3 scope; PR/marketing plan inputs
  - **Estimated Effort (1–5):** **3**
- **Recommended Priority:** **P2**
- **Specific Actionable Next Steps:**
  1. Update GitHub roadmap text to match the posted visual roadmap (single source of truth).
  2. Publish an “Upgrade to V3” draft: breaking changes, migration steps, known issues.
  3. Add a standing “Launch Readiness” checklist (docs, smoke tests, rollback plan, status page links).
- **Potential Assignees:**
  - **odilitime** (already coordinating roadmap/docs)
  - **zadayos** (PR/marketing plan support)
  - Reviewer: **shawmakesmagic** (final messaging approval)

---

## 6) ConnectorTargetCatalog “disabled connector still enumerates” behavior — PR: elizaos/eliza #7315 (merged) follow-up
- **Current Status:** Merged; review flagged P2 concerns (doesn’t consult enabled flag; caches transient 5xx).
- **Impact Assessment:**
  - **User Impact:** **Medium** (automation UI might surface targets for connectors user considers “off”)
  - **Functional Impact:** **Partial** (UX confusion; potential rate-limit pressure)
  - **Brand Impact:** **Medium**
- **Technical Classification:**
  - **Issue Category:** UX / Performance
  - **Component Affected:** App-Core services (`connector-target-catalog`, `discord-target-source`)
  - **Complexity:** **Simple fix**
- **Resource Requirements:**
  - **Required Expertise:** TypeScript, connector config semantics, Discord REST rate limits
  - **Dependencies:** Clarify definition of “enabled” vs “connected”
  - **Estimated Effort (1–5):** **2**
- **Recommended Priority:** **P3**
- **Specific Actionable Next Steps:**
  1. Decide semantics: enumerate only if connector is both “configured + enabled”, or if token exists regardless.
  2. Avoid caching 5xx responses for full TTL (cache only successes; short TTL for errors).
  3. Consider parallelizing per-guild channel fetches with safe concurrency limits.
- **Potential Assignees:**
  - **2-A-M** (service author)
  - Reviewer: **RemilioNubilio** (connector config correctness focus)

---

## 7) Plugin delivery risk: key repos (plugin-auto-trader, plugin-social-alpha, spartan) are referenced as core narrative but lack tracked milestones — Discord (2026-05-03)
- **Current Status:** Repos shared publicly; completion status unclear; no explicit issues/milestones in provided data.
- **Impact Assessment:**
  - **User Impact:** **Medium** (builders waiting on examples/integrations)
  - **Functional Impact:** **No** (not blocking core runtime)
  - **Brand Impact:** **Medium** (expectations vs deliverables)
- **Technical Classification:**
  - **Issue Category:** Feature Request / Project Management
  - **Component Affected:** Plugin ecosystem
  - **Complexity:** **Complex solution** (depends on scope)
- **Resource Requirements:**
  - **Required Expertise:** Trading systems, data ingestion, risk controls, plugin API best practices
  - **Dependencies:** V3 plugin lifecycle stability; Cloud routing decisions; security review for trading actions
  - **Estimated Effort (1–5):** **4**
- **Recommended Priority:** **P3**
- **Specific Actionable Next Steps:**
  1. Create milestone issues per repo: MVP definition, security constraints, test harness, example configs.
  2. Add “safe-by-default” guardrails for trading plugins (rate limits, simulation mode, explicit confirmations).
- **Potential Assignees:**
  - **shawmakesmagic** (product direction)
  - **marianodim** (building trading agent; potential contributor)
  - **rainman1001** / **satsbased** (trading community validation)

---

## 8) Operational blocker: “apply for chatgpt/cyber access” (model capability gating) — Discord (2026-05-01)
- **Current Status:** Mentioned as needed due to refusals on “version 5.5”; action item not confirmed completed.
- **Impact Assessment:**
  - **User Impact:** **Low–Medium** (affects specific workflows requiring cyber policies/tools)
  - **Functional Impact:** **Partial** (limits certain agent tasks)
  - **Brand Impact:** **Low**
- **Technical Classification:**
  - **Issue Category:** Feature / Ops
  - **Component Affected:** Model Integration / Provider access
  - **Complexity:** **Simple fix** (mostly administrative)
- **Resource Requirements:**
  - **Required Expertise:** Provider onboarding, compliance/policy alignment
  - **Dependencies:** External approval timelines
  - **Estimated Effort (1–5):** **1**
- **Recommended Priority:** **P3**
- **Specific Actionable Next Steps:**
  1. File the access request; track outcome in a single internal issue/thread.
  2. Add graceful fallback routing for tasks that trigger refusals (alternate model/provider or capability flags).
- **Potential Assignees:**
  - **shawmakesmagic**
  - Support: **odilitime** (tracking + comms)

---

# Highest-Priority Focus (Top 5–10 to address next)
1. **#7336 (WIP): Cloud cache singleton → request-scope fix + safe cache re-enable plan** (**P1**)
2. **#7316 follow-up: n8n clarification roundtrip paramPath mismatch + free_text infinite loop** (**P1**)
3. **Discord spam filter blocks URLs (legit link sharing)** (**P2**)
4. **Memory rot mitigation design/integration into core memory pipeline** (**P2**)
5. **V3 launch roadmap/docs single source of truth + upgrade notes** (**P2**)
6. **#7315 follow-up: ConnectorTargetCatalog respects enabled flag + cache-on-error behavior** (**P3**)
7. **Trading/plugin repos milestone tracking (auto-trader/social-alpha/spartan)** (**P3**)
8. **Model access ops: chatgpt/cyber capability gating** (**P3**)

---

# Patterns / Themes Indicating Deeper Issues
- **Architecture drift + emergency hotfixes:** Cache disabling/bypasses and large refactors (Clean Architecture) indicate prior coupling and unsafe singletons in serverless contexts.
- **Contract mismatch between LLM output ↔ API patchers ↔ real data models:** The n8n clarification path shows how easily format assumptions break without true end-to-end tests.
- **“Silent failure” UX modes:** Spam-filter drops, clarification loops, and memory rot are all failure modes that look like “nothing happened” or “AI is flaky,” which disproportionately harms trust.

---

# Process Improvement Recommendations
1. **Add end-to-end contract tests for any LLM-mediated workflow** (generate → resolve → deploy) using the same payloads returned by the system.
2. **Establish a “No module-level sockets on Workers” rule** plus a lightweight check (review checklist + lint pattern) for Cloudflare Worker code.
3. **Introduce “fail-loud” policies for user-facing flows** (blocked links, clarification loops, missing connector targets): explicit UI/HTTP errors with remediation tips.
4. **Create ADRs for major behavioral invariants** (cache scope, memory freshness, connector enabled semantics) and require them for architectural migrations.
5. **Pre-launch readiness checklist for V3/Cloud/Milady** tying docs + smoke tests + rollback plans to a single owner and date.