## User Feedback Analysis — 2026-05-03 (based on data through 2026-05-02)

### Data notes / quantification basis
- GitHub (elizaos/eliza, month-to-date snapshot): **9 new issues**, **15 new PRs** (22 merged in the snapshot window), **7 active contributors**.
- Discord summaries (2026-04-30 to 2026-05-02): mostly announcements/architecture discussion; **few direct “help” requests captured**. Most actionable pain points come from GitHub issues/PR review threads.

---

## 1) Pain Point Categorization (Top recurring friction areas)

### 1) Integration — Telegram connector reliability (High severity, high user impact)
**Frequency signal:** Telegram-related failures appear as a cluster: **at least 3 distinct Telegram issues referenced (#7240, #7241, #7245)**, i.e., **~33% of new issues (3/9)** mention Telegram integration directly in this period’s snapshot.

**What’s happening (examples):**
- **Silent message loss** due to **dual long-poll consumers** competing for the same bot token (milady wrapper + `@elizaos/plugin-telegram`) causing race conditions and dropped updates (**#7245**, open).
- **Bun + Telegraf launch/runtime issues** (referenced in #7245 context as #7241) creating a motivation for wrapper workarounds that then introduce new failure modes.
- **Token passing/setup bugs** in Telegram integration (mentioned in daily project summary; #7240).

**Who it affects most:** anyone trying to deploy Telegram bots (especially via milady wrapper flows) expecting “every message gets a reply.”

---

### 2) Technical Functionality — Auth/subscription credential paths brittle (High severity, medium frequency)
**Frequency signal:** Multiple auth-related mismatches surfaced across issues/PRs: Claude/Anthropic stealth preload missing (**#7210**, closed), auth schema drift (`auth.json`) referenced in daily summary (**#7238, #7243**), plus runtime/UI token flows touched by PR **#7212**.

**What’s happening (examples):**
- **Anthropic subscription OAuth fails on fresh clones** because `dev-ui.mjs` references a preload file (`claude-code-stealth.mjs`) that **doesn’t exist**, silently disabling the interceptor and yielding **401 Invalid authentication** (**#7210**, closed).
- **Schema expectations drift** (daily summary: outdated `auth.json` schema causing `hasCodexCliSubscriptionAuth` failures).

**Who it affects most:** self-hosters and developers using subscription-based auth flows (vs API keys), and anyone onboarding from a clean clone.

---

### 3) Technical Functionality — Database/schema drift breaks “fresh boot” chat (High severity, medium frequency)
**Frequency signal:** One detailed issue but severe: chat becomes effectively unusable on a fresh DB until fixed (**#7222**, closed quickly).

**What’s happening (examples):**
- `plugin-sql` runtime migrator misses drizzle table defs for tables that application code queries (`entity_identities`, `entity_merge_candidates`, `fact_candidates`), causing repeated query failures and downstream “structured output parsing error” loops (**#7222**).

**Who it affects most:** new installs, fresh environments, and any wrapper that defaults to PGLite + plugin-sql.

---

### 4) Platform/Runtime Stability — Headless Linux keychain crash / secret storage (High severity, medium frequency)
**Frequency signal:** Noted as a **process-level segfault** on headless Linux, fixed via PR **#7230** and reinforced by vault work **#7197**.

**What’s happening (examples):**
- OS keychain access via native keyring bindings can **segfault** on headless Linux without D-Bus; even try/catch can’t prevent process death because it’s a native crash (fixed by bypassing keychain when D-Bus unavailable: **#7230**, merged).
- Users running on servers/containers need deterministic fallback behavior (passphrase-based master key).

**Who it affects most:** VPS/container deployments, CI, headless servers—i.e., the exact audience for self-hosted agents.

---

### 5) Integration / Build System — ESM-only core migration and plugin compatibility churn (Medium severity, high frequency across repos)
**Frequency signal:** Daily project summary describes “resolving critical build compatibility issues across multiple plugins” by removing CJS bundles (plugin-ollama/openai/openrouter/pdf). This indicates repeated friction rather than a one-off.

**What’s happening (examples):**
- Plugins shipping incompatible module formats causing build failures until aligned with **ESM-only `@elizaos/core`**.
- Ongoing structure changes (e.g., **#7235** “add cloud and plugins, remove rust and python”) may amplify churn for downstreams.

**Who it affects most:** plugin authors, downstream wrappers, and anyone maintaining internal forks.

---

### 6) Documentation — “How do I deploy this safely / correctly?” gaps (Medium severity, high impact on onboarding)
**Frequency signal:** Discord contains architecture comparisons, v3 capability statements, and advanced topics (robotics, memory rot), but little step-by-step operational guidance captured. GitHub issues repeatedly mention multi-hour debug hunts caused by silent failures.

**What’s happening (examples):**
- Silent filtering of missing files in `dev-ui.mjs` (Anthropic stealth) caused confusing errors rather than a direct explanation (**#7210**).
- Telegram “sometimes replies” is a classic silent-failure onboarding killer (**#7245**).
- Headless keychain segfault is non-obvious without explicit docs and environment detection (**#7230**).

**Who it affects most:** newcomers self-hosting, and intermediates integrating connectors without deep runtime knowledge.

---

### 7) Technical Functionality (Long-lived agents) — Memory “rot” / staleness over months (High severity, emerging)
**Frequency signal:** One high-quality research report in Discord (sentient_dawn) describing a failure mode after ~3 months; not yet reflected as GitHub issues here, but strategically important.

**What’s happening (examples):**
- Retrieval-only memory (RAG/vector store) can drift as stale facts persist; contradictions emerge only when humans notice.
- Proposed mitigation: **reconciliation pass with freshness gates, periodic diffs, re-embedding under current ontology** (reported as implemented in production by sentient_dawn).

**Who it affects most:** power users running persistent agents (ops, trading, automation) rather than short chat sessions.

---

## 2) Usage Pattern Analysis (Actual vs intended usage)

### Observed real-world usage patterns
- **Trading agents / market automation**: marianodim building a spot trading agent (SOL, SUI + 5 tokens) with multi-strategy signals (Discord 2026-05-02). This aligns with ElizaOS positioning as “agent framework for autonomous agents, trading, automation.”
- **Robotics / hardware control**: shawmakesmagic integrated Eliza into a **Unitree robot** to walk on command (Discord 2026-05-01). This is an “unexpected but highly validating” use case showing device/runtime extensibility matters.
- **Self-hosted multi-client runtime**: PR **#7212** focuses on CORS + bearer auth + cross-platform builds (browser dashboards, Capacitor mobile, Electrobun desktop) routed to a remote agent. Users want “run my agent on a server, control it everywhere.”
- **Long-lived agent operations**: memory rot discussion indicates users are keeping agents alive for months, not just ephemeral sessions.

### Mismatches vs intended usage (or implied promises)
- **Connector promise vs reality**: Telegram “enabled” does not reliably mean “delivers all messages,” especially with wrappers + plugins both polling (#7245).
- **“Works on headless servers” expectation**: keychain-based secrets can hard-crash headless Linux without guardrails (#7230), contradicting expectations of safe server operation.
- **Subscription auth path**: codebase suggests subscription OAuth is supported, but missing preload artifact made it non-functional until fixed (#7210).

### Feature requests that align with actual usage
- **Connector lifecycle single-owner semantics** (Telegram as first priority; likely applicable to Discord/others).
- **First-class “server-mode” operational profile** (headless secrets, stable storage, deterministic migrations, connector diagnostics).
- **Long-lived memory maintenance primitives** (freshness, reconciliation jobs, staleness detection metrics).

---

## 3) Implementation Opportunities (Concrete solutions per major pain point)

### A) Telegram reliability (dual polling, launch errors, token/setup bugs)
**1) Enforce single polling owner per token (High impact, medium difficulty)**
- Add a **runtime guard**: if two Telegraf instances attempt to poll the same token, fail fast with a clear error (“duplicate poller detected”) rather than silently racing.
- Provide a **mutual exclusion config rule**: if a wrapper enables its own polling, it should automatically disable `@elizaos/plugin-telegram` (or vice versa), with an explicit log line.
- Similar approaches: Home Assistant integrations enforce single coordinator per device/account; many webhook frameworks forbid dual consumers unless explicitly sharded.

**2) Fix upstream Bun/Telegraf lifecycle so wrappers aren’t needed (High impact, higher difficulty)**
- Make `plugin-telegram` the single source of truth: ensure `bot.launch()` is awaited and stable on Bun/Windows; add integration tests for Bun runtime if feasible.
- Similar approaches: Discord.js / Slack Bolt libraries often ship explicit “runtime adapters” and tested lifecycle hooks per runtime.

**3) Add connector delivery observability (Medium impact, low-to-medium difficulty)**
- A “Connector Health” panel: last update timestamp, updates received count, handler success/failure counts, and warnings about competing consumers.
- Similar approaches: Kubernetes controllers expose reconciliation metrics; Slack apps often log event ack/failure counters.

---

### B) Auth/subscription brittleness (stealth preload, schema drift, bearer pairing edge cases)
**1) Remove “silent failure” patterns in auth boot paths (High impact, low difficulty)**
- When a feature is enabled (e.g., stealth Claude path) but required artifact is missing, **log a loud warning** and surface it in UI diagnostics (not a downstream parse error).
- Similar: Next.js and Vite warn loudly on missing env vars/import targets in dev.

**2) Consolidate auth schema + add versioned migrations (High impact, medium difficulty)**
- Version `auth.json` (and related local auth artifacts) with a `schemaVersion` and run migrations at boot.
- Similar: VS Code settings migrations; Prisma migrations for config-like state.

**3) Add end-to-end “fresh clone subscription auth” CI smoke (Medium impact, medium difficulty)**
- A minimal CI job that simulates the subscription token presence and asserts a successful API call path (or at least correct interceptor installation).
- Similar: many SDKs run “golden path” auth tests against mocked endpoints.

---

### C) Database/schema drift (plugin-sql migrator vs abstract schema)
**1) Single source of truth for schema generation (High impact, medium difficulty)**
- Either generate drizzle `pgTable` from abstract schemas or eliminate the parallel abstraction to prevent drift.
- Similar: Prisma schema as the single source; Drizzle Kit generating migrations from a single definition set.

**2) Boot-time schema sanity checks (High impact, low difficulty)**
- On plugin init, verify that all tables referenced by core services exist; if not, fail with a guided remediation (“run migrator,” “update plugin-sql,” etc.).
- Similar: Django checks; Rails startup checks for pending migrations.

**3) Add “fresh PGLite boot chat” test (Medium impact, medium difficulty)**
- Automated test that boots with an empty PGLite store, sends a chat, and asserts no missing-table errors.

---

### D) Headless Linux secrets/keychain stability
**1) Provide an explicit “server profile” defaults (High impact, low difficulty)**
- Document and/or auto-apply: `MILADY_VAULT_DISABLE_KEYCHAIN=1` or require passphrase master key on headless Linux, with a clear startup banner stating which secrets backend is active.
- Similar: GitHub Actions and many CLI tools auto-select non-keychain storage in CI/headless.

**2) Expand guardrails beyond defaultMasterKey (Medium impact, low difficulty)**
- Ensure any direct `osKeychainMasterKey()` call paths are guarded consistently (avoid regressions where callers bypass the safe chain).
- Similar: centralized credential provider chain patterns in AWS SDK.

**3) Operational docs: threat model + recommended deployments (Medium impact, medium difficulty)**
- A short “Secrets & Deployments” guide: keychain vs passphrase vs external password managers; container best practices.

---

### E) ESM migration / plugin build churn
**1) Publish a compatibility matrix + codemod (High impact, medium difficulty)**
- “Plugin author checklist”: ESM-only guidelines, package.json fields, tsdown settings, example plugin template.
- Provide a codemod or repo template that generates correct ESM outputs by default.
- Similar: SvelteKit / Next.js upgrade guides; ESLint configs with fixers.

**2) Add plugin CI that fails on CJS/ESM mismatches early (Medium impact, low-to-medium difficulty)**
- A shared GitHub Action in plugin repos that attempts to import the built artifact under Node + Bun ESM constraints.

---

### F) Long-lived memory rot (emerging, strategic)
**1) First-class “memory reconciliation job” primitive (High impact, higher difficulty)**
- Implement a scheduled reconciliation pass: freshness gates on claims, diffs across sources, re-embedding under current ontology (as described by sentient_dawn).
- Similar: search index reindexing jobs; Kubernetes reconciliation loops.

**2) Add memory observability (Medium impact, medium difficulty)**
- Track contradiction rates, stale-fact detection hits, and last-reconciled timestamps per agent.

---

## 4) Communication Gaps (Expectations vs reality)

### Recurring expectation mismatches
- **“Enabled connector == reliable replies”**: Telegram can appear “working” while silently dropping messages (#7245). Users need explicit warnings when multiple pollers are active.
- **“Fresh clone should just work”**: missing preload file for Anthropic subscription flow (#7210) contradicts developer expectations and wastes debugging time.
- **“Server-safe by default”**: headless Linux segfault from keychain access (#7230) is surprising and undermines trust for production deployments.

### Recurring questions / implied confusion signals
- “Why do I get 401 with a valid token?” (Anthropic stealth missing / wrong interceptor path, plus general auth schema drift).
- “Why does Telegram respond sometimes but not always?” (dual pollers, launch bugs, token bridge issues).
- “Why does chat fail with parsing errors on first message?” (missing SQL tables causing empty state + parser retries, #7222).

### Suggested documentation/onboarding improvements
- Add a **“First 30 minutes self-hosted”** guide that includes:
  - Required env vars and where they live now (vault vs config.env vs process.env).
  - Connector single-owner rule (especially Telegram).
  - Headless Linux secrets guidance (passphrase master key required unless keychain available).
- Add a **Troubleshooting decision tree**:
  - 401 auth → check stealth/interceptor installed → check token kind → check auth.json version.
  - Telegram flaky → check for multiple pollers → check runtime logs for both plugin + wrapper startup lines.
  - Fresh DB errors → verify migrations and table existence.

---

## 5) Community Engagement Insights

### Power users / high-signal contributors (and what they need)
- **Sw4pIO (GitHub issues #7210, #7222, #7245):** provides extremely actionable, repro-heavy reports. Needs:
  - Faster triage loop, clear ownership of integration areas (Telegram/auth/sql).
  - A place to contribute fixes safely (maintainer guidance, “good first PR” pointers tied to their reports).
- **NubsCarson (multiple merged PRs: #7212, #7230, #7232):** focuses on cross-platform reliability and CI. Needs:
  - Clear product priorities so infra work aligns with user pain (connectors + server profile).
- **sentient_dawn (Discord memory rot research):** advanced long-lived agent reliability work. Needs:
  - A formal RFC path to land memory reconciliation concepts upstream.
- **shawmakesmagic / Shaw:** drives v3 architecture direction and real-world demos (robotics). Needs:
  - Structured feedback loops to validate which v3 promises are most urgent (connectors, self-hosting, monetization).

### Newcomer friction points (inferred)
- Self-hosting and connector setup are still too easy to misconfigure (multiple pollers, env var differences, auth modes).
- Errors often surface far from root cause (parse error after missing tables; 401 after missing interceptor).

### Converting passive users into contributors
- Create “Connector Triage Days” (Telegram/Discord): assign owners, label issues, publish weekly status.
- Add a lightweight **RFC + field report** channel/thread (e.g., “Memory reliability RFCs”) to capture research like sentient_dawn’s into actionable roadmap items.
- Provide “repro bounty” recognition: highlight best bug reports (like #7222) and offer fast-path review for PRs that fix them.

---

## 6) Feedback Collection Improvements

### Current channel effectiveness
- **Discord summaries** captured are heavy on announcements and light on support interactions; they do not reflect day-to-day friction well.
- **GitHub issues** are high quality and reproducible, but represent a subset: mostly developers/power users.

### Improvements for more structured, actionable feedback
1) **Standardized “Connector Bug Report” template** (Telegram/Discord/etc.)
   - Requires: runtime (Bun/Node), OS, whether wrapper is used, token source, polling/webhook mode, logs showing which pollers started.
2) **In-app “Export diagnostics” bundle**
   - Redacts secrets; includes connector status, enabled plugins, schema versions, last migration run, and recent errors.
3) **Weekly short survey embedded in app-core UI (optional)**
   - 3 questions: “What did you try to do?”, “Did it work?”, “If not, what blocked you?” + auto-attach anonymized environment metadata.

### Underrepresented segments whose feedback is missing
- **Non-Discord users** (self-hosters who never join community).
- **Less-technical builders** using “app store / Milady” flows who may churn before filing GitHub issues.
- **Production operators** (SRE/DevOps) who care about logs, metrics, and uptime but may not report unless there’s a clear ops channel.

---

## Prioritized High-Impact Actions (next 1–3 weeks)

1) **Fix Telegram “single owner” polling + add explicit detection/warnings** (blocks user trust; prevents silent drops).  
2) **Add boot-time sanity checks (SQL tables + auth artifacts) with fail-loud messaging** (reduces multi-hour debug loops; improves first-run success).  
3) **Publish a “Server Profile” deployment guide + defaults (headless secrets, vault, env vars, connectors)** (aligns expectations for VPS/container usage).  
4) **Ship a plugin/connector compatibility matrix for ESM + a minimal plugin author template** (reduces ecosystem churn and repeated build breaks).  
5) **Start an RFC track for long-lived agent memory maintenance (reconciliation + freshness)** (strategic reliability differentiator; leverage sentient_dawn’s proven approach).