# Council Briefing: 2025-01-21

## Monthly Goal

December 2025: Execution excellence—complete token migration with high success rate, launch ElizaOS Cloud, stabilize flagship agents, and build developer trust through reliability and clear documentation.

## Daily Focus

- The fleet surged in plugin and infrastructure throughput, but the Council’s strategic bottleneck remains reliability/DX: model selection, database startup, and install friction are eroding trust faster than new capabilities can redeem it.

## Key Points for Deliberation

### 1. Topic: Reliability & DX Triage (Config, DB, Install)

**Summary of Topic:** Operational chatter indicates recurring failures in basic onboarding paths: model selection flags are ignored, SQLite/Supabase adapters fail unpredictably (notably with node plugin), and package install/start failures continue to spawn new issues—directly conflicting with the execution-excellence directive.

#### Deliberation Items (Questions):

**Question 1:** Which reliability defect should be declared a Priority-0 “ship-stopper” for the next release train to protect developer trust?

  **Context:**
  - `Discord (2025-01-20, coders): Users reported character files with "model": "small" still default to large models (configuration confusion).`
  - `Discord (2025-01-20, coders): "Database connection not open" / SQLite connection problems, especially with node plugin.`

  **Multiple Choice Answers:**
    a) Fix model selection and modelClass enforcement (small/medium/large mapping) end-to-end.
        *Implication:* Reduces surprise cost/latency and restores configuration credibility—critical for Cloud and enterprise adoption.
    b) Stabilize database adapters and node plugin startup (SQLite + Supabase) with deterministic defaults and clearer errors.
        *Implication:* Improves first-run success rate and lowers support load, directly increasing builder retention.
    c) Resolve package installation/start failures (npm/pnpm packaging, missing modules, model download failures) via a hardened quickstart path.
        *Implication:* Maximizes onboarding throughput, but may defer deeper runtime correctness issues that reappear later.
    d) Other / More discussion needed / None of the above.

**Question 2:** Do we formalize a single blessed “golden path” repo (main eliza) and effectively deprecate eliza-starter until it meets reliability targets?

  **Context:**
  - `Discord (2025-01-20, coders): Community advised using main eliza repository instead of eliza-starter due to dependency issues.`

  **Multiple Choice Answers:**
    a) Yes—declare main repo the golden path; mark eliza-starter as experimental until parity is restored.
        *Implication:* Short-term clarity and fewer broken installs; potential backlash from starter users but less fragmentation.
    b) No—invest immediately to fix eliza-starter and keep it as the primary onboarding path.
        *Implication:* Better long-term onboarding UX, but consumes bandwidth that could stabilize core runtime and Cloud launch.
    c) Hybrid—golden path is main repo now; starter remains supported only for a narrow “hello agent” scenario with CI gates.
        *Implication:* Balances focus and clarity, while keeping an entry ramp for non-experts without overpromising.
    d) Other / More discussion needed / None of the above.

**Question 3:** What is the Council’s minimum acceptable “first-run success rate” and what enforcement mechanism do we adopt to achieve it?

  **Context:**
  - `GitHub Daily Update (2025-01-21): New issues include inability to install `@elizaos/agent` (#2624) and agent start failures due to model download failures (#2623).`

  **Multiple Choice Answers:**
    a) Set a hard gate: ≥90% first-run success in CI smoke tests across OS targets before release.
        *Implication:* Strong trust signal, but may slow feature velocity and require test infra expansion.
    b) Set a soft target: ≥75% success with rapid hotfix cadence and transparent known-issues ledger.
        *Implication:* Keeps shipping momentum, but risks continued churn and reputational drag.
    c) Segmented targets: 95% for Cloud path, 70% for self-host; prioritize commercial reliability first.
        *Implication:* Optimizes for revenue and managed UX, but may alienate open-source self-hosters if neglected.
    d) Other / More discussion needed / None of the above.

---


### 2. Topic: Throughput vs Coherence (Plugin Expansion & Governance of Quality)

**Summary of Topic:** The ecosystem is adding plugins at a high tempo (NIM, Cronos EVM, router nitro, holdstation swap, MongoDB adapter, etc.), but without stronger quality gates this growth can amplify support burden and reduce perceived reliability—contradicting “Execution Excellence.”

#### Deliberation Items (Questions):

**Question 1:** How should the Council govern plugin intake to preserve composability while preventing reliability debt from exploding?

  **Context:**
  - `GitHub Activity (Jan 20–22): "29 new pull requests (19 merged)... jump to 66 active contributors" (rapid intake).`
  - `Daily Report (2025-01-20): Multiple new plugins landed (e.g., NVIDIA NIM #2599, Holdstation swap #2596, Router Nitro #2590, Cronos EVM #2585).`

  **Multiple Choice Answers:**
    a) Adopt strict plugin admission standards: tests + minimal docs + security review required before merge/registry inclusion.
        *Implication:* Higher trust and lower breakage, but reduces contributor velocity and increases maintainer workload.
    b) Two-tier system: “Core/Verified” plugins with high gates; “Community/Experimental” plugins with lightweight gates and clear labeling.
        *Implication:* Preserves innovation while protecting newcomers; requires consistent labeling and registry tooling.
    c) Max velocity: merge quickly, rely on community to surface issues; fix regressions post-merge.
        *Implication:* Short-term expansion, long-term support overload and perceived instability—risks North Star alignment.
    d) Other / More discussion needed / None of the above.

**Question 2:** Do we pause net-new plugins for a defined stabilization window to align with execution excellence, or keep parallel lanes?

  **Context:**
  - `Discord (2025-01-20): Team prioritizing V2 development over PR activities; ongoing backlog includes model selection + DB issues.`

  **Multiple Choice Answers:**
    a) Pause net-new plugins for 1–2 sprints; focus on core stability, docs, and onboarding success rate.
        *Implication:* Improves reliability quickly, but may dampen community excitement and partner integrations.
    b) Parallel lanes: core team stabilizes; community plugins continue under a strict “experimental” banner.
        *Implication:* Maintains momentum while protecting core; requires clear governance and moderation bandwidth.
    c) No pause; rely on tooling (CI, linters, bots) to keep quality acceptable at scale.
        *Implication:* Works only if automation coverage is strong; otherwise risks repeated regressions and contributor frustration.
    d) Other / More discussion needed / None of the above.

---


### 3. Topic: Model & Provider Strategy (DeepSeek R1, NVIDIA NIM, Cost/Performance)

**Summary of Topic:** Community signal indicates a strategic opening: DeepSeek R1 claims near-frontier reasoning at drastically lower cost with permissive licensing, while NVIDIA NIM integration expands provider optionality—yet model selection bugs and inconsistent provider behavior undermine the ability to exploit these options safely.

#### Deliberation Items (Questions):

**Question 1:** Should the Council elevate DeepSeek R1 integration to a strategic priority, and if so, what role should it play (default vs optional vs Cloud-only)?

  **Context:**
  - `Discord (2025-01-20, partners/coders): "DeepSeek's R1... O1/Sonnet-level performance at 30x lower cost with MIT licensing."`
  - `Daily Report (2025-01-20): DeepSeek provider support and related fixes appear in the repo activity stream.`

  **Multiple Choice Answers:**
    a) Make R1 a first-class, documented option and recommend it for cost-optimized deployments.
        *Implication:* Increases competitiveness and developer delight, but increases surface area for provider-specific bugs.
    b) Keep R1 experimental until model selection + provider parity issues are resolved.
        *Implication:* Protects reliability narrative; may miss a window to capture builders seeking cheaper reasoning.
    c) Offer R1 primarily via ElizaOS Cloud with curated configs and guardrails; keep self-host optional.
        *Implication:* Turns provider advantage into managed UX and revenue leverage, but may be seen as gating capability.
    d) Other / More discussion needed / None of the above.

**Question 2:** How do we reconcile “Open & Composable” with an exploding matrix of providers (OpenAI/Anthropic/DeepSeek/NVIDIA NIM/etc.) without sacrificing reliability?

  **Context:**
  - `GitHub Daily Update (2025-01-21): Added NVIDIA NIM plugin (#2599) and multiple provider-related improvements.`
  - `Discord (2025-01-20): Users report provider-specific failures (e.g., Anthropic issues in Discord; switching to OpenAI resolved an error).`

  **Multiple Choice Answers:**
    a) Define a provider compatibility contract (streaming, tools, vision, embeddings) and certify providers against it.
        *Implication:* Creates a reliable composability baseline and supports future certification programs.
    b) Limit official support to a small set of “Council-approved” providers; others remain community-supported.
        *Implication:* Reduces QA load, but constrains openness and may slow ecosystem growth.
    c) Embrace full provider plurality; invest in runtime adapters and robust fallback logic to smooth differences.
        *Implication:* Most aligned with openness, but demands significant engineering investment in abstraction and testing.
    d) Other / More discussion needed / None of the above.

**Question 3:** What is our canonical performance target: lower cost per agent, lower latency, or higher autonomy (memory/RAG/tooling), given current community pain points?

  **Context:**
  - `Discord (2025-01-20, coders): Need for better memory management so agents persist data between messages.`
  - `Discord (2025-01-20): Model selection confusion causing unintended use of large models (cost/latency risk).`

  **Multiple Choice Answers:**
    a) Prioritize cost control (correct model selection + cheaper reasoning providers) to maximize adoption.
        *Implication:* Boosts builder experimentation and Cloud unit economics, but may leave autonomy gaps unresolved.
    b) Prioritize autonomy (memory/RAG correctness and persistence) even if cost/latency stays higher short-term.
        *Implication:* Improves flagship-agent credibility and “agents that work,” but may reduce casual developer adoption.
    c) Prioritize latency/UX (streaming, responsiveness, client stability) to make agents feel alive across platforms.
        *Implication:* Strengthens perceived quality and retention, but without autonomy gains agents may remain shallow.
    d) Other / More discussion needed / None of the above.