{
  "interval": {
    "intervalStart": "2025-07-25T00:00:00.000Z",
    "intervalEnd": "2025-07-26T00:00:00.000Z",
    "intervalType": "day"
  },
  "repository": "elizaos/eliza",
  "overview": "From 2025-07-25 to 2025-07-26, elizaos/eliza had 0 new PRs (0 merged), 1 new issues, and 2 active contributors.",
  "topIssues": [
    {
      "id": "I_kwDOMT5cIs7AVn-q",
      "title": "Ticket Spec: `feat(scenarios): Implement final judgment and user-facing reports`",
      "author": "linear",
      "number": 5579,
      "repository": "elizaos/eliza",
      "body": "**Issue Title**: `feat(scenarios): Implement final judgment and user-facing reports`\n\n**Tags**: `cli`, `scenarios`, `feature`, `reporting`\n\n#### **Description**\n\nThis is the final ticket to complete the core functionality of the Scenario Runner. Its purpose is to transition from printing raw development objects to producing a clean, polished, human-readable report for the user.\n\nThis involves two key pieces of work:\n\n1. **Judgment Logic**: Implementing the logic defined in the `judgment` block of the scenario file. This will aggregate the individual results from the Evaluation Engine into a single, final `PASS` or `FAIL` outcome for the entire scenario.\n2. **Reporting**: Creating a dedicated `Reporter` class to format and display all information about the run—from setup to execution to final results—in a structured and aesthetically pleasing way in the console.\n\nThe outcome of this ticket will be a production-ready user experience for the `elizaos scenario run` command.\n\n#### **Acceptance Criteria**\n\n1. A new `Reporter` class is created in `packages/cli/src/scenarios/Reporter.ts`.\n2. The `handleRunScenario` function is refactored to use the `Reporter` for all console output, removing all intermediate `console.log` statements.\n3. The logic for the `judgment.strategy` is implemented. For `all_pass`, the scenario only passes if every single evaluation result is `success: true`.\n4. A final `SCENARIO STATUS` of `✅ PASS` or `❌ FAIL` is calculated and displayed at the end of the report.\n5. The CLI process exits with code `0` if the final scenario status is `PASS`.\n6. The CLI process exits with code `1` if the final scenario status is `FAIL`.\n7. The final console output is well-structured, using headers, indentation, and status icons (e.g., ✅, ❌, ℹ️) for clarity.\n\n#### **Technical Approach**\n\n**1. Create the** `Reporter` Class\n\nThis class will manage all `stdout`. Using a dedicated class makes it easy to change the reporting format in the future without altering the core runner logic.\n\n* **Create file**: `packages/cli/src/scenarios/Reporter.ts`\n\n```typescript\n// packages/cli/src/scenarios/Reporter.ts\nimport { Scenario } from './schema';\nimport { ExecutionResult } from './providers';\nimport { EvaluationResult } from './EvaluationEngine';\nimport chalk from 'chalk'; // Add chalk for colored output: bun add chalk\nexport class Reporter {\n  \n  public reportStart(scenario: Scenario) {\n    console.log(chalk.bold.cyan(`\\n▶️ RUNNING SCENARIO: ${scenario.name}`));\n    console.log(chalk.gray(`  ${scenario.description}\\n`));\n  }\n  public reportExecutionResult(result: ExecutionResult) {\n    console.log(chalk.bold('--- Execution Output ---'));\n    if (result.stdout) {\n      console.log(chalk.green('STDOUT:'));\n      console.log(result.stdout.trim().split('\\n').map(l => `  | ${l}`).join('\\n'));\n    }\n    if (result.stderr) {\n      console.log(chalk.yellow('STDERR:'));\n      console.log(result.stderr.trim().split('\\n').map(l => `  | ${l}`).join('\\n'));\n    }\n    console.log(chalk.bold('------------------------\\n'));\n  }\n  public reportEvaluationResults(results: EvaluationResult[]) {\n    console.log(chalk.bold('--- Evaluation Results ---'));\n    if (results.length === 0) {\n      console.log(chalk.gray('  No evaluations were run.'));\n    }\n    results.forEach(res => {\n      const status = res.success \n        ? chalk.green('✅ PASS') \n        : chalk.red('❌ FAIL');\n      console.log(`${status}: ${res.message}`);\n    });\n    console.log(chalk.bold('------------------------\\n'));\n  }\n  public reportFinalResult(finalSuccess: boolean) {\n    const finalStatus = finalSuccess\n      ? chalk.bold.green('✅ PASS')\n      : chalk.bold.red('❌ FAIL');\n    \n    console.log(chalk.bold.cyan(`SCENARIO STATUS: ${finalStatus}`));\n  }\n}\n```\n\n**2. Integrate Judgment and Reporting into the CLI Command**\n\nRefactor the main handler to use the `Reporter` and implement the final judgment logic.\n\n* **Modify file**: `packages/cli/src/commands/scenario.ts`\n\n```typescript\n// packages/cli/src/commands/scenario.ts\n// ... imports\nimport { Reporter } from '../scenarios/Reporter';\n// ...\nasync function handleRunScenario(args: ScenarioRunArgs, runtime: AgentRuntime) {\n  // --- SETUP ---\n  const reporter = new Reporter();\n  let finalStatus = false; // Default to fail\n  \n  // ... (file reading and validation) ...\n  const scenario: Scenario = validationResult.data;\n  reporter.reportStart(scenario);\n  const provider = /* ... provider init ... */;\n  const mockEngine = /* ... mock engine init ... */;\n  const evaluationEngine = /* ... evaluation engine init ... */;\n  try {\n    // --- MOCKING & SETUP ---\n    mockEngine.applyMocks(scenario.setup?.mocks);\n    await provider.setup(scenario);\n    // --- EXECUTION ---\n    const execResult = await provider.run(scenario);\n    reporter.reportExecutionResult(execResult);\n    // --- EVALUATION ---\n    const evalResults = await evaluationEngine.runEvaluations(scenario.run[0].evaluations, execResult);\n    reporter.reportEvaluationResults(evalResults);\n    // --- JUDGMENT ---\n    if (scenario.judgment.strategy === 'all_pass') {\n      finalStatus = evalResults.every(res => res.success);\n    } else {\n      // Future strategies like 'any_pass' would go here\n      finalStatus = false;\n    }\n  } catch(error: any) {\n    console.error(chalk.red(`\\nFATAL ERROR: An unrecoverable error occurred during the scenario run.`), error);\n    finalStatus = false;\n  } finally {\n    // --- CLEANUP & FINAL REPORT ---\n    mockEngine.revertMocks();\n    await provider.teardown();\n    reporter.reportFinalResult(finalStatus);\n    \n    // Set exit code for CI/CD environments\n    process.exit(finalStatus ? 0 : 1);\n  }\n}\n```\n\n#### **Testing Strategy**\n\nThis ticket primarily involves verifying the console output and process exit code.\n\n1. **Use the** `evaluation-test.scenario.yaml` from the previous ticket. This scenario is designed to have a mix of passing and failing evaluations.\n   * **Run**: `packages/cli/bin/elizaos.js scenario run ./evaluation-test.scenario.yaml`\n   * **Verify**:\n     * The console output should be cleanly formatted according to the `Reporter`'s design.\n     * The \"Evaluation Results\" section should show a mix of `✅ PASS` and `❌ FAIL` messages.\n     * The final `SCENARIO STATUS` must be `❌ FAIL`.\n     * Check the exit code: `echo $?` should return `1`.\n2. **Create a new** `fully-passing.scenario.yaml` where all evaluations are guaranteed to succeed.\n\n   ```yaml\n   name: \"Test Fully Passing Scenario\"\n   environment: { type: 'local' }\n   run:\n     - input: \"echo 'Success'\"\n       evaluations:\n         - type: \"string_contains\"\n           value: \"Success\"\n   judgment:\n     strategy: all_pass\n   ```\n   * **Run**: `packages/cli/bin/elizaos.js scenario run ./fully-passing.scenario.yaml`\n   * **Verify**:\n     * The final `SCENARIO STATUS` must be `✅ PASS`.\n     * Check the exit code: `echo $?` should return `0`.",
      "createdAt": "2025-07-13T22:43:01Z",
      "closedAt": "2025-07-25T19:21:51Z",
      "state": "CLOSED",
      "commentCount": 0
    },
    {
      "id": "I_kwDOMT5cIs7AVnt4",
      "title": "Ticket Spec: `feat(scenarios): Implement evaluation engine and basic evaluators`",
      "author": "linear",
      "number": 5578,
      "repository": "elizaos/eliza",
      "body": "**Issue Title**: `feat(scenarios): Implement evaluation engine and basic evaluators`\n\n**Tags**: `cli`, `scenarios`, `feature`, `evaluation`\n\n#### **Description**\n\nThis ticket implements the core validation logic of the Scenario Runner: the **Evaluation Engine**. After a scenario's `run` block completes, this engine is responsible for executing a series of assertions (`evaluations`) to determine if the agent's actions and response meet the success criteria.\n\nThe work involves creating a generic `EvaluationEngine` that can manage and execute different types of evaluators. We will also implement the initial set of essential evaluators, covering checks against the agent's text response, the state of the execution environment's file system, the agent's internal action history, and even using an LLM as a judge for more nuanced assessments.\n\n#### **Acceptance Criteria**\n\n1. A new `EvaluationEngine` class and a generic `Evaluator` interface are created in `packages/cli/src/scenarios/EvaluationEngine.ts`.\n2. The `EvaluationEngine` maintains a registry of all available evaluator types.\n3. The following evaluators are implemented and registered with the engine:\n   * `string_contains`\n   * `regex_match`\n   * `file_exists`\n   * `trajectory_contains_action`\n   * `llm_judge`\n4. The main `handleRunScenario` function uses the `EvaluationEngine` to run all evaluations defined in the scenario after the `run` block completes.\n5. The results of each evaluation (pass/fail status and a descriptive message) are collected and printed clearly to the console.\n6. **Dependency**: The `ExecutionResult` object returned by `EnvironmentProvider` must be enhanced to include a representation of the final file system state, which is required for the `file_exists` evaluator.\n7. **Dependency**: The `AgentRuntime` must provide a method to access the agent's action history (e.g., `runtime.getActionHistory()`), which is required for the `trajectory_contains_action` evaluator.\n\n#### **Technical Approach**\n\n**1. Enhance** `EnvironmentProvider` Contract\n\nThe `file_exists` evaluator needs to know about files in the environment *after* the run. We must update the provider interface to support this.\n\n* **Modify file**: `packages/cli/src/scenarios/providers.ts`\n\n```typescript\n// packages/cli/src/scenarios/providers.ts\nexport interface ExecutionResult {\n  exitCode: number;\n  stdout: string;\n  stderr: string;\n  // NEW: A map of all file paths to their string content within the execution environment.\n  files: Record<string, string>; \n}\n```\n\n*Note: The* `LocalEnvironmentProvider` and `E2BEnvironmentProvider` tickets must be updated to implement this contract, recursively reading all files from their respective environments after execution.\n\n**2. Define Evaluator Interfaces and Engine**\n\nCreate the core engine and the contracts that all evaluators must follow.\n\n* **Create file**: `packages/cli/src/scenarios/EvaluationEngine.ts`\n\n```typescript\n// packages/cli/src/scenarios/EvaluationEngine.ts\nimport { AgentRuntime } from '@elizaos/core';\nimport { ExecutionResult } from './providers';\nimport { Evaluation as EvaluationSchema } from './schema'; // Zod schema type\nexport interface EvaluationResult {\n  success: boolean;\n  message: string;\n}\nexport interface Evaluator {\n  evaluate(\n    params: EvaluationSchema,\n    runResult: ExecutionResult,\n    runtime: AgentRuntime\n  ): Promise<EvaluationResult>;\n}\nexport class EvaluationEngine {\n  private evaluators = new Map<string, Evaluator>();\n  constructor(private runtime: AgentRuntime) {\n    // Register all known evaluators\n    this.register('string_contains', new StringContainsEvaluator());\n    this.register('regex_match', new RegexMatchEvaluator());\n    this.register('file_exists', new FileExistsEvaluator());\n    this.register('trajectory_contains_action', new TrajectoryContainsActionEvaluator());\n    this.register('llm_judge', new LLMJudgeEvaluator(this.runtime));\n  }\n  \n  private register(type: string, evaluator: Evaluator) {\n    this.evaluators.set(type, evaluator);\n  }\n  public async runEvaluations(\n    evaluations: EvaluationSchema[],\n    runResult: ExecutionResult\n  ): Promise<EvaluationResult[]> {\n    const results: EvaluationResult[] = [];\n    for (const evaluation of evaluations) {\n      const evaluator = this.evaluators.get(evaluation.type);\n      if (!evaluator) {\n        results.push({ success: false, message: `Unknown evaluator type: '${evaluation.type}'` });\n        continue;\n      }\n      const result = await evaluator.evaluate(evaluation, runResult, this.runtime);\n      results.push(result);\n    }\n    return results;\n  }\n}\n// --- IMPLEMENTATIONS ---\n// In the same file for simplicity, or can be broken out.\nclass StringContainsEvaluator implements Evaluator {\n  async evaluate(params, runResult): Promise<EvaluationResult> {\n    if (params.type !== 'string_contains') throw new Error('Mismatched evaluator');\n    const success = runResult.stdout.includes(params.value);\n    return {\n      success,\n      message: `Checked if stdout contains \"${params.value}\". Result: ${success}`,\n    };\n  }\n}\nclass RegexMatchEvaluator implements Evaluator {\n  async evaluate(params, runResult): Promise<EvaluationResult> {\n    if (params.type !== 'regex_match') throw new Error('Mismatched evaluator');\n    const success = new RegExp(params.pattern).test(runResult.stdout);\n    return {\n      success,\n      message: `Checked if stdout matches regex \"${params.pattern}\". Result: ${success}`,\n    };\n  }\n}\nclass FileExistsEvaluator implements Evaluator {\n  async evaluate(params, runResult): Promise<EvaluationResult> {\n    if (params.type !== 'file_exists') throw new Error('Mismatched evaluator');\n    // Assumes 'file_exists' schema has a 'path' property\n    const success = Object.keys(runResult.files).includes(params.path);\n    return {\n      success,\n      message: `Checked if file \"${params.path}\" exists. Result: ${success}`,\n    };\n  }\n}\nclass TrajectoryContainsActionEvaluator implements Evaluator {\n  async evaluate(params, runResult, runtime): Promise<EvaluationResult> {\n    if (params.type !== 'trajectory_contains_action') throw new Error('Mismatched evaluator');\n    // Assumes runtime has a method to get action history\n    const history = await runtime.getActionHistory(); // This method needs to be implemented on AgentRuntime\n    const success = history.some(event => event.action === params.action);\n    return {\n      success,\n      message: `Checked if action trajectory contains \"${params.action}\". Result: ${success}`,\n    };\n  }\n}\nclass LLMJudgeEvaluator implements Evaluator {\n  constructor(private runtime: AgentRuntime) {}\n  async evaluate(params, runResult): Promise<EvaluationResult> {\n    if (params.type !== 'llm_judge') throw new Error('Mismatched evaluator');\n    const fullPrompt = `${params.prompt}\\n\\nAgent Response:\\n${runResult.stdout}\\n\\nWas the expectation of \"${params.expected}\" met? Respond with only \"yes\" or \"no\".`;\n    const response = await this.runtime.llm.prompt(fullPrompt); // Assumes runtime.llm.prompt exists\n    const success = response.toLowerCase().includes(params.expected.toLowerCase());\n    return {\n      success,\n      message: `LLM Judge assessment for prompt \"${params.prompt}\". Result: ${success}`,\n    };\n  }\n}\n```\n\n*Note: The schema file (*`schema.ts`) will also need to be updated to include definitions for `file_exists` and its `path` parameter.\n\n**3. Integrate into CLI Command Handler**\n\n* **Modify file**: `packages/cli/src/commands/scenario.ts`\n\n```typescript\n// packages/cli/src/commands/scenario.ts\n// ... imports\nimport { EvaluationEngine } from '../scenarios/EvaluationEngine';\n// ...\nasync function handleRunScenario(args: ScenarioRunArgs, runtime: AgentRuntime) {\n  // ... (provider and mock engine setup) ...\n  \n  const evaluationEngine = new EvaluationEngine(runtime);\n  try {\n    // ... (setup and run logic)\n    const result = await provider.run(scenario);\n    console.log('--- Execution Result ---');\n    console.log(JSON.stringify(result, null, 2));\n    // --- NEW EVALUATION LOGIC ---\n    elizaLogger.info('Running evaluations...');\n    const evaluationResults = await evaluationEngine.runEvaluations(scenario.run[0].evaluations, result);\n    console.log('--- Evaluation Results ---');\n    evaluationResults.forEach(res => {\n        const status = res.success ? '✅ PASS' : '❌ FAIL';\n        console.log(`${status}: ${res.message}`);\n    });\n    console.log('--------------------------');\n  } catch(error) {\n    // ...\n  } finally {\n    // ...\n  }\n}\n```\n\n#### **Testing Strategy**\n\n1. **Create a comprehensive test scenario (**`evaluation-test.scenario.yaml`): This file should be designed to test all new evaluators.\n\n   ```yaml\n   name: \"Test Evaluation Engine\"\n   environment: { type: 'local' }\n   setup:\n     virtual_fs:\n       \"start.txt\": \"This file already exists.\"\n   run:\n     - input: \"echo 'Agent response.' && echo 'extra line' > result.txt\"\n       evaluations:\n         # Test Pass\n         - type: \"string_contains\"\n           value: \"Agent response\"\n         # Test Fail\n         - type: \"regex_match\"\n           pattern: \"^Agent response.$\" # Will fail due to multiline\n         # Test Pass\n         - type: \"file_exists\"\n           path: \"result.txt\"\n         # Test Fail\n         - type: \"file_exists\"\n           path: \"nonexistent.txt\"\n         # Test Pass (requires mock for runtime.getActionHistory)\n         - type: \"trajectory_contains_action\"\n           action: \"run_shell_command\"\n         # Test Pass (requires mock for runtime.llm)\n         - type: \"llm_judge\"\n           prompt: \"Did the agent respond?\"\n           expected: \"yes\"\n   judgment:\n     strategy: all_pass\n   ```\n2. **Mock Runtime Dependencies**: In the test runner for this specific scenario, manually mock `runtime.getActionHistory()` and `runtime.llm.prompt()` to return expected values so the evaluators can be tested in isolation.\n3. **Run the test** and verify the console output. The `Evaluation Results` block should show a clear `PASS` or `FAIL` status for each of the six evaluations, matching the expected outcomes defined in the test case.",
      "createdAt": "2025-07-13T22:42:04Z",
      "closedAt": "2025-07-25T19:21:49Z",
      "state": "CLOSED",
      "commentCount": 0
    },
    {
      "id": "I_kwDOMT5cIs7AVnVW",
      "title": "Ticket Spec: `feat(scenarios): Implement mock engine for service calls`",
      "author": "linear",
      "number": 5577,
      "repository": "elizaos/eliza",
      "body": "**Issue Title**: `feat(scenarios): Implement mock engine for service calls`\n\n**Tags**: `cli`, `scenarios`, `feature`, `mocking`, `services`\n\n#### **Description**\n\nThis ticket introduces one of the most critical features of the Scenario Runner: the ability to mock service method calls. This allows us to isolate the agent's reasoning and planning capabilities from the external services it depends on. By replacing live API calls with deterministic, predefined responses, we can create fast, reliable, and repeatable tests that can run in any environment without side effects.\n\nThe core task is to develop a `MockEngine` that intercepts method calls on services requested via `runtime.getService()`. It will check if a mock is defined in the scenario's `setup.mocks` array for that specific call. If a match is found, it returns the mocked response; otherwise, it allows the original method to execute. This engine must also support conditional mocking based on the arguments passed to the method.\n\n#### **Acceptance Criteria**\n\n1. A new `MockEngine` class is created in `packages/cli/src/scenarios/MockEngine.ts`.\n2. The `handleRunScenario` function uses the `MockEngine` to apply mocks from the `setup.mocks` array before the `run` block is executed.\n3. When a scenario with a simple mock is run, calls to the specified service and method are intercepted, and the predefined `response` is returned to the agent's code. The original service method is **not** called.\n4. When a scenario uses a mock with a `when` clause, the mock is only applied if the arguments passed to the method deep-equal the arguments specified in the `when` clause.\n5. If a call is made that does *not* match a `when` clause, the call proceeds to the original, un-mocked service method.\n6. All mocks are cleanly reverted and removed after the scenario run completes (either successfully or in failure), ensuring no state leaks between tests.\n\n#### **Technical Approach**\n\nThe most robust and least intrusive way to implement this is by decorating the `AgentRuntime` instance at runtime. We will use a `Proxy`-based approach to wrap services as they are requested, which allows for transparent interception of method calls.\n\n**1. Create the** `MockEngine`\n\nThis class will encapsulate all mocking logic. We'll use the `lodash` library for reliable deep equality checks.\n\n* **Add dependency**: `bun add lodash @types/lodash`\n* **Create file**: `packages/cli/src/scenarios/MockEngine.ts`\n\n```typescript\n// packages/cli/src/scenarios/MockEngine.ts\nimport { AgentRuntime } from '@elizaos/core';\nimport { Scenario } from './schema';\nimport _ from 'lodash';\ntype MockDefinition = NonNullable<NonNullable<Scenario['setup']>['mocks']>[0];\nexport class MockEngine {\n  private originalGetService: AgentRuntime['getService'];\n  constructor(private runtime: AgentRuntime) {\n    this.originalGetService = this.runtime.getService.bind(this.runtime);\n  }\n  public applyMocks(mocks: MockDefinition[] = []) {\n    if (mocks.length === 0) return;\n    // A map to store mocks for efficient lookup. Key: \"serviceName.methodName\"\n    const mockRegistry = new Map<string, MockDefinition[]>();\n    for (const mock of mocks) {\n      const key = `${mock.service}.${mock.method}`;\n      if (!mockRegistry.has(key)) {\n        mockRegistry.set(key, []);\n      }\n      mockRegistry.get(key)!.push(mock);\n    }\n    // Replace the original getService with our mocked version\n    this.runtime.getService = <T>(name: string): T => {\n      const originalService = this.originalGetService<T>(name);\n      \n      // Return a proxy for the service. This proxy intercepts all method calls.\n      return new Proxy(originalService as any, {\n        get: (target, prop: string, receiver) => {\n          const key = `${name}.${prop}`;\n          \n          if (!mockRegistry.has(key)) {\n            // No mock for this method, return the original.\n            return Reflect.get(target, prop, receiver);\n          }\n          // Return a new function that will perform the mock logic\n          return (...args: any[]) => {\n            const potentialMocks = mockRegistry.get(key)!;\n            \n            // Find a conditional mock that matches the arguments\n            const conditionalMock = potentialMocks.find(m => \n              m.when && _.isEqual(args, m.when.args) // Assumes 'when' has an 'args' array\n            );\n            if (conditionalMock) {\n              return Promise.resolve(conditionalMock.response);\n            }\n            // Find a generic (non-conditional) mock\n            const genericMock = potentialMocks.find(m => !m.when);\n            if (genericMock) {\n              return Promise.resolve(genericMock.response);\n            }\n            // No matching mock found, call the original method\n            return Reflect.get(target, prop, receiver)(...args);\n          };\n        },\n      }) as T;\n    };\n  }\n  public revertMocks() {\n    // Restore the original getService method to clean up.\n    this.runtime.getService = this.originalGetService;\n  }\n}\n```\n\n**2. Update the** `when` clause in the Schema\n\nThe schema needs to be slightly more specific about the shape of the `when` clause.\n\n* **Modify file**: `packages/cli/src/scenarios/schema.ts`\n\n```typescript\n// packages/cli/src/scenarios/schema.ts\n// ... (imports)\nconst MockSchema = z.object({\n  service: z.string(),\n  method: z.string(),\n  // The 'when' clause specifically checks arguments now.\n  when: z.object({\n    args: z.array(z.any())\n  }).optional(),\n  response: z.any(),\n});\n// ... (rest of the file)\n```\n\n**3. Integrate into the CLI Command Handler**\n\nWire the `MockEngine` into the main execution flow.\n\n* **Modify file**: `packages/cli/src/commands/scenario.ts`\n\n```typescript\n// packages/cli/src/commands/scenario.ts\n// ... imports\nimport { MockEngine } from '../scenarios/MockEngine';\n// ...\nasync function handleRunScenario(args: ScenarioRunArgs, runtime: AgentRuntime) {\n  // ... (file reading and validation logic)\n  const scenario: Scenario = validationResult.data;\n  \n  const provider = /* ... provider initialization ... */;\n  const mockEngine = new MockEngine(runtime); // Create instance\n  try {\n    elizaLogger.info('Applying mocks...');\n    mockEngine.applyMocks(scenario.setup?.mocks); // Apply mocks\n    elizaLogger.info(`Setting up '${scenario.environment.type}' environment...`);\n    await provider.setup(scenario);\n    elizaLogger.info('Executing run block...');\n    const result = await provider.run(scenario);\n    // ... print execution result ...\n  } catch(error) {\n    elizaLogger.error('An error occurred during scenario execution:', error);\n  } finally {\n    elizaLogger.info('Reverting mocks and tearing down environment...');\n    mockEngine.revertMocks(); // Crucial cleanup step\n    await provider.teardown();\n  }\n}\n```\n\n#### **Testing Strategy**\n\n1. **Assume an Action**: For testing, assume an action exists like `ask_github(repo: string)` which internally calls `githubService.searchIssues(repo)`. The `run.input` will be a prompt that triggers this action.\n2. **Create Test Scenario (**`mock-test.scenario.yaml`):\n\n   ```yaml\n   name: \"Test Mocking Engine\"\n   environment: { type: 'local' } # or e2b\n   plugins: [ '@elizaos/plugin-github' ] # Example plugin\n   setup:\n     mocks:\n       # A conditional mock\n       - service: \"github-service\"\n         method: \"searchIssues\"\n         when:\n           args: [ \"elizaos/eliza\" ]\n         response:\n           - { title: \"Mocked Issue for Eliza\" }\n       # A generic mock\n       - service: \"github-service\"\n         method: \"searchIssues\"\n         response:\n           - { title: \"Generic Mocked Issue\" }\n   run:\n     - input: \"Tell me about issues in the elizaos/eliza repo\"\n     # In a later ticket, evaluations will check the agent's response.\n     # For now, we check console logs.\n     - input: \"Tell me about issues in some other repo\"\n   #...\n   ```\n3. **Run and Verify**:\n   * Execute the scenario. The `run` block should effectively call `githubService.searchIssues` twice.\n   * **First call**: The log of the agent's response should show `\"Mocked Issue for Eliza\"`.\n   * **Second call**: The log should show `\"Generic Mocked Issue\"` because the arguments don't match the `when` clause.\n   * Add a `console.log` inside the *real* `searchIssues` method in the `plugin-github` source to confirm it is never actually executed during this test run.",
      "createdAt": "2025-07-13T22:40:50Z",
      "closedAt": "2025-07-25T19:21:48Z",
      "state": "CLOSED",
      "commentCount": 0
    },
    {
      "id": "I_kwDOMT5cIs7AVm9B",
      "title": "Ticket Spec: `feat(scenarios): Integrate with @elizaos/plugin-e2b for sandboxed execution`",
      "author": "linear",
      "number": 5576,
      "repository": "elizaos/eliza",
      "body": "**Issue Title**: `feat(scenarios): Integrate with @elizaos/plugin-e2b for sandboxed execution`\n\n**Tags**: `cli`, `scenarios`, `feature`, `environment-provider`, `e2b`\n\n#### **Description**\n\nThis ticket integrates our most critical execution target: the E2B secure sandbox. The work involves creating an `E2BEnvironmentProvider` that acts as a client to the existing `@elizaos/plugin-e2b`.\n\nUnlike the `LocalEnvironmentProvider`, this provider will not implement sandbox logic itself. Instead, it will delegate all sandbox operations (creation, file writing, command execution, and teardown) to the `e2b` service available within the `AgentRuntime`. This ensures a clean separation of concerns, where the Scenario Runner is simply a *client* of the sandbox infrastructure provided by the plugin.\n\nA key part of this task is updating the command handler to be \"runtime-aware,\" allowing it to access the necessary services to instantiate the correct provider.\n\n#### **Acceptance Criteria**\n\n1. A new `packages/cli/src/scenarios/E2BEnvironmentProvider.ts` file is created which implements the `EnvironmentProvider` interface.\n2. The main `handleRunScenario` function is refactored to accept the `AgentRuntime` instance as an argument.\n3. When a scenario specifies `environment: { type: 'e2b' }`, the runner attempts to fetch the `e2b` service from the runtime.\n4. If the `e2b` service is not found, the runner exits with a clear error message instructing the user to install the plugin and configure their API key.\n5. The `E2BEnvironmentProvider` uses the `e2b` service to create a sandbox during `setup`.\n6. The provider correctly implements `setup.virtual_fs` by calling `e2bService.writeFileToSandbox()` for each specified file.\n7. The provider executes the `run.input` command in the sandbox and correctly maps the result (`stdout`, `stderr`, `exitCode`) to the `ExecutionResult` object.\n8. The provider properly terminates the sandbox during `teardown` by calling `e2bService.killSandbox()`.\n9. The implementation respects the `E2B_API_KEY` and `E2B_MODE` environment variables as documented by the `plugin-e2b`.\n\n#### **Technical Approach**\n\n**1. Define the** `E2BService` Interface\n\nBased on the `plugin-e2b` documentation, we can define a type for the service we expect to receive from the runtime. This helps with type safety.\n\n* **Create/Modify file**: `packages/cli/src/scenarios/providers.ts`\n\n```typescript\n// packages/cli/src/scenarios/providers.ts\n// ... (existing interfaces)\n/**\n * Defines the shape of the E2B service we expect from the runtime.\n * Based on the @elizaos/plugin-e2b documentation.\n */\nexport interface E2BService {\n  createSandbox(config: { template?: string; timeoutMs?: number }): Promise<string>; // Returns sandboxId\n  runCommand(sandboxId: string, command: string, rootdir?: string): Promise<{ stdout: string; stderr: string; exitCode: number }>;\n  writeFileToSandbox(sandboxId: string, path: string, content: string): Promise<void>;\n  killSandbox(sandboxId: string): Promise<void>;\n}\n```\n\n**2. Implement the** `E2BEnvironmentProvider`\n\nThis class will contain all the logic for interacting with the E2B plugin.\n\n* **Create file**: `packages/cli/src/scenarios/E2BEnvironmentProvider.ts`\n\n```typescript\n// packages/cli/src/scenarios/E2BEnvironmentProvider.ts\nimport { EnvironmentProvider, ExecutionResult, E2BService } from './providers';\nimport { Scenario } from '../scenarios/schema';\nimport { AgentRuntime } from '@elizaos/core';\nexport class E2BEnvironmentProvider implements EnvironmentProvider {\n  private runtime: AgentRuntime;\n  private e2bService: E2BService;\n  private sandboxId: string | null = null;\n  constructor(runtime: AgentRuntime) {\n    this.runtime = runtime;\n    // Attempt to get the service immediately to fail fast.\n    const service = this.runtime.getService<E2BService>('e2b');\n    if (!service) {\n      throw new Error(\n        \"E2BEnvironmentProvider required, but 'e2b' service was not found in runtime. \" +\n        \"Please ensure @elizaos/plugin-e2b is installed and configured.\"\n      );\n    }\n    this.e2bService = service;\n  }\n  async setup(scenario: Scenario): Promise<void> {\n    this.sandboxId = await this.e2bService.createSandbox({});\n    \n    const virtualFs = scenario.setup?.virtual_fs;\n    if (this.sandboxId && virtualFs) {\n      for (const [filePath, content] of Object.entries(virtualFs)) {\n        await this.e2bService.writeFileToSandbox(this.sandboxId, filePath, content);\n      }\n    }\n  }\n  async run(scenario: Scenario): Promise<ExecutionResult> {\n    if (!this.sandboxId) {\n      throw new Error('E2B sandbox not initialized. Call setup() first.');\n    }\n    const command = scenario.run[0].input;\n    return await this.e2bService.runCommand(this.sandboxId, command);\n  }\n  async teardown(): Promise<void> {\n    if (this.sandboxId) {\n      await this.e2bService.killSandbox(this.sandboxId);\n      this.sandboxId = null;\n    }\n  }\n}\n```\n\n**3. Refactor CLI Command Handler for Runtime Awareness**\n\nThe command handler must now be able to access the `AgentRuntime`. This involves a structural change to how the command is defined in `yargs`.\n\n* **Modify file**: `packages/cli/src/commands/scenario.ts`\n\n```typescript\n// packages/cli/src/commands/scenario.ts\n// ... imports\nimport { AgentRuntime, createAgentRuntime } from '@elizaos/core';\nimport { LocalEnvironmentProvider } from '../scenarios/LocalEnvironmentProvider';\nimport { E2BEnvironmentProvider } from '../scenarios/E2BEnvironmentProvider';\nimport { EnvironmentProvider } from '../scenarios/providers';\n// ... (command, desc, and ScenarioRunArgs interface)\n// The builder function now needs to construct the runtime.\nexport const builder = (yargs: Argv): Argv => {\n  return yargs.command(\n    'run <filePath>',\n    'Execute a scenario from a YAML file.',\n    // ... (yargs options setup is the same)\n    async (argv: ScenarioRunArgs) => {\n      // Create the runtime here. This assumes a default or config-based setup.\n      // This part might need to be adjusted based on how the main 'start' command builds its runtime.\n      const runtime = await createAgentRuntime({ /* ... config ... */ });\n      await handleRunScenario(argv, runtime);\n    }\n  );\n};\n// The handler now accepts the runtime.\nasync function handleRunScenario(args: ScenarioRunArgs, runtime: AgentRuntime) {\n  // ... (file reading and validation logic is the same) ...\n  const scenario: Scenario = validationResult.data;\n  let provider: EnvironmentProvider;\n  try {\n    if (scenario.environment.type === 'e2b') {\n      provider = new E2BEnvironmentProvider(runtime);\n    } else {\n      provider = new LocalEnvironmentProvider();\n    }\n  } catch (error: any) {\n    elizaLogger.error(`Failed to initialize environment provider: ${error.message}`);\n    process.exit(1);\n  }\n  \n  // ... (the rest of the try/catch/finally block for setup/run/teardown is the same) ...\n}\n```\n\n#### **Testing Strategy**\n\n1. **Environment Setup**:\n   * Ensure you have Docker installed and running for `E2B_MODE=local`.\n   * Obtain a valid `E2B_API_KEY` from [e2b.dev](https://e2b.dev) and set it in a `.env` file for testing `E2B_MODE=production`.\n2. **Create a test scenario (**`e2b-exec.scenario.yaml`):\n\n   ```yaml\n   name: \"Test E2B File Execution\"\n   description: \"Verifies the E2BEnvironmentProvider can create a file and run a command.\"\n   environment:\n     type: 'e2b'\n   setup:\n     virtual_fs:\n       \"data.txt\": \"Hello from the sandbox!\"\n   run:\n     - input: \"cat data.txt\"\n       evaluations: [] # Not used yet\n   judgment:\n     strategy: all_pass\n   ```\n3. **Run Test Cases**:\n   * **Local Mode**: `E2B_MODE=local packages/cli/bin/elizaos.js scenario run ./e2b-exec.scenario.yaml`\n     * **Expected**: The command should succeed, and the `Execution Result` should show `{ \"exitCode\": 0, \"stdout\": \"Hello from the sandbox!\", \"stderr\": \"\" }`.\n   * **Production Mode**: `E2B_API_KEY=your_key E2B_MODE=production packages/cli/bin/elizaos.js scenario run ./e2b-exec.scenario.yaml`\n     * **Expected**: Same successful result as local mode.\n   * **Failure Case**: Modify the agent's configuration to *not* include the `@elizaos/plugin-e2b`. Run the same command again.\n     * **Expected**: The command should fail early with the \"service was not found in runtime\" error message.",
      "createdAt": "2025-07-13T22:39:39Z",
      "closedAt": "2025-07-25T19:21:46Z",
      "state": "CLOSED",
      "commentCount": 0
    },
    {
      "id": "I_kwDOMT5cIs7AVmyo",
      "title": "Ticket Spec: `feat(scenarios): Implement local environment provider`",
      "author": "linear",
      "number": 5575,
      "repository": "elizaos/eliza",
      "body": "**Issue Title**: `feat(scenarios): Implement local environment provider`\n\n**Tags**: `cli`, `scenarios`, `feature`, `environment-provider`\n\n#### **Description**\n\nThis ticket marks the transition from parsing scenarios to executing them. The primary goal is to build an extensible framework for running a scenario's `run` block within a specified environment. This will be achieved by creating an `EnvironmentProvider` interface that abstracts away the details of the execution target (e.g., local machine, E2B sandbox).\n\nThe first concrete implementation will be a `LocalEnvironmentProvider`, which executes the scenario's commands directly on the local machine using Node.js's built-in `child_process` and `fs` modules. To ensure clean, isolated runs, each scenario will be executed within a unique temporary directory that is created at the start of the run and destroyed at the end.\n\n#### **Acceptance Criteria**\n\n1. A new `packages/cli/src/scenarios/providers.ts` file is created, defining the `EnvironmentProvider` interface and related data structures (`ExecutionResult`, etc.).\n2. A new `packages/cli/src/scenarios/LocalEnvironmentProvider.ts` file is created, which implements the `EnvironmentProvider` interface.\n3. When a scenario with `environment: { type: 'local' }` is executed, the `LocalEnvironmentProvider` is instantiated and used.\n4. The provider creates a temporary directory (e.g., in `/tmp/eliza-scenario-run-XXXXXX`) for the execution.\n5. If `setup.virtual_fs` is defined, the specified files and their content are created within this temporary directory *before* the `run` block is executed.\n6. The `run.input` command is executed with the temporary directory as its working directory.\n7. The `stdout`, `stderr`, and `exitCode` from the executed command are captured in an `ExecutionResult` object.\n8. After the execution is complete, the temporary directory and all its contents are deleted.\n9. The main `handleRunScenario` function is updated to print the `ExecutionResult` to the console.\n\n#### **Technical Approach**\n\n**1. Define the Provider Interface**\n\nFirst, we will formally define the contracts for any environment provider.\n\n* **Create file**: `packages/cli/src/scenarios/providers.ts`\n\n```typescript\n// packages/cli/src/scenarios/providers.ts\nimport { Scenario } from '../scenarios/schema';\n/**\n * The result of executing the 'run' block in a scenario.\n */\nexport interface ExecutionResult {\n  exitCode: number;\n  stdout: string;\n  stderr: string;\n  // We may add more fields later, like file outputs.\n}\n/**\n * Defines the contract for an environment where a scenario can be executed.\n */\nexport interface EnvironmentProvider {\n  /**\n   * Prepares the environment for the run. This includes creating temporary\n   * directories and seeding the file system.\n   * @param scenario The full scenario definition object.\n   */\n  setup(scenario: Scenario): Promise<void>;\n  /**\n   * Executes the primary run command of the scenario.\n   * @param scenario The full scenario definition object.\n   */\n  run(scenario: Scenario): Promise<ExecutionResult>;\n  /**\n   * Cleans up any resources created during the setup and run phases.\n   */\n  teardown(): Promise<void>;\n}\n```\n\n**2. Implement the** `LocalEnvironmentProvider`\n\nNext, we'll create the concrete implementation for local execution.\n\n* **Create file**: `packages/cli/src/scenarios/LocalEnvironmentProvider.ts`\n\n```typescript\n// packages/cli/src/scenarios/LocalEnvironmentProvider.ts\nimport {\n  EnvironmentProvider,\n  ExecutionResult,\n} from './providers';\nimport { Scenario } from '../scenarios/schema';\nimport fs from 'fs/promises';\nimport path from 'path';\nimport os from 'os';\nimport { exec } from 'child_process';\nimport { promisify } from 'util';\nconst execAsync = promisify(exec);\nexport class LocalEnvironmentProvider implements EnvironmentProvider {\n  private tempDir: string | null = null;\n  async setup(scenario: Scenario): Promise<void> {\n    const tempDirPrefix = path.join(os.tmpdir(), 'eliza-scenario-run-');\n    this.tempDir = await fs.mkdtemp(tempDirPrefix);\n    const virtualFs = scenario.setup?.virtual_fs;\n    if (virtualFs) {\n      for (const [filePath, content] of Object.entries(virtualFs)) {\n        const fullPath = path.join(this.tempDir, filePath);\n        await fs.mkdir(path.dirname(fullPath), { recursive: true });\n        await fs.writeFile(fullPath, content);\n      }\n    }\n  }\n  async run(scenario: Scenario): Promise<ExecutionResult> {\n    if (!this.tempDir) {\n      throw new Error('Setup must be called before run.');\n    }\n    // For now, we only support the first run step.\n    const command = scenario.run[0].input;\n    try {\n      const { stdout, stderr } = await execAsync(command, { cwd: this.tempDir });\n      return { exitCode: 0, stdout, stderr };\n    } catch (error: any) {\n      // 'exec' throws an error for non-zero exit codes.\n      return {\n        exitCode: error.code || 1,\n        stdout: error.stdout,\n        stderr: error.stderr,\n      };\n    }\n  }\n  async teardown(): Promise<void> {\n    if (this.tempDir) {\n      await fs.rm(this.tempDir, { recursive: true, force: true });\n      this.tempDir = null;\n    }\n  }\n}\n```\n\n**3. Integrate into the CLI Command Handler**\n\nFinally, update the main command logic to instantiate and use the provider.\n\n* **Modify file**: `packages/cli/src/commands/scenario.ts`\n\n```typescript\n// packages/cli/src/commands/scenario.ts\n// ... (imports)\nimport { LocalEnvironmentProvider } from '../scenarios/LocalEnvironmentProvider';\nimport { EnvironmentProvider } from '../scenarios/providers';\n// ... (command definition)\nasync function handleRunScenario(args: ScenarioRunArgs) {\n  // ... (file reading and validation logic from previous ticket)\n  // This function now needs to be 'async'\n  \n  const scenario: Scenario = validationResult.data;\n  // --- NEW PROVIDER LOGIC ---\n  let provider: EnvironmentProvider | null = null;\n  if (scenario.environment.type === 'local') {\n    provider = new LocalEnvironmentProvider();\n  } else {\n    elizaLogger.error(`Unsupported environment type: '${scenario.environment.type}'`);\n    process.exit(1);\n  }\n  try {\n    elizaLogger.info(`Setting up '${scenario.environment.type}' environment...`);\n    await provider.setup(scenario);\n    elizaLogger.info('Executing run block...');\n    const result = await provider.run(scenario);\n    console.log('--- Execution Result ---');\n    console.log(JSON.stringify(result, null, 2));\n    console.log('------------------------');\n  } catch(error) {\n    elizaLogger.error('An error occurred during scenario execution:', error);\n  } finally {\n    elizaLogger.info('Tearing down environment...');\n    await provider.teardown();\n  }\n}\n```\n\n#### **Testing Strategy**\n\n1. **Create a test scenario (**`local-exec.scenario.yaml`):\n\n   ```yaml\n   name: \"Test Local File Execution\"\n   description: \"Verifies the LocalEnvironmentProvider can create a file and execute a command.\"\n   environment:\n     type: 'local'\n   setup:\n     virtual_fs:\n       \"input/data.txt\": \"This is a test file.\"\n   run:\n     - input: \"cat input/data.txt\"\n       evaluations: [] # Not used yet\n   judgment:\n     strategy: all_pass\n   ```\n2. **Run the command:**\n   * `packages/cli/bin/elizaos.js scenario run ./local-exec.scenario.yaml`\n3. **Verify the output:**\n   * The `Execution Result` printed to the console should contain `{ \"exitCode\": 0, \"stdout\": \"This is a test file.\", \"stderr\": \"\" }`.\n   * Manually inspect the `/tmp` directory (or OS equivalent) during a paused debug session to confirm the temporary directory is created and subsequently deleted.\n4. **Create a failing test scenario (**`local-fail.scenario.yaml`):\n\n   ```yaml\n   # ... (same as above)\n   run:\n     - input: \"not-a-real-command\"\n   # ...\n   ```\n5. **Run the failing scenario** and verify that the `exitCode` in the output is non-zero and `stderr` contains a \"command not found\" message.",
      "createdAt": "2025-07-13T22:38:59Z",
      "closedAt": "2025-07-25T19:21:45Z",
      "state": "CLOSED",
      "commentCount": 0
    }
  ],
  "topPRs": [],
  "codeChanges": {
    "additions": 0,
    "deletions": 0,
    "files": 0,
    "commitCount": 0
  },
  "completedItems": [],
  "topContributors": [
    {
      "username": "tcm390",
      "avatarUrl": "https://avatars.githubusercontent.com/u/60634884?u=c6c41679b8322eaa0c81f72e0b4ed95e80f0ac16&v=4",
      "totalScore": 21.615519780756337,
      "prScore": 21.615519780756337,
      "issueScore": 0,
      "reviewScore": 0,
      "commentScore": 0,
      "summary": null
    },
    {
      "username": "borisudovicic",
      "avatarUrl": "https://avatars.githubusercontent.com/u/31806472?u=27713fbe603baae91ef519990facbacd6c23e93d&v=4",
      "totalScore": 2,
      "prScore": 0,
      "issueScore": 2,
      "reviewScore": 0,
      "commentScore": 0,
      "summary": null
    },
    {
      "username": "wtfsayo",
      "avatarUrl": "https://avatars.githubusercontent.com/u/82053242?u=98209a1f10456f42d4d2fa71db4d5bf4a672cbc3&v=4",
      "totalScore": 0.2,
      "prScore": 0,
      "issueScore": 0,
      "reviewScore": 0,
      "commentScore": 0.2,
      "summary": null
    }
  ],
  "newPRs": 0,
  "mergedPRs": 0,
  "newIssues": 1,
  "closedIssues": 8,
  "activeContributors": 2
}