openinterpreteropen-interpreter

62,257 stars5,358 forksPythonagpl-3.00 views

Open Interpreter

Open Interpreter is an autonomous agent runtime that translates natural language instructions into executable code to interact with local software and operating systems. It functions as an orchestration framework that connects language models to a secure execution environment, enabling the development of agents capable of managing system resources and performing complex tasks. To ensure safety, the system mandates explicit user verification before executing any generated code and provides robust isolation through containerized sandboxing.

The project distinguishes itself through its deep integration with local environments and its focus on secure, human-in-the-loop automation. It supports a wide range of hosted and local language models, allowing users to balance privacy and performance requirements. Beyond simple script execution, it features vision-enabled automation that analyzes screen content to simulate mouse and keyboard interactions, effectively allowing the agent to navigate graphical user interfaces as a human would.

The system provides a comprehensive suite of computer automation primitives, including tools for managing calendar events, email communications, and clipboard data. It is designed for extensibility, offering support for custom language runtimes and remote sandbox configurations to handle specialized execution needs. Users can manage the interpreter's behavior through detailed configuration settings, including options for stateful conversation persistence and telemetry controls.

The software is distributed as a Python-based package and can be installed and configured to run within isolated container environments to maintain host system security.

Features

LLM Orchestration Frameworks - A modular architecture that connects language models to local execution engines, allowing for flexible model swapping and custom API integration.
Agent Development Frameworks - Building and deploying autonomous agents that can execute code, manage system resources, and interact with external software interfaces.
Autonomous Agent Runtimes - A programmable environment that executes natural language instructions by generating and running code to interact with local software and operating systems.
LLM-Driven Code Generation - Translates natural language instructions into executable code snippets that interact with system APIs and local environments.
Code Execution Sandboxes - A secure environment that isolates arbitrary script execution within containers or remote environments to prevent unauthorized access to host system resources.
Container-Based Sandboxes - Isolates code execution within ephemeral container environments to prevent unauthorized access to host system resources and files.
Hosted Model Providers - The system supports a wide range of third-party cloud-based language model providers, including OpenAI, Anthropic, Google Vertex AI, AWS Sagemaker, and others, via configurable model flags.
Safe Execution Environments - The system activates a security layer that scans code and inspects external dependencies for malicious patterns or potential threats before allowing any operations to run on the machine.
Natural Language Automation - Using conversational commands to control desktop applications, manage files, and perform repetitive tasks across the operating system.
Containerized Execution Environments - The system runs the interpreter within an isolated container environment by building a custom image and launching the process inside that sandboxed instance to ensure consistent execution.
Human-in-the-Loop Gates - Requires explicit user verification before running generated code to ensure safety and maintain control over system operations.
Code Sandboxing Environments - Executing untrusted or generated code within isolated containerized environments to protect the host system from unauthorized access.
Containerized Execution Environments - The system runs code inside a containerized Linux environment to prevent direct access to host system files and ensure that tasks execute in a secure, restricted space.
Language Model Configurations - The system allows users to select between hosted or local language models to balance performance, cost, and privacy requirements based on the specific needs of the project.
Computer Automation Interfaces - A control layer that enables software to perform human-like tasks by simulating mouse movements, keyboard inputs, and visual screen analysis.
Code Execution Runtimes - The system runs code snippets directly within the environment to define variables, import libraries, or perform setup tasks before starting automated execution processes.
Local Model Servers - The system connects to local OpenAI-compatible inference servers by configuring base URLs and model identifiers to enable private, offline language model execution.
Custom Language Runtimes - The system allows users to define custom programming languages by implementing specific methods to handle code execution, process management, and termination for specialized runtime environments.
Dynamic Runtime Injection - Extends the execution environment by dynamically loading custom language handlers to support diverse programming runtimes.
Provider-Agnostic Model Interfaces - Standardizes communication with various local and hosted language models through a unified interface for inference and streaming.
Execution Confirmation Requirements - The system requests explicit user approval before running any generated code to maintain full visibility and control over system-level operations performed by the model.
Security Code Scanners - The system analyzes generated scripts and external packages for security risks or malicious patterns before execution to prevent accidental system damage or unauthorized operations.
Vision-Enabled UI Automation - Analyzes screen captures to identify visual elements and coordinates for simulating mouse and keyboard interactions programmatically.
Remote Sandbox Isolation - The system runs arbitrary code in a secure remote environment by defining a custom language class that replaces the default local engine to prevent unauthorized system access.
Custom Model Adapters - The system connects custom language models by replacing the standard completion function with a generator that accepts messages and streams output back to the system.
Local Language Model Integrations - Connecting and running private, offline language models to perform data processing and task automation without relying on external cloud services.
Stateful Conversation Persistence - Maintains session context and message history in local storage to allow for task resumption and long-term interaction tracking.
Keyboard Input Automation - The system executes keyboard shortcuts or types text into the active window to automate repetitive user input tasks and streamline interaction with external applications.
Mouse Control Automation - The system moves the cursor or performs clicks based on screen coordinates, identified text, or visual icons to interact with elements on the screen programmatically.
Cross-Platform Task Orchestrators - Automating complex workflows by integrating system-level operations like email, calendar management, and file manipulation into a unified execution environment.
Calendar Event Management - The system fetches, creates, or deletes calendar events to organize schedules and manage time-based tasks through direct interaction with personal or professional calendars.
Container Configurations - The system passes command-line flags to the interpreter process during startup to customize its behavior, instructions, or configuration settings while running inside an isolated container environment.
Volume Mounts - The system connects host folders to the container file system to provide the interpreter with direct access to specific local files for reading or manipulation during execution.
Display Screenshot Capture - The system captures screenshots of the primary display to provide visual context for automated operations and assist in analyzing the current state of the user interface.
Email Management - The system retrieves, sends, or counts emails from the system inbox to handle communications programmatically and automate routine messaging tasks within an email account.
Interpreter Configuration Managers - The system adjusts execution modes, enables vision capabilities, sets custom instructions, and controls system-level behaviors like telemetry, budget limits, and message templates to refine model operation.
Virtual Interface Configurations - The system configures the virtual computer interface by toggling offline modes, enabling debugging tools, adjusting image output formats, and importing necessary APIs for specific automation needs.