zylon-aiprivate-gpt

57,116 stars7,611 forksPythonapache-2.01 view

Private Gpt

This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to provide context-aware responses for chat and completion requests.

The system distinguishes itself through a database-agnostic abstraction layer that supports various storage backends, ranging from local disk storage to enterprise-grade vector databases. It offers flexible deployment options, enabling users to run language models entirely on private hardware or connect to external cloud-based providers through a unified interface. To improve the quality of generated output, the engine incorporates reranking logic that refines retrieved document chunks before they are processed by the language model.

The platform includes a comprehensive suite of tools for managing document intelligence pipelines, including automated parsing, text chunking, and embedding generation. Users can configure the system through environment-based profiles to match specific hardware capabilities, such as CPU or GPU-accelerated setups, and stream responses in real time to reduce latency.

The application is configured via runtime settings files and environment variables, with support for building custom container images to suit specific deployment requirements.

Features

Retrieval Augmented Generation Engines - A backend service that processes local documents to provide context-aware conversational responses using both local and cloud-based language models.
Local Model Runtimes - Run language models locally using specific configuration profiles to manage model parameters, context window sizes, and hardware-specific settings for private, offline processing.
Privacy-First AI Backends - A modular service architecture designed to run language models and document processing entirely on local infrastructure for data security.
Text Generation Services - Generate text completions from a prompt by incorporating ingested document context and system instructions to provide relevant and accurate output for the user in real time.
Retrieval-Augmented Generation Pipelines - Processes documents into vector embeddings and stores them to provide relevant context for language model completion requests.
Context-Aware Chat Interfaces - Generate conversational responses by automatically retrieving relevant document context and applying prompt engineering to execute completion models for high-level interactions.
Document Retrieval - Retrieve relevant text segments from stored documents based on a search query while optionally filtering by document identifier and including surrounding context for each search result.
Private Retrieval Augmented Generation - Building secure, local-first applications that answer questions based on private document collections without sending sensitive data to external cloud providers.
Document Intelligence Pipelines - Automating the ingestion, parsing, and vectorization of diverse file formats to enable semantic search and intelligent analysis across internal knowledge bases.
Document Ingestion Pipelines - A data processing workflow that extracts text from diverse file formats and converts them into searchable vector representations for retrieval.
Vector Database Orchestrators - A management layer that handles document ingestion, text chunking, and vector embedding storage across various database providers for semantic search.
Local Language Model Hosting - Running large language models on private hardware to maintain full control over data privacy, security, and infrastructure costs.
Contextual Retrieval Services - Retrieve relevant text chunks from ingested documents based on a specific query to facilitate custom retrieval and generation logic for specialized tasks.
Document Ingestion Pipelines - Manage document ingestion by automatically parsing, splitting, extracting metadata, and generating embeddings for storage within a retrieval augmented generation pipeline.
File Ingestion Services - Extract text chunks and metadata from files and store them to provide searchable context for subsequent chat and completion requests within the system.
Document Ingestion Pipelines - Parses raw files into structured text chunks and metadata to enable efficient semantic search and retrieval during query execution.
Vector Database Abstractions - Uses a modular interface layer to support multiple storage backends like local disk, PostgreSQL, or specialized vector databases.
Application Configuration Managers - Configure the application by selecting language model, embedding, and vector store providers, and manage dependencies using environment-specific profiles.
Text Ingestion Services - Convert raw text into a searchable document by processing its chunks and retrieving a unique identifier for filtering future completion requests during the retrieval process.
External Model Integrations - Configure the application to use external cloud-based language models by defining specific profiles with API keys, base URLs, and model identifiers.
Local Infrastructure Setups - Run the application entirely on local infrastructure by selecting local language model, embedding, and vector store providers and downloading necessary model files.
Text Embedding Generators - Generate vector representations of text strings to enable consumption by machine learning models and various analytical algorithms for advanced search or classification tasks.
Reranking Strategies - Improve retrieval accuracy by pre-selecting the most relevant documents from an initial set before passing them to the generation process for final answer construction.
Streaming Response Architectures - Streams generated text tokens from the language model to the user interface in real time to reduce perceived latency.
Enterprise Vector Database Integrations - Connecting language models to scalable, production-grade vector storage backends to manage large-scale document retrieval and contextual information processing.
Vector Database Integrations - Configure a vector store by specifying connection details like host, port, and authentication keys within the application settings file.
Text Embedding Generators - Generate vector embeddings for arbitrary text input to support custom pipeline implementations and advanced search or analysis workflows across different data sources.
Vector Databases - Configure a vector store by installing the required dependencies and providing server connection details and security settings in the application settings file.
Chat Completion Services - Generate conversational text by processing message history and document context while streaming the output to the user in real time to ensure immediate and relevant feedback.
Multi-Provider Model Integrations - Connects to both local and cloud-based language models through a unified interface to balance privacy and computational performance.
Hardware Profile Deployments - Deploy the service using various hardware profiles including CPU-only or GPU-accelerated configurations to match specific system capabilities and performance requirements.
Execution Modes - Select between query, search, and chat modes to control how the system uses ingested documents and conversation history to generate responses.
Reranking Retrieval Logics - Refines the initial set of retrieved document chunks using a secondary scoring pass to improve the accuracy of generated answers.
Execution Profiles - Execute the application using environment-specific profiles to manage local or cloud-based model inference, including support for various GPU-accelerated configurations.
Local Document Ingestion - Ingest local folders of documents into the system for querying with options to watch for file changes and log processing results.
Supported File Formats - Process a wide range of document types including text, office documents, images, and code with automatic fallback to plain text for unsupported formats.
Document Deletion Operations - Remove a previously stored document from the system by providing its unique identifier to the deletion endpoint for permanent removal from the search index.
Document Retrieval Interfaces - Retrieve a list of all stored documents including their unique identifiers and metadata to enable precise filtering of context for chat or completion requests.
Document Deletion APIs - Delete specific documents from storage by sending a request to the ingestion API endpoint designed for document removal.
Text Summarization Services - Summarize provided text using a language model with options to include ingested document context, custom instructions, and streaming responses for real-time output.
System Prompt Configurations - Configure the system prompt for the language model to define specific roles, expertise, or response criteria for chat interactions.
Chroma Integrations - Configure a vector store by installing the required dependencies and enabling the database in the application settings file for local disk-based storage.
PostgreSQL Vector Stores - Configure a PostgreSQL vector store by installing the required dependencies and providing database connection credentials and schema details in the application settings file.