2 repos

Document Parsing Pipelines — Data Processing Pipelines

We curate 2 GitHub repositories matching data processing pipelines · Document Parsing Pipelines. Refine with filters or upvote what's useful.

Document Parsing Pipelines — Data Processing Pipelines

We'll search the best matching repositories with AI.
  • opendatalab/MinerU

    opendatalab/MinerU

    54,523GitHub

    MinerU is a document parsing pipeline designed to transform unstructured files into machine-readable, structured data. It utilizes deep learning models to perform layout analysis, identifying document regions and extracting complex content such as mathematical expressions. By combining these neural network inferences w

    Pythonai4sciencedocument-analysisextract-data
  • docling-project/docling

    docling-project/docling

    53,584GitHub

    Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing

    Pythonaiconvertdocument-parser