11 repos

Big Data & Engineering — Data & Databases

We curate 11 GitHub repositories matching data & databases · Big Data & Engineering. Refine with filters or upvote what's useful.

Big Data & Engineering — Data & Databases

We'll search the best matching repositories with AI.
  • sindresorhus/awesome

    sindresorhus/awesome

    438,690GitHub

    This project is a community-curated knowledge base that organizes vast technical ecosystems into a hierarchical, human-readable directory. It serves as a comprehensive index of libraries, frameworks, and methodologies, designed to facilitate discovery and professional development across the entire spectrum of software

    awesomeawesome-listlists
  • vinta/awesome-python

    vinta/awesome-python

    283,687GitHub

    This project is a comprehensive, community-curated directory that organizes a vast landscape of Python software libraries, frameworks, and tools. It serves as a centralized knowledge base designed to facilitate ecosystem navigation and accelerate developer discovery across the entire software development lifecycle. Th

    Pythonawesomecollectionspython
  • tensorflow/tensorflow

    tensorflow/tensorflow

    193,864GitHub

    TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The syst

    C++deep-learningdeep-neural-networksdistributed
  • yt-dlp/yt-dlp

    yt-dlp/yt-dlp

    147,702GitHub

    This project is a command-line media downloader designed for the systematic retrieval and organization of digital content from diverse online platforms. It functions as an extensible extraction engine that utilizes a declarative format-selection pipeline to automate the identification, merging, and downloading of speci

    Pythonclidownloaderpython
  • d3/d3

    D3 is a modular library providing low-level primitives for creating data-driven visualizations. It functions as a flexible framework that allows for direct control over visual presentation by mapping abstract data dimensions to graphical properties, such as position, color, and size, without imposing predefined chart a

    Shellchartchartsd3
  • pytorch/pytorch

    pytorch/pytorch

    97,601GitHub

    PyTorch is a machine learning framework centered on a GPU-ready tensor library that supports multi-dimensional array operations across both CPU and accelerator hardware. It provides a foundational infrastructure for mathematical computation and dynamic neural network construction, utilizing a tape-based automatic diffe

    Pythonautograddeep-learninggpu
  • microsoft/markitdown

    microsoft/markitdown

    87,305GitHub

    This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine

    Pythonautogenautogen-extensionlangchain
  • macrozheng/mall

    macrozheng/mall

    82,926GitHub

    This project is an enterprise-grade Java framework designed for building scalable, full-stack e-commerce applications. It provides a comprehensive foundation for microservice-based distributed architectures, enabling the development of complex retail platforms that include product management, order processing, and secu

    Javadockerelasticsearchelk
  • bregman-arie/devops-exercises

    bregman-arie/devops-exercises

    81,169GitHub

    This project is a comprehensive educational curriculum designed to build proficiency across modern infrastructure, cloud-native technologies, and systems administration. It functions as a reference library and interview preparation resource, offering a structured collection of conceptual questions, practical coding cha

    Pythonansibleawsazure
  • elastic/elasticsearch

    elastic/elasticsearch

    76,163GitHub

    Elasticsearch is a distributed search engine and document store designed for the high-performance indexing and retrieval of massive volumes of unstructured data. It functions as a centralized analytics platform, providing a schema-flexible architecture that organizes information into searchable indices while maintainin

    Javaelasticsearchjavasearch-engine
  • awesomedata/awesome-public-datasets

    awesomedata/awesome-public-datasets

    72,846GitHub

    This project is a community-maintained, open-access directory of high-quality public datasets. It serves as a centralized reference point for researchers, developers, and data scientists to locate reliable information sources across a wide spectrum of industries and scientific fields. By providing a structured index, t

    aaron-swartzawesome-public-datasetsdatasets