Lists (4)
Sort Name ascending (A-Z)
Stars
21 Lessons, Get Started Building with Generative AI
Learn Python using your Java Knowledge
Converting a json schema to a spark schema (struct) representation
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
A Model Context Protocol (MCP) server for discovering data products and requesting access in Data Mesh Manager, and executing queries on the data platform to access business data.
Interactive CLI for analyzing Kafka health and configuration according to best practices and industry standards.
An example showing how to apply software engineering best practices to Databricks notebooks.
The Metadata Platform for your Data and AI Stack
POC of a Spring Boot - DataHub integration reporting its data lineage.
Serialization format for row-based incremental data processing
Fastest SQL pipeline engine in a single C++ binary, for stream processing, analytics, observability and AI.
🚀 10x easier, 🚀 140x lower storage cost, 🚀 high performance, 🚀 petabyte scale - Elasticsearch/Splunk/Datadog alternative for 🚀 (logs, metrics, traces, RUM, Error tracking, Session replay).
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies. uForwarder aims to address several pain points while using Apache Kafka for pub-sub message queue…
Used to generate mock Avro data
Open Source DeepWiki: AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories. Join the discord: https://discord.gg/gMwThUMeme
Docker container with a data volume from s3.
A Kubernetes controller to watch changes in ConfigMap and Secrets and do rolling upgrades on Pods with their associated Deployment, StatefulSet, DaemonSet and DeploymentConfig – [✩Star] if you're u…
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
A library that provides an in-memory Kafka instance to run your tests against.
Secure and fast microVMs for serverless computing.
Open-source search and retrieval database for AI applications.
A curated list of awesome ASGI servers, frameworks, apps, libraries, and other resources