Stars
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Build applications that make decisions (chatbots, agents, simulations, etc...). Monitor, trace, persist, and execute on your own infrastructure.
Apache Superset is a Data Visualization and Data Exploration Platform
Apache Spark - A unified analytics engine for large-scale data processing
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Sample project to demonstrate data engineering best practices
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Code for my videos on big data analytics with Apache Spark using Scala.