Skip to content

zhengwang100/LakeMLB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

LakeMLB

LakeMLB (Data Lake / Lakehouse Machine Learning Benchmark)

Status: Work in Progress

Overview

LakeMLB is an evolving benchmark suite designed to evaluate the performance and scalability of machine learning models in data lake and lakehouse environments. It aims to provide a standardized framework for assessing how well ML algorithms handle large-scale, heterogeneous datasets while integrating with modern data architectures.

Key Objectives

  • Performance Evaluation: Benchmark training and inference performance across diverse data lake and lakehouse setups.
  • Scalability Analysis: Assess how models scale with increasing data volumes and complexity.
  • Data Integration: Test integration of ML models with various data storage architectures—from traditional data lakes to modern lakehouses.
  • Reproducibility: Establish standardized tasks and metrics for fair comparisons between different ML approaches.

Features

  • Standardized Benchmarks: A set of tasks that simulate real-world data lake scenarios.
  • Comparative Metrics: Tools to measure throughput, accuracy, latency, and resource efficiency.
  • Extensibility: Open framework allowing the community to add new benchmarks, models, and datasets.
  • Transparency: Detailed guidelines and documentation to reproduce and validate experimental results.

About

LakeMLB (Data Lake Machine Learning Benchmark)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published