Papers Name Finish Date Writing reviews for systems conferences TODO Efficiently compiling efficient query plans for modern hardware 2025-12-08 Encapsulation of parallelism in the Volcano query processing system 2025-12-08 Parallel Database Systems: The Future of High Performance Database Processing 2025-12-08 The Case for Learned Index Structures 2025-12-08 C-store: a column-oriented DBMS 2025-12-08 Vectorwise: Beyond Column Stores 2025-12-07 R-trees: a dynamic index structure for spatial searching 2025-12-07 The Bw-Tree: A B-tree for new hardware platforms 2025-12-04 The Snowflake Elastic Data Warehouse 2025-11-19 A comparison of approaches to large-scale data analysis 2025-11-17 SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference 2025-11-12 Flame: Simplifying Topology Extension in Federated Learning 2025-11-11 Bigtable: A Distributed Storage System for Structured Data 2025-11-11 BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models 2025-11-09 DεpS: Delayed ε-Shrinking for Faster Once-For-All Training 2025-11-09 Debunking the CUDA Myth Towards GPU-based AI Systems: Evaluation of the Performance and Programmability of Intel's Gaudi NPU for AI Model Serving 2025-11-07 SqueezeLLM: Dense-and-Sparse Quantization 2025-11-05 The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks 2025-11-04 Fast Inference from Transformers via Speculative Decoding 2025-10-28 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness 2025-10-28 Cartridges: Lightweight and general-purpose long context representations via self-study 2025-10-26 UGPU: Dynamically Constructing Unbalanced GPUs for Enhanced Resource Efficiency 2025-10-25 Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations 2025-10-24 A Berkeley View of Systems Challenges for AI 2025-10-14 Hidden Technical Debt in Machine Learning Systems 2025-10-14 DeepSeek-V3 Technical Report 2025-10-12 DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving 2025-10-06 Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve 2025-10-05 Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes 2025-10-04 Orca: A Distributed Serving System for Transformer-Based Generative Models 2025-10-03 Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling 2025-10-03 Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads 2025-09-29 TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters 2025-09-28 MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters 2025-09-28 An Empirical Evaluation of Columnar Storage Formats 2025-09-27 A variable warp size architecture 2025-09-27 Gandiva: Introspective Cluster Scheduling for Deep Learning 2025-09-23 SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads 2025-09-21 INFaaS: Automated Model-less Inference Serving 2025-09-21 Scalable GPU graph traversal 2025-09-21 InferLine: ML Prediction Pipeline Provisioning and Management for Tight Latency Objectives 2025-09-16 Clipper: A Low-Latency Online Prediction Serving System 2025-09-16 Varuna: Scalable, Low-cost Training of Massive Deep Learning Models 2025-09-14 Accelerating Large Graph Algorithms on the GPU Using CUDA 2025-09-14 Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning 2025-09-14 ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving 2025-09-12 Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM 2025-09-09 ZeRO: Memory Optimizations Toward Training Trillion Parameter Models 2025-09-09 PyTorch Distributed: Experiences on Accelerating Data Parallel Training 2025-09-07 Scaling Laws for Neural Language Models 2025-09-07 Optimization Techniques for GPU Programming 2025-09-06 PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation 2025-09-01 Triton: an intermediate language and compiler for tiled neural network computations 2025-08-31 How to Read a Computer Science Research Paper 2025-08-30 How to Read a Paper 2025-08-30 Analyzing Modern NVIDIA GPU cores 2025-08-29 PyTorch: An Imperative Style, High-Performance Deep Learning Library 2025-08-25 TensorFlow: A system for large-scale machine learning 2025-08-24 A Few Useful Things to Know About Machine Learning 2025-08-22 What Goes Around Comes Around… And Around… 2025-08-20 Kafka: a Distributed Messaging System for Log Processing 2023-02-26 Blockstack: A Global Naming and Storage System Secured by Blockchains 2022-08-13 Bitcoin: A Peer-to-Peer Electronic Cash System 2022-08-10 Secure Untrusted Data Repository (SUNDR) 2022-08-05 Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS 2022-08-04 Scaling Memcache at Facebook 2022-07-17 Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing 2022-07-16 No compromises: distributed transactions with consistency, availability, and performance 2022-07-14 Spanner: Google’s Globally-Distributed Database 2022-07-11 Frangipani: A Scalable Distributed File System 2022-07-10 Chain Replication for Supporting High Throughput and Availability 2022-06-29 ZooKeeper: Wait-free coordination for Internet-scale systems 2022-06-27 In Search of an Understandable Consensus Algorithm (Extended Version) 2022-06-19 The Go Programming Language and Environment 2022-06-06 The Design of a Practical System for Fault-Tolerant Virtual Machines 2022-06-05 The Google File System 2022-06-03 MapReduce: Simplified Data Processing on Large Clusters 2022-05-28 The Evolution of the Unix Time-sharing System 2022-05-25 The UNIX Time-Sharing System 2022-05-24 RCU Usage In the Linux Kernel: One Decade Later 2022-05-12 Meltdown: Reading Kernel Memory from User Space 2022-05-11 Eliminating Receive Livelock in an Interrupt-driven Kernel 2022-05-08 The benefits and costs of writing a POSIX kernel in a high-level language 2022-05-07 Dune: Safe User-level Access to Privileged CPU Features 2022-05-03 The Performance of micro-Kernel-Based Systems 2022-05-02 Virtual Memory Primitives for User Programs 2022-04-30 Journaling the Linux ext2fs Filesystem 2022-04-24