A Raft-based distributed time series database written in C, featuring consistent hashing for sharding, a SQL-like query language, and minimal dependencies.
This project explores distributed systems concepts with a focus on simplicity and educational value. It implements a complete distributed time series database with leader election, log replication, and automatic sharding.
NOT FOR PRODUCTION USE It's not a mature project and some parts are yet to be completed, if you intend to test it out you're likely gonna incur in some bugs and missing features.
The software evolves incrementally with the following features:
- Raft Consensus - Raft algorithm for leader election and log replication
- UDP-based transport for efficient communication
- Pluggable serialization (binary by default)
- Write-Ahead Log (WAL) for durability
- Consistent Hashing - Consistent hashing for sharding keys across nodes
- Pluggable TCP/UDP transport protocol
- Pluggable serialization (binary by default)
- Mesh topology with all nodes connected
- Time Series Query Language - SQL-like query language for time series operations
- Database and time series management
- Flexible timestamp formats (Unix epochs, ISO dates, relative times)
- Aggregation functions (avg, min, max, latest)
- Time range queries with sampling intervals
- Storage Backend - Custom storage implementation (WIP)
- Configuration - Static configuration files with flag overrides (WIP)
The cluster is organized as shards with replicas:
-
Shards - Distribute data across the cluster using consistent hashing. Each shard is responsible for a portion of the key space.
-
Replicas - Each shard has multiple replicas that use Raft consensus to maintain consistency. One replica is the leader, others are followers.
Example topology (3 shards, 2 replicas each):
Shard 0: node-0 (leader) + raft-0-0, raft-0-1 (replicas)
Shard 1: node-1 (leader) + raft-1-0, raft-1-1 (replicas)
Shard 2: node-2 (leader) + raft-2-0, raft-2-1 (replicas)
gcc
orclang
make
# Build everything (server, client, tests)
make
# Build specific targets
make raft-c # Server binary
make raft-cli # Client CLI
make raft-c-tests # Test suite
# Clean build artifacts
make clean
Use the convenience script to start a 3-shard cluster with 2 replicas each:
./start-cluster.sh
Or start nodes individually:
# Shard 0
./raft-c -c conf/node-0.conf
./raft-c -c conf/raft-0-0.conf
./raft-c -c conf/raft-0-1.conf
# Shard 1
./raft-c -c conf/node-1.conf
./raft-c -c conf/raft-1-0.conf
./raft-c -c conf/raft-1-1.conf
# Shard 2
./raft-c -c conf/node-2.conf
./raft-c -c conf/raft-2-0.conf
./raft-c -c conf/raft-2-1.conf
Logs are written to logs/
directory.
./raft-cli -h 127.0.0.1 -p 27778
-- Create a database
CREATEDB metrics
-- Set active database
USE metrics
-- Create a time series
CREATE cpu_usage
-- Insert data points
INSERT INTO cpu_usage VALUES (now(), 78.5)
INSERT INTO cpu_usage VALUES ('2025-01-15 12:30:00', 82.3)
-- Insert multiple points
INSERT INTO cpu_usage VALUES
(1643673600, 78.5),
(1643673660, 80.2),
(1643673720, 75.1)
-- Query all values
SELECT value FROM cpu_usage
-- Query with time range
SELECT value FROM cpu_usage
BETWEEN '2025-01-01' AND '2025-01-31'
-- Aggregation queries
SELECT avg(value) FROM cpu_usage
BETWEEN '2025-01-01' AND '2025-01-31'
SELECT min(value), max(value) FROM cpu_usage
-- Downsampling with intervals
SELECT avg(value) FROM cpu_usage
BETWEEN '2025-01-01' AND '2025-01-31'
SAMPLE BY 1d
-- Limit results
SELECT value FROM cpu_usage LIMIT 100
SELECT latest(value) FROM cpu_usage
-- Meta commands
.databases
.timeseries
- Unix epoch:
1643673600
- ISO date:
'2025-01-15 12:30:00'
- Relative time:
now()
,now() - 24h
,now() - 7d
- Auto timestamp: Omit timestamp to use current time
avg(value)
- Average valuemin(value)
- Minimum valuemax(value)
- Maximum valuelatest(value)
- Most recent value
ms
- Millisecondss
- Secondsm
- Minutesh
- Hoursd
- Days
CREATEDB <name>
- Create a new databaseUSE <name>
- Set active databaseCREATE <timeseries>
- Create a time seriesINSERT INTO <timeseries> VALUES (timestamp, value)
- Insert dataSELECT ... FROM <timeseries>
- Query dataDELETE <timeseries>
- Delete a time series
Configuration files define node behavior and cluster topology.
# Cluster config
id 0
type shard
host 127.0.0.1:27778
shard_leaders 127.0.0.1:7778 127.0.0.1:7878 127.0.0.1:7978
# Raft replicas for this shard
raft_replicas 127.0.0.1:8778 127.0.0.1:8779 127.0.0.1:7778
raft_heartbeat_ms 150
Similar structure but with type replica
and appropriate ports.
# Build and run tests
make raft-c-tests
./raft-c-tests
Test coverage includes (for now):
- Encoding/decoding (binary serialization)
- Statement parsing (SQL query parser)
- Time series operations (aggregations, sampling)
This is a didactic project focused on:
- Exploring distributed systems concepts
- Keeping implementation simple and dependency-free
- Prioritizing code clarity over performance
- Using straightforward approaches (e.g.,
select
for I/O multiplexing)
It is not intended for production use. Features are added incrementally as learning opportunities.
pkill -f raft-c