Rolling Stats is a high-performance RESTful service designed to ingest batches of data points and return calculated statistics. The service aggregates data points by symbol and calculates statistics for the last 10^k values, where k ranges from 1 to 8.
This project leverages Elixir and Phoenix for the web server, combined with Rust for high-performance calculations. The integration between these technologies is achieved using Rustler.
- Build the Docker image:
docker build -t rolling_stats .- Run the container:
docker run -p 4000:4000 rolling_statsThe API will be accessible at http://localhost:4000.
- Install Elixir and Rust on your system.
- Install dependencies:
mix deps.get- Compile the project:
mix compile- Start the Phoenix server:
mix phx.serverThe API will be accessible at http://localhost:4000.
Adds a batch of data points for a specific symbol.
- URL:
/add_batch - Method: POST
- Content-Type: application/json
Request Payload:
{
"values": [1.0, 2.0, 3.0],
"symbol": "a"
}Response: 201 Created
Retrieves statistics for a specific symbol and k value.
- URL:
/stats - Method: GET
- Query Parameters:
symbol: The symbol to retrieve stats fork: The exponent for the number of data points (1-8)
Example: GET /stats?symbol=a&k=3
Response Payload:
{
"max": 3.0,
"min": 1.0,
"last": 3.0,
"average": 2.0,
"variance": 0.6666666666666666
}Response: 200 OK
Rolling Stats is built using a combination of technologies:
- Elixir and Phoenix: Provide a scalable and developer-friendly web application framework.
- Rust: Handles performance-critical calculations.
- Rustler: Enables seamless integration between Elixir and Rust through NIFs (Native Implemented Functions).
- Phoenix: Handles HTTP requests and routing.
- RollingStats.Native: Elixir module interfacing with Rust logic.
- lib.rs: Entry point for Rust logic, defining
add_batchandget_statsfunctions. - DashMap: Concurrent hashmap for non-blocking access to data stored per symbol.
- MultiRollingStats: Manages multiple
RollingStatsinstances for different exponents. - RollingStats: Efficiently maintains rolling statistics using a
VecDequeandBTreeMap.
Phoenix handles HTTP requests, routing them to the appropriate handlers. The RollingStats.Native
module in Elixir serves as an interface to the Rust logic. In Rust, lib.rs acts as the entry point
for the core logic, defining two key functions: add_batch and get_stats. These functions operate
on a static ALL_STATS structure.
ALL_STATS is implemented using DashMap, a concurrent hashmap that enables non-blocking access to
data stored under separate symbols, ensuring thread-safety and high performance. Each value
in DashMap is of type MultiRollingStats, which maintains a hashmap of RollingStats instances
for each exponent (corresponding to the appropriate number of data points).
The RollingStats struct is responsible for statistical calculations. It contains a VecDeque of
the last 10^k data points and necessary data for efficient statistics
calculation: max_size, sum, sum_squares and index. These values are updated on each batch
addition. The index is implemented using a BTreeMap, which maintains ordered values and allows
for O(1) time complexity when fetching min and max values, while new value insertions are performed
in O(log n) time.
Values of sum and sum_squares are calculated based on added and removed values from the queue on
each batch. This approach, which avoids iterating over the entire queue, enables the calculation of
rolling variance and average in O(1) time.
- Adding a data point: O(log n) time complexity (due to updating the
BTreeMapindex) - Retrieving statistics: O(1) time complexity (thanks to pre-calculated values)
Load tests are performed using Vegeta. To run the load tests:
- Navigate to the
load_testdirectory. - Execute the load test script:
./load_test.shThis script simulates 300 requests per second for 5 minutes, evenly split between adding batches and retrieving stats. Results are visualized in a latency plot saved as plot.html. It demonstrates that despite sending a total of 450M data points, the latency remained constant, empirically suggesting a low time complexity.
Run the unit test suite with:
mix test