GigAPI Metadata Engine

Gigapi Metadata provides a high-performance indexing system for managing metadata about data files (typically Parquet files) organized in time-partitioned structures. It supports efficient querying, merging operations, and provides both local JSON file storage and distributed Redis storage backends.

Features

Dual Storage Backends: JSON file-based storage for local deployments and Redis for distributed systems
Time-Partitioned Data: Optimized for date/hour partitioned data structures 1
Merge Planning: Intelligent merge planning for data consolidation across different layers 2
Async Operations: Promise-based asynchronous operations for better performance 3
Efficient Querying: Time-range and folder-based querying capabilities 4

Installation

go get github.com/gigapi/metadata

Core Concepts

IndexEntry

The fundamental data structure representing metadata about a single data file: 5

Storage Backends

JSON Index

For local file-based storage, suitable for single-node deployments: 6

Redis Index

For distributed deployments with Redis backend: 7

Configuration

Merge Configurations

Before using the library, initialize merge configurations which define merge behavior across different iterations: 8

Example configuration:

import "github.com/gigapi/metadata"

// Configure merge settings: [timeout_sec, max_size_bytes, iteration_id]
metadata.MergeConfigurations = [][3]int64{
    {10, 10 * 1024 * 1024, 1}, // 10s timeout, 10MB max size, iteration 1
    {30, 50 * 1024 * 1024, 2}, // 30s timeout, 50MB max size, iteration 2
}

Usage Examples

Basic JSON Index Usage

// Create a JSON-based table index
tableIndex := metadata.NewJSONIndex("/data/root", "my_database", "my_table")

// Add metadata entries
entries := []*metadata.IndexEntry{
    {
        Database:  "my_database",
        Table:     "my_table", 
        Path:      "date=2024-01-15/hour=14/file1.parquet",
        SizeBytes: 1000000,
        MinTime:   1705327200000000000, // nanoseconds
        MaxTime:   1705327800000000000,
    },
}

// Batch operation (async)
promise := tableIndex.Batch(entries, nil)
result, err := promise.Get()

Redis Index Usage 9

Querying Data

// Query with time range
options := metadata.QueryOptions{
    After:  time.Now().Add(-24 * time.Hour),
    Before: time.Now(),
}

entries, err := tableIndex.GetQuerier().Query(options)

Merge Operations

// Get merge plan
planner := tableIndex.GetMergePlanner()
plan, err := planner.GetMergePlan("layer1", 1)

if plan != nil {
    // Execute merge (external process)
    // ...
    
    // Mark merge as complete
    err = planner.EndMerge(plan)
}

Interfaces

TableIndex Interface

The main interface for table-level operations: 10

DBIndex Interface

For database-level operations: 11

Data Organization

The system expects data organized in the following structure:

/root/
  ├── database1/
  │   ├── table1/
  │   │   ├── date=2024-01-15/
  │   │   │   ├── hour=00/
  │   │   │   ├── hour=01/
  │   │   │   └── ...
  │   │   └── date=2024-01-16/
  │   └── table2/
  └── database2/

Redis Configuration

For Redis backend, use standard Redis connection URLs:

redis://localhost:6379/0 - Standard Redis
rediss://user:pass@host:6380/1 - Redis with TLS 12

Error Handling

All operations return errors through the Promise interface or standard Go error handling. The library uses async operations for better performance in high-throughput scenarios.

Thread Safety

Both JSON and Redis implementations are thread-safe and can be used concurrently across multiple goroutines.

Testing

Run tests with a local Redis instance:

# Start Redis
docker run -d -p 6379:6379 redis:alpine

# Run tests  
go test ./...

License

This project is licensed under the Apache License 2.0. 13

Documentation

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Notes

The library is optimized for time-series data workloads with frequent writes and time-range queries
Redis backend is recommended for distributed deployments and high-throughput scenarios
JSON backend is suitable for single-node deployments and development environments
Merge operations are designed to be executed by external processes, with the library managing the planning and coordination
All time values are stored as Unix nanoseconds for high precision temporal operations

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
redis_scripts		redis_scripts
.gitignore		.gitignore
INTEGRATION.md		INTEGRATION.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
json_db_index.go		json_db_index.go
json_drop_planner.go		json_drop_planner.go
json_index.go		json_index.go
json_index_test.go		json_index_test.go
json_kv_store.go		json_kv_store.go
json_merge_planner.go		json_merge_planner.go
json_move_planner.go		json_move_planner.go
json_part_drop_planner.go		json_part_drop_planner.go
json_part_index.go		json_part_index.go
json_part_merge_planner.go		json_part_merge_planner.go
json_part_move_planner.go		json_part_move_planner.go
promise.go		promise.go
redis_db_index.go		redis_db_index.go
redis_drop_planner.go		redis_drop_planner.go
redis_index.go		redis_index.go
redis_index_test.go		redis_index_test.go
redis_kv_store.go		redis_kv_store.go
redis_merge_planner.go		redis_merge_planner.go
redis_move_planner.go		redis_move_planner.go
redis_scripts.go		redis_scripts.go
redis_task_queue.go		redis_task_queue.go
types.go		types.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GigAPI Metadata Engine

Features

Installation

Core Concepts

IndexEntry

Storage Backends

JSON Index

Redis Index

Configuration

Merge Configurations

Usage Examples

Basic JSON Index Usage

Redis Index Usage 9

Querying Data

Merge Operations

Interfaces

TableIndex Interface

DBIndex Interface

Data Organization

Redis Configuration

Error Handling

Thread Safety

Testing

License

Documentation

Contributing

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

gigapi/metadata

Folders and files

Latest commit

History

Repository files navigation

GigAPI Metadata Engine

Features

Installation

Core Concepts

IndexEntry

Storage Backends

JSON Index

Redis Index

Configuration

Merge Configurations

Usage Examples

Basic JSON Index Usage

Redis Index Usage 9

Querying Data

Merge Operations

Interfaces

TableIndex Interface

DBIndex Interface

Data Organization

Redis Configuration

Error Handling

Thread Safety

Testing

License

Documentation

Contributing

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages