Skip to content
/ pb Public

MapOf is a high-performance concurrent map implementation that offers significant, performance improvements over sync.Map in many common scenarios.

License

Notifications You must be signed in to change notification settings

llxisdsh/pb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MapOf

MapOf is a high-performance concurrent map optimized for read-dominant and mixed read/write workloads, outperforming sync.Map in many common scenarios. Its core design combines lock-free reads with bucket-level fine-grained synchronization on writes, together with cache-line–aware layout, parallel resizing, and pluggable hashing/equality. This yields high throughput with markedly improved tail-latency stability.

Design highlights

  • Lock-free read path; the write path employs bucket-level fine-grained synchronization and backoff to ensure forward progress under high contention.
  • Cache-line alignment and suppression of false sharing to reduce cross-line accesses and hot-spot interference.
  • CPU cache-line size adaptation (optionally specified at build time), with counter padding to reduce contention overhead.
  • Supports preallocation (WithPresize) and automatic shrinking (WithShrinkEnabled) to enhance memory elasticity and cold-start performance.
  • Zero-value usability and lazy initialization to reduce integration complexity.
  • Full sync.Map API compatibility, plus extensions: LoadOrStoreFn, ProcessEntry, Size, IsZero, Clone, and batch operations.
  • Parallel resizing to lower resize tail latency and improve scalability.
  • Pluggable hashing and equality: WithKeyHasher / IHashCode / IHashOpts / WithValueEqual / IEqual, enabling domain-specific tuning for keys/values.
  • Defaults to Go’s built-in hashing; when hash computation is expensive, embedded hash caching can be enabled via mapof_opt_embeddedhash.

Applicability (overview)

  • Suitable for read-intensive caches, online low-latency key–value access, and shared-state management requiring atomic conditional updates; under resource constraints, shrinking and custom hashing can further improve space efficiency and stability.

Validated across multiple platforms with adaptive optimizations for strong memory models (e.g., x86 TSO). See the benchmarks below for details.

📊 Comprehensive Benchmarks

Benchmark results consistently show MapOf outperforms other implementations with the fastest operations across Store, LoadOrStore, Load, and Mixed workloads.

goos: windows
goarch: amd64
pkg: github.com/llxisdsh/pb
cpu: AMD Ryzen Threadripper 3970X 32-Core Processor 
Benchmark Test (09/01/2025)
const countStore = 1_000_000
const countLoadOrStore = countStore
const countLoad = min(1_000_000, countStore)

func mixRand(i int) int {
	return i & (8 - 1)
}

func BenchmarkStore_original_syncMap(b *testing.B) {
	b.ReportAllocs()
	var m sync.Map
	runtime.GC()
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		i := 0
		for pb.Next() {
			m.Store(i, i)
			i++
			if i >= countStore {
				i = 0
			}
		}
	})
}

func BenchmarkLoadOrStore_original_syncMap(b *testing.B) {
	b.ReportAllocs()
	var m sync.Map
	runtime.GC()
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		i := 0
		for pb.Next() {
			_, _ = m.LoadOrStore(i, i)
			i++
			if i >= countLoadOrStore {
				i = 0
			}
		}
	})
}

func BenchmarkLoad_original_syncMap(b *testing.B) {

	b.ReportAllocs()
	var m sync.Map
	for i := 0; i < countLoad; i++ {
		m.Store(i, i)
	}
	runtime.GC()
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		i := 0
		for pb.Next() {
			_, _ = m.Load(i)
			i++
			if i >= countLoad {
				i = 0
			}
		}
	})
}

func BenchmarkMixed_original_syncMap(b *testing.B) {
	b.ReportAllocs()
	var m sync.Map
	for i := 0; i < countLoad; i++ {
		m.Store(i, i)
	}
	runtime.GC()

	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		i := 0
		for pb.Next() {
			r := mixRand(i)
			if r == 0 {
				m.Store(i, i)
			} else if r == 1 {
				m.Delete(i)
			} else if r == 2 {
				_, _ = m.LoadOrStore(i, i)
			} else {
				_, _ = m.Load(i)
			}
			i++
			if i >= countLoad<<1 {
				i = 0
			}
		}
	})
}
Implementation Operation Ops/sec ns/op B/op Allocs/op
original_syncMap Store 43,408,107 26.24 64 3
LoadOrStore 64,053,295 19.23 17 2
Load 387,208,314 3.80 0 0
Mixed 153,055,444 8.24 10 0
pb_MapOf 🏆 Store 1,000,000,000 0.58 0 0
LoadOrStore 1,000,000,000 0.47 0 0
Load 1,000,000,000 0.18 0 0
Mixed 1,000,000,000 0.42 0 0
pb_FlatMapOf 🥈 Store 1,000,000,000 0.93 0 0
LoadOrStore 1,000,000,000 0.61 0 0
Load 1,000,000,000 0.22 0 0
Mixed 1,000,000,000 0.65 0 0
xsync_MapOf Store 136,297,424 8.47 16 1
LoadOrStore 27,398,7921 4.00 0 0
Load 74,548,1913 1.59 0 0
Mixed 47,340,9570 2.27 2 0
pb_HashTrieMap Store 62,662,484 18.30 48 1
LoadOrStore 86,982,994 11.95 1 0
Load 463,348,550 3.58 0 0
Mixed 198,404,397 6.16 6 0
alphadose_haxmap Store 80,418,174 14.01 9 1
LoadOrStore 75,698,793 14.39 9 1
Load 767,338,982 1.92 0 0
Mixed 186,024,159 5.52 2 0
zhangyunhao116_skipmap Store 29,587,840 39.17 9 1
LoadOrStore 37,524,507 30.38 1 0
Load 527,466,034 2.27 0 0
Mixed 183,319,764 6.22 1 0
riraccuia_ash Store 51,692,203 21.81 62 4
LoadOrStore 28,938,676 41.67 94 3
Load 269,917,735 4.64 7 0
Mixed 124,839,320 9.94 18 1
fufuok_cmap Store 29,590,686 38.21 1 0
LoadOrStore 43,682,427 27.92 0 0
Load 151,875,836 7.74 0 0
Mixed 25,265,125 46.51 0 0
mhmtszr_concurrent_swiss_map Store 29,849,112 40.21 1 0
LoadOrStore 27,447,705 45.02 0 0
Load 163,669,803 7.66 0 0
Mixed 30,439,160 36.53 0 0
orcaman_concurrent_map Store 34,420,544 33.46 1 0
LoadOrStore 50,175,614 25.41 1 0
Load 145,646,887 8.70 0 0
Mixed 48,974,994 26.18 0 0
RWLockShardedMap_256 Store 69,052,428 15.79 1 0
LoadOrStore 119,077,255 9.45 0 0
Load 287,863,598 4.17 0 0
Mixed 172,448,331 6.98 0 0
RWLockMap Store 4,238,468 269.00 1 0
LoadOrStore 9,066,169 163.10 1 0
Load 33,225,810 36.11 0 0
Mixed 9,591,573 127.70 0 0
snawoot_lfmap Store 364,705 3153.00 7754 48
LoadOrStore 8,887,498 205.60 518 2
Load 281,609,623 4.22 0 0
Mixed 1,899,039 630.00 2453 10
  • RWLockShardedMap_256: A 256-shard concurrent map using Go's native map + RWMutex per shard (benchmark reference). Sharding theoretically boosts performance/throughput for any map.
Store Throughput Test (09/23/2025)
const total = 100_000_000

func testInsert_pb_MapOf(t *testing.T, total int, numCPU int, preSize bool) {
	time.Sleep(2 * time.Second)
	runtime.GC()
	
	var m *MapOf[int, int]
	if preSize {
		m = NewMapOf[int, int](WithPresize(total))
	} else {
		m = NewMapOf[int, int]()
	}

	var wg sync.WaitGroup
	wg.Add(numCPU)

	start := time.Now()

	batchSize := total / numCPU

	for i := 0; i < numCPU; i++ {
		go func(start, end int) {
			//defer wg.Done()

			for j := start; j < end; j++ {
				m.Store(j, j)
			}
			wg.Done()
		}(i*batchSize, min((i+1)*batchSize, total))
	}

	wg.Wait()

	elapsed := time.Since(start)

	size := m.Size()
	if size != total {
		t.Errorf("Expected size %d, got %d", total, size)
	}

	t.Logf("Inserted %d items in %v", total, elapsed)
	t.Logf("Average: %.2f ns/op", float64(elapsed.Nanoseconds())/float64(total))
	t.Logf("Throughput: %.2f million ops/sec", float64(total)/(elapsed.Seconds()*1000000))

	// rand check
	for i := 0; i < 1000; i++ {
		idx := i * (total / 1000)
		if val, ok := m.Load(idx); !ok || val != idx {
			t.Errorf("Expected value %d at key %d, got %d, exists: %v", idx, idx, val, ok)
		}
	}
}

func TestInsert_pb_MapOf(t *testing.T) {
	t.Run("1 no_pre_size", func(t *testing.T) {
		testInsert_pb_MapOf(t, total, 1, false)
	})

	t.Run("64 no_pre_size", func(t *testing.T) {
		testInsert_pb_MapOf(t, total, runtime.GOMAXPROCS(0), false)
	})
	t.Run("1 pre_size", func(t *testing.T) {
		testInsert_pb_MapOf(t, total, 1, true)
	})

	t.Run("64 pre_size", func(t *testing.T) {
		testInsert_pb_MapOf(t, total, runtime.GOMAXPROCS(0), true)
	})
}
Implementation & Case Throughput
(M ops/s)
Performance Scale
pb_FlatMapOf (64/pre) 262.42 ━━━━━━━━━━━━━━━━━━━━━━━━
pb_FlatMapOf (64) 108.37 ━━━━━━━━━━━━━━━
pb_FlatMapOf (1/pre) 31.44 ━━━━━━━━
pb_FlatMapOf (1) 22.93 ━━━━━━
pb_MapOf (64/pre) 178.09 ━━━━━━━━━━━━━━━━━
pb_MapOf (64) 94.21 ━━━━━━━━━━━
pb_MapOf (1/pre) 26.83 ━━━━━━━
pb_MapOf (1) 21.45 ━━━━━
xsync_MapV4 (64/pre) 91.83 ━━━━━━━━━━━
xsync_MapV4 (64) 25.47 ━━━━━━━
xsync_MapV4 (1/pre) 5.86 ━━
xsync_MapV4 (1) 3.23 ━━
pb_HashTrieMap (64) 25.54 ━━━━━━━
pb_HashTrieMap (1) 1.73
zhangyunhao116_skipmap (64) 25.37 ━━━━━━━
zhangyunhao116_skipmap (1) 3.38 ━━
RWLockShardedMap_256 (64) 21.92 ━━━━━
RWLockShardedMap_256 (1) 3.47 ━━
original_syncMap (64) 21.84 ━━━━━
original_syncMap (1) 1.42
alphadose_haxmap (64/pre) 2.88
alphadose_haxmap (64) 0.57
alphadose_haxmap (1/pre) 1.00
alphadose_haxmap (1) 0.92
  • (1): 1 goroutine without pre-allocation
  • (1/pre): 1 goroutine with pre-allocation
  • (64): 64 goroutines without pre-allocation
  • (64/pre): 64 goroutines with pre-allocation
Memory Usage Test (08/04/2025)
func Test_MemoryPeakReduction(t *testing.T) {
	if testing.Short() {
		t.Skip("Skipping memory test in short mode")
	}
	
	const numItems = 100000
	
	var m1, m2 runtime.MemStats
	runtime.GC()
	runtime.ReadMemStats(&m1)
	
	m := NewMapOf[int, int]()
	
	for i := 0; i < numItems; i++ {
		m.Store(i, i)
	}
	runtime.GC()
	runtime.ReadMemStats(&m2)
	
	peak := m2.Alloc - m1.Alloc
	t.Logf("pb_MapOf memory usage: %d bytes, items: %d", peak, m.Size())
}

Benchmark results indicate that MapOf matches the native Go map in memory efficiency, and even outperforms it in most scenarios. Memory Usage Comparison (log scale) Memory Usage Comparison (Linear scale)

⚡ Performance Tips

To achieve exceptional performance, MapOf employs multiple optimization strategies:

Optimization Strategies

1. Built-in Optimizations (Automatic)

  • Integer Key Optimization: Uses raw key values as hash for integer types, delivering 4-6x performance improvement
  • String Key Optimization: Automatically enabled for short string and []byte keys (≤12 bytes), achieving 2-3x performance improvement
  • Memory Layout Optimization: Cache-line aligned structures prevent false sharing

Note: Default optimizations may increase hash collision rates in some scenarios. For domain-specific optimizations, see custom hash functions in the Usage section.

2. Configurable Optimizations

  • Pre-sizing: Use WithPresize(n) to avoid rehashing during initial population
  • Shrink Control: Use WithShrinkEnabled() to automatically reduce memory usage when map size decreases
  • Custom Hash Functions: Implement IHashCode interface or use WithKeyHasher() for specialized key types (see Usage section for examples)

3. Performance Guidelines

  • Custom hash functions provide significant gains for domain-specific key types
  • Interface implementations are automatically detected at initial time
  • With* functions take precedence over interface implementations
  • Well-designed custom hash functions typically outweigh collision risks

🚀 Quick Start

Installation

go get github.com/llxisdsh/pb@latest

Prerequisites

The MapOf implementation uses golang.org/x/sys to determine the system's CacheLineSize. For optimal performance, ensure your build environment has the latest version of this dependency:

go get golang.org/x/sys@latest

Usage

1. Basic Initialization

package main

import (
    "github.com/llxisdsh/pb"
)

func main() {
    // Zero-value initialization with lazy loading
    var cache pb.MapOf[string, int]

    // Optional initialization, Must be called once before use.
    cache.InitWithOptions(pb.WithPresize(1000000))

    // Direct initialization with options
    cache2 := pb.NewMapOf[string, int](pb.WithPresize(1000000), pb.WithShrinkEnabled())
}

2. Basic Read/Write Operations

import (
    "fmt"
    "github.com/llxisdsh/pb"
)

func basicOperations() {
    cache := pb.NewMapOf[string, int]()

    // Store and load values
    cache.Store("key1", 100)
    cache.Store("key2", 200)

    value, exists := cache.Load("key1")
    if exists {
        fmt.Printf("key1: %d\n", value)
    }

    // Load or store atomically
    actual, loaded := cache.LoadOrStore("key3", 300)
    if loaded {
        fmt.Printf("key3 already exists: %d\n", actual)
    } else {
        fmt.Printf("key3 stored: %d\n", actual)
    }

    cache.Delete("key2")
}

3. Atomic Operations

import (
    "fmt"
    "github.com/llxisdsh/pb"
)

func atomicOperations() {
    cache := pb.NewMapOf[string, int]()
    cache.Store("key1", 100)

    // Atomic swap
    oldValue, swapped := cache.Swap("key1", 150)
    if swapped {
        fmt.Printf("Swapped key1 from %d to 150\n", oldValue)
    }

    // Compare and swap
    swapped = cache.CompareAndSwap("key1", 150, 175)
    if swapped {
        fmt.Println("Successfully swapped key1 from 150 to 175")
    }

    // Compare and delete
    deleted := cache.CompareAndDelete("key1", 175)
    if deleted {
        fmt.Println("Successfully deleted key1 with value 175")
    }

    // Lazy value generation
    result, loaded := cache.LoadOrStoreFn("computed:key", func() int {
        fmt.Println("Computing expensive value...")
        return 43
    })
    fmt.Printf("Result: %d, Loaded: %t\n", result, loaded)

    // ProcessEntry: Atomic conditional processing with complex business logic
    cache.ProcessEntry("user:789", func(entry *pb.EntryOf[string, int]) (*pb.EntryOf[string, int], int, bool) {
        if entry != nil {
            // Update existing entry
            newEntry := &pb.EntryOf[string, int]{Value: entry.Value + 1}
            return newEntry, entry.Value, true
        }
        // Create new entry
        return &pb.EntryOf[string, int]{Value: 1}, 0, false
    })

    // LoadAndDelete: Atomic read-and-delete operation
    deletedValue, wasPresent := cache.LoadAndDelete("user:456")
    fmt.Printf("Deleted value: %d, Was present: %t\n", deletedValue, wasPresent)

    // LoadAndUpdate: Atomic read-and-update operation
    previousValue, wasUpdated := cache.LoadAndUpdate("counter", 1)
    fmt.Printf("Previous: %d, Updated: %t\n", previousValue, wasUpdated)
}

4. Custom Hash Functions and Equality

import (
    "slices"
    "github.com/llxisdsh/pb"
)

func customOptimizations() {
    // Configuration Priority (highest to lowest):
    //   - Explicit With* functions (WithKeyHasher, WithValueEqual)
    //   - Interface implementations (IHashCode, IHashOpts, IEqual)
    //   - Default built-in implementations (defaultHasher) - fallback
    
    // Method 1: Using With* functions for runtime configuration

    // Built-in hasher: better for long strings (lower collisions) but slower for short strings (2-3x)
    stringCache := pb.NewMapOf[string, int](pb.WithBuiltInHasher[string]())

    // Custom hash function for pointer keys
    type UserID struct {
        UserID   int64
        TenantID int64
    }
    userCache := pb.NewMapOf[UserID, string](pb.WithKeyHasher(func(key UserID, seed uintptr) uintptr {
        return uintptr(key.UserID) ^ seed
    }))

    // Method 2: Interface-based (runtime init check)
    type CustomKey struct {
        ID   int64
        Name string
    }

    // Implement IHashCode for custom hashing
    func (c *CustomKey) HashCode(uintptr) uintptr {
        return uintptr(c.ID)
    }

    // Custom value type with IEqual for non-comparable types
    type UserProfile struct {
        Name string
        Tags []string // slice makes this non-comparable
    }

    func (u *UserProfile) Equal(other UserProfile) bool {
        return u.Name == other.Name && slices.Equal(u.Tags, other.Tags)
    }

    // Automatically detects and uses interfaces
    var interfaceCache MapOf[CustomKey, UserProfile]
    // Or
    interfaceCache2 := pb.NewMapOf[CustomKey, UserProfile]()
}

5. Iteration Operations

import (
    "fmt"
    "github.com/llxisdsh/pb"
)

func iterationOperations() {
    cache := pb.NewMapOf[string, int]()
    cache.Store("key1", 100)
    cache.Store("key2", 200)

    // Range over all key-value pairs
    cache.Range(func(key string, value int) bool {
        fmt.Printf("Key: %s, Value: %d\n", key, value)
        return true // continue iteration
    })

    // RangeEntry: Iterate over all entry pointers (more efficient)
    // Note: loaded parameter is guaranteed to be non-nil during iteration
    cache.RangeEntry(func(loaded *pb.EntryOf[string, int]) bool {
        fmt.Printf("%s: %d\n", loaded.Key, loaded.Value)
        return true
    })

    // All: Go 1.23+ iterator support
    for key, value := range cache.All() {
        fmt.Printf("%s: %d\n", key, value)
    }
}

6. Batch Operations

import (
    "fmt"
    "maps"
    "github.com/llxisdsh/pb"
)

func batchOperations() {
    cache := pb.NewMapOf[string, int]()

    // Store multiple values
    batchData := map[string]int{
        "batch1": 1000,
        "batch2": 2000,
        "batch3": 3000,
    }
    cache.FromMap(batchData)

    // Load multiple values
    keys := []string{"batch1", "batch2", "nonexistent"}
    for _, key := range keys {
        if value, ok := cache.Load(key); ok {
            fmt.Printf("%s: %d\n", key, value)
        } else {
            fmt.Printf("%s: not found\n", key)
        }
    }

    // Delete multiple values using BatchDelete
    cache.BatchDelete([]string{"batch1", "batch2"})

    // Batch process all entries
    // Note: loaded parameter is guaranteed to be non-nil during iteration
    cache.RangeProcessEntry(func(loaded *pb.EntryOf[string, int]) *pb.EntryOf[string, int] {
        if loaded.Value < 100 {
            // Double all values less than 100
            return &pb.EntryOf[string, int]{Value: loaded.Value * 2}
        }
        return loaded // Keep unchanged
    })

    // Process all entries with callback (Go 1.23+ iterator support)
    for e := range cache.ProcessAll() {
        switch e.Key() {
            case "batch1":
                e.Update(e.Value() + 500)
            case "batch2":
                e.Delete()
            default:
                // no-op
        }
    }

    // Process entries with specified keys (Go 1.23+ iterator support)
    for e := range cache.ProcessSpecified("batch1", "batch4") {
        if e.Loaded() && e.Value() < 1000 {
            e.Delete()
        }
    }

    // BatchProcess: Batch process iterator data
    data := map[string]int{"a": 1, "b": 2, "c": 3}
    cache.BatchProcess(maps.All(data), func(_ string, v int, loaded *pb.EntryOf[string, int]) (*pb.EntryOf[string, int], int, bool) {
        return &pb.EntryOf[string, int]{Value: v * 10}, v, true
    })
}

7. Capacity Management

import (
    "fmt"
    "github.com/llxisdsh/pb"
)

func capacityManagement() {
    cache := pb.NewMapOf[string, int]()

    // Get current size
    size := cache.Size()
    fmt.Printf("Current size: %d\n", size)

    // Check if map is zero-value
    if cache.IsZero() {
        fmt.Println("Cache is empty")
    }

    // Pre-allocate capacity to avoid frequent resizing
    cache.Grow(10000)

    // Manual shrink
    cache.Shrink()

    // Clear all entries
    cache.Clear()
}

8. Data Conversion and Serialization

import (
    "encoding/json"
    "fmt"
    "github.com/llxisdsh/pb"
)

func dataConversionAndSerialization() {
    cache := pb.NewMapOf[string, int]()
    cache.Store("key1", 100)
    cache.Store("key2", 200)

    // Clone: Deep copy the entire map
    clonedCache := cache.Clone()
    fmt.Printf("Cloned size: %d\n", clonedCache.Size())

    // Convert to Go map
    goMap := cache.ToMap()
    fmt.Printf("Go map: %+v\n", goMap)

    // Load from Go map
    newData := map[string]int{
        "new1": 100,
        "new2": 200,
    }
    cache.FromMap(newData)

    // JSON serialization
    jsonData, err := json.Marshal(&cache)
    if err == nil {
        fmt.Printf("JSON: %s\n", jsonData)
    }

    // JSON deserialization
    var newCache pb.MapOf[string, int]
    err = json.Unmarshal(jsonData, &newCache)
    if err == nil {
        fmt.Println("Successfully deserialized from JSON")
    }

    // String representation
    fmt.Printf("Cache contents: %s\n", cache.String())

    // Stats: Get detailed performance statistics
    stats := cache.Stats()
    fmt.Printf("Stats: Buckets=%d, Growths=%d, Shrinks=%d\n",
        stats.RootBuckets, stats.TotalGrowths, stats.TotalShrinks)
}

Compile-Time Optimizations

Optimize performance for specific use cases with build tags. Warning: Incorrect optimization choices may degrade performance. Choose carefully based on your actual scenario.

# === Cache Line Size Optimization ===
# Optimize based on target CPU's cache line size
# Auto-detection is default, but manual specification may be needed for cross-compilation
go build -tags mapof_opt_cachelinesize_32   # For some embedded systems
go build -tags mapof_opt_cachelinesize_64   # For most modern CPUs
go build -tags mapof_opt_cachelinesize_128  # For some high-end server CPUs
go build -tags mapof_opt_cachelinesize_256  # For some specialized architectures

# === Counter Performance Optimization ===
# Add padding around counters to reduce false sharing in high-concurrency scenarios
# Note: Increases memory usage - suitable for memory-abundant, high-concurrency environments
go build -tags mapof_opt_enablepadding

# === Hash Caching Optimization ===
# Cache hash values in entries, suitable for expensive hash computation scenarios
# Note: Go's built-in hashing is usually fast; this may increase memory overhead without performance gain
go build -tags mapof_opt_embeddedhash

# === Combined Optimization Examples ===

# Build with optimizations for FUJITSU A64FX's CMG-based ccNUMA architecture:
go build -tags "mapof_opt_cachelinesize_256 mapof_opt_enablepadding"

# Complex key types configuration:
go build -tags "mapof_opt_embeddedhash"

Optimization Selection Guide:

  • Default Configuration : Suitable for most scenarios, no additional tags needed
  • NUMA architecture: Consider mapof_opt_enablepadding
  • Complex Key Types : If hash computation is expensive, consider mapof_opt_embeddedhash
  • Cross-Compilation : May need manual mapof_opt_cachelinesize_* specification Performance Testing Recommendation: Before production use, benchmark different optimization combinations against your specific workload to determine the best configuration.

📚 Documentation

🙏 Acknowledgments

pb.MapOf builds upon the excellent work of xsync(v3, MIT licensed). We extend our gratitude to the xsync authors and the broader Go community. Reproduce its introduction below:

MapOf is like a Go map[K]V but is safe for concurrent
use by multiple goroutines without additional locking or
coordination. It follows the interface of sync.Map with
a number of valuable extensions like Compute or Size.

A MapOf must not be copied after first use.

MapOf uses a modified version of Cache-Line Hash Table (CLHT)
data structure: https://github.com/LPD-EPFL/CLHT

CLHT is built around idea to organize the hash table in
cache-line-sized buckets, so that on all modern CPUs update
operations complete with at most one cache-line transfer.
Also, Get operations involve no write to memory, as well as no
mutexes or any other sort of locks. Due to this design, in all
considered scenarios MapOf outperforms sync.Map.

MapOf also borrows ideas from Java's j.u.c.ConcurrentHashMap
(immutable K/V pair structs instead of atomic snapshots)
and C++'s absl::flat_hash_map (meta memory and SWAR-based lookups).

FlatMapOf

FlatMapOf is a seqlock-based, flat-layout concurrent hash table. The table and key/value entries are stored inline to minimize pointer chasing and cache misses, providing more stable latency and throughput even for cold working sets. Prefer creating instances via NewFlatMapOf and configure via options.

Design and concurrency semantics (overview)

  • Read path (seqlock validation): Each bucket maintains a sequence. Readers load s1; if s1 is even, they read metadata/entries and then load s2. If s1==s2 (and even), the read is consistent; otherwise, readers spin and retry, and, if needed, fall back to a short locked slow path.
  • Write path (ordered publication): While holding the root-bucket lock (opLock), flip the bucket sequence to odd (enter write state) → apply modifications → flip back to even (publish a consistent view), and finally release the root lock.
  • Resizing (parallel copy): A help–copy–publish protocol partitions the table and migrates buckets in parallel. When migration completes, the new table is atomically published. Ongoing reads/writes are minimally affected.

Memory layout and cache behavior

  • Inlined K/V entries: Entries are laid out contiguously with their bucket, reducing extra pointer dereferences and cross-line accesses. Under cold data and low cache-hit scenarios, this significantly reduces p99/p999 latency jitter.
  • Value size: V is not limited by machine word size; any value type is supported. Note that large V increases the per-bucket footprint (see “Limitations”).

Time and space characteristics (intuition)

  • Expected complexity: Load/Store are amortized O(1) under uniform hashing and a reasonable load factor.
  • Progress guarantees: Reads are lock-free in the absence of write contention; under contention, they use spin + backoff and fall back to a short locked slow path to avoid livelock. Writes are protected by fine-grained, bucket-level mutual exclusion.

Systematic comparison with MapOf

  • Concurrency control model:
    • FlatMapOf: bucket-level seqlock + root-bucket mutual exclusion; reads are “optimistic + sequence validation,” writes are “odd/even sequence toggle + ordered publication.”
    • MapOf: CLHT-inspired design; read path performs no writes (lock-free) and uses SWAR metadata for fast matching; writes use fine-grained synchronization at the bucket level.
  • Read latency and cold data:
    • FlatMapOf: the inline layout avoids multi-hop pointer chasing under cold states, yielding more predictable hit latency and smoother tail latency.
    • MapOf: extremely fast for hot data; with colder data and more cross-object pointer traversals, latency stability can be slightly worse than FlatMapOf.
  • Writes and contention:
    • FlatMapOf: writes toggle bucket sequences and hold the root lock; under sustained high write contention, reads may experience bounded retries or slow-path fallbacks, and throughput may degrade earlier than in a read-optimized MapOf.
    • MapOf: pointer-level updates with a mature write path; often easier to sustain throughput in write-heavy scenarios.
  • Memory footprint and density:
    • FlatMapOf: inline entries avoid “one allocation per entry,” but incur “hole cost” (reserved space in partially filled buckets); when V is large or occupancy is sparse, bucket size inflation increases baseline memory usage.
    • MapOf: buckets store pointers and entries are separate objects; with automatic shrinking (WithShrinkEnabled), MapOf is typically more memory-elastic and efficient, especially for large V or underutilized tables.
  • API capabilities and ecosystem:
    • FlatMapOf: provides core operations (Load/Store/Delete/LoadOrStore/LoadOrStoreFn/Range/Process/RangeProcess/Size/IsZero). It does not currently offer dedicated convenience APIs such as CompareAndSwap/CompareAndDelete/LoadAndUpdate/Swap, but such semantics can be expressed using Process.
    • MapOf: a more complete API surface including multiple CompareAnd*, Swap, and batch operations, plus richer customization of equality and hashing, making it more mature for migration and production use.
  • Hashing and optimizations:
    • Both support WithKeyHasher/IHashCode/IHashOpts; FlatMapOf also supports WithPresize and WithShrinkEnabled.

Advantages (FlatMapOf)

  • More stable hit latency and tail behavior in cold working sets and random-access workloads.
  • Inline layout reduces pointer chasing and cache misses; Range/scan-style operations benefit from better sequential locality.
  • Parallel resize support; arbitrary value sizes are supported.

Limitations and caveats (FlatMapOf)

  • Large V or sparse occupancy amplifies the “hole cost,” increasing memory consumption.
  • Under sustained write contention with longer critical sections, reads may retry or fall back to the slow path, reducing throughput.
  • Dedicated CompareAnd* convenience APIs are currently unavailable; use Process to compose equivalent atomic semantics.
  • Experimental implementation; future APIs and semantics may be refined. Prefer NewFlatMapOf for initialization and explicit configuration.

Usage guidance and boundaries

  • Prefer FlatMapOf for:
    • Read-dominant or mixed workloads with strong p99/p999 latency requirements on the online path;
    • Cold or low-cache-hit distributions where predictable hit latency is critical;
    • Small-to-medium value types where modest space trade-offs improve latency and throughput;
    • Range/scan tasks that benefit from improved sequential locality.
  • Prefer MapOf for:
    • Memory efficiency under resource constraints, or large value types with sparse occupancy;
    • A full set of CompareAnd* / Swap / LoadAndUpdate convenience APIs;
    • Write-heavy or high-contention workloads where a more mature write path is desirable.

API overview

  • Provided: Load, Store, Delete, LoadOrStore, LoadOrStoreFn, Range, Process, RangeProcess, Size, IsZero.
  • Compositional semantics: Process can express atomic read-modify-write, conditional updates, and conditional deletes.
  • Construction and configuration: supports WithPresize and WithShrinkEnabled for capacity and shrinking; supports custom hashing and distribution via IHashCode/IHashOpts/WithKeyHasher.

HashTrieMap

HashTrieMap is a highly optimized implementation of Go's built-in HashTrieMap, delivering 50%+ performance improvements while maintaining full compatibility.

🎯 Key Improvements

  • Lazy Value Generation: LoadOrStoreFn support for expensive computations
  • 50%+ Performance Gain: Optimized initialization and atomic operations
  • Full Compatibility: Drop-in replacement for built-in HashTrieMap
  • Comprehensive Testing: All original tests pass plus additional validation

🔧 Technical Optimizations

  • Eliminated inited Field: Uses root pointer directly for state management
  • Lazy Initialization: Defers initialization until first write operation
  • Hash Caching: Caches hash values to accelerate expand operations

📄 License

Licensed under MIT License.

About

MapOf is a high-performance concurrent map implementation that offers significant, performance improvements over sync.Map in many common scenarios.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages