PII Guard is an LLM-powered tool that detects and manages Personally Identifiable Information (PII) in logs β designed to support data privacy and GDPR compliance.
β οΈ This is a personal side project
Built to explore how Large Language Models can detect sensitive data in logs more intelligently than traditional regex-based approaches.
- About
- Why Use LLMs for PII Detection?
- PII Types Detected
- Architecture
- Getting Started
- Try It Out
- How to Test
- Project Structure
- Suggestions & Contributions
This project experiments with Large Language Models (LLMs) β specifically the gemma:3b
model running locally via Ollama β to evaluate how effectively they can identify PII in both structured and unstructured log data.
π§ LLM-Based Detection with Ollama
- Uses
gemma:3b
through the Ollama runtime- Analyzes logs using natural language understanding
- Handles real-world, messy logs better than regex
- Work in progress β contributions welcome!
- π Identifies PII even when it's obfuscated, incomplete, or embedded in text
- π Handles multilingual input and inconsistent formats
- π§ Leverages semantic context instead of relying on static patterns
- π§ͺ Ideal for experimenting with privacy tooling powered by AI
Traditional detection rules often break under complexity β LLMs provide contextual intelligence.
full-name
, first-name
, last-name
, username
, email
, phone-number
, mobile
, address
, postal-code
, location
racial-or-ethnic-origin
, political-opinion
, religious-belief
, philosophical-belief
, trade-union-membership
, genetic-data
, biometric-data
, health-data
, sex-life
, sexual-orientation
national-id
, passport-number
, driving-license-number
, ssn
, vat-number
, credit-card
, iban
, bank-account
ip-address
, ip-addresses
, mac-address
, imei
, device-id
, device-metadata
, browser-fingerprint
, cookie-id
, location-coordinates
license-plate
This is how PII Guard works:
- Clone the repo and start everything with a single command:
make all-in-up
- Shut down everything with:
make all-in-down
This will launch the full stack:
- π PostgreSQL
- π Elasticsearch
- π RabbitMQ
- π€ Ollama (with
gemma:3b
) - π PII Guard dashboard and backend API
Visit: http://localhost:3000
http://localhost:8888/api/jobs
curl --location 'http://localhost:8888/api/jobs/flush' \
--header 'Content-Type: application/json' \
--data-raw '{
"version": "1.0.0",
"logs": [
"{\"timestamp\":\"2025-04-21T15:02:10Z\",\"service\":\"auth-service\",\"level\":\"INFO\",\"event\":\"user_login\",\"requestId\":\"1a9c7e21\",\"user\":{\"id\":\"u9001001\",\"name\":\"Leila Park\",\"email\":\"[email protected]\"},\"srcIp\":\"198.51.100.15\"}",
"{\"timestamp\":\"2025-04-21T15:02:12Z\",\"service\":\"cache-service\",\"level\":\"DEBUG\",\"event\":\"cache_miss\",\"requestId\":\"82c5cc9f\",\"cacheKey\":\"product_44291_variant_blue\",\"region\":\"us-east-1\"}"
]
}'
Please refer to the Testing PII Guard guide for instructions on running the test setup, including simulated log generation and stress testing.
This guide will help you set up a test environment to evaluate the performance and detection accuracy of PII Guard.
- API:
api/
- Dashboard:
ui/
- LLM Prompt Template:
api/src/prompt/pii.prompt.ts
Got a bug to report? Feature request? Wild idea? Bring it on!
- π Bug reports help improve stability
- β¨ Feature requests help shape the product
- π¬ Suggestions, feedback, and contributions are all welcome!