Our privacy-preserving federated learning system allows hospitals to collaborate on training a COVID-19 detection model without sharing sensitive patient data. The project is designed with healthcare privacy compliance in mind, following a defense-in-depth security approach with encrypted model weights, secure communication, and proper access controls.
- Privacy-preserving COVID-19 detection using chest X-rays
- Federated learning across multiple simulated hospitals using AWS EC2 instances
- Homomorphic encryption for secure weight sharing using TenSEAL (CKKS scheme)
- Optimized COVID-Net model for encrypted data processing
- Comprehensive evaluation metrics and benchmarking against published works
- HIPAA considerations built into the design
- Scalable architecture that can start small locally and scale to multi-institution deployment
Our system follows a general federated learning architecture with the following key components:
- Local Hospital Environments: Simulated using AWS EC2 instances
- Central Aggregator: For secure weight aggregation
- Homomorphic Encryption Layer: Using TenSEAL for privacy preservation
- COVID-Net CXR Model: Adapted for encrypted computations
- Model Initialization: Base COVID detection model created
- Local Training: Hospitals train on local patient data
- Secure Aggregation:
- Local model updates encrypted using homomorphic encryption
- Encrypted updates sent to aggregation server
- Server performs addition on encrypted data
- Result decrypted and applied to global model
- Model Distribution: Updated global model distributed to hospitals
- Iteration: Process repeats for multiple rounds
| Component | AWS Service | Purpose |
|---|---|---|
| Compute | EC2 (t3.large) | Model training and aggregation |
| Storage | S3 with SSE | Model weights, checkpoints |
| Encryption | KMS | Key management for homomorphic encryption |
| Access Control | IAM | Fine-grained permissions |
| Security Groups | EC2 | Network isolation and protection |
- 1 central server EC2 instance for federated aggregation
- Multiple client EC2 instances representing hospital nodes
- S3 bucket for encrypted model exchange
- KMS for encryption key management
The project follows a staged implementation approach with clear checkpoints:
- Local Prototype - Simple classification model on medical images
- Basic Federated Learning - Implementation of federated architecture
- COVID Detection Model - Scale up to COVID-19 X-ray images
- Privacy Implementation - Add homomorphic encryption for secure aggregation
- AWS Deployment - Full system deployment with monitoring
- Homomorphic Encryption: Model weights encrypted with CKKS scheme
- Server-Side Encryption: S3 objects protected with AES-256
- Network Isolation: Security groups limiting communication
- IAM Roles: Principle of least privilege access
- KMS Integration: Centralized key management
The project uses the CKKS (Cheon-Kim-Kim-Song) scheme implemented in TenSEAL:
-
Parameters:
- Polynomial modulus degree: 8192
- Coefficient modulus bit sizes: [60, 40, 40, 60]
- Scale: 2^40
-
Operations:
- Encrypted addition of model weights
- Scale-factor multiplication for averaging
# Clone the repository
git clone https://github.com/ColmCoffey/cxr-secure-federated.git
# Navigate to the project directory
cd cxr-secure-federated
# Install dependencies
pip install -r requirements.txt
# Set up AWS CLI and configure credentials
aws configure- AWS Account with appropriate permissions
- Python 3.8+
- TensorFlow 2.5+
- TenSEAL 0.3.0+
- Boto3
cxr-secure-federated/
βββ client/ # Client-side code
β βββ model.py # COVID detection model
β βββ training.py # Local training logic
β βββ encryption.py # Encryption utilities
βββ server/ # Server-side code
β βββ aggregator.py # Secure aggregation logic
β βββ distribution.py # Model distribution
βββ deployment/ # AWS deployment resources
β βββ terraform/ # IaC for AWS resources
β βββ scripts/ # Deployment scripts
β βββ config/ # Configuration files
βββ data/ # Sample data (not patient data)
β βββ sample_images/ # Example X-ray images
βββ docs/ # Documentation
βββ architecture.md # Detailed architecture
βββ security.md # Security considerations
βββ performance.md # Performance benchmarks
Confusion matrix:
[[194. 6.]
[ 17. 183.]]
Sens Negative: 0.970, Positive: 0.915
PPV Negative: 0.919, Positive: 0.968
Accuracy = 94.25%
- Encryption Overhead: Benchmarks for homomorphic operations
- Communication Efficiency: Minimizing bandwidth usage
- Model Compression: Techniques to reduce model size
- Scaling Strategies: Horizontal and vertical scaling options
- TensorFlow for model implementation
- TenSEAL for homomorphic encryption
- AWS EC2 for simulated hospital environments
- AWS S3 for secure storage
- AWS KMS for key management
- Python for scripting and data processing
Contact: [email protected]
- Submitting Pull Requests: Follow our code style and include clear descriptions of your changes.
- Reporting Issues: Provide detailed information about the issue and steps to reproduce it.
- Requesting Features: Clearly explain the feature and its benefits.
This project is licensed under the MIT License.
- COVID-Net project for the initial model architecture
- TenSEAL developers for the homomorphic encryption library
- TensorFlow Federated team
- OpenMined for the TenSEAL library
-
Wang, L., Lin, Z. Q., & Wong, A. (2020). COVID-Net: A Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-Ray Images. Scientific Reports, 10(1), 19549. https://arxiv.org/abs/2003.09871
-
Wood, A., Najarian, K., & Kahrobaei, D. (2020). Homomorphic Encryption for Machine Learning in Medicine and Bioinformatics. ACM Computing Surveys. https://eprint.iacr.org/2023/1320
-
Brand, M., & Pradel, G. (2023). Practical Privacy-Preserving Machine Learning using Fully Homomorphic Encryption. Cryptology ePrint Archive, Paper 2023/1320. https://eprint.iacr.org/2023/1320
This project was developed as part of a research initiative to improve privacy-preserving machine learning in healthcare settings.