I’m a Lead Site Reliability Engineer passionate about building and running reliable, scalable, and secure systems. I thrive at the intersection of infrastructure, automation, and team leadership, enabling organizations to deliver robust services with confidence.
- SRE Leadership: Managing and mentoring Site Reliability Engineering teams, driving best practices and reliability culture.
- SLI / SLO / SLA: Defining, implementing, and evolving Service Level Indicators, Objectives, and Agreements to align reliability with business goals.
- Infrastructure as Code: Deep expertise in Terraform and Ansible for reproducible, automated infrastructure provisioning.
- Kubernetes: Designing, deploying, and maintaining resilient container orchestration platforms at scale.
- Programming: Proficient in Go and Python for automation, tooling, and backend development.
- Security & Secrets Management: Implementing secure workflows with HashiCorp Vault.
- Observability: Building modern monitoring, alerting, and incident response systems.
- Architect and operate cloud-native, highly available infrastructure.
- Automate everything: deployment, scaling, recovery, and compliance.
- Lead SRE teams to deliver reliable systems and foster a culture of continuous improvement.
- Translate business requirements into actionable SLOs and drive reliability initiatives.
- Build internal tools and platforms to empower developers and reduce toil.
- Mentor engineers in reliability engineering, automation, and best practices.