Platform • SRE • Automation • AI
Application Engineer (Platform / Product SRE)
Application Engineer (Platform/Product SRE) with 13+ years of experience improving reliability, performance and operational efficiency across large-scale environments. Strong background in cloud platforms (AWS, Kubernetes, OpenShift), infrastructure as code, observability and automation. Completed postgraduate studies in AI & Machine Learning (Texas), providing a solid foundation in applied ML and emerging AI-driven automation patterns. Currently advancing skills in Agentic AI approaches and exploring how they can be safely incorporated into SRE workflows for intelligent alerting, incident assistance and self-healing capabilities.
Quick snapshot
- 📍 Based in Reading, UK
- 🏢 Application Engineer (Platform / Product SRE)
- ☁️ AWS, OpenShift, Kubernetes, Azure DevOps
- 📊 Observability, SLOs & reliability
- 🧠 PG in AI & ML (Texas) • Agentic AI learner
About
I'm an Application Engineer working in the Platform/Product SRE space, with over 13 years of experience improving reliability, performance and operational efficiency across large-scale environments. My work spans cloud migrations, Kubernetes and OpenShift platforms, observability, DR automation and SRE best practices.
I enjoy building tools and frameworks that help product teams ship safely and operate confidently: from deployment inspectors and automated dashboards to DR utilities and chaos experiments. I like problems where reliability, automation and clean engineering meet.
With a postgraduate background in AI & Machine Learning from Texas, I'm now focused on how **Agentic AI** patterns can be applied responsibly within SRE teams—for example, in intelligent alert enrichment, guided incident response and self-healing workflows, always with reliability and safety as the first concern.
Skills
Cloud & Platform
- • AWS (EC2, RDS, EKS, VPC, CloudWatch)
- • OpenShift / Kubernetes
- • Azure DevOps (pipelines, repos)
- • Terraform, Ansible, Helm
SRE & Observability
- • SLIs • SLOs • Error Budgets
- • Journey-based monitoring
- • DataDog, ELK / Kibana, Grafana, Prometheus
- • Incident response, RCA, resilience testing
Automation & CI/CD
- • Jenkins, GitHub/GitLab CI, Azure DevOps
- • Python, Shell, Groovy
- • Deployment automation & platform tooling
- • DR runbooks and automation
Foundations & AI Focus
- • Linux systems, networking fundamentals
- • ITIL, Agile ways of working
- • Architecture & design reviews, mentoring
- • PG in AI & ML (Texas) • Agentic AI (learning & exploration)
Experience
Discover Financial Services UK
Application Engineer (Platform / SRE) • Nov 2021 – Present • Reading, UK
- • Built Postgres Analyzer automation to improve cost visibility and optimisation across AWS RDS fleets.
- • Developed OCP Deployment Inspector to help product teams validate deployments against platform best practices for resiliency, probes, resources and security.
- • Developed a Disaster Recovery automation framework (traffic flip, validation and reporting), significantly reducing manual effort and improving repeatability.
- • Automated observability dashboards in Kibana for new and existing microservices, standardising logging and reducing onboarding time.
- • Implemented chaos experiments using Chaos Toolkit to validate failover and error-handling behaviour for workloads on the platform.
- • Designed a zero-downtime PostgreSQL upgrade process ensuring continuous replication and workload continuity.
- • Supported PCF → OCP migrations, modernising application deployments, environments and monitoring.
- • Defined SLIs/KPIs aligned with SLOs and contributed to journey-based monitoring for customer-critical flows.
- • Acted as on-call engineer in RRT, leading incident response and driving RCAs to reduce MTTR and prevent repeat issues.
- • Leveraging postgraduate AI/ML knowledge to evaluate feasibility and guardrails for emerging AI-enabled automation opportunities within SRE.
- • Actively learning and experimenting with Agentic AI patterns to identify workflows where intelligent orchestration and operational decision-support can reduce manual toil and improve reliability signals.
Vodafone UK
Senior Site Reliability Engineer / Site Reliability Engineer • Sep 2016 – Nov 2021
- • Executed large-scale migrations from on-prem infrastructure to AWS using Terraform and automated CI/CD pipelines.
- • Deployed and managed Kubernetes clusters (EKS and on-prem) including automation via Ansible.
- • Built observability dashboards using Prometheus, Splunk, AppDynamics and Datadog to align infra and app metrics.
- • Implemented Azure DevOps pipelines for continuous delivery to AWS workloads.
- • Performed capacity analysis, availability reviews and resilience improvements across environments.
- • Handled incident response, root-cause analysis and post-incident automation to improve service availability.
Early Career — DevOps Consultant / Middleware Specialist
2012 – 2016
- • Delivered CI/CD transformations using Jenkins, Maven and Git to reduce manual deployment effort.
- • Automated Linux server provisioning and configuration using Kickstart, Chef and Ansible.
- • Installed and supported enterprise middleware (WebLogic, WebSphere, Apache) in production environments.
- • Provided proactive troubleshooting and monitoring to maintain platform stability and performance.
Selected Work & Tools
OCP Deployment Inspector
Platform tool that inspects OpenShift namespaces and deployments against best practices for resiliency, probes, resources and image policies, giving teams a clear view of deployment health.
Python • OpenShift • Kubernetes • CI/CD
Postgres Analyzer
Automation to review Postgres instances and configuration across regions, enabling cost optimisation and operational visibility for RDS fleets.
Python • AWS RDS • Reporting
DR Automation Framework
Utility to orchestrate DR activities including traffic flip, validation checks and reporting, reducing manual effort and improving confidence in failover plans.
AWS • Scripting • Runbooks • Automation
Observability Dashboard Factory
Automated creation of Kibana dashboards for services, helping teams get consistent logging views with minimal setup effort.
Kibana / ELK • Automation
Contact
I'm open to conversations around platform engineering, SRE, observability, automation and AI-assisted operations.
📍 Reading, United Kingdom