Your personal AI career agent
Site Reliability Engineering (SRE) Architect(m/w/x)
Architecting public cloud infrastructure and observability strategies for global firms. Expert-level cloud and Kubernetes experience required. Hybrid work, 4-day work week.
Requirements
- 10+ years software engineering, DevOps, or systems engineering experience
- At least 5 years senior SRE or systems architecture experience
- Expert-level knowledge of AWS, GCP, or Azure
- Expert-level knowledge of core cloud services (compute, storage, networking, managed databases)
- Deep, hands-on Kubernetes cluster design and management experience
- Deep, hands-on container-based microservices architecture experience
- Proven expertise architecting infrastructure with Terraform
- Proficiency with Ansible, Chef, or Puppet
- Extensive experience implementing monitoring and observability solutions
- Experience with Prometheus, Grafana, OpenTelemetry, Jaeger, or ELK Stack
- Experience with commercial observability tools (Datadog, New Relic)
- Strong proficiency in Go or Python for automation
- Strong proficiency in Go or Python for tooling
- Strong proficiency in Go or Python for building system integrations
- Deep understanding of distributed systems
- Deep understanding of networking protocols (TCP/IP, HTTP)
- Deep understanding of high-availability design patterns
- Experience working across multiple cloud environments (multi-cloud)
- Professional cloud certifications (e.g., AWS Certified Solutions Architect Professional, Google Professional Cloud Architect)
- Experience with service mesh technologies like Istio or Linkerd
- Knowledge of security best practices in cloud-native environment (DevSecOps)
- Demonstrated experience leading large-scale technology transformations
- Demonstrated experience influencing engineering culture
Tasks
- Design and architect infrastructure and application services on public cloud platforms
- Define long-term vision for system reliability and performance
- Establish standards for SLOs, SLIs, and error budgets
- Architect a comprehensive observability strategy
- Design systems for logging, metrics, tracing, and alerting
- Lead automation and IaC strategy
- Design reusable patterns and frameworks for infrastructure provisioning
- Identify and mitigate reliability risks
- Design and champion resilience patterns and disaster recovery plans
- Design and champion chaos engineering experiments
- Act as a thought leader in reliability engineering
- Mentor SREs and developers on reliability best practices
- Lead architectural review sessions for reliability
- Analyze major incidents to identify architectural weaknesses
- Drive design changes to prevent incident recurrence
- Evolve postmortem culture and incident response capabilities
Work Experience
- 10 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- AWS
- GCP
- Azure
- Kubernetes
- Terraform
- Ansible
- Chef
- Puppet
- Prometheus
- Grafana
- OpenTelemetry
- Jaeger
- ELK Stack
- Datadog
- New Relic
- Go
- Python
- Istio
- Linkerd
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
Not a perfect match?
- WorkatoFull-timeOn-siteSeniorBerlin, Frankfurt am Main, München
- realworld one
Senior DevOps Engineer(m/w/x)
Full-timeOn-siteSeniorMünchen - Entrix
Senior / Staff Cloud Engineer(m/w/x)
Full-timeOn-siteManagementMünchenfrom 135,000 / year - Hawk
Customer Cloud Engineer(m/w/x)
Full-timeOn-siteExperiencedMünchen - Rivada Space Networks
Security Architect(m/w/x)
Full-timeOn-siteSeniorMünchen
Site Reliability Engineering (SRE) Architect(m/w/x)
Architecting public cloud infrastructure and observability strategies for global firms. Expert-level cloud and Kubernetes experience required. Hybrid work, 4-day work week.
Requirements
- 10+ years software engineering, DevOps, or systems engineering experience
- At least 5 years senior SRE or systems architecture experience
- Expert-level knowledge of AWS, GCP, or Azure
- Expert-level knowledge of core cloud services (compute, storage, networking, managed databases)
- Deep, hands-on Kubernetes cluster design and management experience
- Deep, hands-on container-based microservices architecture experience
- Proven expertise architecting infrastructure with Terraform
- Proficiency with Ansible, Chef, or Puppet
- Extensive experience implementing monitoring and observability solutions
- Experience with Prometheus, Grafana, OpenTelemetry, Jaeger, or ELK Stack
- Experience with commercial observability tools (Datadog, New Relic)
- Strong proficiency in Go or Python for automation
- Strong proficiency in Go or Python for tooling
- Strong proficiency in Go or Python for building system integrations
- Deep understanding of distributed systems
- Deep understanding of networking protocols (TCP/IP, HTTP)
- Deep understanding of high-availability design patterns
- Experience working across multiple cloud environments (multi-cloud)
- Professional cloud certifications (e.g., AWS Certified Solutions Architect Professional, Google Professional Cloud Architect)
- Experience with service mesh technologies like Istio or Linkerd
- Knowledge of security best practices in cloud-native environment (DevSecOps)
- Demonstrated experience leading large-scale technology transformations
- Demonstrated experience influencing engineering culture
Tasks
- Design and architect infrastructure and application services on public cloud platforms
- Define long-term vision for system reliability and performance
- Establish standards for SLOs, SLIs, and error budgets
- Architect a comprehensive observability strategy
- Design systems for logging, metrics, tracing, and alerting
- Lead automation and IaC strategy
- Design reusable patterns and frameworks for infrastructure provisioning
- Identify and mitigate reliability risks
- Design and champion resilience patterns and disaster recovery plans
- Design and champion chaos engineering experiments
- Act as a thought leader in reliability engineering
- Mentor SREs and developers on reliability best practices
- Lead architectural review sessions for reliability
- Analyze major incidents to identify architectural weaknesses
- Drive design changes to prevent incident recurrence
- Evolve postmortem culture and incident response capabilities
Work Experience
- 10 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- AWS
- GCP
- Azure
- Kubernetes
- Terraform
- Ansible
- Chef
- Puppet
- Prometheus
- Grafana
- OpenTelemetry
- Jaeger
- ELK Stack
- Datadog
- New Relic
- Go
- Python
- Istio
- Linkerd
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
About the Company
Infosys Consulting - Europe
Industry
Consulting
Description
Infosys Consulting is a globally renowned management consulting firm focused on industry disruption and technology, partnering with clients on their transformation journeys.
Not a perfect match?
- Workato
Senior Infrastructure Engineer - Observability(m/w/x)
Full-timeOn-siteSeniorBerlin, Frankfurt am Main, München - realworld one
Senior DevOps Engineer(m/w/x)
Full-timeOn-siteSeniorMünchen - Entrix
Senior / Staff Cloud Engineer(m/w/x)
Full-timeOn-siteManagementMünchenfrom 135,000 / year - Hawk
Customer Cloud Engineer(m/w/x)
Full-timeOn-siteExperiencedMünchen - Rivada Space Networks
Security Architect(m/w/x)
Full-timeOn-siteSeniorMünchen