Dein persönlicher KI-Karriere-Agent
Site Reliability Engineering (SRE) Architect(m/w/x)
Architecting public cloud infrastructure and observability strategies for global firms. Expert-level cloud and Kubernetes experience required. Hybrid work, 4-day work week.
Anforderungen
- 10+ years software engineering, DevOps, or systems engineering experience
- At least 5 years senior SRE or systems architecture experience
- Expert-level knowledge of AWS, GCP, or Azure
- Expert-level knowledge of core cloud services (compute, storage, networking, managed databases)
- Deep, hands-on Kubernetes cluster design and management experience
- Deep, hands-on container-based microservices architecture experience
- Proven expertise architecting infrastructure with Terraform
- Proficiency with Ansible, Chef, or Puppet
- Extensive experience implementing monitoring and observability solutions
- Experience with Prometheus, Grafana, OpenTelemetry, Jaeger, or ELK Stack
- Experience with commercial observability tools (Datadog, New Relic)
- Strong proficiency in Go or Python for automation
- Strong proficiency in Go or Python for tooling
- Strong proficiency in Go or Python for building system integrations
- Deep understanding of distributed systems
- Deep understanding of networking protocols (TCP/IP, HTTP)
- Deep understanding of high-availability design patterns
- Experience working across multiple cloud environments (multi-cloud)
- Professional cloud certifications (e.g., AWS Certified Solutions Architect Professional, Google Professional Cloud Architect)
- Experience with service mesh technologies like Istio or Linkerd
- Knowledge of security best practices in cloud-native environment (DevSecOps)
- Demonstrated experience leading large-scale technology transformations
- Demonstrated experience influencing engineering culture
Aufgaben
- Design and architect infrastructure and application services on public cloud platforms
- Define long-term vision for system reliability and performance
- Establish standards for SLOs, SLIs, and error budgets
- Architect a comprehensive observability strategy
- Design systems for logging, metrics, tracing, and alerting
- Lead automation and IaC strategy
- Design reusable patterns and frameworks for infrastructure provisioning
- Identify and mitigate reliability risks
- Design and champion resilience patterns and disaster recovery plans
- Design and champion chaos engineering experiments
- Act as a thought leader in reliability engineering
- Mentor SREs and developers on reliability best practices
- Lead architectural review sessions for reliability
- Analyze major incidents to identify architectural weaknesses
- Drive design changes to prevent incident recurrence
- Evolve postmortem culture and incident response capabilities
Berufserfahrung
- 10 Jahre
Ausbildung
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- AWS
- GCP
- Azure
- Kubernetes
- Terraform
- Ansible
- Chef
- Puppet
- Prometheus
- Grafana
- OpenTelemetry
- Jaeger
- ELK Stack
- Datadog
- New Relic
- Go
- Python
- Istio
- Linkerd
Gefällt dir diese Stelle?
BetaDein Career Agent findet täglich ähnliche Jobs für dich.
Noch nicht perfekt?
- Giesecke + Devrient GmbHVollzeitnur vor OrtSeniorMünchen
- Workato
Senior Infrastructure Engineer - Observability(m/w/x)
Vollzeitnur vor OrtSeniorBerlin, Frankfurt am Main, München - realworld one
Senior DevOps Engineer(m/w/x)
Vollzeitnur vor OrtSeniorMünchen - Entrix
Senior / Staff Cloud Engineer(m/w/x)
Vollzeitnur vor OrtManagementMünchenab 135.000 / Jahr - Hawk
Customer Cloud Engineer(m/w/x)
Vollzeitnur vor OrtBerufserfahrenMünchen
Site Reliability Engineering (SRE) Architect(m/w/x)
Architecting public cloud infrastructure and observability strategies for global firms. Expert-level cloud and Kubernetes experience required. Hybrid work, 4-day work week.
Anforderungen
- 10+ years software engineering, DevOps, or systems engineering experience
- At least 5 years senior SRE or systems architecture experience
- Expert-level knowledge of AWS, GCP, or Azure
- Expert-level knowledge of core cloud services (compute, storage, networking, managed databases)
- Deep, hands-on Kubernetes cluster design and management experience
- Deep, hands-on container-based microservices architecture experience
- Proven expertise architecting infrastructure with Terraform
- Proficiency with Ansible, Chef, or Puppet
- Extensive experience implementing monitoring and observability solutions
- Experience with Prometheus, Grafana, OpenTelemetry, Jaeger, or ELK Stack
- Experience with commercial observability tools (Datadog, New Relic)
- Strong proficiency in Go or Python for automation
- Strong proficiency in Go or Python for tooling
- Strong proficiency in Go or Python for building system integrations
- Deep understanding of distributed systems
- Deep understanding of networking protocols (TCP/IP, HTTP)
- Deep understanding of high-availability design patterns
- Experience working across multiple cloud environments (multi-cloud)
- Professional cloud certifications (e.g., AWS Certified Solutions Architect Professional, Google Professional Cloud Architect)
- Experience with service mesh technologies like Istio or Linkerd
- Knowledge of security best practices in cloud-native environment (DevSecOps)
- Demonstrated experience leading large-scale technology transformations
- Demonstrated experience influencing engineering culture
Aufgaben
- Design and architect infrastructure and application services on public cloud platforms
- Define long-term vision for system reliability and performance
- Establish standards for SLOs, SLIs, and error budgets
- Architect a comprehensive observability strategy
- Design systems for logging, metrics, tracing, and alerting
- Lead automation and IaC strategy
- Design reusable patterns and frameworks for infrastructure provisioning
- Identify and mitigate reliability risks
- Design and champion resilience patterns and disaster recovery plans
- Design and champion chaos engineering experiments
- Act as a thought leader in reliability engineering
- Mentor SREs and developers on reliability best practices
- Lead architectural review sessions for reliability
- Analyze major incidents to identify architectural weaknesses
- Drive design changes to prevent incident recurrence
- Evolve postmortem culture and incident response capabilities
Berufserfahrung
- 10 Jahre
Ausbildung
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- AWS
- GCP
- Azure
- Kubernetes
- Terraform
- Ansible
- Chef
- Puppet
- Prometheus
- Grafana
- OpenTelemetry
- Jaeger
- ELK Stack
- Datadog
- New Relic
- Go
- Python
- Istio
- Linkerd
Gefällt dir diese Stelle?
BetaDein Career Agent findet täglich ähnliche Jobs für dich.
Über das Unternehmen
Infosys Consulting - Europe
Branche
Consulting
Beschreibung
Infosys Consulting is a globally renowned management consulting firm focused on industry disruption and technology, partnering with clients on their transformation journeys.
Noch nicht perfekt?
- Giesecke + Devrient GmbH
Infrastructure, DevOps Architect(m/w/x)
Vollzeitnur vor OrtSeniorMünchen - Workato
Senior Infrastructure Engineer - Observability(m/w/x)
Vollzeitnur vor OrtSeniorBerlin, Frankfurt am Main, München - realworld one
Senior DevOps Engineer(m/w/x)
Vollzeitnur vor OrtSeniorMünchen - Entrix
Senior / Staff Cloud Engineer(m/w/x)
Vollzeitnur vor OrtManagementMünchenab 135.000 / Jahr - Hawk
Customer Cloud Engineer(m/w/x)
Vollzeitnur vor OrtBerufserfahrenMünchen