Dein persönlicher KI-Karriere-Agent
Staff Site Reliability Engineer(m/w/x)
Improving system reliability for global travel booking platform, focusing on observability tooling and incident prevention. Deep understanding of observability tooling and proven experience reducing MTTD, MTTR, and change failure rate required. Work from anywhere (40 days/year), flexible working arrangements.
Anforderungen
- Deep understanding of observability tooling (Datadog)
- Proven experience reducing MTTD, MTTR, change failure rate
- Strong coding skills in Java
- Comfortable reading/contributing in Go
- Frontend context for React/Vue collaboration
- Experience with Kubernetes
- Experience with AWS
- Experience with service mesh technologies (Istio/Envoy)
- Solid understanding of distributed systems
- Solid understanding of networking
- Solid understanding of container technology
- Hands-on experience with CI/CD
- Hands-on experience with automated testing strategies
- Hands-on experience with build systems
- Ability to influence engineers and teams
- Excellent written communication skills in English
- Excellent verbal communication skills in English
- Positive, proactive team player
- Passionate about operational excellence
- Led company-wide initiatives to improve DORA metrics
- Identified systemic gaps in automated testing
- Driven improvements reducing change failure rate
- Driven improvements reducing production incidents
- Embedded operational excellence practices into culture
- Driven cost-reduction outcomes through improvements
Aufgaben
- Prevent incidents and enhance user trust
- Enable faster incident resolution
- Drive operational excellence and reliability
- Partner with product teams to improve system reliability
- Reduce incident frequency and resolution times
- Lead post-incident reviews and implement improvements
- Build diagnostic and resolution tooling
- Promote blameless incident handling and continuous improvement
- Participate in on-call infrastructure rotation
- Advance Datadog observability practices
- Ensure meaningful SLOs and actionable alerts
- Enable efficient production debugging
- Improve change failure rate with automated testing
- Reduce deployment costs and risks
- Design and maintain well-documented development paths
- Collaborate with product teams on system design
- Guide teams on infrastructure best practices
- Identify and implement cost optimization
- Leverage AI for incident response and workflow improvement
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Abgeschlossene BerufsausbildungODER
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- Datadog
- Java
- Go
- React
- Vue
- Kubernetes
- AWS
- Istio
- Envoy
Benefits
Sonstige Zulagen
- Annual personal growth budget
Mentoring & Coaching
- Mentorship programs
Flexibles Arbeiten
- Work from anywhere (40 days/year)
- Flexible working arrangements
Team Events & Ausflüge
- Quarterly team events
- Yearly company-wide events
Öffi Tickets
- Monthly transportation budget
Gesundheits- & Fitnessangebote
- Monthly fitness budget
- Health and wellness benefits
Mitarbeiterrabatte
- Discounts on GetYourGuide activities
Weiterbildungsangebote
- Language reimbursement program
Gefällt dir diese Stelle?
BetaDein Career Agent findet täglich ähnliche Jobs für dich.
Noch nicht perfekt?
- GetYourGuideVollzeitmit HomeofficeSeniorBerlin
- Doctolib
Senior Site Reliability Engineer - Observability(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Scout24
Senior Platform Engineer - Site Reliability(m/w/x)
Vollzeitmit HomeofficeManagementBerlin - Nebius
Senior Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - ImmoScout24
Senior Platform Engineer - Site Reliability(m/w/x)
Vollzeitmit HomeofficeManagementBerlin
Staff Site Reliability Engineer(m/w/x)
Improving system reliability for global travel booking platform, focusing on observability tooling and incident prevention. Deep understanding of observability tooling and proven experience reducing MTTD, MTTR, and change failure rate required. Work from anywhere (40 days/year), flexible working arrangements.
Anforderungen
- Deep understanding of observability tooling (Datadog)
- Proven experience reducing MTTD, MTTR, change failure rate
- Strong coding skills in Java
- Comfortable reading/contributing in Go
- Frontend context for React/Vue collaboration
- Experience with Kubernetes
- Experience with AWS
- Experience with service mesh technologies (Istio/Envoy)
- Solid understanding of distributed systems
- Solid understanding of networking
- Solid understanding of container technology
- Hands-on experience with CI/CD
- Hands-on experience with automated testing strategies
- Hands-on experience with build systems
- Ability to influence engineers and teams
- Excellent written communication skills in English
- Excellent verbal communication skills in English
- Positive, proactive team player
- Passionate about operational excellence
- Led company-wide initiatives to improve DORA metrics
- Identified systemic gaps in automated testing
- Driven improvements reducing change failure rate
- Driven improvements reducing production incidents
- Embedded operational excellence practices into culture
- Driven cost-reduction outcomes through improvements
Aufgaben
- Prevent incidents and enhance user trust
- Enable faster incident resolution
- Drive operational excellence and reliability
- Partner with product teams to improve system reliability
- Reduce incident frequency and resolution times
- Lead post-incident reviews and implement improvements
- Build diagnostic and resolution tooling
- Promote blameless incident handling and continuous improvement
- Participate in on-call infrastructure rotation
- Advance Datadog observability practices
- Ensure meaningful SLOs and actionable alerts
- Enable efficient production debugging
- Improve change failure rate with automated testing
- Reduce deployment costs and risks
- Design and maintain well-documented development paths
- Collaborate with product teams on system design
- Guide teams on infrastructure best practices
- Identify and implement cost optimization
- Leverage AI for incident response and workflow improvement
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Abgeschlossene BerufsausbildungODER
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- Datadog
- Java
- Go
- React
- Vue
- Kubernetes
- AWS
- Istio
- Envoy
Benefits
Sonstige Zulagen
- Annual personal growth budget
Mentoring & Coaching
- Mentorship programs
Flexibles Arbeiten
- Work from anywhere (40 days/year)
- Flexible working arrangements
Team Events & Ausflüge
- Quarterly team events
- Yearly company-wide events
Öffi Tickets
- Monthly transportation budget
Gesundheits- & Fitnessangebote
- Monthly fitness budget
- Health and wellness benefits
Mitarbeiterrabatte
- Discounts on GetYourGuide activities
Weiterbildungsangebote
- Language reimbursement program
Gefällt dir diese Stelle?
BetaDein Career Agent findet täglich ähnliche Jobs für dich.
Über das Unternehmen
GetYourGuide
Branche
Tourism
Beschreibung
GetYourGuide is the globally leading marketplace for unforgettable travel experiences, helping travelers discover the best things to do.
Noch nicht perfekt?
- GetYourGuide
Senior Engineer, Operational Excellence(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Doctolib
Senior Site Reliability Engineer - Observability(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Scout24
Senior Platform Engineer - Site Reliability(m/w/x)
Vollzeitmit HomeofficeManagementBerlin - Nebius
Senior Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - ImmoScout24
Senior Platform Engineer - Site Reliability(m/w/x)
Vollzeitmit HomeofficeManagementBerlin