Skip to content
New Job?Nejo!

Your personal AI career agent

GEGetYourGuide

Staff Site Reliability Engineer(m/w/x)

Berlin
Full-timeWith Home OfficeSenior
AI/ML

Improving system reliability for global travel booking platform, focusing on observability tooling and incident prevention. Deep understanding of observability tooling and proven experience reducing MTTD, MTTR, and change failure rate required. Work from anywhere (40 days/year), flexible working arrangements.

Requirements

  • Deep understanding of observability tooling (Datadog)
  • Proven experience reducing MTTD, MTTR, change failure rate
  • Strong coding skills in Java
  • Comfortable reading/contributing in Go
  • Frontend context for React/Vue collaboration
  • Experience with Kubernetes
  • Experience with AWS
  • Experience with service mesh technologies (Istio/Envoy)
  • Solid understanding of distributed systems
  • Solid understanding of networking
  • Solid understanding of container technology
  • Hands-on experience with CI/CD
  • Hands-on experience with automated testing strategies
  • Hands-on experience with build systems
  • Ability to influence engineers and teams
  • Excellent written communication skills in English
  • Excellent verbal communication skills in English
  • Positive, proactive team player
  • Passionate about operational excellence
  • Led company-wide initiatives to improve DORA metrics
  • Identified systemic gaps in automated testing
  • Driven improvements reducing change failure rate
  • Driven improvements reducing production incidents
  • Embedded operational excellence practices into culture
  • Driven cost-reduction outcomes through improvements

Tasks

  • Prevent incidents and enhance user trust
  • Enable faster incident resolution
  • Drive operational excellence and reliability
  • Partner with product teams to improve system reliability
  • Reduce incident frequency and resolution times
  • Lead post-incident reviews and implement improvements
  • Build diagnostic and resolution tooling
  • Promote blameless incident handling and continuous improvement
  • Participate in on-call infrastructure rotation
  • Advance Datadog observability practices
  • Ensure meaningful SLOs and actionable alerts
  • Enable efficient production debugging
  • Improve change failure rate with automated testing
  • Reduce deployment costs and risks
  • Design and maintain well-documented development paths
  • Collaborate with product teams on system design
  • Guide teams on infrastructure best practices
  • Identify and implement cost optimization
  • Leverage AI for incident response and workflow improvement

Work Experience

  • approx. 4 - 6 years

Education

  • Vocational certificationOR
  • Bachelor's degreeOR
  • Master's degree

Languages

  • EnglishBusiness Fluent

Tools & Technologies

  • Datadog
  • Java
  • Go
  • React
  • Vue
  • Kubernetes
  • AWS
  • Istio
  • Envoy

Benefits

Additional Allowances

  • Annual personal growth budget

Mentorship & Coaching

  • Mentorship programs

Flexible Working

  • Work from anywhere (40 days/year)
  • Flexible working arrangements

Team Events

  • Quarterly team events
  • Yearly company-wide events

Public Transport Subsidies

  • Monthly transportation budget

Healthcare & Fitness

  • Monthly fitness budget
  • Health and wellness benefits

Corporate Discounts

  • Discounts on GetYourGuide activities

Learning & Development

  • Language reimbursement program
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of GetYourGuide and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

Like this job?

Beta

Your Career Agent finds similar jobs for you every day.


  • GetYourGuide

    Senior Engineer, Operational Excellence(m/w/x)

    Full-timeWith HomeofficeSenior
    Berlin
  • Nebius

    Senior Site Reliability Engineer(m/w/x)

    Full-timeWith HomeofficeSenior
    Berlin
  • Scout24

    Senior Platform Engineer - Site Reliability(m/w/x)

    Full-timeWith HomeofficeManagement
    Berlin
  • ImmoScout24

    Senior Platform Engineer - Site Reliability(m/w/x)

    Full-timeWith HomeofficeManagement
    Berlin
  • IONOS SE

    Site Reliability Engineer(m/w/x)

    Full-timeWith HomeofficeExperienced
    Berlin
View all 100+ similar jobs

Nejo is an AI – results may be incomplete or contain mistakes