Skip to content
New Job?Nejo!

The AI Job Search Engine

TATalon.One

Senior Site Reliability Engineer(m/w/x)

Berlin
Full-timeWith Home OfficeSenior

Observability architecture and SLO-driven reliability for a high-scale promotions engine. Expertise in Grafana stack and monitoring design required. Annual €1,000 learning budget and work-from-abroad options.

Requirements

  • Strong ownership for production health
  • Ability to establish SLO-driven reliability
  • Strong observability instincts and dashboarding
  • Experience with Grafana stack and pipelines
  • Experience designing monitoring and observability architectures
  • Understanding of Kubernetes and Google Cloud
  • Understanding of the OpenTelemetry protocol
  • Proactive mindset and solution-oriented approach
  • Strong communication skills under pressure
  • Ability to influence engineering practices

Tasks

  • Own availability, latency, and error rates
  • Define and introduce SLOs and error budgets
  • Establish reliability targets to drive engineering prioritization
  • Guide engineering with reliability designs and standards
  • Build and evolve observability across metrics, logs, and traces
  • Design end-to-end monitoring and observability architecture
  • Manage data pipelines and signal quality
  • Develop alert strategies, dashboards, and SLO implementations
  • Ensure cost-aware scalability of monitoring systems
  • Build reliability tooling and automation to eliminate toil
  • Address underlying incident causes to drive structural improvements
  • Lead and improve incident management and on-call readiness
  • Manage severity handling and stakeholder communication
  • Conduct blameless postmortems with strong follow-through
  • Reduce noisy alerts and close reliability gaps
  • Automate recurring operational work
  • Work deeply in Kubernetes and Google Cloud environments
  • Make deployments safer and more predictable
  • Apply GitOps principles for versioned and traceable changes

Work Experience

  • approx. 4 - 6 years

Education

  • Bachelor's degreeOR
  • Master's degree

Languages

  • EnglishBusiness Fluent

Tools & Technologies

  • Grafana
  • Prometheus
  • Grafana Alloy
  • Loki
  • Tempo
  • Kubernetes
  • Google Cloud
  • OpenTelemetry

Benefits

Flexible Working

  • Flexible working hours
  • 90-day worldwide remote work

Workation & Sabbatical

  • Work from abroad

Learning & Development

  • €1,000 annual learning budget
  • Full LinkedIn Learning access
  • Free German language courses

More Vacation Days

  • 30 days of annual leave
  • Paid birthday leave
  • Paid moving day leave

Additional Allowances

  • Home office setup budget
  • Monthly home office allowance

Mental Health Support

  • Mental health support

Corporate Discounts

  • Discounted Urban Sports Club membership

Retirement Plans

  • 20% company pension subsidy

Public Transport Subsidies

  • Subsidised BVG transport ticket

Informal Culture

  • Dog-friendly office

Company Bike

  • BusinessBike leasing
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Talon.One and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

  • Scout24

    Senior Platform Engineer - Site Reliability(m/w/x)

    Full-timeWith HomeofficeManagement
    Berlin
  • Redcare Pharmacy

    Senior Site Reliability Engineer(m/w/x)

    Full-timeWith HomeofficeSenior
    Berlin
  • Doctolib

    Senior Site Reliability Engineer - Observability(m/w/x)

    Full-timeWith HomeofficeSenior
    Berlin
  • Almedia

    Staff Site Reliability Engineer / DevOps(m/w/x)

    Full-timeWith HomeofficeExperienced
    Berlin
    from 125,000 / year
  • ImmoScout24

    Senior Platform Engineer - Site Reliability(m/w/x)

    Full-timeWith HomeofficeManagement
    Berlin
View all 100+ similar jobs

Nejo is an AI – results may be incomplete or contain mistakes