Die KI-Suchmaschine für Jobs
Senior Site Reliability Engineer(m/w/x)
Observability architecture and SLO-driven reliability for a high-scale promotions engine. Expertise in Grafana stack and monitoring design required. Annual €1,000 learning budget and work-from-abroad options.
Anforderungen
- Strong ownership for production health
- Ability to establish SLO-driven reliability
- Strong observability instincts and dashboarding
- Experience with Grafana stack and pipelines
- Experience designing monitoring and observability architectures
- Understanding of Kubernetes and Google Cloud
- Understanding of the OpenTelemetry protocol
- Proactive mindset and solution-oriented approach
- Strong communication skills under pressure
- Ability to influence engineering practices
Aufgaben
- Own availability, latency, and error rates
- Define and introduce SLOs and error budgets
- Establish reliability targets to drive engineering prioritization
- Guide engineering with reliability designs and standards
- Build and evolve observability across metrics, logs, and traces
- Design end-to-end monitoring and observability architecture
- Manage data pipelines and signal quality
- Develop alert strategies, dashboards, and SLO implementations
- Ensure cost-aware scalability of monitoring systems
- Build reliability tooling and automation to eliminate toil
- Address underlying incident causes to drive structural improvements
- Lead and improve incident management and on-call readiness
- Manage severity handling and stakeholder communication
- Conduct blameless postmortems with strong follow-through
- Reduce noisy alerts and close reliability gaps
- Automate recurring operational work
- Work deeply in Kubernetes and Google Cloud environments
- Make deployments safer and more predictable
- Apply GitOps principles for versioned and traceable changes
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- Grafana
- Prometheus
- Grafana Alloy
- Loki
- Tempo
- Kubernetes
- Google Cloud
- OpenTelemetry
Benefits
Flexibles Arbeiten
- Flexible working hours
- 90-day worldwide remote work
Workation & Sabbatical
- Work from abroad
Weiterbildungsangebote
- €1,000 annual learning budget
- Full LinkedIn Learning access
- Free German language courses
Mehr Urlaubstage
- 30 days of annual leave
- Paid birthday leave
- Paid moving day leave
Sonstige Zulagen
- Home office setup budget
- Monthly home office allowance
Mentale Gesundheitsförderung
- Mental health support
Mitarbeiterrabatte
- Discounted Urban Sports Club membership
Betriebliche Altersvorsorge
- 20% company pension subsidy
Öffi Tickets
- Subsidised BVG transport ticket
Lockere Unternehmenskultur
- Dog-friendly office
Firmenfahrrad
- BusinessBike leasing
Noch nicht perfekt?
- Scout24Vollzeitmit HomeofficeManagementBerlin
- Redcare Pharmacy
Senior Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Doctolib
Senior Site Reliability Engineer - Observability(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Almedia
Staff Site Reliability Engineer / DevOps(m/w/x)
Vollzeitmit HomeofficeBerufserfahrenBerlinab 125.000 / Jahr - ImmoScout24
Senior Platform Engineer - Site Reliability(m/w/x)
Vollzeitmit HomeofficeManagementBerlin
Senior Site Reliability Engineer(m/w/x)
Observability architecture and SLO-driven reliability for a high-scale promotions engine. Expertise in Grafana stack and monitoring design required. Annual €1,000 learning budget and work-from-abroad options.
Anforderungen
- Strong ownership for production health
- Ability to establish SLO-driven reliability
- Strong observability instincts and dashboarding
- Experience with Grafana stack and pipelines
- Experience designing monitoring and observability architectures
- Understanding of Kubernetes and Google Cloud
- Understanding of the OpenTelemetry protocol
- Proactive mindset and solution-oriented approach
- Strong communication skills under pressure
- Ability to influence engineering practices
Aufgaben
- Own availability, latency, and error rates
- Define and introduce SLOs and error budgets
- Establish reliability targets to drive engineering prioritization
- Guide engineering with reliability designs and standards
- Build and evolve observability across metrics, logs, and traces
- Design end-to-end monitoring and observability architecture
- Manage data pipelines and signal quality
- Develop alert strategies, dashboards, and SLO implementations
- Ensure cost-aware scalability of monitoring systems
- Build reliability tooling and automation to eliminate toil
- Address underlying incident causes to drive structural improvements
- Lead and improve incident management and on-call readiness
- Manage severity handling and stakeholder communication
- Conduct blameless postmortems with strong follow-through
- Reduce noisy alerts and close reliability gaps
- Automate recurring operational work
- Work deeply in Kubernetes and Google Cloud environments
- Make deployments safer and more predictable
- Apply GitOps principles for versioned and traceable changes
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- Grafana
- Prometheus
- Grafana Alloy
- Loki
- Tempo
- Kubernetes
- Google Cloud
- OpenTelemetry
Benefits
Flexibles Arbeiten
- Flexible working hours
- 90-day worldwide remote work
Workation & Sabbatical
- Work from abroad
Weiterbildungsangebote
- €1,000 annual learning budget
- Full LinkedIn Learning access
- Free German language courses
Mehr Urlaubstage
- 30 days of annual leave
- Paid birthday leave
- Paid moving day leave
Sonstige Zulagen
- Home office setup budget
- Monthly home office allowance
Mentale Gesundheitsförderung
- Mental health support
Mitarbeiterrabatte
- Discounted Urban Sports Club membership
Betriebliche Altersvorsorge
- 20% company pension subsidy
Öffi Tickets
- Subsidised BVG transport ticket
Lockere Unternehmenskultur
- Dog-friendly office
Firmenfahrrad
- BusinessBike leasing
Über das Unternehmen
Talon.One
Branche
IT
Beschreibung
The company develops a flexible, highly scalable promotions engine using state-of-the-art technologies.
Noch nicht perfekt?
- Scout24
Senior Platform Engineer - Site Reliability(m/w/x)
Vollzeitmit HomeofficeManagementBerlin - Redcare Pharmacy
Senior Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Doctolib
Senior Site Reliability Engineer - Observability(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Almedia
Staff Site Reliability Engineer / DevOps(m/w/x)
Vollzeitmit HomeofficeBerufserfahrenBerlinab 125.000 / Jahr - ImmoScout24
Senior Platform Engineer - Site Reliability(m/w/x)
Vollzeitmit HomeofficeManagementBerlin