The AI Job Search Engine
Senior Site Reliability Engineer(m/w/x)
Observability architecture and SLO-driven reliability for a high-scale promotions engine. Expertise in Grafana stack and monitoring design required. Annual €1,000 learning budget and work-from-abroad options.
Requirements
- Strong ownership for production health
- Ability to establish SLO-driven reliability
- Strong observability instincts and dashboarding
- Experience with Grafana stack and pipelines
- Experience designing monitoring and observability architectures
- Understanding of Kubernetes and Google Cloud
- Understanding of the OpenTelemetry protocol
- Proactive mindset and solution-oriented approach
- Strong communication skills under pressure
- Ability to influence engineering practices
Tasks
- Own availability, latency, and error rates
- Define and introduce SLOs and error budgets
- Establish reliability targets to drive engineering prioritization
- Guide engineering with reliability designs and standards
- Build and evolve observability across metrics, logs, and traces
- Design end-to-end monitoring and observability architecture
- Manage data pipelines and signal quality
- Develop alert strategies, dashboards, and SLO implementations
- Ensure cost-aware scalability of monitoring systems
- Build reliability tooling and automation to eliminate toil
- Address underlying incident causes to drive structural improvements
- Lead and improve incident management and on-call readiness
- Manage severity handling and stakeholder communication
- Conduct blameless postmortems with strong follow-through
- Reduce noisy alerts and close reliability gaps
- Automate recurring operational work
- Work deeply in Kubernetes and Google Cloud environments
- Make deployments safer and more predictable
- Apply GitOps principles for versioned and traceable changes
Work Experience
- approx. 4 - 6 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Grafana
- Prometheus
- Grafana Alloy
- Loki
- Tempo
- Kubernetes
- Google Cloud
- OpenTelemetry
Benefits
Flexible Working
- Flexible working hours
- 90-day worldwide remote work
Workation & Sabbatical
- Work from abroad
Learning & Development
- €1,000 annual learning budget
- Full LinkedIn Learning access
- Free German language courses
More Vacation Days
- 30 days of annual leave
- Paid birthday leave
- Paid moving day leave
Additional Allowances
- Home office setup budget
- Monthly home office allowance
Mental Health Support
- Mental health support
Corporate Discounts
- Discounted Urban Sports Club membership
Retirement Plans
- 20% company pension subsidy
Public Transport Subsidies
- Subsidised BVG transport ticket
Informal Culture
- Dog-friendly office
Company Bike
- BusinessBike leasing
Not a perfect match?
- Scout24Full-timeWith HomeofficeManagementBerlin
- Redcare Pharmacy
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Doctolib
Senior Site Reliability Engineer - Observability(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Almedia
Staff Site Reliability Engineer / DevOps(m/w/x)
Full-timeWith HomeofficeExperiencedBerlinfrom 125,000 / year - ImmoScout24
Senior Platform Engineer - Site Reliability(m/w/x)
Full-timeWith HomeofficeManagementBerlin
Senior Site Reliability Engineer(m/w/x)
Observability architecture and SLO-driven reliability for a high-scale promotions engine. Expertise in Grafana stack and monitoring design required. Annual €1,000 learning budget and work-from-abroad options.
Requirements
- Strong ownership for production health
- Ability to establish SLO-driven reliability
- Strong observability instincts and dashboarding
- Experience with Grafana stack and pipelines
- Experience designing monitoring and observability architectures
- Understanding of Kubernetes and Google Cloud
- Understanding of the OpenTelemetry protocol
- Proactive mindset and solution-oriented approach
- Strong communication skills under pressure
- Ability to influence engineering practices
Tasks
- Own availability, latency, and error rates
- Define and introduce SLOs and error budgets
- Establish reliability targets to drive engineering prioritization
- Guide engineering with reliability designs and standards
- Build and evolve observability across metrics, logs, and traces
- Design end-to-end monitoring and observability architecture
- Manage data pipelines and signal quality
- Develop alert strategies, dashboards, and SLO implementations
- Ensure cost-aware scalability of monitoring systems
- Build reliability tooling and automation to eliminate toil
- Address underlying incident causes to drive structural improvements
- Lead and improve incident management and on-call readiness
- Manage severity handling and stakeholder communication
- Conduct blameless postmortems with strong follow-through
- Reduce noisy alerts and close reliability gaps
- Automate recurring operational work
- Work deeply in Kubernetes and Google Cloud environments
- Make deployments safer and more predictable
- Apply GitOps principles for versioned and traceable changes
Work Experience
- approx. 4 - 6 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Grafana
- Prometheus
- Grafana Alloy
- Loki
- Tempo
- Kubernetes
- Google Cloud
- OpenTelemetry
Benefits
Flexible Working
- Flexible working hours
- 90-day worldwide remote work
Workation & Sabbatical
- Work from abroad
Learning & Development
- €1,000 annual learning budget
- Full LinkedIn Learning access
- Free German language courses
More Vacation Days
- 30 days of annual leave
- Paid birthday leave
- Paid moving day leave
Additional Allowances
- Home office setup budget
- Monthly home office allowance
Mental Health Support
- Mental health support
Corporate Discounts
- Discounted Urban Sports Club membership
Retirement Plans
- 20% company pension subsidy
Public Transport Subsidies
- Subsidised BVG transport ticket
Informal Culture
- Dog-friendly office
Company Bike
- BusinessBike leasing
About the Company
Talon.One
Industry
IT
Description
The company develops a flexible, highly scalable promotions engine using state-of-the-art technologies.
Not a perfect match?
- Scout24
Senior Platform Engineer - Site Reliability(m/w/x)
Full-timeWith HomeofficeManagementBerlin - Redcare Pharmacy
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Doctolib
Senior Site Reliability Engineer - Observability(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Almedia
Staff Site Reliability Engineer / DevOps(m/w/x)
Full-timeWith HomeofficeExperiencedBerlinfrom 125,000 / year - ImmoScout24
Senior Platform Engineer - Site Reliability(m/w/x)
Full-timeWith HomeofficeManagementBerlin