New Job?Nejo!

The AI Job Search Engine

Back

TATalon.One

17d ago

Senior Site Reliability Engineer(m/w/x)

Berlin

Full-timeWith Home OfficeSenior

Nejo AI Summary

Apply now

Observability architecture and SLO-driven reliability for a high-scale promotions engine. Expertise in Grafana stack and monitoring design required. Annual €1,000 learning budget and work-from-abroad options.

Requirements

Strong ownership for production health
Ability to establish SLO-driven reliability
Strong observability instincts and dashboarding
Experience with Grafana stack and pipelines
Experience designing monitoring and observability architectures
Understanding of Kubernetes and Google Cloud
Understanding of the OpenTelemetry protocol
Proactive mindset and solution-oriented approach
Strong communication skills under pressure
Ability to influence engineering practices

Tasks

Own availability, latency, and error rates
Define and introduce SLOs and error budgets
Establish reliability targets to drive engineering prioritization
Guide engineering with reliability designs and standards
Build and evolve observability across metrics, logs, and traces
Design end-to-end monitoring and observability architecture
Manage data pipelines and signal quality
Develop alert strategies, dashboards, and SLO implementations
Ensure cost-aware scalability of monitoring systems
Build reliability tooling and automation to eliminate toil
Address underlying incident causes to drive structural improvements
Lead and improve incident management and on-call readiness
Manage severity handling and stakeholder communication
Conduct blameless postmortems with strong follow-through
Reduce noisy alerts and close reliability gaps
Automate recurring operational work
Work deeply in Kubernetes and Google Cloud environments
Make deployments safer and more predictable
Apply GitOps principles for versioned and traceable changes

Work Experience

approx. 4 - 6 years

Education

Bachelor's degreeOR
Master's degree

Languages

English – Business Fluent

Tools & Technologies

Grafana
Prometheus
Grafana Alloy
Loki
Tempo
Kubernetes
Google Cloud
OpenTelemetry

Benefits

Flexible Working

Flexible working hours
90-day worldwide remote work

Workation & Sabbatical

Work from abroad

Learning & Development

€1,000 annual learning budget
Full LinkedIn Learning access
Free German language courses

More Vacation Days

30 days of annual leave
Paid birthday leave
Paid moving day leave

Additional Allowances

Home office setup budget
Monthly home office allowance

Mental Health Support

Mental health support

Corporate Discounts

Discounted Urban Sports Club membership

Retirement Plans

20% company pension subsidy

Public Transport Subsidies

Subsidised BVG transport ticket

Informal Culture

Dog-friendly office

Company Bike

BusinessBike leasing

Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Talon.One and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

Not a perfect match?

100+ Similar Jobs in Berlin View all

Scout24
Senior Platform Engineer - Site Reliability(m/w/x)
Full-timeWith HomeofficeManagement
Berlin
Redcare Pharmacy
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSenior
Berlin
Doctolib
Senior Site Reliability Engineer - Observability(m/w/x)
Full-timeWith HomeofficeSenior
Berlin
Almedia
Staff Site Reliability Engineer / DevOps(m/w/x)
Full-timeWith HomeofficeExperienced
Berlin
from 125,000 / year
ImmoScout24
Senior Platform Engineer - Site Reliability(m/w/x)
Full-timeWith HomeofficeManagement
Berlin

View all 100+ similar jobs

TATalon.One

17d ago