New Job?Nejo!

The AI Job Search Engine

Lambda

3mo ago

Senior Site Reliability Engineer, Managed Kubernetes(m/w/x)

Berlin

from 310,000 / year

Full-timeRemoteSenior

AI/ML

Nejo AI Summary

Apply now

Description

In this role, you will manage Kubernetes clusters, ensuring their reliability and performance. Your daily responsibilities will include troubleshooting issues, automating processes, and collaborating with teams to enhance the cloud infrastructure.

Let AI find the perfect jobs for you!

Upload your CV and Nejo AI will find matching job offers for you.

Start AI Job Search

Requirements

•6+ years of experience in SRE, operations engineer, or similar role
•Strong programming skills in Go and Python
•Proven experience operating Kubernetes clusters in production environments
•Ability to work independently or as part of a team
•Ability to work with customers during incidents
•Familiarity with observability tools like Prometheus, Grafana, FluentBit
•Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API
•Deep Kubernetes expertise: CRDs, CSI, CNI, Kubernetes Operator Coding experience
•Exposure to HPC clusters, AI/ML workloads, or large-scale GPU clusters
•Hybrid or multi-cloud Kubernetes environment experience
•Contributions to CNCF projects or Kubernetes SIGs
•Diversity of backgrounds, experiences, and skills welcomed

Work Experience

6 years

Tasks

•Operate and maintain bare-metal Kubernetes clusters
•Handle cluster degradation, recovery, resizing, and incident response
•Participate in a well-managed on-call rotation for critical incidents
•Assist customers with Kubernetes questions and workload integration
•Collaborate with HPC Ops and Datacenter Ops teams on cross-functional issues
•Use Python and Golang to create tooling and automate platform validation
•Design, build, and maintain scalable control plane services and custom controllers
•Develop automation for cluster lifecycle management, including provisioning and upgrades
•Define and implement SLOs and SLIs for Kubernetes services and platform reliability

Tools & Technologies

GoPythonGitOpsArgoCDHelmKubernetesPrometheusGrafanaFluentBitkubeadmCluster API

Languages

English – Business Fluent

Benefits

Healthcare & Fitness

•Health, dental, and vision coverage
•Wellness stipend

Public Transport Subsidies

•Commuter stipend

Retirement Plans

•401k Plan with 2% company match

More Vacation Days

•Flexible Paid Time Off Plan

Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Lambda and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

Not a perfect match?

100+ Similar Jobs in Berlin

GetYourGuide
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSenior
Berlin
Redcare Pharmacy
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSenior
Berlin
Nebius
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSenior
Berlin
fiskaly
Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeNot specified
from 80,000 / year
Berlin, Wien
Wire Germany GmbH
Site Reliability Engineer / Systems Engineer(m/w/x)
Full-timeWith HomeofficeSenior
Berlin

100+ View all similar jobs

Lambda

3mo ago

Senior Site Reliability Engineer, Managed Kubernetes(m/w/x)

Berlin

from 310,000 / year

Full-timeRemoteSenior

AI/ML

Nejo AI Summary

Apply now

New Job?Nejo!

The AI Job Search Engine

Description

Let AI find the perfect jobs for you!

Upload your CV and Nejo AI will find matching job offers for you.

Start AI Job Search

Requirements

•6+ years of experience in SRE, operations engineer, or similar role
•Strong programming skills in Go and Python
•Proven experience operating Kubernetes clusters in production environments
•Ability to work independently or as part of a team
•Ability to work with customers during incidents
•Familiarity with observability tools like Prometheus, Grafana, FluentBit
•Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API
•Deep Kubernetes expertise: CRDs, CSI, CNI, Kubernetes Operator Coding experience
•Exposure to HPC clusters, AI/ML workloads, or large-scale GPU clusters
•Hybrid or multi-cloud Kubernetes environment experience
•Contributions to CNCF projects or Kubernetes SIGs
•Diversity of backgrounds, experiences, and skills welcomed

Work Experience

6 years

Tasks

•Operate and maintain bare-metal Kubernetes clusters
•Handle cluster degradation, recovery, resizing, and incident response
•Participate in a well-managed on-call rotation for critical incidents
•Assist customers with Kubernetes questions and workload integration
•Collaborate with HPC Ops and Datacenter Ops teams on cross-functional issues
•Use Python and Golang to create tooling and automate platform validation
•Design, build, and maintain scalable control plane services and custom controllers
•Develop automation for cluster lifecycle management, including provisioning and upgrades
•Define and implement SLOs and SLIs for Kubernetes services and platform reliability

Tools & Technologies

GoPythonGitOpsArgoCDHelmKubernetesPrometheusGrafanaFluentBitkubeadmCluster API

Languages

English – Business Fluent

Benefits

Healthcare & Fitness

•Health, dental, and vision coverage
•Wellness stipend

Public Transport Subsidies

•Commuter stipend

Retirement Plans

•401k Plan with 2% company match

More Vacation Days

•Flexible Paid Time Off Plan

About the Company

Lambda

Industry

Description

The company builds Gigawatt-scale AI Factories for Training and Inference and aims to make compute as ubiquitous as electricity.

More Jobs

Not a perfect match?

100+ Similar Jobs in Berlin

GetYourGuide
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSenior
Berlin
Redcare Pharmacy
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSenior
Berlin
Nebius
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSenior
Berlin
fiskaly
Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeNot specified
from 80,000 / year
Berlin, Wien
Wire Germany GmbH
Site Reliability Engineer / Systems Engineer(m/w/x)
Full-timeWith HomeofficeSenior
Berlin

100+ View all similar jobs

Senior Site Reliability Engineer, Managed Kubernetes(m/w/x)

Description

Requirements

Work Experience

Tasks

Tools & Technologies

Languages

Benefits

Healthcare & Fitness

Public Transport Subsidies

Retirement Plans

More Vacation Days

Senior Site Reliability Engineer(m/w/x)

Senior Site Reliability Engineer(m/w/x)

Senior Site Reliability Engineer(m/w/x)

Site Reliability Engineer(m/w/x)

Site Reliability Engineer / Systems Engineer(m/w/x)

Senior Site Reliability Engineer, Managed Kubernetes(m/w/x)

Description

Requirements

Work Experience

Tasks

Tools & Technologies

Languages

Benefits

Healthcare & Fitness

Public Transport Subsidies

Retirement Plans

More Vacation Days

About the Company

Senior Site Reliability Engineer(m/w/x)

Senior Site Reliability Engineer(m/w/x)

Senior Site Reliability Engineer(m/w/x)

Site Reliability Engineer(m/w/x)

Site Reliability Engineer / Systems Engineer(m/w/x)