Neuer Job?Nejo!

Die KI-Suchmaschine für Jobs

Lambda

vor 3 Monaten

Senior Site Reliability Engineer, Managed Kubernetes(m/w/x)

Berlin

ab 310.000 / Jahr

VollzeitRemoteSenior

AI/ML

Nejo KI-Zusammenfassung

Jetzt bewerben

Beschreibung

In this role, you will manage Kubernetes clusters, ensuring their reliability and performance. Your daily responsibilities will include troubleshooting issues, automating processes, and collaborating with teams to enhance the cloud infrastructure.

Lass KI die perfekten Jobs für dich finden!

Lade deinen CV hoch und die Nejo-KI findet passende Stellenangebote für dich.

KI-Jobsuche starten

Anforderungen

•6+ years of experience in SRE, operations engineer, or similar role
•Strong programming skills in Go and Python
•Proven experience operating Kubernetes clusters in production environments
•Ability to work independently or as part of a team
•Ability to work with customers during incidents
•Familiarity with observability tools like Prometheus, Grafana, FluentBit
•Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API
•Deep Kubernetes expertise: CRDs, CSI, CNI, Kubernetes Operator Coding experience
•Exposure to HPC clusters, AI/ML workloads, or large-scale GPU clusters
•Hybrid or multi-cloud Kubernetes environment experience
•Contributions to CNCF projects or Kubernetes SIGs
•Diversity of backgrounds, experiences, and skills welcomed

Berufserfahrung

6 Jahre

Aufgaben

•Operate and maintain bare-metal Kubernetes clusters
•Handle cluster degradation, recovery, resizing, and incident response
•Participate in a well-managed on-call rotation for critical incidents
•Assist customers with Kubernetes questions and workload integration
•Collaborate with HPC Ops and Datacenter Ops teams on cross-functional issues
•Use Python and Golang to create tooling and automate platform validation
•Design, build, and maintain scalable control plane services and custom controllers
•Develop automation for cluster lifecycle management, including provisioning and upgrades
•Define and implement SLOs and SLIs for Kubernetes services and platform reliability

Tools & Technologien

GoPythonGitOpsArgoCDHelmKubernetesPrometheusGrafanaFluentBitkubeadmCluster API

Sprachen

Englisch – verhandlungssicher

Benefits

Gesundheits- & Fitnessangebote

•Health, dental, and vision coverage
•Wellness stipend

Öffi Tickets

•Commuter stipend

Betriebliche Altersvorsorge

•401k Plan with 2% company match

Mehr Urlaubstage

•Flexible Paid Time Off Plan

Die Originalanzeige dieses Stellenangebotes in der aktuellsten Version findest du hier. Nejo hat diesen Job automatisch von der Website des Unternehmens Lambda erfasst und die Informationen auf Nejo mit Hilfe von KI für dich aufbereitet. Trotz sorgfältiger Analyse können einzelne Informationen unvollständig oder ungenau sein. Bitte prüfe immer alle Angaben in der Originalanzeige! Inhalte und Urheberrechte der Originalanzeige liegen beim ausschreibenden Unternehmen.

Noch nicht perfekt?

100+ Ähnliche Jobs in Berlin

GetYourGuide
Senior Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeSenior
Berlin
Redcare Pharmacy
Senior Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeSenior
Berlin
Nebius
Senior Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeSenior
Berlin
fiskaly
Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeKeine Angabe
ab 80.000 / Jahr
Berlin, Wien
Wire Germany GmbH
Site Reliability Engineer / Systems Engineer(m/w/x)
Vollzeitmit HomeofficeSenior
Berlin

100+ Alle ähnlichen Jobs ansehen

Lambda

vor 3 Monaten

Senior Site Reliability Engineer, Managed Kubernetes(m/w/x)

Berlin

ab 310.000 / Jahr

VollzeitRemoteSenior

AI/ML

Nejo KI-Zusammenfassung

Jetzt bewerben

Neuer Job?Nejo!

Die KI-Suchmaschine für Jobs

Beschreibung

Lass KI die perfekten Jobs für dich finden!

Lade deinen CV hoch und die Nejo-KI findet passende Stellenangebote für dich.

KI-Jobsuche starten

Anforderungen

•6+ years of experience in SRE, operations engineer, or similar role
•Strong programming skills in Go and Python
•Proven experience operating Kubernetes clusters in production environments
•Ability to work independently or as part of a team
•Ability to work with customers during incidents
•Familiarity with observability tools like Prometheus, Grafana, FluentBit
•Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API
•Deep Kubernetes expertise: CRDs, CSI, CNI, Kubernetes Operator Coding experience
•Exposure to HPC clusters, AI/ML workloads, or large-scale GPU clusters
•Hybrid or multi-cloud Kubernetes environment experience
•Contributions to CNCF projects or Kubernetes SIGs
•Diversity of backgrounds, experiences, and skills welcomed

Berufserfahrung

6 Jahre

Aufgaben

•Operate and maintain bare-metal Kubernetes clusters
•Handle cluster degradation, recovery, resizing, and incident response
•Participate in a well-managed on-call rotation for critical incidents
•Assist customers with Kubernetes questions and workload integration
•Collaborate with HPC Ops and Datacenter Ops teams on cross-functional issues
•Use Python and Golang to create tooling and automate platform validation
•Design, build, and maintain scalable control plane services and custom controllers
•Develop automation for cluster lifecycle management, including provisioning and upgrades
•Define and implement SLOs and SLIs for Kubernetes services and platform reliability

Tools & Technologien

GoPythonGitOpsArgoCDHelmKubernetesPrometheusGrafanaFluentBitkubeadmCluster API

Sprachen

Englisch – verhandlungssicher

Benefits

Gesundheits- & Fitnessangebote

•Health, dental, and vision coverage
•Wellness stipend

Öffi Tickets

•Commuter stipend

Betriebliche Altersvorsorge

•401k Plan with 2% company match

Mehr Urlaubstage

•Flexible Paid Time Off Plan

Über das Unternehmen

Lambda

Branche

Beschreibung

The company builds Gigawatt-scale AI Factories for Training and Inference and aims to make compute as ubiquitous as electricity.

Mehr Jobs

Noch nicht perfekt?

100+ Ähnliche Jobs in Berlin

GetYourGuide
Senior Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeSenior
Berlin
Redcare Pharmacy
Senior Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeSenior
Berlin
Nebius
Senior Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeSenior
Berlin
fiskaly
Site Reliability Engineer(m/w/x)
Vollzeitmit HomeofficeKeine Angabe
ab 80.000 / Jahr
Berlin, Wien
Wire Germany GmbH
Site Reliability Engineer / Systems Engineer(m/w/x)
Vollzeitmit HomeofficeSenior
Berlin

100+ Alle ähnlichen Jobs ansehen

Senior Site Reliability Engineer, Managed Kubernetes(m/w/x)

Beschreibung

Anforderungen

Berufserfahrung

Aufgaben

Tools & Technologien

Sprachen

Benefits

Gesundheits- & Fitnessangebote

Öffi Tickets

Betriebliche Altersvorsorge

Mehr Urlaubstage

Senior Site Reliability Engineer(m/w/x)

Senior Site Reliability Engineer(m/w/x)

Senior Site Reliability Engineer(m/w/x)

Site Reliability Engineer(m/w/x)

Site Reliability Engineer / Systems Engineer(m/w/x)

Senior Site Reliability Engineer, Managed Kubernetes(m/w/x)

Beschreibung

Anforderungen

Berufserfahrung

Aufgaben

Tools & Technologien

Sprachen

Benefits

Gesundheits- & Fitnessangebote

Öffi Tickets

Betriebliche Altersvorsorge

Mehr Urlaubstage

Über das Unternehmen

Senior Site Reliability Engineer(m/w/x)

Senior Site Reliability Engineer(m/w/x)

Senior Site Reliability Engineer(m/w/x)

Site Reliability Engineer(m/w/x)

Site Reliability Engineer / Systems Engineer(m/w/x)