Your personal AI career agent
Site Reliability Engineer(m/w/x)
Operating and evolving cloud/on-prem infrastructure for custom AI solutions on Kubernetes. 2-5 years large-scale production infrastructure experience required. Remote-first setup, 30 days vacation.
Requirements
- 2-5 years of experience with large-scale production infrastructure
- Experience with distributed or service-oriented architectures
- Working knowledge of Infrastructure as Code (Terraform preferred)
- Solid troubleshooting skills across systems
- Pragmatic mindset balancing speed, simplicity, and reliability
- Ownership and accountability for systems end-to-end
- Ability to work independently while aligned with team goals
- Experience optimizing cloud costs at scale
- Interest or experience in Machine Learning / LLM systems
- Experience improving developer experience and platform tooling using AI agents
- Contributions to SRE practices like postmortems, SLIs/SLOs, and reliability engineering culture
Tasks
- Build and operate real-world infrastructure
- Design, configure, and evolve cloud and on-prem environments
- Make self-hosted platform production-ready
- Deliver platform deployable on any Kubernetes setup
- Improve CI/CD pipelines and GitOps setups
- Optimize GitHub workflows for faster, reliable deployments
- Simplify systems and reduce infrastructure costs
- Maintain performance and reliability during optimization
- Champion reliability, scalability, and security best practices
- Implement best practices as working systems
Work Experience
- 2 - 5 years
Education
- Vocational certificationOR
- Bachelor's degreeOR
- Master's degree
Languages
- German – Business Fluent
Tools & Technologies
- AWS
- Kubernetes
- ArgoCD
- Terraform
- Datadog
- Prometheus
Benefits
Competitive Pay
- Stock options
Flexible Working
- Remote-first setup
- Flexible hours
Modern Equipment
- Choice of tech
More Vacation Days
- 30 days vacation
Family Support
- Family sick leave
Additional Allowances
- Monthly sports allowance
Mental Health Support
- Mental health support allowance
Learning & Development
- Annual learning & development budget
Team Events
- Monthly team socials
- In-person meetups
Informal Culture
- Dog-friendly HQ
Not a perfect match?
- IONOS SEFull-timeWith HomeofficeExperiencedBerlin
- Nebius
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Nomitri
Senior DevOps/MLOps(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Solactive AG
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSeniorFrankfurt am Main, Berlin - SysEleven GmbH
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin
Site Reliability Engineer(m/w/x)
Operating and evolving cloud/on-prem infrastructure for custom AI solutions on Kubernetes. 2-5 years large-scale production infrastructure experience required. Remote-first setup, 30 days vacation.
Requirements
- 2-5 years of experience with large-scale production infrastructure
- Experience with distributed or service-oriented architectures
- Working knowledge of Infrastructure as Code (Terraform preferred)
- Solid troubleshooting skills across systems
- Pragmatic mindset balancing speed, simplicity, and reliability
- Ownership and accountability for systems end-to-end
- Ability to work independently while aligned with team goals
- Experience optimizing cloud costs at scale
- Interest or experience in Machine Learning / LLM systems
- Experience improving developer experience and platform tooling using AI agents
- Contributions to SRE practices like postmortems, SLIs/SLOs, and reliability engineering culture
Tasks
- Build and operate real-world infrastructure
- Design, configure, and evolve cloud and on-prem environments
- Make self-hosted platform production-ready
- Deliver platform deployable on any Kubernetes setup
- Improve CI/CD pipelines and GitOps setups
- Optimize GitHub workflows for faster, reliable deployments
- Simplify systems and reduce infrastructure costs
- Maintain performance and reliability during optimization
- Champion reliability, scalability, and security best practices
- Implement best practices as working systems
Work Experience
- 2 - 5 years
Education
- Vocational certificationOR
- Bachelor's degreeOR
- Master's degree
Languages
- German – Business Fluent
Tools & Technologies
- AWS
- Kubernetes
- ArgoCD
- Terraform
- Datadog
- Prometheus
Benefits
Competitive Pay
- Stock options
Flexible Working
- Remote-first setup
- Flexible hours
Modern Equipment
- Choice of tech
More Vacation Days
- 30 days vacation
Family Support
- Family sick leave
Additional Allowances
- Monthly sports allowance
Mental Health Support
- Mental health support allowance
Learning & Development
- Annual learning & development budget
Team Events
- Monthly team socials
- In-person meetups
Informal Culture
- Dog-friendly HQ
About the Company
deepset GmbH
Industry
IT
Description
The company is an AI startup that empowers developers to build applications using natural language as the interface to data.
Not a perfect match?
- IONOS SE
Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeExperiencedBerlin - Nebius
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Nomitri
Senior DevOps/MLOps(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Solactive AG
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSeniorFrankfurt am Main, Berlin - SysEleven GmbH
Senior Site Reliability Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin