Die KI-Suchmaschine für Jobs
Member of Technical Staff - Training Cluster Engineer(m/w/x)
Beschreibung
You design and maintain ML training clusters, ensuring their performance and security. By collaborating with research teams, you translate their computational needs into effective infrastructure solutions.
Lass KI die perfekten Jobs für dich finden!
Lade deinen CV hoch und die Nejo-KI findet passende Stellenangebote für dich.
Anforderungen
- •Production experience managing SLURM clusters
- •Hands-on experience with Docker or similar container runtimes
- •Proven track record managing GPU clusters
- •Understanding of distributed training patterns
- •Experience with Kubernetes for containerized workloads
- •Experience with high-performance interconnects
- •Track record of managing 1000+ GPU training runs
- •Familiarity with high-performance storage solutions
- •Experience running hybrid training/inference infrastructure
- •Strong scripting skills in Python and Bash
Berufserfahrung
ca. 1 - 4 Jahre
Aufgaben
- •Design and maintain large-scale ML training clusters
- •Deploy SLURM for distributed workload orchestration
- •Implement node health monitoring systems
- •Automate failure detection and recovery workflows
- •Ensure cluster availability with cloud providers
- •Monitor performance with colocation partners
- •Establish security best practices for ML infrastructure
- •Build developer-facing tools and APIs for ML workflows
- •Collaborate with ML research teams on infrastructure needs
Tools & Technologien
Sprachen
Englisch – verhandlungssicher
- Black Forest LabsVollzeitnur vor OrtSeniorFreiburg im Breisgau
- Prior Labs
ML Engineer, Cloud Platform(m/w/x)
Vollzeitnur vor OrtBerufserfahrenab 140.000 / JahrBerlin, Freiburg im Breisgau - Prior Labs
MLOps / ML Systems Engineer(m/w/x)
Vollzeitnur vor OrtSeniorBerlin, Freiburg im Breisgau - Black Forest Labs
Member of Technical Staff - Data Engineering(m/w/x)
Vollzeitnur vor OrtBerufserfahrenFreiburg im Breisgau - Black Forest Labs
Member of Technical Staff - Image / Video Applications(m/w/x)
Vollzeitnur vor OrtKeine AngabeFreiburg im Breisgau
Member of Technical Staff - Training Cluster Engineer(m/w/x)
Die KI-Suchmaschine für Jobs
Beschreibung
You design and maintain ML training clusters, ensuring their performance and security. By collaborating with research teams, you translate their computational needs into effective infrastructure solutions.
Lass KI die perfekten Jobs für dich finden!
Lade deinen CV hoch und die Nejo-KI findet passende Stellenangebote für dich.
Anforderungen
- •Production experience managing SLURM clusters
- •Hands-on experience with Docker or similar container runtimes
- •Proven track record managing GPU clusters
- •Understanding of distributed training patterns
- •Experience with Kubernetes for containerized workloads
- •Experience with high-performance interconnects
- •Track record of managing 1000+ GPU training runs
- •Familiarity with high-performance storage solutions
- •Experience running hybrid training/inference infrastructure
- •Strong scripting skills in Python and Bash
Berufserfahrung
ca. 1 - 4 Jahre
Aufgaben
- •Design and maintain large-scale ML training clusters
- •Deploy SLURM for distributed workload orchestration
- •Implement node health monitoring systems
- •Automate failure detection and recovery workflows
- •Ensure cluster availability with cloud providers
- •Monitor performance with colocation partners
- •Establish security best practices for ML infrastructure
- •Build developer-facing tools and APIs for ML workflows
- •Collaborate with ML research teams on infrastructure needs
Tools & Technologien
Sprachen
Englisch – verhandlungssicher
Über das Unternehmen
Black Forest Labs
Branche
IT
Beschreibung
Black Forest Labs is a cutting-edge startup pioneering generative image and video models. The company focuses on innovation and developing advanced ML infrastructure.
- Black Forest Labs
Member of Technical Staff - Large scale data infrastructure(m/w/x)
Vollzeitnur vor OrtSeniorFreiburg im Breisgau - Prior Labs
ML Engineer, Cloud Platform(m/w/x)
Vollzeitnur vor OrtBerufserfahrenab 140.000 / JahrBerlin, Freiburg im Breisgau - Prior Labs
MLOps / ML Systems Engineer(m/w/x)
Vollzeitnur vor OrtSeniorBerlin, Freiburg im Breisgau - Black Forest Labs
Member of Technical Staff - Data Engineering(m/w/x)
Vollzeitnur vor OrtBerufserfahrenFreiburg im Breisgau - Black Forest Labs
Member of Technical Staff - Image / Video Applications(m/w/x)
Vollzeitnur vor OrtKeine AngabeFreiburg im Breisgau