Skip to content
New Job?Nejo!

Your personal AI career agent

BLBlack Forest Labs

Member of Technical Staff - ML Infrastructure Engineer(m/w/x)

Freiburg im Breisgau
from USD 180,000 - 300,000 / year
Full-timeWith Home OfficeSenior
AI/ML

Designing and deploying ML infrastructure for generative AI models, supporting multi-week training runs and production inference. Experience building and managing ML infrastructure at scale required. Travel costs covered, with on-call for failed training runs.

Requirements

  • Built and managed ML infrastructure at scale
  • Understanding of supporting AI research infrastructure
  • Experience being paged for failed training runs
  • Debugging storage bottlenecks
  • Infrastructure for long-term experiments
  • Strong proficiency in cloud platforms (AWS, Azure, GCP)
  • Focus on ML/AI services
  • Extensive Kubernetes experience
  • Extensive Slurm cluster management experience
  • Expertise in Infrastructure as Code tools
  • Discipline to use IaC tools
  • Managing network-based cloud file systems for ML
  • Optimizing network-based cloud file systems for ML
  • Managing object storage for ML workloads
  • Optimizing object storage for ML workloads
  • Experience with CI/CD tools in ML contexts
  • Experience with CI/CD practices in ML contexts
  • Strong understanding of cloud security principles
  • Strong understanding of cloud security best practices
  • Experience with monitoring tools
  • Experience with observability tools
  • Familiarity with ML workflows
  • Familiarity with GPU infrastructure management
  • Understanding researcher needs for ML infrastructure
  • Handling complex migrations in production
  • Handling breaking changes in production
  • Experience building custom autoscaling solutions for ML
  • Knowledge of cost optimization strategies for ML infrastructure
  • Familiarity with MLOps practices
  • Familiarity with MLOps tools
  • Experience with high-performance computing (HPC)
  • Understanding data versioning for ML
  • Understanding experiment tracking for ML
  • Knowledge of network optimization for distributed ML
  • Experience with multi-cloud architectures
  • Experience with hybrid cloud architectures
  • Familiarity with container security
  • Familiarity with vulnerability scanning tools

Tasks

  • Design, deploy, and maintain ML infrastructure
  • Support multi-week training runs and production inference
  • Implement cloud-based ML training and inference clusters
  • Manage network-based cloud file systems and blob/S3 storage
  • Develop and maintain Infrastructure as Code (IaC)
  • Optimize CI/CD pipelines for ML workflows
  • Design custom autoscaling solutions for ML workloads
  • Ensure security best practices in ML infrastructure
  • Provide developer-friendly ML operations tools

Work Experience

  • approx. 4 - 6 years

Education

  • Bachelor's degreeOR
  • Master's degree

Languages

  • EnglishBusiness Fluent

Tools & Technologies

  • AWS
  • Azure
  • GCP
  • Kubernetes
  • Slurm
  • Terraform
  • Ansible
  • CircleCI
  • GitHub Actions
  • ArgoCD
  • Prometheus
  • Grafana
  • Loki

Benefits

Additional Allowances

  • Reasonable travel costs covered
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Black Forest Labs and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

Like this job?

Beta

Your Career Agent finds similar jobs for you every day.


  • Black Forest Labs

    Member of Technical Staff - Pretraining(m/w/x)

    Full-timeWith HomeofficeSenior
    Freiburg im Breisgau
  • Black Forest Labs

    Member of Technical Staff - VLM(m/w/x)

    Full-timeWith HomeofficeSenior
    Freiburg im Breisgau
  • Black Forest Labs

    Member of Technical Staff - Image / Video Researcher(m/w/x)

    Full-timeWith HomeofficeExperienced
    Freiburg im Breisgau
  • Haufe Group

    AI Automation Engineer(m/w/x)

    Full-timeWith HomeofficeExperienced
    Freiburg im Breisgau
  • Haufe Group

    Senior Data Analytics Engineer(m/w/x)

    Full-timeWith HomeofficeSenior
    Freiburg im Breisgau
View all 100+ similar jobs

Nejo is an AI – results may be incomplete or contain mistakes