New Job?Nejo!

The AI Job Search Engine

NE
Nebius
last mo.

Technical Product Manager (Cluster Experience)(m/w/x)

Berlin
Full-timeWith Home OfficeExperienced
AI/ML
Data Science

Description

In this role, you will focus on enhancing the reliability and performance of GPU clusters for machine learning. You will guide product direction from discovery to adoption while collaborating with engineering teams and conducting in-depth customer research.

Let AI find the perfect jobs for you!

Upload your CV and Nejo AI will find matching job offers for you.

Requirements

  • 3–5+ years of experience in product management, ML infrastructure/MLOps, distributed systems engineering, or cloud architecture
  • Strong technical foundation in computer science, distributed systems, or ML infrastructure
  • Hands-on familiarity with ML training, ideally using orchestrators like Slurm, Kubernetes, Ray, or similar systems
  • Proven ability to ship technically complex features with multiple engineering teams
  • Excellent communicator capable of influencing engineering, research, and customer stakeholders
  • Experience with product analytics, data-driven prioritization, and experiment design
  • Strong willingness and ability to learn quickly in a fast-evolving ML and infrastructure environment
  • Experience working with GPU platforms, Infiniband/RDMA networking, or HPC systems
  • Understanding of modern ML frameworks (PyTorch, DeepSpeed, FSDP, NCCL, etc.)
  • Knowledge of ML training efficiency: Goodput, MFU, scheduling, health checks
  • Exposure to LLM training, distributed data/ZeRO/FSDP strategies, or transformer inference
  • Experience in observability, performance tuning, or reliability engineering
  • Customer-facing technical experience (supporting ML or infrastructure workloads)
  • Strong candidates with backgrounds in ML infrastructure, distributed systems, SRE, or cloud engineering who want to grow into product

Work Experience

3 - 5 years

Tasks

  • Own key tracks in Cluster Experience
  • Define product direction from problem discovery to delivery
  • Drive cross-functional execution across various teams
  • Conduct deep customer research through interviews and analytics
  • Identify bottlenecks across hardware, network, and runtime
  • Translate ML research ideas into scalable product features
  • Shape user interactions with clusters through dashboards and notifications

Tools & Technologies

SlurmKubernetesRayPyTorchDeepSpeedFSDPNCCL

Languages

EnglishBusiness Fluent

Benefits

Flexible Working

  • Flexible working arrangements

Other Benefits

  • Comprehensive benefits package

Career Advancement

  • Opportunities for professional growth

Informal Culture

  • Dynamic and collaborative work environment
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Nebius and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.
Not a perfect match?
100+ Similar Jobs in Berlin
  • Synthflow AI

    Technical Product Manager(m/w/x)

    Full-timeRemoteExperienced
    Berlin
  • DeepL

    Senior Technical Product Manager - AI Systems(m/w/x)

    Full-timeWith HomeofficeSenior
    Berlin
  • Superhuman

    Senior Product Manager, Engineering Platform(m/w/x)

    Full-timeWith HomeofficeSenior
    Berlin
  • Bioptimus

    AI Technical Operations Manager(m/w/x)

    Full-timeWith HomeofficeExperienced
    Berlin
  • Cint

    Technical Program Manager(m/w/x)

    Full-timeWith HomeofficeExperienced
    Berlin
100+ View all similar jobs