The AI Job Search Engine
Technical Product Manager (Cluster Experience)(m/w/x)
Description
In this role, you will focus on enhancing the reliability and performance of GPU clusters for machine learning. You will guide product direction from discovery to adoption while collaborating with engineering teams and conducting in-depth customer research.
Let AI find the perfect jobs for you!
Upload your CV and Nejo AI will find matching job offers for you.
Requirements
- •3–5+ years of experience in product management, ML infrastructure/MLOps, distributed systems engineering, or cloud architecture
- •Strong technical foundation in computer science, distributed systems, or ML infrastructure
- •Hands-on familiarity with ML training, ideally using orchestrators like Slurm, Kubernetes, Ray, or similar systems
- •Proven ability to ship technically complex features with multiple engineering teams
- •Excellent communicator capable of influencing engineering, research, and customer stakeholders
- •Experience with product analytics, data-driven prioritization, and experiment design
- •Strong willingness and ability to learn quickly in a fast-evolving ML and infrastructure environment
- •Experience working with GPU platforms, Infiniband/RDMA networking, or HPC systems
- •Understanding of modern ML frameworks (PyTorch, DeepSpeed, FSDP, NCCL, etc.)
- •Knowledge of ML training efficiency: Goodput, MFU, scheduling, health checks
- •Exposure to LLM training, distributed data/ZeRO/FSDP strategies, or transformer inference
- •Experience in observability, performance tuning, or reliability engineering
- •Customer-facing technical experience (supporting ML or infrastructure workloads)
- •Strong candidates with backgrounds in ML infrastructure, distributed systems, SRE, or cloud engineering who want to grow into product
Work Experience
3 - 5 years
Tasks
- •Own key tracks in Cluster Experience
- •Define product direction from problem discovery to delivery
- •Drive cross-functional execution across various teams
- •Conduct deep customer research through interviews and analytics
- •Identify bottlenecks across hardware, network, and runtime
- •Translate ML research ideas into scalable product features
- •Shape user interactions with clusters through dashboards and notifications
Tools & Technologies
Languages
English – Business Fluent
Benefits
Flexible Working
- •Flexible working arrangements
Other Benefits
- •Comprehensive benefits package
Career Advancement
- •Opportunities for professional growth
Informal Culture
- •Dynamic and collaborative work environment
- Synthflow AIFull-timeRemoteExperiencedBerlin
- DeepL
Senior Technical Product Manager - AI Systems(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Superhuman
Senior Product Manager, Engineering Platform(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Bioptimus
AI Technical Operations Manager(m/w/x)
Full-timeWith HomeofficeExperiencedBerlin - Cint
Technical Program Manager(m/w/x)
Full-timeWith HomeofficeExperiencedBerlin
Technical Product Manager (Cluster Experience)(m/w/x)
The AI Job Search Engine
Description
In this role, you will focus on enhancing the reliability and performance of GPU clusters for machine learning. You will guide product direction from discovery to adoption while collaborating with engineering teams and conducting in-depth customer research.
Let AI find the perfect jobs for you!
Upload your CV and Nejo AI will find matching job offers for you.
Requirements
- •3–5+ years of experience in product management, ML infrastructure/MLOps, distributed systems engineering, or cloud architecture
- •Strong technical foundation in computer science, distributed systems, or ML infrastructure
- •Hands-on familiarity with ML training, ideally using orchestrators like Slurm, Kubernetes, Ray, or similar systems
- •Proven ability to ship technically complex features with multiple engineering teams
- •Excellent communicator capable of influencing engineering, research, and customer stakeholders
- •Experience with product analytics, data-driven prioritization, and experiment design
- •Strong willingness and ability to learn quickly in a fast-evolving ML and infrastructure environment
- •Experience working with GPU platforms, Infiniband/RDMA networking, or HPC systems
- •Understanding of modern ML frameworks (PyTorch, DeepSpeed, FSDP, NCCL, etc.)
- •Knowledge of ML training efficiency: Goodput, MFU, scheduling, health checks
- •Exposure to LLM training, distributed data/ZeRO/FSDP strategies, or transformer inference
- •Experience in observability, performance tuning, or reliability engineering
- •Customer-facing technical experience (supporting ML or infrastructure workloads)
- •Strong candidates with backgrounds in ML infrastructure, distributed systems, SRE, or cloud engineering who want to grow into product
Work Experience
3 - 5 years
Tasks
- •Own key tracks in Cluster Experience
- •Define product direction from problem discovery to delivery
- •Drive cross-functional execution across various teams
- •Conduct deep customer research through interviews and analytics
- •Identify bottlenecks across hardware, network, and runtime
- •Translate ML research ideas into scalable product features
- •Shape user interactions with clusters through dashboards and notifications
Tools & Technologies
Languages
English – Business Fluent
Benefits
Flexible Working
- •Flexible working arrangements
Other Benefits
- •Comprehensive benefits package
Career Advancement
- •Opportunities for professional growth
Informal Culture
- •Dynamic and collaborative work environment
About the Company
Nebius
Industry
IT
Description
The company is leading a new era in cloud computing to serve the global AI economy by creating tools and resources for real-world challenges.
- Synthflow AI
Technical Product Manager(m/w/x)
Full-timeRemoteExperiencedBerlin - DeepL
Senior Technical Product Manager - AI Systems(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Superhuman
Senior Product Manager, Engineering Platform(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Bioptimus
AI Technical Operations Manager(m/w/x)
Full-timeWith HomeofficeExperiencedBerlin - Cint
Technical Program Manager(m/w/x)
Full-timeWith HomeofficeExperiencedBerlin