Die KI-Suchmaschine für Jobs
Technical Product Manager (Cluster Experience)(m/w/x)
Beschreibung
In this role, you will focus on enhancing the reliability and performance of GPU clusters for machine learning. You will guide product direction from discovery to adoption while collaborating with engineering teams and conducting in-depth customer research.
Lass KI die perfekten Jobs für dich finden!
Lade deinen CV hoch und die Nejo-KI findet passende Stellenangebote für dich.
Anforderungen
- •3–5+ years of experience in product management, ML infrastructure/MLOps, distributed systems engineering, or cloud architecture
- •Strong technical foundation in computer science, distributed systems, or ML infrastructure
- •Hands-on familiarity with ML training, ideally using orchestrators like Slurm, Kubernetes, Ray, or similar systems
- •Proven ability to ship technically complex features with multiple engineering teams
- •Excellent communicator capable of influencing engineering, research, and customer stakeholders
- •Experience with product analytics, data-driven prioritization, and experiment design
- •Strong willingness and ability to learn quickly in a fast-evolving ML and infrastructure environment
- •Experience working with GPU platforms, Infiniband/RDMA networking, or HPC systems
- •Understanding of modern ML frameworks (PyTorch, DeepSpeed, FSDP, NCCL, etc.)
- •Knowledge of ML training efficiency: Goodput, MFU, scheduling, health checks
- •Exposure to LLM training, distributed data/ZeRO/FSDP strategies, or transformer inference
- •Experience in observability, performance tuning, or reliability engineering
- •Customer-facing technical experience (supporting ML or infrastructure workloads)
- •Strong candidates with backgrounds in ML infrastructure, distributed systems, SRE, or cloud engineering who want to grow into product
Berufserfahrung
3 - 5 Jahre
Aufgaben
- •Own key tracks in Cluster Experience
- •Define product direction from problem discovery to delivery
- •Drive cross-functional execution across various teams
- •Conduct deep customer research through interviews and analytics
- •Identify bottlenecks across hardware, network, and runtime
- •Translate ML research ideas into scalable product features
- •Shape user interactions with clusters through dashboards and notifications
Tools & Technologien
Sprachen
Englisch – verhandlungssicher
Benefits
Flexibles Arbeiten
- •Flexible working arrangements
Sonstige Vorteile
- •Comprehensive benefits package
Karriere- und Weiterentwicklung
- •Opportunities for professional growth
Lockere Unternehmenskultur
- •Dynamic and collaborative work environment
- Synthflow AIVollzeitRemoteBerufserfahrenBerlin
- DeepL
Senior Technical Product Manager - AI Systems(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Superhuman
Senior Product Manager, Engineering Platform(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Bioptimus
AI Technical Operations Manager(m/w/x)
Vollzeitmit HomeofficeBerufserfahrenBerlin - Cint
Technical Program Manager(m/w/x)
Vollzeitmit HomeofficeBerufserfahrenBerlin
Technical Product Manager (Cluster Experience)(m/w/x)
Die KI-Suchmaschine für Jobs
Beschreibung
In this role, you will focus on enhancing the reliability and performance of GPU clusters for machine learning. You will guide product direction from discovery to adoption while collaborating with engineering teams and conducting in-depth customer research.
Lass KI die perfekten Jobs für dich finden!
Lade deinen CV hoch und die Nejo-KI findet passende Stellenangebote für dich.
Anforderungen
- •3–5+ years of experience in product management, ML infrastructure/MLOps, distributed systems engineering, or cloud architecture
- •Strong technical foundation in computer science, distributed systems, or ML infrastructure
- •Hands-on familiarity with ML training, ideally using orchestrators like Slurm, Kubernetes, Ray, or similar systems
- •Proven ability to ship technically complex features with multiple engineering teams
- •Excellent communicator capable of influencing engineering, research, and customer stakeholders
- •Experience with product analytics, data-driven prioritization, and experiment design
- •Strong willingness and ability to learn quickly in a fast-evolving ML and infrastructure environment
- •Experience working with GPU platforms, Infiniband/RDMA networking, or HPC systems
- •Understanding of modern ML frameworks (PyTorch, DeepSpeed, FSDP, NCCL, etc.)
- •Knowledge of ML training efficiency: Goodput, MFU, scheduling, health checks
- •Exposure to LLM training, distributed data/ZeRO/FSDP strategies, or transformer inference
- •Experience in observability, performance tuning, or reliability engineering
- •Customer-facing technical experience (supporting ML or infrastructure workloads)
- •Strong candidates with backgrounds in ML infrastructure, distributed systems, SRE, or cloud engineering who want to grow into product
Berufserfahrung
3 - 5 Jahre
Aufgaben
- •Own key tracks in Cluster Experience
- •Define product direction from problem discovery to delivery
- •Drive cross-functional execution across various teams
- •Conduct deep customer research through interviews and analytics
- •Identify bottlenecks across hardware, network, and runtime
- •Translate ML research ideas into scalable product features
- •Shape user interactions with clusters through dashboards and notifications
Tools & Technologien
Sprachen
Englisch – verhandlungssicher
Benefits
Flexibles Arbeiten
- •Flexible working arrangements
Sonstige Vorteile
- •Comprehensive benefits package
Karriere- und Weiterentwicklung
- •Opportunities for professional growth
Lockere Unternehmenskultur
- •Dynamic and collaborative work environment
Über das Unternehmen
Nebius
Branche
IT
Beschreibung
The company is leading a new era in cloud computing to serve the global AI economy by creating tools and resources for real-world challenges.
- Synthflow AI
Technical Product Manager(m/w/x)
VollzeitRemoteBerufserfahrenBerlin - DeepL
Senior Technical Product Manager - AI Systems(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Superhuman
Senior Product Manager, Engineering Platform(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin - Bioptimus
AI Technical Operations Manager(m/w/x)
Vollzeitmit HomeofficeBerufserfahrenBerlin - Cint
Technical Program Manager(m/w/x)
Vollzeitmit HomeofficeBerufserfahrenBerlin