Neuer Job?Nejo!

Die KI-Suchmaschine für Jobs

NE
Nebius
letzten Monat

Technical Product Manager (Cluster Experience)(m/w/x)

Berlin
Vollzeitmit HomeofficeBerufserfahren
AI/ML
Data Science

Beschreibung

In this role, you will focus on enhancing the reliability and performance of GPU clusters for machine learning. You will guide product direction from discovery to adoption while collaborating with engineering teams and conducting in-depth customer research.

Lass KI die perfekten Jobs für dich finden!

Lade deinen CV hoch und die Nejo-KI findet passende Stellenangebote für dich.

Anforderungen

  • 3–5+ years of experience in product management, ML infrastructure/MLOps, distributed systems engineering, or cloud architecture
  • Strong technical foundation in computer science, distributed systems, or ML infrastructure
  • Hands-on familiarity with ML training, ideally using orchestrators like Slurm, Kubernetes, Ray, or similar systems
  • Proven ability to ship technically complex features with multiple engineering teams
  • Excellent communicator capable of influencing engineering, research, and customer stakeholders
  • Experience with product analytics, data-driven prioritization, and experiment design
  • Strong willingness and ability to learn quickly in a fast-evolving ML and infrastructure environment
  • Experience working with GPU platforms, Infiniband/RDMA networking, or HPC systems
  • Understanding of modern ML frameworks (PyTorch, DeepSpeed, FSDP, NCCL, etc.)
  • Knowledge of ML training efficiency: Goodput, MFU, scheduling, health checks
  • Exposure to LLM training, distributed data/ZeRO/FSDP strategies, or transformer inference
  • Experience in observability, performance tuning, or reliability engineering
  • Customer-facing technical experience (supporting ML or infrastructure workloads)
  • Strong candidates with backgrounds in ML infrastructure, distributed systems, SRE, or cloud engineering who want to grow into product

Berufserfahrung

3 - 5 Jahre

Aufgaben

  • Own key tracks in Cluster Experience
  • Define product direction from problem discovery to delivery
  • Drive cross-functional execution across various teams
  • Conduct deep customer research through interviews and analytics
  • Identify bottlenecks across hardware, network, and runtime
  • Translate ML research ideas into scalable product features
  • Shape user interactions with clusters through dashboards and notifications

Tools & Technologien

SlurmKubernetesRayPyTorchDeepSpeedFSDPNCCL

Sprachen

Englischverhandlungssicher

Benefits

Flexibles Arbeiten

  • Flexible working arrangements

Sonstige Vorteile

  • Comprehensive benefits package

Karriere- und Weiterentwicklung

  • Opportunities for professional growth

Lockere Unternehmenskultur

  • Dynamic and collaborative work environment
Die Originalanzeige dieses Stellenangebotes in der aktuellsten Version findest du hier. Nejo hat diesen Job automatisch von der Website des Unternehmens Nebius erfasst und die Informationen auf Nejo mit Hilfe von KI für dich aufbereitet. Trotz sorgfältiger Analyse können einzelne Informationen unvollständig oder ungenau sein. Bitte prüfe immer alle Angaben in der Originalanzeige! Inhalte und Urheberrechte der Originalanzeige liegen beim ausschreibenden Unternehmen.
Noch nicht perfekt?
100+ Ähnliche Jobs in Berlin
  • Synthflow AI

    Technical Product Manager(m/w/x)

    VollzeitRemoteBerufserfahren
    Berlin
  • DeepL

    Senior Technical Product Manager - AI Systems(m/w/x)

    Vollzeitmit HomeofficeSenior
    Berlin
  • Superhuman

    Senior Product Manager, Engineering Platform(m/w/x)

    Vollzeitmit HomeofficeSenior
    Berlin
  • Bioptimus

    AI Technical Operations Manager(m/w/x)

    Vollzeitmit HomeofficeBerufserfahren
    Berlin
  • Cint

    Technical Program Manager(m/w/x)

    Vollzeitmit HomeofficeBerufserfahren
    Berlin
100+ Alle ähnlichen Jobs ansehen