Die KI-Suchmaschine für Jobs
Senior Performance Engineer- Pretraining(m/w/x)
Engineering systems for large-scale foundation model training on massive GPU clusters at a generative AI solutions provider. Deep understanding of CUDA programming model and distributed systems engineering background required. Equity package and 30 days paid vacation.
Anforderungen
- Proficiency in Python and PyTorch
- Engineering background in parallel or distributed systems
- Experience with modern machine learning techniques
- Deep understanding of CUDA programming model
- Experience in distributed programming with APIs
- Experience analyzing profiling traces
- Regular on-site collaboration in Heidelberg
- Contributions to distributed training frameworks
- Familiarity with low-precision training formats
- Understanding of NCCL, NVSHMEM, or IPC
- Track record of optimizing transformer training
- Experience with NVIDIA Blackwell architecture
Aufgaben
- Engineer systems for large-scale foundation model training
- Maximize hardware utilization on massive GPU clusters
- Eliminate bottlenecks from Python to GPU kernels
- Profile training loops using PyTorch and Nsight
- Identify system- and kernel-level performance bottlenecks
- Configure and tune composite parallelism strategies
- Optimize load balance and communication-to-computation trade-offs
- Partner with researchers to design hardware-efficient architectures
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- Python
- PyTorch
- CUDA
- NCCL
- MPI
- PyTorch Profiler
- Nvidia Nsight
- TorchTitan
- Megatron-LM
- DeepSpeed
- MXFP4
- MXFP8
- NVSHMEM
- CUDA IPC
- NVIDIA Blackwell
Benefits
Flexibles Arbeiten
- Flexible working hours
- Hybrid working model
Attraktive Vergütung
- Competitive salary
- Equity package
Mehr Urlaubstage
- 30 days of paid vacation
Gesundheits- & Fitnessangebote
- Fitness and wellness offerings
Mentale Gesundheitsförderung
- Mental health support
Firmenfahrrad
- JobRad Bike Lease
Betriebliche Altersvorsorge
- Subsidized company pension plan
Öffi Tickets
- Subsidized Germany-wide transportation ticket
Moderne Technikausstattung
- Budget for technical equipment
Noch nicht perfekt?
- Aleph AlphaVollzeitmit HomeofficeSeniorHeidelberg
- SAP
Principal Machine Learning Expert/ Development Architect(m/w/x)
Vollzeitmit HomeofficeSeniorWalldorf - accredia placement GmbH
MLOps Engineer(m/w/x)
VollzeitRemoteBerufserfahrenLudwigshafen am Rhein - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Vollzeitmit HomeofficeSeniorMannheim - botario GmbH
Senior Python Engineer - Voice AI Platform(m/w/x)
Vollzeitmit HomeofficeSeniorMainz, Berlin, München, Mannheim, Bremen
Senior Performance Engineer- Pretraining(m/w/x)
Engineering systems for large-scale foundation model training on massive GPU clusters at a generative AI solutions provider. Deep understanding of CUDA programming model and distributed systems engineering background required. Equity package and 30 days paid vacation.
Anforderungen
- Proficiency in Python and PyTorch
- Engineering background in parallel or distributed systems
- Experience with modern machine learning techniques
- Deep understanding of CUDA programming model
- Experience in distributed programming with APIs
- Experience analyzing profiling traces
- Regular on-site collaboration in Heidelberg
- Contributions to distributed training frameworks
- Familiarity with low-precision training formats
- Understanding of NCCL, NVSHMEM, or IPC
- Track record of optimizing transformer training
- Experience with NVIDIA Blackwell architecture
Aufgaben
- Engineer systems for large-scale foundation model training
- Maximize hardware utilization on massive GPU clusters
- Eliminate bottlenecks from Python to GPU kernels
- Profile training loops using PyTorch and Nsight
- Identify system- and kernel-level performance bottlenecks
- Configure and tune composite parallelism strategies
- Optimize load balance and communication-to-computation trade-offs
- Partner with researchers to design hardware-efficient architectures
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- Python
- PyTorch
- CUDA
- NCCL
- MPI
- PyTorch Profiler
- Nvidia Nsight
- TorchTitan
- Megatron-LM
- DeepSpeed
- MXFP4
- MXFP8
- NVSHMEM
- CUDA IPC
- NVIDIA Blackwell
Benefits
Flexibles Arbeiten
- Flexible working hours
- Hybrid working model
Attraktive Vergütung
- Competitive salary
- Equity package
Mehr Urlaubstage
- 30 days of paid vacation
Gesundheits- & Fitnessangebote
- Fitness and wellness offerings
Mentale Gesundheitsförderung
- Mental health support
Firmenfahrrad
- JobRad Bike Lease
Betriebliche Altersvorsorge
- Subsidized company pension plan
Öffi Tickets
- Subsidized Germany-wide transportation ticket
Moderne Technikausstattung
- Budget for technical equipment
Über das Unternehmen
Aleph Alpha
Branche
Research
Beschreibung
The company develops cutting-edge generative AI solutions with a strong emphasis on sovereignty, ethical development, and societal benefit.
Noch nicht perfekt?
- Aleph Alpha
Senior AI Researcher- Reinforcement learning(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - SAP
Principal Machine Learning Expert/ Development Architect(m/w/x)
Vollzeitmit HomeofficeSeniorWalldorf - accredia placement GmbH
MLOps Engineer(m/w/x)
VollzeitRemoteBerufserfahrenLudwigshafen am Rhein - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Vollzeitmit HomeofficeSeniorMannheim - botario GmbH
Senior Python Engineer - Voice AI Platform(m/w/x)
Vollzeitmit HomeofficeSeniorMainz, Berlin, München, Mannheim, Bremen