The AI Job Search Engine
Senior Performance Engineer- Pretraining(m/w/x)
Engineering systems for large-scale foundation model training on massive GPU clusters at a generative AI solutions provider. Deep understanding of CUDA programming model and distributed systems engineering background required. Equity package and 30 days paid vacation.
Requirements
- Proficiency in Python and PyTorch
- Engineering background in parallel or distributed systems
- Experience with modern machine learning techniques
- Deep understanding of CUDA programming model
- Experience in distributed programming with APIs
- Experience analyzing profiling traces
- Regular on-site collaboration in Heidelberg
- Contributions to distributed training frameworks
- Familiarity with low-precision training formats
- Understanding of NCCL, NVSHMEM, or IPC
- Track record of optimizing transformer training
- Experience with NVIDIA Blackwell architecture
Tasks
- Engineer systems for large-scale foundation model training
- Maximize hardware utilization on massive GPU clusters
- Eliminate bottlenecks from Python to GPU kernels
- Profile training loops using PyTorch and Nsight
- Identify system- and kernel-level performance bottlenecks
- Configure and tune composite parallelism strategies
- Optimize load balance and communication-to-computation trade-offs
- Partner with researchers to design hardware-efficient architectures
Work Experience
- approx. 4 - 6 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- PyTorch
- CUDA
- NCCL
- MPI
- PyTorch Profiler
- Nvidia Nsight
- TorchTitan
- Megatron-LM
- DeepSpeed
- MXFP4
- MXFP8
- NVSHMEM
- CUDA IPC
- NVIDIA Blackwell
Benefits
Flexible Working
- Flexible working hours
- Hybrid working model
Competitive Pay
- Competitive salary
- Equity package
More Vacation Days
- 30 days of paid vacation
Healthcare & Fitness
- Fitness and wellness offerings
Mental Health Support
- Mental health support
Company Bike
- JobRad Bike Lease
Retirement Plans
- Subsidized company pension plan
Public Transport Subsidies
- Subsidized Germany-wide transportation ticket
Modern Equipment
- Budget for technical equipment
Not a perfect match?
- Aleph AlphaFull-timeWith HomeofficeSeniorHeidelberg
- SAP
Principal Machine Learning Expert/ Development Architect(m/w/x)
Full-timeWith HomeofficeSeniorWalldorf - accredia placement GmbH
MLOps Engineer(m/w/x)
Full-timeRemoteExperiencedLudwigshafen am Rhein - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Full-timeWith HomeofficeSeniorMannheim - botario GmbH
Senior Python Engineer - Voice AI Platform(m/w/x)
Full-timeWith HomeofficeSeniorMainz, Berlin, München, Mannheim, Bremen
Senior Performance Engineer- Pretraining(m/w/x)
Engineering systems for large-scale foundation model training on massive GPU clusters at a generative AI solutions provider. Deep understanding of CUDA programming model and distributed systems engineering background required. Equity package and 30 days paid vacation.
Requirements
- Proficiency in Python and PyTorch
- Engineering background in parallel or distributed systems
- Experience with modern machine learning techniques
- Deep understanding of CUDA programming model
- Experience in distributed programming with APIs
- Experience analyzing profiling traces
- Regular on-site collaboration in Heidelberg
- Contributions to distributed training frameworks
- Familiarity with low-precision training formats
- Understanding of NCCL, NVSHMEM, or IPC
- Track record of optimizing transformer training
- Experience with NVIDIA Blackwell architecture
Tasks
- Engineer systems for large-scale foundation model training
- Maximize hardware utilization on massive GPU clusters
- Eliminate bottlenecks from Python to GPU kernels
- Profile training loops using PyTorch and Nsight
- Identify system- and kernel-level performance bottlenecks
- Configure and tune composite parallelism strategies
- Optimize load balance and communication-to-computation trade-offs
- Partner with researchers to design hardware-efficient architectures
Work Experience
- approx. 4 - 6 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- PyTorch
- CUDA
- NCCL
- MPI
- PyTorch Profiler
- Nvidia Nsight
- TorchTitan
- Megatron-LM
- DeepSpeed
- MXFP4
- MXFP8
- NVSHMEM
- CUDA IPC
- NVIDIA Blackwell
Benefits
Flexible Working
- Flexible working hours
- Hybrid working model
Competitive Pay
- Competitive salary
- Equity package
More Vacation Days
- 30 days of paid vacation
Healthcare & Fitness
- Fitness and wellness offerings
Mental Health Support
- Mental health support
Company Bike
- JobRad Bike Lease
Retirement Plans
- Subsidized company pension plan
Public Transport Subsidies
- Subsidized Germany-wide transportation ticket
Modern Equipment
- Budget for technical equipment
About the Company
Aleph Alpha
Industry
Research
Description
The company develops cutting-edge generative AI solutions with a strong emphasis on sovereignty, ethical development, and societal benefit.
Not a perfect match?
- Aleph Alpha
Senior AI Researcher- Reinforcement learning(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - SAP
Principal Machine Learning Expert/ Development Architect(m/w/x)
Full-timeWith HomeofficeSeniorWalldorf - accredia placement GmbH
MLOps Engineer(m/w/x)
Full-timeRemoteExperiencedLudwigshafen am Rhein - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Full-timeWith HomeofficeSeniorMannheim - botario GmbH
Senior Python Engineer - Voice AI Platform(m/w/x)
Full-timeWith HomeofficeSeniorMainz, Berlin, München, Mannheim, Bremen