Skip to content
New Job?Nejo!

The AI Job Search Engine

ALAleph Alpha

Senior Performance Engineer- Pretraining(m/w/x)

Heidelberg
Full-timeWith Home OfficeSenior
AI/ML
Data Science

Engineering systems for large-scale foundation model training on massive GPU clusters at a generative AI solutions provider. Deep understanding of CUDA programming model and distributed systems engineering background required. Equity package and 30 days paid vacation.

Requirements

  • Proficiency in Python and PyTorch
  • Engineering background in parallel or distributed systems
  • Experience with modern machine learning techniques
  • Deep understanding of CUDA programming model
  • Experience in distributed programming with APIs
  • Experience analyzing profiling traces
  • Regular on-site collaboration in Heidelberg
  • Contributions to distributed training frameworks
  • Familiarity with low-precision training formats
  • Understanding of NCCL, NVSHMEM, or IPC
  • Track record of optimizing transformer training
  • Experience with NVIDIA Blackwell architecture

Tasks

  • Engineer systems for large-scale foundation model training
  • Maximize hardware utilization on massive GPU clusters
  • Eliminate bottlenecks from Python to GPU kernels
  • Profile training loops using PyTorch and Nsight
  • Identify system- and kernel-level performance bottlenecks
  • Configure and tune composite parallelism strategies
  • Optimize load balance and communication-to-computation trade-offs
  • Partner with researchers to design hardware-efficient architectures

Work Experience

  • approx. 4 - 6 years

Education

  • Bachelor's degreeOR
  • Master's degree

Languages

  • EnglishBusiness Fluent

Tools & Technologies

  • Python
  • PyTorch
  • CUDA
  • NCCL
  • MPI
  • PyTorch Profiler
  • Nvidia Nsight
  • TorchTitan
  • Megatron-LM
  • DeepSpeed
  • MXFP4
  • MXFP8
  • NVSHMEM
  • CUDA IPC
  • NVIDIA Blackwell

Benefits

Flexible Working

  • Flexible working hours
  • Hybrid working model

Competitive Pay

  • Competitive salary
  • Equity package

More Vacation Days

  • 30 days of paid vacation

Healthcare & Fitness

  • Fitness and wellness offerings

Mental Health Support

  • Mental health support

Company Bike

  • JobRad Bike Lease

Retirement Plans

  • Subsidized company pension plan

Public Transport Subsidies

  • Subsidized Germany-wide transportation ticket

Modern Equipment

  • Budget for technical equipment
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Aleph Alpha and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

  • Aleph Alpha

    Senior AI Researcher- Reinforcement learning(m/w/x)

    Full-timeWith HomeofficeSenior
    Heidelberg
  • SAP

    Principal Machine Learning Expert/ Development Architect(m/w/x)

    Full-timeWith HomeofficeSenior
    Walldorf
  • accredia placement GmbH

    MLOps Engineer(m/w/x)

    Full-timeRemoteExperienced
    Ludwigshafen am Rhein
  • Buhl Data Service GmbH

    Senior AI / Data Science Engineer(m/w/x)

    Full-timeWith HomeofficeSenior
    Mannheim
  • botario GmbH

    Senior Python Engineer - Voice AI Platform(m/w/x)

    Full-timeWith HomeofficeSenior
    Mainz, Berlin, München, Mannheim, Bremen
View all 100+ similar jobs

Nejo is an AI – results may be incomplete or contain mistakes