Your personal AI career agent
Senior ML Engineer - Token Factory(m/w/x)
Optimizing LLM inference performance, implementing novel speculative decoding architectures at cloud computing provider for global AI. Experience profiling GPU workloads and GPU memory hierarchy understanding required. Flexible working arrangements.
Requirements
- Understanding of machine learning foundations
- Experience profiling GPU workloads
- Understanding of GPU memory hierarchy
- Familiarity with LLM architectures
- Understanding of neural network training
- Strong software engineering skills
- Experience with deep learning frameworks
- Proficiency in CI/CD and versioning
- Strong communication and leadership abilities
- Experience with open-source inference engines
- Experience with kernel languages
- Track record of delivering products
- Experience developing large distributed systems
- Open-source projects showcasing engineering prowess
- Excellent command of English language
Tasks
- Identify LLM inference bottlenecks
- Drive production speedups
- Maximize performance for LLM architectures
- Support and optimize inference engines
- Implement novel speculative decoding architectures
- Optimize dense and MoE components
- Contribute to open-source inference engines
- Design low-precision training pipelines
- Productionize FP8 and NVFP4 inference
- Improve throughput and cost-efficiency
Work Experience
- approx. 4 - 6 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Nsight
- PyTorch profiler
- Python
- CI/CD
- vLLM
- SGLang
- TensorRT-LLM
- Triton
- Cute
- CUTLASS
- CUDA
Benefits
Flexible Working
- Flexible working arrangements
Competitive Pay
- Competitive salary
Other Benefits
- Comprehensive benefits package
Career Advancement
- Professional growth opportunities
Informal Culture
- Dynamic and collaborative work environment
Not a perfect match?
- NebiusFull-timeWith HomeofficeSeniorBerlin
- FactoryPal
Senior Machine Learning Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - RepRisk AG
Senior Machine Learning Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - AUTO1 Group
Senior Machine Learning Platform/Ops Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Super.AI
Machine Learning Engineer(m/w/x)
Full-timeWith HomeofficeExperiencedBerlin
Senior ML Engineer - Token Factory(m/w/x)
Optimizing LLM inference performance, implementing novel speculative decoding architectures at cloud computing provider for global AI. Experience profiling GPU workloads and GPU memory hierarchy understanding required. Flexible working arrangements.
Requirements
- Understanding of machine learning foundations
- Experience profiling GPU workloads
- Understanding of GPU memory hierarchy
- Familiarity with LLM architectures
- Understanding of neural network training
- Strong software engineering skills
- Experience with deep learning frameworks
- Proficiency in CI/CD and versioning
- Strong communication and leadership abilities
- Experience with open-source inference engines
- Experience with kernel languages
- Track record of delivering products
- Experience developing large distributed systems
- Open-source projects showcasing engineering prowess
- Excellent command of English language
Tasks
- Identify LLM inference bottlenecks
- Drive production speedups
- Maximize performance for LLM architectures
- Support and optimize inference engines
- Implement novel speculative decoding architectures
- Optimize dense and MoE components
- Contribute to open-source inference engines
- Design low-precision training pipelines
- Productionize FP8 and NVFP4 inference
- Improve throughput and cost-efficiency
Work Experience
- approx. 4 - 6 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Nsight
- PyTorch profiler
- Python
- CI/CD
- vLLM
- SGLang
- TensorRT-LLM
- Triton
- Cute
- CUTLASS
- CUDA
Benefits
Flexible Working
- Flexible working arrangements
Competitive Pay
- Competitive salary
Other Benefits
- Comprehensive benefits package
Career Advancement
- Professional growth opportunities
Informal Culture
- Dynamic and collaborative work environment
About the Company
Nebius
Industry
IT
Description
The company is leading a new era in cloud computing to serve the global AI economy by creating tools and resources for real-world challenges.
Not a perfect match?
- Nebius
Senior Backend Developer (Token Factory)(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - FactoryPal
Senior Machine Learning Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - RepRisk AG
Senior Machine Learning Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - AUTO1 Group
Senior Machine Learning Platform/Ops Engineer(m/w/x)
Full-timeWith HomeofficeSeniorBerlin - Super.AI
Machine Learning Engineer(m/w/x)
Full-timeWith HomeofficeExperiencedBerlin