Your personal AI career agent
Senior AI Software Engineer - Model Evaluation(m/w/x)
Designing and implementing evaluation methodologies for foundational AI models in industrial applications. LLM evaluation, benchmark design, and Python skills required. 30 days vacation, hybrid work, and wellness offerings.
Requirements
- Experience with LLM evaluation, benchmark design, dataset curation, and experimental design
- Familiarity with statistical methods for evaluation and experiment design
- Track record of shipping impactful technical work (research, infrastructure, or both)
- Strong Python skills and comfort with ML tooling
- Ability to reason about evaluation measures and their relevance
- Ownership mentality: problem diagnosis to solution deployment
- Willingness to relocate to Heidelberg or travel regularly
- Understanding of foundation model training (data, scale, architecture effects)
- Experience with large-scale data processing or ML infrastructure
- German language proficiency (helpful for evaluating German capabilities, not required)
- PhD in machine learning, NLP, statistics, or related field (valued but not required)
Tasks
- Design and implement evaluation methodologies
- Select and maintain evaluation datasets
- Develop scoring infrastructure for pre-training
- Optimize evaluation pipelines for speed and reliability
- Build tools for benchmark result interpretation
- Identify and address model capability gaps
- Create or integrate new benchmarks
- Ensure rigorous German language evaluation
- Correlate pre-training metrics with performance outcomes
Work Experience
- approx. 4 - 6 years
Education
- Doctoral / PhD
Languages
- German – Basic
Tools & Technologies
- Python
- PyTorch
- LLM evaluation
- ML tooling
- evaluation frameworks
- distributed systems
- foundation model training
- large-scale data processing
- ML infrastructure
Benefits
Flexible Working
- Flexible working hours
- Hybrid working model
More Vacation Days
- 30 days of paid vacation
Healthcare & Fitness
- Access to fitness & wellness offerings
Mental Health Support
- Mental health support
Retirement Plans
- Subsidized company pension plan
Public Transport Subsidies
- Subsidized Germany-wide transportation ticket
Additional Allowances
- Budget for additional technical equipment
Competitive Pay
- Virtual Stock Option Plan
Company Bike
- Bike Lease
Not a perfect match?
- Aleph AlphaFull-timeWith HomeofficeSeniorHeidelberg
- Aleph Alpha
Senior AI Engineer – Pre-training Data(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - Aleph Alpha
Senior Performance Engineer- Pretraining(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Full-timeWith HomeofficeSeniorMannheim - Aleph Alpha
Senior AI Researcher- Reinforcement learning(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg
Senior AI Software Engineer - Model Evaluation(m/w/x)
Designing and implementing evaluation methodologies for foundational AI models in industrial applications. LLM evaluation, benchmark design, and Python skills required. 30 days vacation, hybrid work, and wellness offerings.
Requirements
- Experience with LLM evaluation, benchmark design, dataset curation, and experimental design
- Familiarity with statistical methods for evaluation and experiment design
- Track record of shipping impactful technical work (research, infrastructure, or both)
- Strong Python skills and comfort with ML tooling
- Ability to reason about evaluation measures and their relevance
- Ownership mentality: problem diagnosis to solution deployment
- Willingness to relocate to Heidelberg or travel regularly
- Understanding of foundation model training (data, scale, architecture effects)
- Experience with large-scale data processing or ML infrastructure
- German language proficiency (helpful for evaluating German capabilities, not required)
- PhD in machine learning, NLP, statistics, or related field (valued but not required)
Tasks
- Design and implement evaluation methodologies
- Select and maintain evaluation datasets
- Develop scoring infrastructure for pre-training
- Optimize evaluation pipelines for speed and reliability
- Build tools for benchmark result interpretation
- Identify and address model capability gaps
- Create or integrate new benchmarks
- Ensure rigorous German language evaluation
- Correlate pre-training metrics with performance outcomes
Work Experience
- approx. 4 - 6 years
Education
- Doctoral / PhD
Languages
- German – Basic
Tools & Technologies
- Python
- PyTorch
- LLM evaluation
- ML tooling
- evaluation frameworks
- distributed systems
- foundation model training
- large-scale data processing
- ML infrastructure
Benefits
Flexible Working
- Flexible working hours
- Hybrid working model
More Vacation Days
- 30 days of paid vacation
Healthcare & Fitness
- Access to fitness & wellness offerings
Mental Health Support
- Mental health support
Retirement Plans
- Subsidized company pension plan
Public Transport Subsidies
- Subsidized Germany-wide transportation ticket
Additional Allowances
- Budget for additional technical equipment
Competitive Pay
- Virtual Stock Option Plan
Company Bike
- Bike Lease
About the Company
Aleph Alpha
Industry
IT
Description
The company develops cutting-edge generative AI solutions with a strong emphasis on sovereignty, ethical development, and societal benefit.
Not a perfect match?
- Aleph Alpha
Senior AI Researcher - Pre-training Data(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - Aleph Alpha
Senior AI Engineer – Pre-training Data(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - Aleph Alpha
Senior Performance Engineer- Pretraining(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Full-timeWith HomeofficeSeniorMannheim - Aleph Alpha
Senior AI Researcher- Reinforcement learning(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg