Your personal AI career agent
Senior AI Software Engineer - Model Evaluation(m/w/x)
Evaluating foundation models for finance, manufacturing, and public administration clients. LLM evaluation, benchmark design, and Python skills required. 30 days vacation, subsidized transport ticket.
Requirements
- Experience with LLM evaluation, benchmark design, dataset curation, and experimental design
- Familiarity with statistical methods for evaluation and experiment design
- Track record of shipping impactful technical work (research, infrastructure, or both)
- Strong Python skills and comfort with ML tooling
- Ability to reason about evaluation measurements and their relevance
- Ownership mentality: seeing problems through from diagnosis to solution to deployment
- Willingness to relocate to Heidelberg or travel regularly
- Understanding of foundation model training (data, scale, architecture effects)
- Experience with large-scale data processing or ML infrastructure
- German language proficiency (helpful for evaluating German capabilities, not required)
- PhD in machine learning, NLP, statistics, or related field (valued but not required)
Tasks
- Define evaluation criteria for models
- Build systems to measure model performance
- Ensure training team has reliable evaluation signals
- Select and implement evaluation benchmarks
- Maintain dataset curation and scoring infrastructure
- Develop and optimize evaluation pipelines
- Ensure pipeline speed, reliability, and reproducibility
- Design benchmark result aggregation
- Create tools for interpretable results
- Identify model capability gaps
- Integrate benchmarks for measuring progress
- Evaluate German language capabilities rigorously
- Correlate pre-training metrics with performance
Work Experience
- approx. 4 - 6 years
Education
- Doctoral / PhD
Languages
- German – Basic
Tools & Technologies
- LLM
- Python
- PyTorch
- ML tooling
- evaluation frameworks
- distributed systems
Benefits
More Vacation Days
- 30 days of paid vacation
Healthcare & Fitness
- Access to fitness & wellness offerings via Wellhub
Mental Health Support
- Mental health support through nilo.health
Retirement Plans
- Substantially subsidized company pension plan
Public Transport Subsidies
- Subsidized Germany-wide transportation ticket
Additional Allowances
- Budget for additional technical equipment
Flexible Working
- Flexible working hours
- Hybrid working model
Competitive Pay
- Virtual Stock Option Plan
Company Bike
- JobRad® Bike Lease
- Home
- Jobs in Germany
- Senior AI Software Engineer - Model EvaluationSenior AI Software Engineer - Model Evaluation at Aleph A...
Not a perfect match?
- Aleph AlphaFull-timeWith HomeofficeSeniorHeidelberg
- Aleph Alpha
Senior Performance Engineer- Pretraining(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Full-timeWith HomeofficeSeniorMannheim - Aleph Alpha
Senior AI Researcher- Reinforcement learning(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - HMS Analytical Software GmbH
Senior Data Scientist / Senior AI Engineer(m/w/x)
Full-time/Part-timeWith HomeofficeSeniorHeidelberg, Berlin, Ulm
- Home
- Jobs in Germany
- Senior AI Software Engineer - Model EvaluationSenior AI Software Engineer - Model Evaluation at Aleph A...
Senior AI Software Engineer - Model Evaluation(m/w/x)
Evaluating foundation models for finance, manufacturing, and public administration clients. LLM evaluation, benchmark design, and Python skills required. 30 days vacation, subsidized transport ticket.
Requirements
- Experience with LLM evaluation, benchmark design, dataset curation, and experimental design
- Familiarity with statistical methods for evaluation and experiment design
- Track record of shipping impactful technical work (research, infrastructure, or both)
- Strong Python skills and comfort with ML tooling
- Ability to reason about evaluation measurements and their relevance
- Ownership mentality: seeing problems through from diagnosis to solution to deployment
- Willingness to relocate to Heidelberg or travel regularly
- Understanding of foundation model training (data, scale, architecture effects)
- Experience with large-scale data processing or ML infrastructure
- German language proficiency (helpful for evaluating German capabilities, not required)
- PhD in machine learning, NLP, statistics, or related field (valued but not required)
Tasks
- Define evaluation criteria for models
- Build systems to measure model performance
- Ensure training team has reliable evaluation signals
- Select and implement evaluation benchmarks
- Maintain dataset curation and scoring infrastructure
- Develop and optimize evaluation pipelines
- Ensure pipeline speed, reliability, and reproducibility
- Design benchmark result aggregation
- Create tools for interpretable results
- Identify model capability gaps
- Integrate benchmarks for measuring progress
- Evaluate German language capabilities rigorously
- Correlate pre-training metrics with performance
Work Experience
- approx. 4 - 6 years
Education
- Doctoral / PhD
Languages
- German – Basic
Tools & Technologies
- LLM
- Python
- PyTorch
- ML tooling
- evaluation frameworks
- distributed systems
Benefits
More Vacation Days
- 30 days of paid vacation
Healthcare & Fitness
- Access to fitness & wellness offerings via Wellhub
Mental Health Support
- Mental health support through nilo.health
Retirement Plans
- Substantially subsidized company pension plan
Public Transport Subsidies
- Subsidized Germany-wide transportation ticket
Additional Allowances
- Budget for additional technical equipment
Flexible Working
- Flexible working hours
- Hybrid working model
Competitive Pay
- Virtual Stock Option Plan
Company Bike
- JobRad® Bike Lease
About the Company
Aleph Alpha
Industry
IT
Description
The company develops cutting-edge generative AI solutions with a strong emphasis on sovereignty, ethical development, and societal benefit.
Not a perfect match?
- Aleph Alpha
Senior AI Engineer – Pre-training Data(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - Aleph Alpha
Senior Performance Engineer- Pretraining(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Full-timeWith HomeofficeSeniorMannheim - Aleph Alpha
Senior AI Researcher- Reinforcement learning(m/w/x)
Full-timeWith HomeofficeSeniorHeidelberg - HMS Analytical Software GmbH
Senior Data Scientist / Senior AI Engineer(m/w/x)
Full-time/Part-timeWith HomeofficeSeniorHeidelberg, Berlin, Ulm