Skip to content
New Job?Nejo!

Your personal AI career agent

ALAleph Alpha

Senior AI Researcher - Pre-training Data(m/w/x)

Heidelberg
Full-timeWith Home OfficeSenior
AI/ML

Shaping scientific methodology for pre-training corpora and co-engineering software for foundation models. Deep understanding of ML theory and foundation model training dynamics required. 30 days vacation, hybrid work, and wellness offerings.

Requirements

  • Deep understanding of ML theory, foundation model training dynamics, scaling laws, data-centric AI
  • Experience designing/evaluating ML experiments (data composition, curriculum learning, data quality)
  • Familiarity with statistical methods for evaluation and experiment design
  • Ability to reason about dataset information-theoretic properties and predictive power
  • Strong Python skills and comfort with ML tooling and deep learning frameworks
  • Willingness to relocate to Heidelberg or travel at least fortnightly
  • PhD in machine learning, NLP, or equivalent research experience
  • History of contributions to top-tier venues (NeurIPS, ICML, ICLR, ACL)
  • Experience training foundation models from scratch and diagnosing data-induced pathologies
  • Bonus: German language proficiency for curating/assessing German-language data

Tasks

  • Shape scientific methodology for pre-training corpora
  • Co-engineer software and systems for pre-training
  • Conduct theoretical and empirical research on data scaling
  • Design targeted ablations across various scales
  • Derive and test hypotheses from training dynamics
  • Develop algorithms for data quality estimation
  • Perform data curation and synthetic data generation
  • Contribute to engineering tasks for research support
  • Collaborate with engineers and researchers on pipelines
  • Write technical reports for internal and external readers
  • Present at technical meetings and conferences
  • Identify and implement novel data quality approaches
  • Iterate on synthetic data generation techniques
  • Research advanced curation methods and curriculum learning
  • Design rigorous ablation studies for data composition
  • Analyze effects of deduplication and scaling laws
  • Develop advanced algorithms for data scoring and selection
  • Partner with diverse teams to scale research prototypes
  • Ensure pre-training distributions support fine-tuning
  • Align pre-training with customer needs

Work Experience

  • approx. 4 - 6 years

Education

  • Doctoral / PhD

Languages

  • GermanBasic

Tools & Technologies

  • Python
  • PyTorch

Benefits

Flexible Working

  • Flexible working hours
  • Hybrid working model

More Vacation Days

  • 30 days of paid vacation

Healthcare & Fitness

  • Fitness & wellness offerings

Mental Health Support

  • Mental health support

Retirement Plans

  • Subsidized company pension plan

Public Transport Subsidies

  • Subsidized Germany-wide transportation ticket

Additional Allowances

  • Budget for additional technical equipment

Competitive Pay

  • Virtual Stock Option Plan

Company Bike

  • Bike Lease
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Aleph Alpha and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

  • Aleph Alpha

    Senior AI Engineer – Pre-training Data(m/w/x)

    Full-timeWith HomeofficeSenior
    Heidelberg
  • Aleph Alpha

    Senior Performance Engineer- Pretraining(m/w/x)

    Full-timeWith HomeofficeSenior
    Heidelberg
  • Aleph Alpha

    Senior AI Researcher- Reinforcement learning(m/w/x)

    Full-timeWith HomeofficeSenior
    Heidelberg
  • Buhl Data Service GmbH

    Senior AI / Data Science Engineer(m/w/x)

    Full-timeWith HomeofficeSenior
    Mannheim
  • ABB AG

    (Senior) Scientist – AI and Graphs(m/w/x)

    Full-timeWith HomeofficeExperienced
    Mannheim
View all 100+ similar jobs

Nejo is an AI – results may be incomplete or contain mistakes