Skip to content
New Job?Nejo!

Your personal AI career agent

ALAleph Alpha

Senior AI Engineer – Pre-training Data(m/w/x)

Heidelberg
Full-timeWith Home OfficeSenior
AI/ML

Defining and preparing large-scale data for foundation model pre-training in finance, manufacturing, and public administration. Strong Python, data engineering, and ML infrastructure experience required. 30 days vacation, hybrid work, flexible hours.

Requirements

  • Track record of shipping impactful technical work
  • Strong Python skills
  • Comfort with data engineering and ML infrastructure
  • Experience with deep learning frameworks
  • Experience with workflow orchestration
  • Experience with object storage
  • Experience with columnar data formats
  • Experience with distributed processing
  • Ability to reason about dataset contribution to model training
  • Ownership mentality
  • Willingness to relocate to Heidelberg
  • Travel at least fortnightly
  • Experience with large-scale data processing for ML
  • Experience with corpus sourcing, curation, cleaning, deduplication, and filtering
  • Familiarity with data quality methods
  • Understanding of foundation model training
  • Experience with web-scale data sourcing
  • Experience with crawl processing
  • Rust proficiency
  • Infrastructure knowledge
  • Experience with Kubernetes
  • Experience with container orchestration
  • Experience with cloud-native ML infrastructure
  • PhD in machine learning, NLP, data engineering, or related field (valued but not required)
  • German language proficiency (helpful but not required)

Tasks

  • Define data for model inputs
  • Build data sourcing and preparation systems
  • Ensure high-quality data for training
  • Analyze data quality and corpus value
  • Optimize large-scale data processing pipelines
  • Develop tools for data visibility
  • Stay updated on pre-training data research
  • Design and run data experiments
  • Co-own end-to-end data pipelines
  • Design and maintain data infrastructure
  • Curate and iterate on data mixtures
  • Balance data domains, languages, and quality
  • Build data quality classifiers and heuristics
  • Monitor pipeline health and data metrics
  • Identify and address data coverage gaps
  • Collaborate with post-training teams
  • Ensure German-language data coverage
  • Establish data-to-performance signals
  • Maintain data lineage and provenance

Work Experience

  • approx. 4 - 6 years

Education

  • Doctoral / PhD

Languages

  • GermanBasic

Tools & Technologies

  • Python
  • deep learning frameworks
  • workflow orchestration
  • object storage
  • columnar data formats
  • distributed processing
  • Kubernetes
  • container orchestration
  • cloud-native ML infrastructure
  • Common Crawl
  • WARC pipelines
  • Rust

Benefits

Flexible Working

  • Flexible working hours
  • Hybrid working model

More Vacation Days

  • 30 days of paid vacation

Healthcare & Fitness

  • Fitness & wellness offerings

Mental Health Support

  • Mental health support

Retirement Plans

  • Subsidized company pension plan

Public Transport Subsidies

  • Subsidized Germany-wide transportation ticket

Additional Allowances

  • Budget for additional technical equipment

Competitive Pay

  • Virtual Stock Option Plan

Company Bike

  • Bike Lease
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Aleph Alpha and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

  • Aleph Alpha

    Senior Performance Engineer- Pretraining(m/w/x)

    Full-timeWith HomeofficeSenior
    Heidelberg
  • Aleph Alpha

    Senior AI Software Engineer - Model Evaluation(m/w/x)

    Full-timeWith HomeofficeSenior
    Heidelberg
  • Buhl Data Service GmbH

    Senior AI / Data Science Engineer(m/w/x)

    Full-timeWith HomeofficeSenior
    Mannheim
  • Aleph Alpha

    Senior AI Researcher- Reinforcement learning(m/w/x)

    Full-timeWith HomeofficeSenior
    Heidelberg
  • Computacenter

    MLOPs Engineer - Data & AI Platforms(m/w/x)

    Full-timeWith HomeofficeExperienced
    Frankfurt am Main, Stuttgart, Hannover, Hamburg, München, Ludwigshafen am Rhein, Nürnberg, Köln, Berlin
View all 100+ similar jobs

Nejo is an AI – results may be incomplete or contain mistakes