Skip to content
Neuer Job?Nejo!

Dein persönlicher KI-Karriere-Agent

ALAleph Alpha

Senior AI Engineer – Pre-training Data(m/w/x)

Heidelberg
Vollzeitmit HomeofficeSenior
AI/ML
Data Science

Defining and building systems for foundation model pre-training data at a European AI leader. High engineering competence and strong Python skills required. 30 days vacation, hybrid work, fitness offerings.

Anforderungen

  • Significant research experience (industry or academia)
  • High engineering competence
  • Track record of shipping impactful technical work
  • Strong Python skills
  • Comfort with data engineering and ML infrastructure
  • Experience with deep learning frameworks
  • Experience with workflow orchestration
  • Experience with object storage
  • Experience with columnar data formats
  • Experience with distributed processing
  • Ability to reason about dataset contribution to model training
  • Understanding of dataset relevance for model training
  • Ownership mentality
  • Willingness to relocate to Heidelberg
  • Travel at least fortnightly
  • Experience with large-scale data processing for ML
  • Experience with corpus sourcing
  • Experience with corpus curation
  • Experience with corpus cleaning
  • Experience with corpus deduplication
  • Experience with corpus filtering
  • Familiarity with data quality methods
  • Understanding of foundation model training
  • Understanding of data composition effects on capabilities
  • Understanding of scale effects on capabilities
  • Understanding of mixing ratios effects on capabilities
  • Experience with web-scale data sourcing
  • Experience with crawl processing
  • Rust proficiency
  • Infrastructure knowledge
  • Experience with Kubernetes
  • Experience with container orchestration
  • Experience with cloud-native ML infrastructure
  • PhD in machine learning, NLP, data engineering, or related field (valued but not required)
  • German language proficiency (bonus, not required)

Aufgaben

  • Define data for model training
  • Build systems for data sourcing and preparation
  • Ensure high-quality data for training team
  • Work on full stack of data preparation
  • Analyze data quality and corpus value
  • Optimize large-scale data processing pipelines
  • Build tools for data visibility
  • Stay updated on pre-training data research
  • Design and run data experiments
  • Co-own data pipelines end-to-end
  • Design and maintain data infrastructure
  • Curate and compose data mixtures
  • Balance data domains, languages, and quality
  • Build data quality tooling
  • Develop classifiers and heuristics
  • Monitor pipeline health and data quality
  • Close data gaps
  • Identify and address model weaknesses
  • Collaborate with post-training team
  • Support downstream fine-tuning and deployment
  • Ensure high-quality German-language data
  • Establish data-to-performance signal
  • Maintain data lineage and provenance

Berufserfahrung

  • ca. 4 - 6 Jahre

Ausbildung

  • Doktor / Ph.D.

Sprachen

  • DeutschGrundkenntnisse

Tools & Technologien

  • Python
  • Deep learning frameworks
  • Workflow orchestration
  • Object storage
  • Columnar data formats
  • Distributed processing
  • Kubernetes
  • Container orchestration
  • Cloud-native ML infrastructure
  • Rust
  • Common Crawl
  • WARC pipelines

Benefits

Flexibles Arbeiten

  • Flexible working hours
  • Hybrid working model

Mehr Urlaubstage

  • 30 days of paid vacation

Gesundheits- & Fitnessangebote

  • Fitness & wellness offerings

Mentale Gesundheitsförderung

  • Mental health support

Betriebliche Altersvorsorge

  • Subsidized company pension plan

Öffi Tickets

  • Subsidized Germany-wide transportation ticket

Sonstige Zulagen

  • Budget for additional technical equipment

Attraktive Vergütung

  • Virtual Stock Option Plan

Firmenfahrrad

  • Bike Lease
Die Originalanzeige dieses Stellenangebotes in der aktuellsten Version findest du hier. Nejo hat diesen Job automatisch von der Website des Unternehmens Aleph Alpha erfasst und die Informationen auf Nejo mit Hilfe von KI für dich aufbereitet. Trotz sorgfältiger Analyse können einzelne Informationen unvollständig oder ungenau sein. Bitte prüfe immer alle Angaben in der Originalanzeige! Inhalte und Urheberrechte der Originalanzeige liegen beim ausschreibenden Unternehmen.

  • Aleph Alpha

    Senior AI Researcher - Pre-training Data(m/w/x)

    Vollzeitmit HomeofficeSenior
    Heidelberg
  • Aleph Alpha

    Senior AI Software Engineer - Model Evaluation(m/w/x)

    Vollzeitmit HomeofficeSenior
    Heidelberg
  • Aleph Alpha

    Senior Performance Engineer- Pretraining(m/w/x)

    Vollzeitmit HomeofficeSenior
    Heidelberg
  • Natuvion GmbH

    (Senior) AI Engineer(m/w/x)

    Vollzeitmit HomeofficeBerufserfahren
    Bratislava, München, Walldorf, Wien, Leipzig
  • Buhl Data Service GmbH

    Senior AI / Data Science Engineer(m/w/x)

    Vollzeitmit HomeofficeSenior
    Mannheim
Alle 100+ ähnlichen Jobs ansehen

Nejo ist eine KI – Ergebnisse können unvollständig sein oder Fehler enthalten