Skip to content
Neuer Job?Nejo!

Dein persönlicher KI-Karriere-Agent

ALAleph Alpha

Senior AI Engineer – Pre-training Data(m/w/x)

Heidelberg
Vollzeitmit HomeofficeSenior
AI/ML

Defining and preparing large-scale data for foundation model pre-training in finance, manufacturing, and public administration. Strong Python, data engineering, and ML infrastructure experience required. 30 days vacation, hybrid work, flexible hours.

Anforderungen

  • Track record of shipping impactful technical work
  • Strong Python skills
  • Comfort with data engineering and ML infrastructure
  • Experience with deep learning frameworks
  • Experience with workflow orchestration
  • Experience with object storage
  • Experience with columnar data formats
  • Experience with distributed processing
  • Ability to reason about dataset contribution to model training
  • Ownership mentality
  • Willingness to relocate to Heidelberg
  • Travel at least fortnightly
  • Experience with large-scale data processing for ML
  • Experience with corpus sourcing, curation, cleaning, deduplication, and filtering
  • Familiarity with data quality methods
  • Understanding of foundation model training
  • Experience with web-scale data sourcing
  • Experience with crawl processing
  • Rust proficiency
  • Infrastructure knowledge
  • Experience with Kubernetes
  • Experience with container orchestration
  • Experience with cloud-native ML infrastructure
  • PhD in machine learning, NLP, data engineering, or related field (valued but not required)
  • German language proficiency (helpful but not required)

Aufgaben

  • Define data for model inputs
  • Build data sourcing and preparation systems
  • Ensure high-quality data for training
  • Analyze data quality and corpus value
  • Optimize large-scale data processing pipelines
  • Develop tools for data visibility
  • Stay updated on pre-training data research
  • Design and run data experiments
  • Co-own end-to-end data pipelines
  • Design and maintain data infrastructure
  • Curate and iterate on data mixtures
  • Balance data domains, languages, and quality
  • Build data quality classifiers and heuristics
  • Monitor pipeline health and data metrics
  • Identify and address data coverage gaps
  • Collaborate with post-training teams
  • Ensure German-language data coverage
  • Establish data-to-performance signals
  • Maintain data lineage and provenance

Berufserfahrung

  • ca. 4 - 6 Jahre

Ausbildung

  • Doktor / Ph.D.

Sprachen

  • DeutschGrundkenntnisse

Tools & Technologien

  • Python
  • deep learning frameworks
  • workflow orchestration
  • object storage
  • columnar data formats
  • distributed processing
  • Kubernetes
  • container orchestration
  • cloud-native ML infrastructure
  • Common Crawl
  • WARC pipelines
  • Rust

Benefits

Flexibles Arbeiten

  • Flexible working hours
  • Hybrid working model

Mehr Urlaubstage

  • 30 days of paid vacation

Gesundheits- & Fitnessangebote

  • Fitness & wellness offerings

Mentale Gesundheitsförderung

  • Mental health support

Betriebliche Altersvorsorge

  • Subsidized company pension plan

Öffi Tickets

  • Subsidized Germany-wide transportation ticket

Sonstige Zulagen

  • Budget for additional technical equipment

Attraktive Vergütung

  • Virtual Stock Option Plan

Firmenfahrrad

  • Bike Lease
Die Originalanzeige dieses Stellenangebotes in der aktuellsten Version findest du hier. Nejo hat diesen Job automatisch von der Website des Unternehmens Aleph Alpha erfasst und die Informationen auf Nejo mit Hilfe von KI für dich aufbereitet. Trotz sorgfältiger Analyse können einzelne Informationen unvollständig oder ungenau sein. Bitte prüfe immer alle Angaben in der Originalanzeige! Inhalte und Urheberrechte der Originalanzeige liegen beim ausschreibenden Unternehmen.

  • Aleph Alpha

    Senior Performance Engineer- Pretraining(m/w/x)

    Vollzeitmit HomeofficeSenior
    Heidelberg
  • Aleph Alpha

    Senior AI Software Engineer - Model Evaluation(m/w/x)

    Vollzeitmit HomeofficeSenior
    Heidelberg
  • Buhl Data Service GmbH

    Senior AI / Data Science Engineer(m/w/x)

    Vollzeitmit HomeofficeSenior
    Mannheim
  • Aleph Alpha

    Senior AI Researcher- Reinforcement learning(m/w/x)

    Vollzeitmit HomeofficeSenior
    Heidelberg
  • Computacenter

    MLOPs Engineer - Data & AI Platforms(m/w/x)

    Vollzeitmit HomeofficeBerufserfahren
    Frankfurt am Main, Stuttgart, Hannover, Hamburg, München, Ludwigshafen am Rhein, Nürnberg, Köln, Berlin
Alle 100+ ähnlichen Jobs ansehen

Nejo ist eine KI – Ergebnisse können unvollständig sein oder Fehler enthalten

Diese Jobs könnten dich auch interessieren