Dein persönlicher KI-Karriere-Agent
Senior AI Engineer – Pre-training Data(m/w/x)
Defining and preparing large-scale data for foundation model pre-training in finance, manufacturing, and public administration. Strong Python, data engineering, and ML infrastructure experience required. 30 days vacation, hybrid work, flexible hours.
Anforderungen
- Track record of shipping impactful technical work
- Strong Python skills
- Comfort with data engineering and ML infrastructure
- Experience with deep learning frameworks
- Experience with workflow orchestration
- Experience with object storage
- Experience with columnar data formats
- Experience with distributed processing
- Ability to reason about dataset contribution to model training
- Ownership mentality
- Willingness to relocate to Heidelberg
- Travel at least fortnightly
- Experience with large-scale data processing for ML
- Experience with corpus sourcing, curation, cleaning, deduplication, and filtering
- Familiarity with data quality methods
- Understanding of foundation model training
- Experience with web-scale data sourcing
- Experience with crawl processing
- Rust proficiency
- Infrastructure knowledge
- Experience with Kubernetes
- Experience with container orchestration
- Experience with cloud-native ML infrastructure
- PhD in machine learning, NLP, data engineering, or related field (valued but not required)
- German language proficiency (helpful but not required)
Aufgaben
- Define data for model inputs
- Build data sourcing and preparation systems
- Ensure high-quality data for training
- Analyze data quality and corpus value
- Optimize large-scale data processing pipelines
- Develop tools for data visibility
- Stay updated on pre-training data research
- Design and run data experiments
- Co-own end-to-end data pipelines
- Design and maintain data infrastructure
- Curate and iterate on data mixtures
- Balance data domains, languages, and quality
- Build data quality classifiers and heuristics
- Monitor pipeline health and data metrics
- Identify and address data coverage gaps
- Collaborate with post-training teams
- Ensure German-language data coverage
- Establish data-to-performance signals
- Maintain data lineage and provenance
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Doktor / Ph.D.
Sprachen
- Deutsch – Grundkenntnisse
Tools & Technologien
- Python
- deep learning frameworks
- workflow orchestration
- object storage
- columnar data formats
- distributed processing
- Kubernetes
- container orchestration
- cloud-native ML infrastructure
- Common Crawl
- WARC pipelines
- Rust
Benefits
Flexibles Arbeiten
- Flexible working hours
- Hybrid working model
Mehr Urlaubstage
- 30 days of paid vacation
Gesundheits- & Fitnessangebote
- Fitness & wellness offerings
Mentale Gesundheitsförderung
- Mental health support
Betriebliche Altersvorsorge
- Subsidized company pension plan
Öffi Tickets
- Subsidized Germany-wide transportation ticket
Sonstige Zulagen
- Budget for additional technical equipment
Attraktive Vergütung
- Virtual Stock Option Plan
Firmenfahrrad
- Bike Lease
- Home
- Jobs in Deutschland
- Heidelberg
- Senior AI Engineer – Pre-training DataSenior AI Engineer – Pre-training Data bei Aleph Alpha
Noch nicht perfekt?
- Aleph AlphaVollzeitmit HomeofficeSeniorHeidelberg
- Aleph Alpha
Senior AI Software Engineer - Model Evaluation(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Vollzeitmit HomeofficeSeniorMannheim - Aleph Alpha
Senior AI Researcher- Reinforcement learning(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - Computacenter
MLOPs Engineer - Data & AI Platforms(m/w/x)
Vollzeitmit HomeofficeBerufserfahrenFrankfurt am Main, Stuttgart, Hannover, Hamburg, München, Ludwigshafen am Rhein, Nürnberg, Köln, Berlin
- Home
- Jobs in Deutschland
- Heidelberg
- Senior AI Engineer – Pre-training DataSenior AI Engineer – Pre-training Data bei Aleph Alpha
Senior AI Engineer – Pre-training Data(m/w/x)
Defining and preparing large-scale data for foundation model pre-training in finance, manufacturing, and public administration. Strong Python, data engineering, and ML infrastructure experience required. 30 days vacation, hybrid work, flexible hours.
Anforderungen
- Track record of shipping impactful technical work
- Strong Python skills
- Comfort with data engineering and ML infrastructure
- Experience with deep learning frameworks
- Experience with workflow orchestration
- Experience with object storage
- Experience with columnar data formats
- Experience with distributed processing
- Ability to reason about dataset contribution to model training
- Ownership mentality
- Willingness to relocate to Heidelberg
- Travel at least fortnightly
- Experience with large-scale data processing for ML
- Experience with corpus sourcing, curation, cleaning, deduplication, and filtering
- Familiarity with data quality methods
- Understanding of foundation model training
- Experience with web-scale data sourcing
- Experience with crawl processing
- Rust proficiency
- Infrastructure knowledge
- Experience with Kubernetes
- Experience with container orchestration
- Experience with cloud-native ML infrastructure
- PhD in machine learning, NLP, data engineering, or related field (valued but not required)
- German language proficiency (helpful but not required)
Aufgaben
- Define data for model inputs
- Build data sourcing and preparation systems
- Ensure high-quality data for training
- Analyze data quality and corpus value
- Optimize large-scale data processing pipelines
- Develop tools for data visibility
- Stay updated on pre-training data research
- Design and run data experiments
- Co-own end-to-end data pipelines
- Design and maintain data infrastructure
- Curate and iterate on data mixtures
- Balance data domains, languages, and quality
- Build data quality classifiers and heuristics
- Monitor pipeline health and data metrics
- Identify and address data coverage gaps
- Collaborate with post-training teams
- Ensure German-language data coverage
- Establish data-to-performance signals
- Maintain data lineage and provenance
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Doktor / Ph.D.
Sprachen
- Deutsch – Grundkenntnisse
Tools & Technologien
- Python
- deep learning frameworks
- workflow orchestration
- object storage
- columnar data formats
- distributed processing
- Kubernetes
- container orchestration
- cloud-native ML infrastructure
- Common Crawl
- WARC pipelines
- Rust
Benefits
Flexibles Arbeiten
- Flexible working hours
- Hybrid working model
Mehr Urlaubstage
- 30 days of paid vacation
Gesundheits- & Fitnessangebote
- Fitness & wellness offerings
Mentale Gesundheitsförderung
- Mental health support
Betriebliche Altersvorsorge
- Subsidized company pension plan
Öffi Tickets
- Subsidized Germany-wide transportation ticket
Sonstige Zulagen
- Budget for additional technical equipment
Attraktive Vergütung
- Virtual Stock Option Plan
Firmenfahrrad
- Bike Lease
Über das Unternehmen
Aleph Alpha
Branche
IT
Beschreibung
The company develops cutting-edge generative AI solutions with a strong emphasis on sovereignty, ethical development, and societal benefit.
Noch nicht perfekt?
- Aleph Alpha
Senior Performance Engineer- Pretraining(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - Aleph Alpha
Senior AI Software Engineer - Model Evaluation(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Vollzeitmit HomeofficeSeniorMannheim - Aleph Alpha
Senior AI Researcher- Reinforcement learning(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - Computacenter
MLOPs Engineer - Data & AI Platforms(m/w/x)
Vollzeitmit HomeofficeBerufserfahrenFrankfurt am Main, Stuttgart, Hannover, Hamburg, München, Ludwigshafen am Rhein, Nürnberg, Köln, Berlin