Dein persönlicher KI-Karriere-Agent
Senior AI Researcher - Pre-training Data(m/w/x)
Shaping scientific methodology for pre-training corpora and co-engineering software for foundation models. Deep understanding of ML theory and foundation model training dynamics required. 30 days vacation, hybrid work, and wellness offerings.
Anforderungen
- Deep understanding of ML theory, foundation model training dynamics, scaling laws, data-centric AI
- Experience designing/evaluating ML experiments (data composition, curriculum learning, data quality)
- Familiarity with statistical methods for evaluation and experiment design
- Ability to reason about dataset information-theoretic properties and predictive power
- Strong Python skills and comfort with ML tooling and deep learning frameworks
- Willingness to relocate to Heidelberg or travel at least fortnightly
- PhD in machine learning, NLP, or equivalent research experience
- History of contributions to top-tier venues (NeurIPS, ICML, ICLR, ACL)
- Experience training foundation models from scratch and diagnosing data-induced pathologies
- Bonus: German language proficiency for curating/assessing German-language data
Aufgaben
- Shape scientific methodology for pre-training corpora
- Co-engineer software and systems for pre-training
- Conduct theoretical and empirical research on data scaling
- Design targeted ablations across various scales
- Derive and test hypotheses from training dynamics
- Develop algorithms for data quality estimation
- Perform data curation and synthetic data generation
- Contribute to engineering tasks for research support
- Collaborate with engineers and researchers on pipelines
- Write technical reports for internal and external readers
- Present at technical meetings and conferences
- Identify and implement novel data quality approaches
- Iterate on synthetic data generation techniques
- Research advanced curation methods and curriculum learning
- Design rigorous ablation studies for data composition
- Analyze effects of deduplication and scaling laws
- Develop advanced algorithms for data scoring and selection
- Partner with diverse teams to scale research prototypes
- Ensure pre-training distributions support fine-tuning
- Align pre-training with customer needs
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Doktor / Ph.D.
Sprachen
- Deutsch – Grundkenntnisse
Tools & Technologien
- Python
- PyTorch
Benefits
Flexibles Arbeiten
- Flexible working hours
- Hybrid working model
Mehr Urlaubstage
- 30 days of paid vacation
Gesundheits- & Fitnessangebote
- Fitness & wellness offerings
Mentale Gesundheitsförderung
- Mental health support
Betriebliche Altersvorsorge
- Subsidized company pension plan
Öffi Tickets
- Subsidized Germany-wide transportation ticket
Sonstige Zulagen
- Budget for additional technical equipment
Attraktive Vergütung
- Virtual Stock Option Plan
Firmenfahrrad
- Bike Lease
Noch nicht perfekt?
- Aleph AlphaVollzeitmit HomeofficeSeniorHeidelberg
- Aleph Alpha
Senior Performance Engineer- Pretraining(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - Aleph Alpha
Senior AI Researcher- Reinforcement learning(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Vollzeitmit HomeofficeSeniorMannheim - Exxeta
Senior Data Scientist - Physical AI & Computer Vision(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin, Karlsruhe, Mannheim
Senior AI Researcher - Pre-training Data(m/w/x)
Shaping scientific methodology for pre-training corpora and co-engineering software for foundation models. Deep understanding of ML theory and foundation model training dynamics required. 30 days vacation, hybrid work, and wellness offerings.
Anforderungen
- Deep understanding of ML theory, foundation model training dynamics, scaling laws, data-centric AI
- Experience designing/evaluating ML experiments (data composition, curriculum learning, data quality)
- Familiarity with statistical methods for evaluation and experiment design
- Ability to reason about dataset information-theoretic properties and predictive power
- Strong Python skills and comfort with ML tooling and deep learning frameworks
- Willingness to relocate to Heidelberg or travel at least fortnightly
- PhD in machine learning, NLP, or equivalent research experience
- History of contributions to top-tier venues (NeurIPS, ICML, ICLR, ACL)
- Experience training foundation models from scratch and diagnosing data-induced pathologies
- Bonus: German language proficiency for curating/assessing German-language data
Aufgaben
- Shape scientific methodology for pre-training corpora
- Co-engineer software and systems for pre-training
- Conduct theoretical and empirical research on data scaling
- Design targeted ablations across various scales
- Derive and test hypotheses from training dynamics
- Develop algorithms for data quality estimation
- Perform data curation and synthetic data generation
- Contribute to engineering tasks for research support
- Collaborate with engineers and researchers on pipelines
- Write technical reports for internal and external readers
- Present at technical meetings and conferences
- Identify and implement novel data quality approaches
- Iterate on synthetic data generation techniques
- Research advanced curation methods and curriculum learning
- Design rigorous ablation studies for data composition
- Analyze effects of deduplication and scaling laws
- Develop advanced algorithms for data scoring and selection
- Partner with diverse teams to scale research prototypes
- Ensure pre-training distributions support fine-tuning
- Align pre-training with customer needs
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Doktor / Ph.D.
Sprachen
- Deutsch – Grundkenntnisse
Tools & Technologien
- Python
- PyTorch
Benefits
Flexibles Arbeiten
- Flexible working hours
- Hybrid working model
Mehr Urlaubstage
- 30 days of paid vacation
Gesundheits- & Fitnessangebote
- Fitness & wellness offerings
Mentale Gesundheitsförderung
- Mental health support
Betriebliche Altersvorsorge
- Subsidized company pension plan
Öffi Tickets
- Subsidized Germany-wide transportation ticket
Sonstige Zulagen
- Budget for additional technical equipment
Attraktive Vergütung
- Virtual Stock Option Plan
Firmenfahrrad
- Bike Lease
Über das Unternehmen
Aleph Alpha
Branche
IT
Beschreibung
The company develops cutting-edge generative AI solutions with a strong emphasis on sovereignty, ethical development, and societal benefit.
Noch nicht perfekt?
- Aleph Alpha
Senior AI Engineer – Pre-training Data(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - Aleph Alpha
Senior Performance Engineer- Pretraining(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - Aleph Alpha
Senior AI Researcher- Reinforcement learning(m/w/x)
Vollzeitmit HomeofficeSeniorHeidelberg - Buhl Data Service GmbH
Senior AI / Data Science Engineer(m/w/x)
Vollzeitmit HomeofficeSeniorMannheim - Exxeta
Senior Data Scientist - Physical AI & Computer Vision(m/w/x)
Vollzeitmit HomeofficeSeniorBerlin, Karlsruhe, Mannheim