Dein persönlicher KI-Karriere-Agent
Senior Machine Learning Engineer - Multimodal Data(m/w/x)
Building multimodal AI data pipelines for agent training at a design platform. Production-grade data pipeline and ML DevOps experience required. Equity packages, flexible leave options.
Anforderungen
- Strong software engineering skills in Python
- Experience building production-grade data pipelines
- ML DevOps experience
- Practical prompt engineering experience
- Designing, testing, refining prompts for LLM/VLM outputs
- Experience with ML data workflows
- Large-scale data processing and loading
- Data versioning experience
- Format considerations for training (tokenization, batching, sharding)
- Hands-on experience with data pipelines for large-scale distributed ML training
- Familiarity with annotation tooling
- Familiarity with human-in-the-loop data collection
- Understanding of ML training requirements
- Knowledge of "good data" for LLM/VLM fine-tuning
- Anticipation of downstream issues
- Experience loading/writing large datasets to/from cloud infrastructure (AWS)
- Experience loading/writing large datasets to/from distributed storage systems
- Strong communication skills
- Ability to work with researchers to scope ambiguous problems
- Ability to translate needs into actionable plans
- Collaborative approach
- Comfortable taking ownership
- Comfortable iterating quickly
- Experience with preference data collection for RLHF
- Experience with reward modelling
- Familiarity with multimodal data (image-text pairs, video, design assets)
- Experience building synthetic data generation pipelines using LLMs
- Background in data quality metrics
- Background in monitoring systems
- Contributions to dataset releases or benchmarks in ML community
Aufgaben
- Design and build data pipelines for agent training
- Collect, filter, deduplicate, format, and version data
- Build and maintain infrastructure for efficient data loading, storage, and retrieval
- Collaborate with research scientists to translate requirements into data specifications
- Create evaluation datasets and benchmarks with researchers
- Curate task distributions to identify real failure modes
- Develop tooling for dataset construction
- Implement human annotation workflows
- Generate synthetic data
- Collect preference data for RLHF/DPO-style training
- Ensure data quality through validation frameworks
- Monitor for data drift and contamination
- Establish standards for trustworthy and reproducible datasets
- Document datasets thoroughly
- Record provenance, known limitations, intended use cases, and versioning history
- Implement comprehensive test coverage for data pipelines and ML workflows
- Conduct code reviews and refactoring
- Establish engineering best practices
- Identify data bottlenecks and propose solutions
- Contribute to team roadmaps to unblock research velocity
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- Python
- ML DevOps
- LLM
- VLM
- Ray
- AWS
- RLHF
Benefits
Attraktive Vergütung
- Equity packages
Großzügige Elternzeit
- Inclusive parental leave policy
Sonstige Zulagen
- Annual Vibe & Thrive allowance
Workation & Sabbatical
- Flexible leave options
Gefällt dir diese Stelle?
BetaDein Career Agent findet täglich ähnliche Jobs für dich.
Noch nicht perfekt?
- CanvaVollzeitmit HomeofficeBerufserfahrenWienab 70.000 / Jahr
- Canva
Senior Backend Engineer - Research Enablement(m/w/x)
VollzeitRemoteSeniorWien - Becton, Dickinson and Company
Senior Machine Learning Engineer(m/w/x)
Vollzeit/Teilzeitmit HomeofficeSeniorWienab 52.136 / Jahr - Canva
Machine Learning Engineering Manager - Evaluations(m/w/x)
Vollzeitmit HomeofficeManagementWien - Canva
Senior Research Scientist - Reinforcement Learning, MoEs(m/w/x)
Vollzeitmit HomeofficeSeniorWien
Senior Machine Learning Engineer - Multimodal Data(m/w/x)
Building multimodal AI data pipelines for agent training at a design platform. Production-grade data pipeline and ML DevOps experience required. Equity packages, flexible leave options.
Anforderungen
- Strong software engineering skills in Python
- Experience building production-grade data pipelines
- ML DevOps experience
- Practical prompt engineering experience
- Designing, testing, refining prompts for LLM/VLM outputs
- Experience with ML data workflows
- Large-scale data processing and loading
- Data versioning experience
- Format considerations for training (tokenization, batching, sharding)
- Hands-on experience with data pipelines for large-scale distributed ML training
- Familiarity with annotation tooling
- Familiarity with human-in-the-loop data collection
- Understanding of ML training requirements
- Knowledge of "good data" for LLM/VLM fine-tuning
- Anticipation of downstream issues
- Experience loading/writing large datasets to/from cloud infrastructure (AWS)
- Experience loading/writing large datasets to/from distributed storage systems
- Strong communication skills
- Ability to work with researchers to scope ambiguous problems
- Ability to translate needs into actionable plans
- Collaborative approach
- Comfortable taking ownership
- Comfortable iterating quickly
- Experience with preference data collection for RLHF
- Experience with reward modelling
- Familiarity with multimodal data (image-text pairs, video, design assets)
- Experience building synthetic data generation pipelines using LLMs
- Background in data quality metrics
- Background in monitoring systems
- Contributions to dataset releases or benchmarks in ML community
Aufgaben
- Design and build data pipelines for agent training
- Collect, filter, deduplicate, format, and version data
- Build and maintain infrastructure for efficient data loading, storage, and retrieval
- Collaborate with research scientists to translate requirements into data specifications
- Create evaluation datasets and benchmarks with researchers
- Curate task distributions to identify real failure modes
- Develop tooling for dataset construction
- Implement human annotation workflows
- Generate synthetic data
- Collect preference data for RLHF/DPO-style training
- Ensure data quality through validation frameworks
- Monitor for data drift and contamination
- Establish standards for trustworthy and reproducible datasets
- Document datasets thoroughly
- Record provenance, known limitations, intended use cases, and versioning history
- Implement comprehensive test coverage for data pipelines and ML workflows
- Conduct code reviews and refactoring
- Establish engineering best practices
- Identify data bottlenecks and propose solutions
- Contribute to team roadmaps to unblock research velocity
Berufserfahrung
- ca. 4 - 6 Jahre
Ausbildung
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- Python
- ML DevOps
- LLM
- VLM
- Ray
- AWS
- RLHF
Benefits
Attraktive Vergütung
- Equity packages
Großzügige Elternzeit
- Inclusive parental leave policy
Sonstige Zulagen
- Annual Vibe & Thrive allowance
Workation & Sabbatical
- Flexible leave options
Gefällt dir diese Stelle?
BetaDein Career Agent findet täglich ähnliche Jobs für dich.
Über das Unternehmen
Canva
Branche
IT
Beschreibung
The company is a fast-growing platform that redefines how the world experiences design.
Noch nicht perfekt?
- Canva
Python Backend Engineer - AI Platform Enablement(m/w/x)
Vollzeitmit HomeofficeBerufserfahrenWienab 70.000 / Jahr - Canva
Senior Backend Engineer - Research Enablement(m/w/x)
VollzeitRemoteSeniorWien - Becton, Dickinson and Company
Senior Machine Learning Engineer(m/w/x)
Vollzeit/Teilzeitmit HomeofficeSeniorWienab 52.136 / Jahr - Canva
Machine Learning Engineering Manager - Evaluations(m/w/x)
Vollzeitmit HomeofficeManagementWien - Canva
Senior Research Scientist - Reinforcement Learning, MoEs(m/w/x)
Vollzeitmit HomeofficeSeniorWien