Your personal AI career agent
Senior Machine Learning Engineer - Multimodal Data(m/w/x)
Building multimodal AI data pipelines for agent training at a design platform. Production-grade data pipeline and ML DevOps experience required. Equity packages, flexible leave options.
Requirements
- Strong software engineering skills in Python
- Experience building production-grade data pipelines
- ML DevOps experience
- Practical prompt engineering experience
- Designing, testing, refining prompts for LLM/VLM outputs
- Experience with ML data workflows
- Large-scale data processing and loading
- Data versioning experience
- Format considerations for training (tokenization, batching, sharding)
- Hands-on experience with data pipelines for large-scale distributed ML training
- Familiarity with annotation tooling
- Familiarity with human-in-the-loop data collection
- Understanding of ML training requirements
- Knowledge of "good data" for LLM/VLM fine-tuning
- Anticipation of downstream issues
- Experience loading/writing large datasets to/from cloud infrastructure (AWS)
- Experience loading/writing large datasets to/from distributed storage systems
- Strong communication skills
- Ability to work with researchers to scope ambiguous problems
- Ability to translate needs into actionable plans
- Collaborative approach
- Comfortable taking ownership
- Comfortable iterating quickly
- Experience with preference data collection for RLHF
- Experience with reward modelling
- Familiarity with multimodal data (image-text pairs, video, design assets)
- Experience building synthetic data generation pipelines using LLMs
- Background in data quality metrics
- Background in monitoring systems
- Contributions to dataset releases or benchmarks in ML community
Tasks
- Design and build data pipelines for agent training
- Collect, filter, deduplicate, format, and version data
- Build and maintain infrastructure for efficient data loading, storage, and retrieval
- Collaborate with research scientists to translate requirements into data specifications
- Create evaluation datasets and benchmarks with researchers
- Curate task distributions to identify real failure modes
- Develop tooling for dataset construction
- Implement human annotation workflows
- Generate synthetic data
- Collect preference data for RLHF/DPO-style training
- Ensure data quality through validation frameworks
- Monitor for data drift and contamination
- Establish standards for trustworthy and reproducible datasets
- Document datasets thoroughly
- Record provenance, known limitations, intended use cases, and versioning history
- Implement comprehensive test coverage for data pipelines and ML workflows
- Conduct code reviews and refactoring
- Establish engineering best practices
- Identify data bottlenecks and propose solutions
- Contribute to team roadmaps to unblock research velocity
Work Experience
- approx. 4 - 6 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- ML DevOps
- LLM
- VLM
- Ray
- AWS
- RLHF
Benefits
Competitive Pay
- Equity packages
Generous Parental Leave
- Inclusive parental leave policy
Additional Allowances
- Annual Vibe & Thrive allowance
Workation & Sabbatical
- Flexible leave options
Not a perfect match?
- Becton, Dickinson and CompanyFull-time/Part-timeWith HomeofficeSeniorWienfrom 52,136 / year
- Canva
Senior Backend Engineer - Research Enablement(m/w/x)
Full-timeRemoteSeniorWien - Canva
Machine Learning Engineering Manager - Evaluations(m/w/x)
Full-timeWith HomeofficeManagementWien - Canva
Senior Research Scientist - Reinforcement Learning, MoEs(m/w/x)
Full-timeWith HomeofficeSeniorWien - RHI Magnesita GmbH
Senior Data Scientist(m/w/x)
Full-timeWith HomeofficeSeniorWienfrom 65,000 / year
Senior Machine Learning Engineer - Multimodal Data(m/w/x)
Building multimodal AI data pipelines for agent training at a design platform. Production-grade data pipeline and ML DevOps experience required. Equity packages, flexible leave options.
Requirements
- Strong software engineering skills in Python
- Experience building production-grade data pipelines
- ML DevOps experience
- Practical prompt engineering experience
- Designing, testing, refining prompts for LLM/VLM outputs
- Experience with ML data workflows
- Large-scale data processing and loading
- Data versioning experience
- Format considerations for training (tokenization, batching, sharding)
- Hands-on experience with data pipelines for large-scale distributed ML training
- Familiarity with annotation tooling
- Familiarity with human-in-the-loop data collection
- Understanding of ML training requirements
- Knowledge of "good data" for LLM/VLM fine-tuning
- Anticipation of downstream issues
- Experience loading/writing large datasets to/from cloud infrastructure (AWS)
- Experience loading/writing large datasets to/from distributed storage systems
- Strong communication skills
- Ability to work with researchers to scope ambiguous problems
- Ability to translate needs into actionable plans
- Collaborative approach
- Comfortable taking ownership
- Comfortable iterating quickly
- Experience with preference data collection for RLHF
- Experience with reward modelling
- Familiarity with multimodal data (image-text pairs, video, design assets)
- Experience building synthetic data generation pipelines using LLMs
- Background in data quality metrics
- Background in monitoring systems
- Contributions to dataset releases or benchmarks in ML community
Tasks
- Design and build data pipelines for agent training
- Collect, filter, deduplicate, format, and version data
- Build and maintain infrastructure for efficient data loading, storage, and retrieval
- Collaborate with research scientists to translate requirements into data specifications
- Create evaluation datasets and benchmarks with researchers
- Curate task distributions to identify real failure modes
- Develop tooling for dataset construction
- Implement human annotation workflows
- Generate synthetic data
- Collect preference data for RLHF/DPO-style training
- Ensure data quality through validation frameworks
- Monitor for data drift and contamination
- Establish standards for trustworthy and reproducible datasets
- Document datasets thoroughly
- Record provenance, known limitations, intended use cases, and versioning history
- Implement comprehensive test coverage for data pipelines and ML workflows
- Conduct code reviews and refactoring
- Establish engineering best practices
- Identify data bottlenecks and propose solutions
- Contribute to team roadmaps to unblock research velocity
Work Experience
- approx. 4 - 6 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- ML DevOps
- LLM
- VLM
- Ray
- AWS
- RLHF
Benefits
Competitive Pay
- Equity packages
Generous Parental Leave
- Inclusive parental leave policy
Additional Allowances
- Annual Vibe & Thrive allowance
Workation & Sabbatical
- Flexible leave options
About the Company
Canva
Industry
IT
Description
The company is a fast-growing platform that redefines how the world experiences design.
Not a perfect match?
- Becton, Dickinson and Company
Senior Machine Learning Engineer(m/w/x)
Full-time/Part-timeWith HomeofficeSeniorWienfrom 52,136 / year - Canva
Senior Backend Engineer - Research Enablement(m/w/x)
Full-timeRemoteSeniorWien - Canva
Machine Learning Engineering Manager - Evaluations(m/w/x)
Full-timeWith HomeofficeManagementWien - Canva
Senior Research Scientist - Reinforcement Learning, MoEs(m/w/x)
Full-timeWith HomeofficeSeniorWien - RHI Magnesita GmbH
Senior Data Scientist(m/w/x)
Full-timeWith HomeofficeSeniorWienfrom 65,000 / year