New Job?Nejo!
Your personal AI career agent
CACanva
Senior Machine Learning Engineer - Multimodal Data(m/w/x)
Wien
Full-timeWith Home OfficeSenior
AI/ML
Nejo AI Summary
Building multimodal AI data pipelines for agent training at a design platform. Production-grade data pipeline and ML DevOps experience required. Equity packages, flexible leave options.
Requirements
- Strong software engineering skills in Python
- Experience building production-grade data pipelines
- ML DevOps experience
- Practical prompt engineering experience
- Designing, testing, refining prompts for LLM/VLM outputs
- Experience with ML data workflows
- Large-scale data processing and loading
- Data versioning experience
- Format considerations for training (tokenization, batching, sharding)
- Hands-on experience with data pipelines for large-scale distributed ML training
- Familiarity with annotation tooling
- Familiarity with human-in-the-loop data collection
- Understanding of ML training requirements
- Knowledge of "good data" for LLM/VLM fine-tuning
- Anticipation of downstream issues
- Experience loading/writing large datasets to/from cloud infrastructure (AWS)
- Experience loading/writing large datasets to/from distributed storage systems
- Strong communication skills
- Ability to work with researchers to scope ambiguous problems
- Ability to translate needs into actionable plans
- Collaborative approach
- Comfortable taking ownership
- Comfortable iterating quickly
- Experience with preference data collection for RLHF
- Experience with reward modelling
- Familiarity with multimodal data (image-text pairs, video, design assets)
- Experience building synthetic data generation pipelines using LLMs
- Background in data quality metrics
- Background in monitoring systems
- Contributions to dataset releases or benchmarks in ML community
Tasks
- Design and build data pipelines for agent training
- Collect, filter, deduplicate, format, and version data
- Build and maintain infrastructure for efficient data loading, storage, and retrieval
- Collaborate with research scientists to translate requirements into data specifications
- Create evaluation datasets and benchmarks with researchers
- Curate task distributions to identify real failure modes
- Develop tooling for dataset construction
- Implement human annotation workflows
- Generate synthetic data
- Collect preference data for RLHF/DPO-style training
- Ensure data quality through validation frameworks
- Monitor for data drift and contamination
- Establish standards for trustworthy and reproducible datasets
- Document datasets thoroughly
- Record provenance, known limitations, intended use cases, and versioning history
- Implement comprehensive test coverage for data pipelines and ML workflows
- Conduct code reviews and refactoring
- Establish engineering best practices
- Identify data bottlenecks and propose solutions
- Contribute to team roadmaps to unblock research velocity
Work Experience
- approx. 4 - 6 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- ML DevOps
- LLM
- VLM
- Ray
- AWS
- RLHF
Benefits
Competitive Pay
- Equity packages
Generous Parental Leave
- Inclusive parental leave policy
Additional Allowances
- Annual Vibe & Thrive allowance
Workation & Sabbatical
- Flexible leave options
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Canva and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.
CACanva
Senior Machine Learning Engineer - Multimodal Data(m/w/x)
Wien
Full-timeWith Home OfficeSenior
AI/ML
Nejo AI Summary
Building multimodal AI data pipelines for agent training at a design platform. Production-grade data pipeline and ML DevOps experience required. Equity packages, flexible leave options.
Requirements
- Strong software engineering skills in Python
- Experience building production-grade data pipelines
- ML DevOps experience
- Practical prompt engineering experience
- Designing, testing, refining prompts for LLM/VLM outputs
- Experience with ML data workflows
- Large-scale data processing and loading
- Data versioning experience
- Format considerations for training (tokenization, batching, sharding)
- Hands-on experience with data pipelines for large-scale distributed ML training
- Familiarity with annotation tooling
- Familiarity with human-in-the-loop data collection
- Understanding of ML training requirements
- Knowledge of "good data" for LLM/VLM fine-tuning
- Anticipation of downstream issues
- Experience loading/writing large datasets to/from cloud infrastructure (AWS)
- Experience loading/writing large datasets to/from distributed storage systems
- Strong communication skills
- Ability to work with researchers to scope ambiguous problems
- Ability to translate needs into actionable plans
- Collaborative approach
- Comfortable taking ownership
- Comfortable iterating quickly
- Experience with preference data collection for RLHF
- Experience with reward modelling
- Familiarity with multimodal data (image-text pairs, video, design assets)
- Experience building synthetic data generation pipelines using LLMs
- Background in data quality metrics
- Background in monitoring systems
- Contributions to dataset releases or benchmarks in ML community
Tasks
- Design and build data pipelines for agent training
- Collect, filter, deduplicate, format, and version data
- Build and maintain infrastructure for efficient data loading, storage, and retrieval
- Collaborate with research scientists to translate requirements into data specifications
- Create evaluation datasets and benchmarks with researchers
- Curate task distributions to identify real failure modes
- Develop tooling for dataset construction
- Implement human annotation workflows
- Generate synthetic data
- Collect preference data for RLHF/DPO-style training
- Ensure data quality through validation frameworks
- Monitor for data drift and contamination
- Establish standards for trustworthy and reproducible datasets
- Document datasets thoroughly
- Record provenance, known limitations, intended use cases, and versioning history
- Implement comprehensive test coverage for data pipelines and ML workflows
- Conduct code reviews and refactoring
- Establish engineering best practices
- Identify data bottlenecks and propose solutions
- Contribute to team roadmaps to unblock research velocity
Work Experience
- approx. 4 - 6 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Python
- ML DevOps
- LLM
- VLM
- Ray
- AWS
- RLHF
Benefits
Competitive Pay
- Equity packages
Generous Parental Leave
- Inclusive parental leave policy
Additional Allowances
- Annual Vibe & Thrive allowance
Workation & Sabbatical
- Flexible leave options
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Canva and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.
About the Company
Canva
Industry
IT
Description
The company is a fast-growing platform that redefines how the world experiences design.