Skip to content
New Job?Nejo!

Your personal AI career agent

CACanva

Senior Machine Learning Engineer - Multimodal Data(m/w/x)

Wien
Full-timeWith Home OfficeSenior
AI/ML

Building multimodal AI data pipelines for agent training at a design platform. Production-grade data pipeline and ML DevOps experience required. Equity packages, flexible leave options.

Requirements

  • Strong software engineering skills in Python
  • Experience building production-grade data pipelines
  • ML DevOps experience
  • Practical prompt engineering experience
  • Designing, testing, refining prompts for LLM/VLM outputs
  • Experience with ML data workflows
  • Large-scale data processing and loading
  • Data versioning experience
  • Format considerations for training (tokenization, batching, sharding)
  • Hands-on experience with data pipelines for large-scale distributed ML training
  • Familiarity with annotation tooling
  • Familiarity with human-in-the-loop data collection
  • Understanding of ML training requirements
  • Knowledge of "good data" for LLM/VLM fine-tuning
  • Anticipation of downstream issues
  • Experience loading/writing large datasets to/from cloud infrastructure (AWS)
  • Experience loading/writing large datasets to/from distributed storage systems
  • Strong communication skills
  • Ability to work with researchers to scope ambiguous problems
  • Ability to translate needs into actionable plans
  • Collaborative approach
  • Comfortable taking ownership
  • Comfortable iterating quickly
  • Experience with preference data collection for RLHF
  • Experience with reward modelling
  • Familiarity with multimodal data (image-text pairs, video, design assets)
  • Experience building synthetic data generation pipelines using LLMs
  • Background in data quality metrics
  • Background in monitoring systems
  • Contributions to dataset releases or benchmarks in ML community

Tasks

  • Design and build data pipelines for agent training
  • Collect, filter, deduplicate, format, and version data
  • Build and maintain infrastructure for efficient data loading, storage, and retrieval
  • Collaborate with research scientists to translate requirements into data specifications
  • Create evaluation datasets and benchmarks with researchers
  • Curate task distributions to identify real failure modes
  • Develop tooling for dataset construction
  • Implement human annotation workflows
  • Generate synthetic data
  • Collect preference data for RLHF/DPO-style training
  • Ensure data quality through validation frameworks
  • Monitor for data drift and contamination
  • Establish standards for trustworthy and reproducible datasets
  • Document datasets thoroughly
  • Record provenance, known limitations, intended use cases, and versioning history
  • Implement comprehensive test coverage for data pipelines and ML workflows
  • Conduct code reviews and refactoring
  • Establish engineering best practices
  • Identify data bottlenecks and propose solutions
  • Contribute to team roadmaps to unblock research velocity

Work Experience

  • approx. 4 - 6 years

Education

  • Bachelor's degreeOR
  • Master's degree

Languages

  • EnglishBusiness Fluent

Tools & Technologies

  • Python
  • ML DevOps
  • LLM
  • VLM
  • Ray
  • AWS
  • RLHF

Benefits

Competitive Pay

  • Equity packages

Generous Parental Leave

  • Inclusive parental leave policy

Additional Allowances

  • Annual Vibe & Thrive allowance

Workation & Sabbatical

  • Flexible leave options
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Canva and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

Nejo is an AI – results may be incomplete or contain mistakes