Skip to content
New Job?Nejo!

Your personal AI career agent

TETether Operations Limited

AI Research Engineer - Kernel & Inference Optimization(m/w/x)

Zürich
Full-timeRemoteSenior
AI/ML

Optimizing AI model serving pipelines for low-latency, high-throughput transactions. PhD in NLP/ML and custom compute shader experience required. Focus on Metal Shading Language (MSL) implementation.

Requirements

  • Degree in Computer Science or related field
  • PhD in NLP, Machine Learning, or related field
  • Solid track record in AI R&D with publications
  • Knowledge of Metal Shading Language (MSL)
  • Comfortable writing custom compute shaders
  • Proven experience in low-level kernel optimizations
  • Proven experience in inference optimization on mobile devices
  • Contributions leading to measurable improvements in inference latency, throughput, and memory footprint
  • Deep understanding of modern model serving architectures
  • Deep understanding of inference optimization techniques
  • Strong expertise in writing GPU kernels for mobile devices
  • Deep understanding of model serving frameworks and engines
  • Practical experience in developing end-to-end inference pipelines
  • Ability to apply empirical research to model serving challenges
  • Proficient in designing robust evaluation frameworks
  • Designing and optimizing high-performance inference engines
  • Experience with Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism
  • Deep understanding of Diffusion Models math and structure
  • Deep understanding of Vision Transformers math and structure
  • Understanding of Pruning
  • Understanding of Quantization
  • Understanding of Flash attention
  • Understanding of KV Cache
  • Understanding of Speculative Decoding (Eagle)

Tasks

  • Drive innovation in model serving and inference architectures
  • Optimize model deployment and inference strategies
  • Design and deploy state-of-the-art model serving pipelines
  • Ensure high throughput and low latency in model serving
  • Optimize memory usage in model serving pipelines
  • Establish clear performance targets for latency and memory
  • Build and run controlled inference tests
  • Monitor key performance indicators in production
  • Document and validate performance across platforms
  • Identify and prepare high-quality test datasets
  • Set criteria for evaluating model performance
  • Analyze computational efficiency and diagnose bottlenecks
  • Address suboptimal batch processing and network delays
  • Optimize serving infrastructure for scalability and reliability
  • Integrate optimized serving frameworks into production
  • Define success metrics for real-world performance
  • Ensure continuous monitoring and iterative refinements

Work Experience

  • approx. 4 - 6 years

Education

  • Bachelor's degree

Languages

  • EnglishBusiness Fluent

Tools & Technologies

  • Metal Shading Language (MSL)
  • Compute shaders
  • GPU kernels
  • Diffusion Models
  • Vision Transformers
  • Pruning
  • Quantization
  • Flash attention
  • KV Cache
  • Speculative Decoding (Eagle)
  • Tensor Parallelism
  • Pipeline Parallelism
  • Expert Parallelism
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Tether Operations Limited and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

Like this job?

Beta

Your Career Agent finds similar jobs for you every day.


  • Tether Operations Limited

    .AI Research Engineer (Model Compression & Quantization)(m/w/x)

    Full-timeRemoteSenior
    Zürich
  • Anthropic

    Research Engineer / Research Scientist, Pre-training(m/w/x)

    Full-timeWith HomeofficeExperienced
    Zürich
    from CHF 280,000 - 680,000 / year
  • ANYbotics

    Senior AI Research Engineer, Visual Perception(m/w/x)

    Full-timeWith HomeofficeSenior
    Zürich
  • Mistral

    AI Scientist(m/w/x)

    Full-timeWith HomeofficeNot specified
    Zürich
  • Anthropic

    Research Engineer, Production Model Post Training(m/w/x)

    Full-timeWith HomeofficeExperienced
    Zürich
View all 100+ similar jobs

Nejo is an AI – results may be incomplete or contain mistakes