New Job?Nejo!

Your personal AI career agent

TETether Operations Limited

5h ago

AI Research Engineer - Kernel & Inference Optimization(m/w/x)

Zürich

Full-timeRemoteSenior

AI/ML

Nejo AI Summary

Apply now

Optimizing AI model serving pipelines for low-latency, high-throughput transactions. PhD in NLP/ML and custom compute shader experience required. Focus on Metal Shading Language (MSL) implementation.

Requirements

Degree in Computer Science or related field
PhD in NLP, Machine Learning, or related field
Solid track record in AI R&D with publications
Knowledge of Metal Shading Language (MSL)
Comfortable writing custom compute shaders
Proven experience in low-level kernel optimizations
Proven experience in inference optimization on mobile devices
Contributions leading to measurable improvements in inference latency, throughput, and memory footprint
Deep understanding of modern model serving architectures
Deep understanding of inference optimization techniques
Strong expertise in writing GPU kernels for mobile devices
Deep understanding of model serving frameworks and engines
Practical experience in developing end-to-end inference pipelines
Ability to apply empirical research to model serving challenges
Proficient in designing robust evaluation frameworks
Designing and optimizing high-performance inference engines
Experience with Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism
Deep understanding of Diffusion Models math and structure
Deep understanding of Vision Transformers math and structure
Understanding of Pruning
Understanding of Quantization
Understanding of Flash attention
Understanding of KV Cache
Understanding of Speculative Decoding (Eagle)

Tasks

Drive innovation in model serving and inference architectures
Optimize model deployment and inference strategies
Design and deploy state-of-the-art model serving pipelines
Ensure high throughput and low latency in model serving
Optimize memory usage in model serving pipelines
Establish clear performance targets for latency and memory
Build and run controlled inference tests
Monitor key performance indicators in production
Document and validate performance across platforms
Identify and prepare high-quality test datasets
Set criteria for evaluating model performance
Analyze computational efficiency and diagnose bottlenecks
Address suboptimal batch processing and network delays
Optimize serving infrastructure for scalability and reliability
Integrate optimized serving frameworks into production
Define success metrics for real-world performance
Ensure continuous monitoring and iterative refinements

Work Experience

approx. 4 - 6 years

Education

Bachelor's degree

Languages

English – Business Fluent

Tools & Technologies

Metal Shading Language (MSL)
Compute shaders
GPU kernels
Diffusion Models
Vision Transformers
Pruning
Quantization
Flash attention
KV Cache
Speculative Decoding (Eagle)
Tensor Parallelism
Pipeline Parallelism
Expert Parallelism

Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Tether Operations Limited and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

Like this job?

Beta

Your Career Agent finds similar jobs for you every day.

Not a perfect match?

100+ Similar Jobs in Zürich View all

Tether Operations Limited
.AI Research Engineer (Model Compression & Quantization)(m/w/x)
Full-timeRemoteSenior
Zürich
Anthropic
Research Engineer / Research Scientist, Pre-training(m/w/x)
Full-timeWith HomeofficeExperienced
Zürich
from CHF 280,000 - 680,000 / year
ANYbotics
Senior AI Research Engineer, Visual Perception(m/w/x)
Full-timeWith HomeofficeSenior
Zürich
Mistral
AI Scientist(m/w/x)
Full-timeWith HomeofficeNot specified
Zürich
Anthropic
Research Engineer, Production Model Post Training(m/w/x)
Full-timeWith HomeofficeExperienced
Zürich

View all 100+ similar jobs

TETether Operations Limited

5h ago