Skip to content
Neuer Job?Nejo!

Dein persönlicher KI-Karriere-Agent

TETether Operations Limited

AI Research Engineer - Kernel & Inference Optimization(m/w/x)

Zürich
VollzeitRemoteSenior
AI/ML

Optimizing AI model serving pipelines for low-latency, high-throughput transactions. PhD in NLP/ML and custom compute shader experience required. Focus on Metal Shading Language (MSL) implementation.

Anforderungen

  • Degree in Computer Science or related field
  • PhD in NLP, Machine Learning, or related field
  • Solid track record in AI R&D with publications
  • Knowledge of Metal Shading Language (MSL)
  • Comfortable writing custom compute shaders
  • Proven experience in low-level kernel optimizations
  • Proven experience in inference optimization on mobile devices
  • Contributions leading to measurable improvements in inference latency, throughput, and memory footprint
  • Deep understanding of modern model serving architectures
  • Deep understanding of inference optimization techniques
  • Strong expertise in writing GPU kernels for mobile devices
  • Deep understanding of model serving frameworks and engines
  • Practical experience in developing end-to-end inference pipelines
  • Ability to apply empirical research to model serving challenges
  • Proficient in designing robust evaluation frameworks
  • Designing and optimizing high-performance inference engines
  • Experience with Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism
  • Deep understanding of Diffusion Models math and structure
  • Deep understanding of Vision Transformers math and structure
  • Understanding of Pruning
  • Understanding of Quantization
  • Understanding of Flash attention
  • Understanding of KV Cache
  • Understanding of Speculative Decoding (Eagle)

Aufgaben

  • Drive innovation in model serving and inference architectures
  • Optimize model deployment and inference strategies
  • Design and deploy state-of-the-art model serving pipelines
  • Ensure high throughput and low latency in model serving
  • Optimize memory usage in model serving pipelines
  • Establish clear performance targets for latency and memory
  • Build and run controlled inference tests
  • Monitor key performance indicators in production
  • Document and validate performance across platforms
  • Identify and prepare high-quality test datasets
  • Set criteria for evaluating model performance
  • Analyze computational efficiency and diagnose bottlenecks
  • Address suboptimal batch processing and network delays
  • Optimize serving infrastructure for scalability and reliability
  • Integrate optimized serving frameworks into production
  • Define success metrics for real-world performance
  • Ensure continuous monitoring and iterative refinements

Berufserfahrung

  • ca. 4 - 6 Jahre

Ausbildung

  • Bachelor-Abschluss

Sprachen

  • Englischverhandlungssicher

Tools & Technologien

  • Metal Shading Language (MSL)
  • Compute shaders
  • GPU kernels
  • Diffusion Models
  • Vision Transformers
  • Pruning
  • Quantization
  • Flash attention
  • KV Cache
  • Speculative Decoding (Eagle)
  • Tensor Parallelism
  • Pipeline Parallelism
  • Expert Parallelism
Die Originalanzeige dieses Stellenangebotes in der aktuellsten Version findest du hier. Nejo hat diesen Job automatisch von der Website des Unternehmens Tether Operations Limited erfasst und die Informationen auf Nejo mit Hilfe von KI für dich aufbereitet. Trotz sorgfältiger Analyse können einzelne Informationen unvollständig oder ungenau sein. Bitte prüfe immer alle Angaben in der Originalanzeige! Inhalte und Urheberrechte der Originalanzeige liegen beim ausschreibenden Unternehmen.

Gefällt dir diese Stelle?

Beta

Dein Career Agent findet täglich ähnliche Jobs für dich.


  • Tether Operations Limited

    .AI Research Engineer (Model Compression & Quantization)(m/w/x)

    VollzeitRemoteSenior
    Zürich
  • Anthropic

    Research Engineer / Research Scientist, Pre-training(m/w/x)

    Vollzeitmit HomeofficeBerufserfahren
    Zürich
    ab CHF 280.000 - 680.000 / Jahr
  • ANYbotics

    Senior AI Research Engineer, Visual Perception(m/w/x)

    Vollzeitmit HomeofficeSenior
    Zürich
  • Mistral

    AI Scientist(m/w/x)

    Vollzeitmit HomeofficeKeine Angabe
    Zürich
  • Anthropic

    Research Engineer, Production Model Post Training(m/w/x)

    Vollzeitmit HomeofficeBerufserfahren
    Zürich
Alle 100+ ähnlichen Jobs ansehen