Your personal AI career agent
.Member of Technical Staff - Model Serving / API Backend Engineer(m/w/x)
Deploying generative AI models into production inference services on GPU infrastructure. Experience building and operating systems at scale required. Reasonable travel costs covered.
Requirements
- Experience building and operating systems at scale
- Understanding of production systems vs. prototypes
- Comfort navigating ambiguity and making tradeoffs
- System improvement under real-world constraints
- Strong judgment on performance, reliability, cost tradeoffs
- Experience scaling APIs or ML systems under load
- Comfort in fast-moving, research-adjacent environments
- Ownership from system design through debugging and deployment
- Building and operating ML inference services in production
- Designing scalable API architectures with async processing
- Optimizing GPU workloads (batching, quantization, compilation, CUDA)
- Managing distributed systems and task queues under variable load
- Implementing monitoring and observability for production ML systems
- Debugging performance bottlenecks across model, infrastructure, network
- Experience with real-time or low-latency inference systems
- Experience with TensorRT, reduced precision, layer fusion, model compilation
- Frontend demo tooling (Streamlit, Gradio, React)
- CI/CD and automated testing for ML systems
- Security best practices for API and model serving
Tasks
- Turn research checkpoints into production-ready inference services
- Design and maintain high-performance APIs
- Optimize inference latency and throughput on GPU infrastructure
- Build scalable serving architectures for unpredictable traffic
- Improve reliability, monitoring, and observability
- Prototype and ship demos showcasing new capabilities quickly
- Collaborate with researchers to expedite idea-to-endpoint processes
Work Experience
- approx. 1 - 4 years
Education
- Vocational certificationOR
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- CUDA
- TensorRT
- Streamlit
- Gradio
- React
Benefits
Additional Allowances
- Reasonable travel costs covered
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
Not a perfect match?
- Black Forest LabsFull-timeOn-siteSeniorFreiburg im Breisgau
- Black Forest Labs
Member of Technical Staff - Training Cluster Engineer(m/w/x)
Full-timeOn-siteExperiencedFreiburg im Breisgau - Prior Labs
ML Engineer, Cloud Platform(m/w/x)
Full-timeOn-siteExperiencedBerlin, Freiburg im Breisgaufrom 140,000 / year - Prior Labs
Senior ML Infrastructure Engineer(m/w/x)
Full-timeOn-siteSeniorFreiburg im Breisgau, Berlin - Prior Labs
ML Engineer, Foundation Model(m/w/x)
Full-timeOn-siteExperiencedBerlin, Freiburg im Breisgaufrom 120,000 / year
.Member of Technical Staff - Model Serving / API Backend Engineer(m/w/x)
Deploying generative AI models into production inference services on GPU infrastructure. Experience building and operating systems at scale required. Reasonable travel costs covered.
Requirements
- Experience building and operating systems at scale
- Understanding of production systems vs. prototypes
- Comfort navigating ambiguity and making tradeoffs
- System improvement under real-world constraints
- Strong judgment on performance, reliability, cost tradeoffs
- Experience scaling APIs or ML systems under load
- Comfort in fast-moving, research-adjacent environments
- Ownership from system design through debugging and deployment
- Building and operating ML inference services in production
- Designing scalable API architectures with async processing
- Optimizing GPU workloads (batching, quantization, compilation, CUDA)
- Managing distributed systems and task queues under variable load
- Implementing monitoring and observability for production ML systems
- Debugging performance bottlenecks across model, infrastructure, network
- Experience with real-time or low-latency inference systems
- Experience with TensorRT, reduced precision, layer fusion, model compilation
- Frontend demo tooling (Streamlit, Gradio, React)
- CI/CD and automated testing for ML systems
- Security best practices for API and model serving
Tasks
- Turn research checkpoints into production-ready inference services
- Design and maintain high-performance APIs
- Optimize inference latency and throughput on GPU infrastructure
- Build scalable serving architectures for unpredictable traffic
- Improve reliability, monitoring, and observability
- Prototype and ship demos showcasing new capabilities quickly
- Collaborate with researchers to expedite idea-to-endpoint processes
Work Experience
- approx. 1 - 4 years
Education
- Vocational certificationOR
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- CUDA
- TensorRT
- Streamlit
- Gradio
- React
Benefits
Additional Allowances
- Reasonable travel costs covered
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
About the Company
Black Forest Labs
Industry
IT
Description
The company advances generative deep learning for media, creating models that transform ideas into images and videos.
Not a perfect match?
- Black Forest Labs
Member of Technical Staff - Large scale data infrastructure(m/w/x)
Full-timeOn-siteSeniorFreiburg im Breisgau - Black Forest Labs
Member of Technical Staff - Training Cluster Engineer(m/w/x)
Full-timeOn-siteExperiencedFreiburg im Breisgau - Prior Labs
ML Engineer, Cloud Platform(m/w/x)
Full-timeOn-siteExperiencedBerlin, Freiburg im Breisgaufrom 140,000 / year - Prior Labs
Senior ML Infrastructure Engineer(m/w/x)
Full-timeOn-siteSeniorFreiburg im Breisgau, Berlin - Prior Labs
ML Engineer, Foundation Model(m/w/x)
Full-timeOn-siteExperiencedBerlin, Freiburg im Breisgaufrom 120,000 / year