The AI Job Search Engine
HPC and AI Software Architect(m/w/x)
Optimizing distributed AI training and inference systems, enhancing communication libraries (NCCL, UCX, UCC) and co-designing hardware for data movement acceleration for AI/VR solutions. Ph.D. or equivalent industry experience in computer science, with 2+ years in high-performance data movement or distributed computing, required. Direct impact on groundbreaking AI/VR/Autonomous Vehicle solutions.
Requirements
- Ph.D. or equivalent industry experience in computer science, computer engineering, or a closely related field
- 2+ years of experience in systems programming, parallel or distributed computing, or high-performance data movement
- Strong programming background in C++, Python, and ideally CUDA or other GPU programming models
- Practical experience with AI frameworks (e.g., PyTorch, TensorFlow) and familiarity with communication libraries
- Experience in designing or optimizing software for high-throughput, low-latency systems
- Strong collaboration skills in a multi-national, interdisciplinary environment
- Expertise with NCCL, Gloo, UCX, or similar libraries used in distributed AI workloads
- Background in networking and communication protocols, RDMA, collective communications, or accelerator-aware networking
- Deep understanding of large model training, inference serving at scale, and associated communication bottlenecks
- Knowledge of quantization, tensor/activation fusion, or memory optimization for inference
- Familiarity with infrastructure for deployment of LLMs or transformer-based models, including sharding, pipelining, or hybrid parallelism
Tasks
- Design and prototype scalable software systems for distributed AI training and inference
- Optimize throughput, latency, and memory efficiency
- Develop and evaluate enhancements to communication libraries like NCCL, UCX, and UCC
- Collaborate with AI framework teams to improve communication backend integration and performance
- Co-design hardware features to accelerate data movement for inference and model serving
- Contribute to the evolution of runtime systems and AI-specific protocol layers
Work Experience
- 2 years
Education
- Doctoral / PhD
Languages
- English – Business Fluent
Tools & Technologies
- C++
- Python
- CUDA
- PyTorch
- TensorFlow
- NCCL
- Gloo
- UCX
Not a perfect match?
- NVIDIA Switzerland AGFull-timeInternshipOn-siteZürich
- NVIDIA Switzerland AG
Principal Software Architect, GPU Networking Research(m/w/x)
Full-timeOn-siteSeniorZürich - NVIDIA
Deep Learning Solutions Architect – Inference Optimization(m/w/x)
Full-timeOn-siteSeniorZürich - NVIDIA Switzerland AG
Research Scientist, ML Systems - PhD New College Grad(m/w/x)
Full-timeOn-siteExperiencedZürich - Hewlett Packard Enterprise
Research Engineer HPC/AI Focus Daedalus System(m/w/x)
Full-timeOn-siteExperiencedBasel, Zürich
HPC and AI Software Architect(m/w/x)
Optimizing distributed AI training and inference systems, enhancing communication libraries (NCCL, UCX, UCC) and co-designing hardware for data movement acceleration for AI/VR solutions. Ph.D. or equivalent industry experience in computer science, with 2+ years in high-performance data movement or distributed computing, required. Direct impact on groundbreaking AI/VR/Autonomous Vehicle solutions.
Requirements
- Ph.D. or equivalent industry experience in computer science, computer engineering, or a closely related field
- 2+ years of experience in systems programming, parallel or distributed computing, or high-performance data movement
- Strong programming background in C++, Python, and ideally CUDA or other GPU programming models
- Practical experience with AI frameworks (e.g., PyTorch, TensorFlow) and familiarity with communication libraries
- Experience in designing or optimizing software for high-throughput, low-latency systems
- Strong collaboration skills in a multi-national, interdisciplinary environment
- Expertise with NCCL, Gloo, UCX, or similar libraries used in distributed AI workloads
- Background in networking and communication protocols, RDMA, collective communications, or accelerator-aware networking
- Deep understanding of large model training, inference serving at scale, and associated communication bottlenecks
- Knowledge of quantization, tensor/activation fusion, or memory optimization for inference
- Familiarity with infrastructure for deployment of LLMs or transformer-based models, including sharding, pipelining, or hybrid parallelism
Tasks
- Design and prototype scalable software systems for distributed AI training and inference
- Optimize throughput, latency, and memory efficiency
- Develop and evaluate enhancements to communication libraries like NCCL, UCX, and UCC
- Collaborate with AI framework teams to improve communication backend integration and performance
- Co-design hardware features to accelerate data movement for inference and model serving
- Contribute to the evolution of runtime systems and AI-specific protocol layers
Work Experience
- 2 years
Education
- Doctoral / PhD
Languages
- English – Business Fluent
Tools & Technologies
- C++
- Python
- CUDA
- PyTorch
- TensorFlow
- NCCL
- Gloo
- UCX
About the Company
NVIDIA
Industry
IT
Description
The company is developing groundbreaking solutions in Virtual Reality, Artificial Intelligence, Deep Learning, and Autonomous Vehicles.
Not a perfect match?
- NVIDIA Switzerland AG
HPC and AI Software Architecture Intern(m/w/x)
Full-timeInternshipOn-siteZürich - NVIDIA Switzerland AG
Principal Software Architect, GPU Networking Research(m/w/x)
Full-timeOn-siteSeniorZürich - NVIDIA
Deep Learning Solutions Architect – Inference Optimization(m/w/x)
Full-timeOn-siteSeniorZürich - NVIDIA Switzerland AG
Research Scientist, ML Systems - PhD New College Grad(m/w/x)
Full-timeOn-siteExperiencedZürich - Hewlett Packard Enterprise
Research Engineer HPC/AI Focus Daedalus System(m/w/x)
Full-timeOn-siteExperiencedBasel, Zürich