Your personal AI career agent
HPC and AI Software Architect(m/w/x)
Optimizing distributed AI training and inference systems, enhancing communication libraries (NCCL, UCX, UCC) and co-designing hardware for data movement acceleration for AI/VR solutions. Ph.D. or equivalent industry experience in computer science, with 2+ years in high-performance data movement or distributed computing, required. Direct impact on groundbreaking AI/VR/Autonomous Vehicle solutions.
Requirements
- Ph.D. or equivalent industry experience in computer science, computer engineering, or a closely related field
- 2+ years of experience in systems programming, parallel or distributed computing, or high-performance data movement
- Strong programming background in C++, Python, and ideally CUDA or other GPU programming models
- Practical experience with AI frameworks (e.g., PyTorch, TensorFlow) and familiarity with communication libraries
- Experience in designing or optimizing software for high-throughput, low-latency systems
- Strong collaboration skills in a multi-national, interdisciplinary environment
- Expertise with NCCL, Gloo, UCX, or similar libraries used in distributed AI workloads
- Background in networking and communication protocols, RDMA, collective communications, or accelerator-aware networking
- Deep understanding of large model training, inference serving at scale, and associated communication bottlenecks
- Knowledge of quantization, tensor/activation fusion, or memory optimization for inference
- Familiarity with infrastructure for deployment of LLMs or transformer-based models, including sharding, pipelining, or hybrid parallelism
Tasks
- Design and prototype scalable software systems for distributed AI training and inference
- Optimize throughput, latency, and memory efficiency
- Develop and evaluate enhancements to communication libraries like NCCL, UCX, and UCC
- Collaborate with AI framework teams to improve communication backend integration and performance
- Co-design hardware features to accelerate data movement for inference and model serving
- Contribute to the evolution of runtime systems and AI-specific protocol layers
Work Experience
- 2 years
Education
- Doctoral / PhD
Languages
- English – Business Fluent
Tools & Technologies
- C++
- Python
- CUDA
- PyTorch
- TensorFlow
- NCCL
- Gloo
- UCX
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
Not a perfect match?
- NVIDIA Switzerland AGFull-timeOn-siteSeniorZürich
- NVIDIA Switzerland AG
HPC and AI Software Architecture Intern(m/w/x)
Full-timeInternshipOn-siteZürich - NVIDIA
Senior GPU Networking Architect(m/w/x)
Full-timeOn-siteSeniorZürich - NVIDIA Switzerland AG
Principal Software Architect, GPU Networking Research(m/w/x)
Full-timeOn-siteSeniorZürich - CH01 NVIDIA Switzerland AG
Senior System Software Engineer, NCCL - Partner Enablement(m/w/x)
Full-timeOn-siteSeniorZürich
HPC and AI Software Architect(m/w/x)
Optimizing distributed AI training and inference systems, enhancing communication libraries (NCCL, UCX, UCC) and co-designing hardware for data movement acceleration for AI/VR solutions. Ph.D. or equivalent industry experience in computer science, with 2+ years in high-performance data movement or distributed computing, required. Direct impact on groundbreaking AI/VR/Autonomous Vehicle solutions.
Requirements
- Ph.D. or equivalent industry experience in computer science, computer engineering, or a closely related field
- 2+ years of experience in systems programming, parallel or distributed computing, or high-performance data movement
- Strong programming background in C++, Python, and ideally CUDA or other GPU programming models
- Practical experience with AI frameworks (e.g., PyTorch, TensorFlow) and familiarity with communication libraries
- Experience in designing or optimizing software for high-throughput, low-latency systems
- Strong collaboration skills in a multi-national, interdisciplinary environment
- Expertise with NCCL, Gloo, UCX, or similar libraries used in distributed AI workloads
- Background in networking and communication protocols, RDMA, collective communications, or accelerator-aware networking
- Deep understanding of large model training, inference serving at scale, and associated communication bottlenecks
- Knowledge of quantization, tensor/activation fusion, or memory optimization for inference
- Familiarity with infrastructure for deployment of LLMs or transformer-based models, including sharding, pipelining, or hybrid parallelism
Tasks
- Design and prototype scalable software systems for distributed AI training and inference
- Optimize throughput, latency, and memory efficiency
- Develop and evaluate enhancements to communication libraries like NCCL, UCX, and UCC
- Collaborate with AI framework teams to improve communication backend integration and performance
- Co-design hardware features to accelerate data movement for inference and model serving
- Contribute to the evolution of runtime systems and AI-specific protocol layers
Work Experience
- 2 years
Education
- Doctoral / PhD
Languages
- English – Business Fluent
Tools & Technologies
- C++
- Python
- CUDA
- PyTorch
- TensorFlow
- NCCL
- Gloo
- UCX
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
About the Company
NVIDIA
Industry
IT
Description
The company is developing groundbreaking solutions in Virtual Reality, Artificial Intelligence, Deep Learning, and Autonomous Vehicles.
Not a perfect match?
- NVIDIA Switzerland AG
Senior HPC and AI Network Software Architect(m/w/x)
Full-timeOn-siteSeniorZürich - NVIDIA Switzerland AG
HPC and AI Software Architecture Intern(m/w/x)
Full-timeInternshipOn-siteZürich - NVIDIA
Senior GPU Networking Architect(m/w/x)
Full-timeOn-siteSeniorZürich - NVIDIA Switzerland AG
Principal Software Architect, GPU Networking Research(m/w/x)
Full-timeOn-siteSeniorZürich - CH01 NVIDIA Switzerland AG
Senior System Software Engineer, NCCL - Partner Enablement(m/w/x)
Full-timeOn-siteSeniorZürich