Your personal AI career agent
Principal Site Reliability Engineer(m/w/x)
Leading SRE initiatives with computational and ML scientists at a global healthcare company. Deep cloud and AI/ML systems expertise required. Collaboration with DDC colleagues, on-premises infrastructure management.
Requirements
- PhD in Computer Science or related field with 2-7 years experience, or MS with 5-10 years experience, or BS with 7-12 years experience
- 5+ years leading technical operations in on-premises and cloud environments
- Deep understanding of on-premises infrastructure, cloud ecosystems (Kubernetes-based AWS), and AI/ML systems
- Ability to manage technical risks and ensure system reliability and scalability
- Strategic thinking and driving long-term optimization with urgency
- Experience in reducing tech debt, consolidating platforms, and deprecating legacy solutions
- Proven experience mentoring diverse teams
- Fostering a culture of collaboration, accountability, and continuous learning
- Strong oral and written communication skills
- Ability to engage stakeholders, provide clear direction, and navigate complex organizational structures
- Staying updated on emerging technologies and industry best practices
- Guiding technical decision-making
Tasks
- Join TSO Leadership team
- Collaborate with DDC colleagues
- Work with Computational Scientists
- Work with ML Scientists
- Work with Research Scientists
- Lead and deliver enterprise-scale technical operations
- Manage operations for Applications and Software
- Contribute as an SRE for Infrastructure
- Integrate expertise across on-premises operations
- Integrate expertise across cloud services
- Integrate expertise across AI/ML technologies
- Ensure high performance and user enablement
- Manage relationships with external vendors
- Develop strategic plans aligned with organizational goals
- Integrate managed services
- Leverage AI as a key accelerator
- Negotiate vendor contracts
- Ensure adherence to SLAs
- Apply software engineering principles to operations
- Scale and maintain highly reliable production systems
- Balance rapid feature releases with system stability
- Establish service level metrics (SLA, SLO, SLI)
- Implement error budgets
- Drive continuous improvement and innovation
- Ensure security standards adherence
- Ensure regulatory compliance
- Mitigate risks to safeguard systems and data
- Champion user onboarding
- Provide user training
- Offer technical support
- Enhance system usability
- Enhance troubleshooting capabilities
- Promote AI/ML adoption
- Foster a collaborative team culture
- Mentor team members
- Standardize processes
- Align with gCS values
Work Experience
- 2 - 12 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Kubernetes
- AWS
- AI/ML
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
Not a perfect match?
- RocheFull-timeOn-siteManagementBasel
- 1201 F. Hoffmann-La Roche AG
Principal Software Engineer - Automation Core - Lab Automation(m/w/x)
Full-timeOn-siteSeniorBasel - Roche
Senior Software Engineer / Principal Software Engineer(m/w/x)
Full-timeOn-siteSeniorBasel - Roche
Principal Data Innovation Specialist(m/w/x)
Full-timeOn-siteSeniorBasel - 1201 F. Hoffmann-La Roche AG
Global Clinical Operations Excellence Lead - Study Systems Lead(m/w/x)
Full-timeOn-siteSeniorBasel
Principal Site Reliability Engineer(m/w/x)
Leading SRE initiatives with computational and ML scientists at a global healthcare company. Deep cloud and AI/ML systems expertise required. Collaboration with DDC colleagues, on-premises infrastructure management.
Requirements
- PhD in Computer Science or related field with 2-7 years experience, or MS with 5-10 years experience, or BS with 7-12 years experience
- 5+ years leading technical operations in on-premises and cloud environments
- Deep understanding of on-premises infrastructure, cloud ecosystems (Kubernetes-based AWS), and AI/ML systems
- Ability to manage technical risks and ensure system reliability and scalability
- Strategic thinking and driving long-term optimization with urgency
- Experience in reducing tech debt, consolidating platforms, and deprecating legacy solutions
- Proven experience mentoring diverse teams
- Fostering a culture of collaboration, accountability, and continuous learning
- Strong oral and written communication skills
- Ability to engage stakeholders, provide clear direction, and navigate complex organizational structures
- Staying updated on emerging technologies and industry best practices
- Guiding technical decision-making
Tasks
- Join TSO Leadership team
- Collaborate with DDC colleagues
- Work with Computational Scientists
- Work with ML Scientists
- Work with Research Scientists
- Lead and deliver enterprise-scale technical operations
- Manage operations for Applications and Software
- Contribute as an SRE for Infrastructure
- Integrate expertise across on-premises operations
- Integrate expertise across cloud services
- Integrate expertise across AI/ML technologies
- Ensure high performance and user enablement
- Manage relationships with external vendors
- Develop strategic plans aligned with organizational goals
- Integrate managed services
- Leverage AI as a key accelerator
- Negotiate vendor contracts
- Ensure adherence to SLAs
- Apply software engineering principles to operations
- Scale and maintain highly reliable production systems
- Balance rapid feature releases with system stability
- Establish service level metrics (SLA, SLO, SLI)
- Implement error budgets
- Drive continuous improvement and innovation
- Ensure security standards adherence
- Ensure regulatory compliance
- Mitigate risks to safeguard systems and data
- Champion user onboarding
- Provide user training
- Offer technical support
- Enhance system usability
- Enhance troubleshooting capabilities
- Promote AI/ML adoption
- Foster a collaborative team culture
- Mentor team members
- Standardize processes
- Align with gCS values
Work Experience
- 2 - 12 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Kubernetes
- AWS
- AI/ML
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
About the Company
Roche
Industry
Pharmaceuticals
Description
The company is dedicated to advancing science and ensuring access to healthcare for everyone.
Not a perfect match?
- Roche
DSX Lead(m/w/x)
Full-timeOn-siteManagementBasel - 1201 F. Hoffmann-La Roche AG
Principal Software Engineer - Automation Core - Lab Automation(m/w/x)
Full-timeOn-siteSeniorBasel - Roche
Senior Software Engineer / Principal Software Engineer(m/w/x)
Full-timeOn-siteSeniorBasel - Roche
Principal Data Innovation Specialist(m/w/x)
Full-timeOn-siteSeniorBasel - 1201 F. Hoffmann-La Roche AG
Global Clinical Operations Excellence Lead - Study Systems Lead(m/w/x)
Full-timeOn-siteSeniorBasel