Your personal AI career agent
Staff/Senior AWS Cloud Platform Engineer(m/w/x)
Optimizing incident management for cloud-native IoT SuperNetwork on AWS. Hands-on experience with observability tools (Prometheus, Mimir, Grafana, Loki) in SaaS/telecom required. Focus on mission-critical IoT use cases for a global platform.
Requirements
- Proven experience as (Site) Reliability Engineer or similar role in SaaS and/or telecom company
- Hands-on experience with observability tools (Prometheus, Mimir, Grafana, Loki, CloudWatch, Grafana IRM, Rootly)
- Experience in establishing and managing incident management processes
- Understanding of incident management frameworks and best practices
- Extensive experience with AWS cloud services (EC2, S3, RDS, Lambda, CloudWatch)
- Expert skills with modern infrastructure tooling (Kubernetes, Terraform, GitHub Actions, Jenkins)
- Good understanding of modern development tooling (microservices architecture, 12-factor applications, Docker)
- Advanced documentation skills
- Exceptional problem-solving and critical thinking
- Passion for enhancing development experiences
- Ability to work independently and as part of a team
- Knowledge of networking protocols and telecom systems
- Knowledge of secure software development
- Familiarity with programming languages (Python, Go, or Java)
- AWS Certification (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect)
Tasks
- Lead end-to-end incident management.
- Optimize incident management processes.
- Ensure timely incident detection and resolution.
- Document incidents thoroughly.
- Coordinate cross-functional incident teams.
- Conduct post-mortems and root cause analyses.
- Drive continuous workflow improvements.
- Design and implement observability frameworks.
- Continuously improve observability frameworks.
- Develop dashboards, alerts, and metrics strategies.
- Develop logging strategies.
- Monitor service health.
- Proactively detect anomalies.
- Support issue resolution.
- Ensure cost-optimized platform performance.
- Partner with cross-functional teams.
- Implement observability best practices.
- Provide training and guidance on tools.
- Leverage metrics data.
- Drive engineering priorities.
- Design resilient AWS cloud infrastructure.
- Build resilient AWS cloud infrastructure.
- Maintain resilient AWS cloud infrastructure.
- Implement security best practices.
- Implement scalability best practices.
- Implement cost optimization best practices.
- Ensure high availability and disaster recovery.
- Ensure robust platform pipelines.
- Ensure robust shared infrastructure.
- Ensure robust application services.
Work Experience
- approx. 4 - 6 years
Education
- Vocational certificationOR
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Prometheus
- Mimir
- Grafana
- Loki
- CloudWatch
- Grafana IRM
- Rootly
- AWS
- EC2
- S3
- RDS
- Lambda
- Kubernetes
- Terraform
- GitHub Actions
- Jenkins
- Docker
- Python
- Go
- Java
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
Not a perfect match?
- Trade RepublicFull-timeOn-siteSeniorBerlin
- Trade Republic
Cloud Platform Tech Lead(m/w/x)
Full-timeOn-siteSeniorBerlin - Workato
Senior Infrastructure Engineer - Observability(m/w/x)
Full-timeOn-siteSeniorBerlin, Frankfurt am Main, München - Forto
Senior Site Reliability Engineer(m/w/x)
Full-timeOn-siteSeniorBerlin - Trade Republic
(Senior) Platform Engineer (Go)(m/w/x)
Full-timeOn-siteExperiencedBerlin
Staff/Senior AWS Cloud Platform Engineer(m/w/x)
Optimizing incident management for cloud-native IoT SuperNetwork on AWS. Hands-on experience with observability tools (Prometheus, Mimir, Grafana, Loki) in SaaS/telecom required. Focus on mission-critical IoT use cases for a global platform.
Requirements
- Proven experience as (Site) Reliability Engineer or similar role in SaaS and/or telecom company
- Hands-on experience with observability tools (Prometheus, Mimir, Grafana, Loki, CloudWatch, Grafana IRM, Rootly)
- Experience in establishing and managing incident management processes
- Understanding of incident management frameworks and best practices
- Extensive experience with AWS cloud services (EC2, S3, RDS, Lambda, CloudWatch)
- Expert skills with modern infrastructure tooling (Kubernetes, Terraform, GitHub Actions, Jenkins)
- Good understanding of modern development tooling (microservices architecture, 12-factor applications, Docker)
- Advanced documentation skills
- Exceptional problem-solving and critical thinking
- Passion for enhancing development experiences
- Ability to work independently and as part of a team
- Knowledge of networking protocols and telecom systems
- Knowledge of secure software development
- Familiarity with programming languages (Python, Go, or Java)
- AWS Certification (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect)
Tasks
- Lead end-to-end incident management.
- Optimize incident management processes.
- Ensure timely incident detection and resolution.
- Document incidents thoroughly.
- Coordinate cross-functional incident teams.
- Conduct post-mortems and root cause analyses.
- Drive continuous workflow improvements.
- Design and implement observability frameworks.
- Continuously improve observability frameworks.
- Develop dashboards, alerts, and metrics strategies.
- Develop logging strategies.
- Monitor service health.
- Proactively detect anomalies.
- Support issue resolution.
- Ensure cost-optimized platform performance.
- Partner with cross-functional teams.
- Implement observability best practices.
- Provide training and guidance on tools.
- Leverage metrics data.
- Drive engineering priorities.
- Design resilient AWS cloud infrastructure.
- Build resilient AWS cloud infrastructure.
- Maintain resilient AWS cloud infrastructure.
- Implement security best practices.
- Implement scalability best practices.
- Implement cost optimization best practices.
- Ensure high availability and disaster recovery.
- Ensure robust platform pipelines.
- Ensure robust shared infrastructure.
- Ensure robust application services.
Work Experience
- approx. 4 - 6 years
Education
- Vocational certificationOR
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- Prometheus
- Mimir
- Grafana
- Loki
- CloudWatch
- Grafana IRM
- Rootly
- AWS
- EC2
- S3
- RDS
- Lambda
- Kubernetes
- Terraform
- GitHub Actions
- Jenkins
- Docker
- Python
- Go
- Java
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
About the Company
emnify
Industry
IT
Description
The company enhances innovative components, bridging telco languages and internet protocols.
Not a perfect match?
- Trade Republic
Observability Tech Lead(m/w/x)
Full-timeOn-siteSeniorBerlin - Trade Republic
Cloud Platform Tech Lead(m/w/x)
Full-timeOn-siteSeniorBerlin - Workato
Senior Infrastructure Engineer - Observability(m/w/x)
Full-timeOn-siteSeniorBerlin, Frankfurt am Main, München - Forto
Senior Site Reliability Engineer(m/w/x)
Full-timeOn-siteSeniorBerlin - Trade Republic
(Senior) Platform Engineer (Go)(m/w/x)
Full-timeOn-siteExperiencedBerlin