New Job?Nejo!

Your personal AI career agent

EMemnify

2mo ago

Staff/Senior AWS Cloud Platform Engineer(m/w/x)

Berlin

Full-timeOn-siteSenior

AI/ML

Nejo AI Summary

Apply now

Optimizing incident management for cloud-native IoT SuperNetwork on AWS. Hands-on experience with observability tools (Prometheus, Mimir, Grafana, Loki) in SaaS/telecom required. Focus on mission-critical IoT use cases for a global platform.

Requirements

Proven experience as (Site) Reliability Engineer or similar role in SaaS and/or telecom company
Hands-on experience with observability tools (Prometheus, Mimir, Grafana, Loki, CloudWatch, Grafana IRM, Rootly)
Experience in establishing and managing incident management processes
Understanding of incident management frameworks and best practices
Extensive experience with AWS cloud services (EC2, S3, RDS, Lambda, CloudWatch)
Expert skills with modern infrastructure tooling (Kubernetes, Terraform, GitHub Actions, Jenkins)
Good understanding of modern development tooling (microservices architecture, 12-factor applications, Docker)
Advanced documentation skills
Exceptional problem-solving and critical thinking
Passion for enhancing development experiences
Ability to work independently and as part of a team
Knowledge of networking protocols and telecom systems
Knowledge of secure software development
Familiarity with programming languages (Python, Go, or Java)
AWS Certification (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect)

Tasks

Lead end-to-end incident management.
Optimize incident management processes.
Ensure timely incident detection and resolution.
Document incidents thoroughly.
Coordinate cross-functional incident teams.
Conduct post-mortems and root cause analyses.
Drive continuous workflow improvements.
Design and implement observability frameworks.
Continuously improve observability frameworks.
Develop dashboards, alerts, and metrics strategies.
Develop logging strategies.
Monitor service health.
Proactively detect anomalies.
Support issue resolution.
Ensure cost-optimized platform performance.
Partner with cross-functional teams.
Implement observability best practices.
Provide training and guidance on tools.
Leverage metrics data.
Drive engineering priorities.
Design resilient AWS cloud infrastructure.
Build resilient AWS cloud infrastructure.
Maintain resilient AWS cloud infrastructure.
Implement security best practices.
Implement scalability best practices.
Implement cost optimization best practices.
Ensure high availability and disaster recovery.
Ensure robust platform pipelines.
Ensure robust shared infrastructure.
Ensure robust application services.

Work Experience

approx. 4 - 6 years

Education

Vocational certificationOR
Bachelor's degreeOR
Master's degree

Languages

English – Business Fluent

Tools & Technologies

Prometheus
Mimir
Grafana
Loki
CloudWatch
Grafana IRM
Rootly
AWS
EC2
S3
RDS
Lambda
Kubernetes
Terraform
GitHub Actions
Jenkins
Docker
Python
Go
Java

Find the original job posting in its most current version here. Nejo automatically captured this job from the website of emnify and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

Like this job?

Beta

Your Career Agent finds similar jobs for you every day.

Not a perfect match?

100+ Similar Jobs in Berlin View all

1GLOBAL
Senior Site Reliability Engineer (SRE)(m/w/x)
Full-timeOn-siteSenior
Berlin
Sony Interactive Entertainment
Senior Platform Engineer(m/w/x)
Full-timeOn-siteSenior
Berlin
Trade Republic
Observability Tech Lead(m/w/x)
Full-timeOn-siteSenior
Berlin
Trade Republic
Cloud Platform Tech Lead(m/w/x)
Full-timeOn-siteSenior
Berlin
Yunex Traffic
Senior DevOps Developer(m/w/x)
Full-time/Part-timeOn-siteSenior
München, Leipzig, Augsburg, Essen, Berlin, Frankfurt am Main, Hamburg, Bremen, Dresden

View all 100+ similar jobs

EMemnify

2mo ago