Skip to content
New Job?Nejo!

The AI Job Search Engine

EMemnify

Staff/Senior AWS Cloud Platform Engineer(m/w/x)

Berlin
Full-timeOn-siteSenior
AI/ML

Optimizing incident management for cloud-native IoT SuperNetwork on AWS. Hands-on experience with observability tools (Prometheus, Mimir, Grafana, Loki) in SaaS/telecom required. Focus on mission-critical IoT use cases for a global platform.

Requirements

  • Proven experience as (Site) Reliability Engineer or similar role in SaaS and/or telecom company
  • Hands-on experience with observability tools (Prometheus, Mimir, Grafana, Loki, CloudWatch, Grafana IRM, Rootly)
  • Experience in establishing and managing incident management processes
  • Understanding of incident management frameworks and best practices
  • Extensive experience with AWS cloud services (EC2, S3, RDS, Lambda, CloudWatch)
  • Expert skills with modern infrastructure tooling (Kubernetes, Terraform, GitHub Actions, Jenkins)
  • Good understanding of modern development tooling (microservices architecture, 12-factor applications, Docker)
  • Advanced documentation skills
  • Exceptional problem-solving and critical thinking
  • Passion for enhancing development experiences
  • Ability to work independently and as part of a team
  • Knowledge of networking protocols and telecom systems
  • Knowledge of secure software development
  • Familiarity with programming languages (Python, Go, or Java)
  • AWS Certification (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect)

Tasks

  • Lead end-to-end incident management.
  • Optimize incident management processes.
  • Ensure timely incident detection and resolution.
  • Document incidents thoroughly.
  • Coordinate cross-functional incident teams.
  • Conduct post-mortems and root cause analyses.
  • Drive continuous workflow improvements.
  • Design and implement observability frameworks.
  • Continuously improve observability frameworks.
  • Develop dashboards, alerts, and metrics strategies.
  • Develop logging strategies.
  • Monitor service health.
  • Proactively detect anomalies.
  • Support issue resolution.
  • Ensure cost-optimized platform performance.
  • Partner with cross-functional teams.
  • Implement observability best practices.
  • Provide training and guidance on tools.
  • Leverage metrics data.
  • Drive engineering priorities.
  • Design resilient AWS cloud infrastructure.
  • Build resilient AWS cloud infrastructure.
  • Maintain resilient AWS cloud infrastructure.
  • Implement security best practices.
  • Implement scalability best practices.
  • Implement cost optimization best practices.
  • Ensure high availability and disaster recovery.
  • Ensure robust platform pipelines.
  • Ensure robust shared infrastructure.
  • Ensure robust application services.

Work Experience

  • approx. 4 - 6 years

Education

  • Vocational certificationOR
  • Bachelor's degreeOR
  • Master's degree

Languages

  • EnglishBusiness Fluent

Tools & Technologies

  • Prometheus
  • Mimir
  • Grafana
  • Loki
  • CloudWatch
  • Grafana IRM
  • Rootly
  • AWS
  • EC2
  • S3
  • RDS
  • Lambda
  • Kubernetes
  • Terraform
  • GitHub Actions
  • Jenkins
  • Docker
  • Python
  • Go
  • Java
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of emnify and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

  • Trade Republic

    Staff Engineer – Cloud Platform(m/w/x)

    Full-timeOn-siteSenior
    Berlin
  • Workato

    Senior Infrastructure Engineer - Observability(m/w/x)

    Full-timeOn-siteSenior
    Berlin, Frankfurt am Main, München
  • Trade Republic

    (Senior) Platform Engineer (Go)(m/w/x)

    Full-timeOn-siteExperienced
    Berlin
  • SAP

    T2/T3 DevOps Engineer for Sovereign Cloud Onsite / ApeiroRA / EU AI Projects(m/w/x)

    Full-time/Part-timeOn-siteSenior
    Berlin, Garching bei München, Dresden, St. Leon-Rot
  • Trade Republic

    Senior Security Engineer - Cloud Security(m/w/x)

    Full-timeOn-siteSenior
    Berlin
View all 100+ similar jobs

Nejo is an AI – results may be incomplete or contain mistakes