Skip to content
New Job?Nejo!

The AI Job Search Engine

WOWorkato

Senior Infrastructure Engineer - Observability(m/w/x)

Berlin, Frankfurt am Main, München
Full-timeOn-siteSenior

Designing and scaling production logging, metrics, and tracing stacks across multiple data centers and Kubernetes clusters. 8+ years industry experience with hands-on production observability systems required. Building infrastructure for an AI-powered orchestration platform.

Requirements

  • 8+ years industry experience
  • Solid hands-on production experience with observability systems
  • Strong plus: familiarity with OpenTelemetry, Kafka, Vector, VictoriaMetrics
  • Experience with logging pipelines: design, deployment, refactoring
  • Understanding of distributed tracing and SPM
  • Experience with Kubernetes cluster lifecycle management (EKS preferred)
  • Practical knowledge of storage trade-offs for observability data
  • Experience using AI to automate infrastructure or observability tasks
  • Familiarity with AI-assisted tooling selection and workflow integration
  • Experience with MCP (custom or open-source implementations)
  • Background in cloud account or environment migrations
  • Experience preparing infrastructure for compliance/audit processes
  • Understanding network architecture, troubleshooting, incident resolution, Post-mortems
  • Experience with containers and Kubernetes (installation, configuration of operators)
  • Basic knowledge of Python, Golang, Java
  • Good communication and collaboration skills
  • Interest in modern big distributed storage technologies, architectures
  • Good Spoken English for technical discussions
  • Balance of hands-on and analytical approaches

Tasks

  • Design, deploy, and maintain production observability stacks (logs, metrics, traces)
  • Scale observability infrastructure across multiple data centers and Kubernetes clusters
  • Manage logging pipeline architecture and refactoring efforts
  • Improve distributed tracing coverage
  • Drive distributed tracing adoption across engineering teams
  • Manage EKS upgrades, node exporters, agents, and collectors
  • Automate operational tasks to reduce toil and improve system stability
  • Ensure compliance and audit readiness for access controls, data handling, and pipeline integrity
  • Evaluate and adopt new observability tooling

Work Experience

  • 8 years

Education

  • Bachelor's degreeOR
  • Master's degree

Languages

  • EnglishBusiness Fluent

Tools & Technologies

  • OpenTelemetry
  • Kafka
  • Vector
  • VictoriaMetrics
  • vmagent
  • alerting rules
  • Kubernetes
  • EKS
  • Containers
  • Python
  • Golang
  • Java
  • AI
  • MCP
Find the original job posting in its most current version here. Nejo automatically captured this job from the website of Workato and processed the information on Nejo with the help of AI for you. Despite careful analysis, some information may be incomplete or inaccurate. Please always verify all details in the original posting! Content and copyrights of the original posting belong to the advertising company.

  • Workato

    Senior Infrastructure Engineer /DevOps(m/w/x)

    Full-timeOn-siteSenior
    Berlin, Frankfurt am Main, München
  • Perplexity

    Senior Backend/Infrastructure Engineer - Search(m/w/x)

    Full-timeOn-siteSenior
    Berlin
  • Nebius

    Senior Site Reliability Engineer — AI Studio (Inference Platform)(m/w/x)

    Full-timeOn-siteSenior
    Berlin
  • Nebius

    Technical Product Manager - AI Cloud Observability(m/w/x)

    Full-timeOn-siteSenior
    Berlin
  • SAP

    T2/T3 DevOps Engineer for Sovereign Cloud Onsite / ApeiroRA / EU AI Projects(m/w/x)

    Full-time/Part-timeOn-siteSenior
    Berlin, Garching bei München, Dresden, St. Leon-Rot
View all 100+ similar jobs

Nejo is an AI – results may be incomplete or contain mistakes