The AI Job Search Engine
Senior Infrastructure Engineer - Observability(m/w/x)
Designing and scaling production logging, metrics, and tracing stacks across multiple data centers and Kubernetes clusters. 8+ years industry experience with hands-on production observability systems required. Building infrastructure for an AI-powered orchestration platform.
Requirements
- 8+ years industry experience
- Solid hands-on production experience with observability systems
- Strong plus: familiarity with OpenTelemetry, Kafka, Vector, VictoriaMetrics
- Experience with logging pipelines: design, deployment, refactoring
- Understanding of distributed tracing and SPM
- Experience with Kubernetes cluster lifecycle management (EKS preferred)
- Practical knowledge of storage trade-offs for observability data
- Experience using AI to automate infrastructure or observability tasks
- Familiarity with AI-assisted tooling selection and workflow integration
- Experience with MCP (custom or open-source implementations)
- Background in cloud account or environment migrations
- Experience preparing infrastructure for compliance/audit processes
- Understanding network architecture, troubleshooting, incident resolution, Post-mortems
- Experience with containers and Kubernetes (installation, configuration of operators)
- Basic knowledge of Python, Golang, Java
- Good communication and collaboration skills
- Interest in modern big distributed storage technologies, architectures
- Good Spoken English for technical discussions
- Balance of hands-on and analytical approaches
Tasks
- Design, deploy, and maintain production observability stacks (logs, metrics, traces)
- Scale observability infrastructure across multiple data centers and Kubernetes clusters
- Manage logging pipeline architecture and refactoring efforts
- Improve distributed tracing coverage
- Drive distributed tracing adoption across engineering teams
- Manage EKS upgrades, node exporters, agents, and collectors
- Automate operational tasks to reduce toil and improve system stability
- Ensure compliance and audit readiness for access controls, data handling, and pipeline integrity
- Evaluate and adopt new observability tooling
Work Experience
- 8 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- OpenTelemetry
- Kafka
- Vector
- VictoriaMetrics
- vmagent
- alerting rules
- Kubernetes
- EKS
- Containers
- Python
- Golang
- Java
- AI
- MCP
Not a perfect match?
- WorkatoFull-timeOn-siteSeniorBerlin, Frankfurt am Main, München
- Perplexity
Senior Backend/Infrastructure Engineer - Search(m/w/x)
Full-timeOn-siteSeniorBerlin - Nebius
Senior Site Reliability Engineer — AI Studio (Inference Platform)(m/w/x)
Full-timeOn-siteSeniorBerlin - Nebius
Technical Product Manager - AI Cloud Observability(m/w/x)
Full-timeOn-siteSeniorBerlin - SAP
T2/T3 DevOps Engineer for Sovereign Cloud Onsite / ApeiroRA / EU AI Projects(m/w/x)
Full-time/Part-timeOn-siteSeniorBerlin, Garching bei München, Dresden, St. Leon-Rot
Senior Infrastructure Engineer - Observability(m/w/x)
Designing and scaling production logging, metrics, and tracing stacks across multiple data centers and Kubernetes clusters. 8+ years industry experience with hands-on production observability systems required. Building infrastructure for an AI-powered orchestration platform.
Requirements
- 8+ years industry experience
- Solid hands-on production experience with observability systems
- Strong plus: familiarity with OpenTelemetry, Kafka, Vector, VictoriaMetrics
- Experience with logging pipelines: design, deployment, refactoring
- Understanding of distributed tracing and SPM
- Experience with Kubernetes cluster lifecycle management (EKS preferred)
- Practical knowledge of storage trade-offs for observability data
- Experience using AI to automate infrastructure or observability tasks
- Familiarity with AI-assisted tooling selection and workflow integration
- Experience with MCP (custom or open-source implementations)
- Background in cloud account or environment migrations
- Experience preparing infrastructure for compliance/audit processes
- Understanding network architecture, troubleshooting, incident resolution, Post-mortems
- Experience with containers and Kubernetes (installation, configuration of operators)
- Basic knowledge of Python, Golang, Java
- Good communication and collaboration skills
- Interest in modern big distributed storage technologies, architectures
- Good Spoken English for technical discussions
- Balance of hands-on and analytical approaches
Tasks
- Design, deploy, and maintain production observability stacks (logs, metrics, traces)
- Scale observability infrastructure across multiple data centers and Kubernetes clusters
- Manage logging pipeline architecture and refactoring efforts
- Improve distributed tracing coverage
- Drive distributed tracing adoption across engineering teams
- Manage EKS upgrades, node exporters, agents, and collectors
- Automate operational tasks to reduce toil and improve system stability
- Ensure compliance and audit readiness for access controls, data handling, and pipeline integrity
- Evaluate and adopt new observability tooling
Work Experience
- 8 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- OpenTelemetry
- Kafka
- Vector
- VictoriaMetrics
- vmagent
- alerting rules
- Kubernetes
- EKS
- Containers
- Python
- Golang
- Java
- AI
- MCP
About the Company
Workato
Industry
IT
Description
Workato helps businesses globally streamline operations by connecting data, processes, applications, and experiences.
Not a perfect match?
- Workato
Senior Infrastructure Engineer /DevOps(m/w/x)
Full-timeOn-siteSeniorBerlin, Frankfurt am Main, München - Perplexity
Senior Backend/Infrastructure Engineer - Search(m/w/x)
Full-timeOn-siteSeniorBerlin - Nebius
Senior Site Reliability Engineer — AI Studio (Inference Platform)(m/w/x)
Full-timeOn-siteSeniorBerlin - Nebius
Technical Product Manager - AI Cloud Observability(m/w/x)
Full-timeOn-siteSeniorBerlin - SAP
T2/T3 DevOps Engineer for Sovereign Cloud Onsite / ApeiroRA / EU AI Projects(m/w/x)
Full-time/Part-timeOn-siteSeniorBerlin, Garching bei München, Dresden, St. Leon-Rot