Your personal AI career agent
Senior Infrastructure Engineer - Observability(m/w/x)
Designing and scaling production logging, metrics, and tracing stacks across multiple data centers and Kubernetes clusters. 8+ years industry experience with hands-on production observability systems required. Building infrastructure for an AI-powered orchestration platform.
Requirements
- 8+ years industry experience
- Solid hands-on production experience with observability systems
- Strong plus: familiarity with OpenTelemetry, Kafka, Vector, VictoriaMetrics
- Experience with logging pipelines: design, deployment, refactoring
- Understanding of distributed tracing and SPM
- Experience with Kubernetes cluster lifecycle management (EKS preferred)
- Practical knowledge of storage trade-offs for observability data
- Experience using AI to automate infrastructure or observability tasks
- Familiarity with AI-assisted tooling selection and workflow integration
- Experience with MCP (custom or open-source implementations)
- Background in cloud account or environment migrations
- Experience preparing infrastructure for compliance/audit processes
- Understanding network architecture, troubleshooting, incident resolution, Post-mortems
- Experience with containers and Kubernetes (installation, configuration of operators)
- Basic knowledge of Python, Golang, Java
- Good communication and collaboration skills
- Interest in modern big distributed storage technologies, architectures
- Good Spoken English for technical discussions
- Balance of hands-on and analytical approaches
Tasks
- Design, deploy, and maintain production observability stacks (logs, metrics, traces)
- Scale observability infrastructure across multiple data centers and Kubernetes clusters
- Manage logging pipeline architecture and refactoring efforts
- Improve distributed tracing coverage
- Drive distributed tracing adoption across engineering teams
- Manage EKS upgrades, node exporters, agents, and collectors
- Automate operational tasks to reduce toil and improve system stability
- Ensure compliance and audit readiness for access controls, data handling, and pipeline integrity
- Evaluate and adopt new observability tooling
Work Experience
- 8 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- OpenTelemetry
- Kafka
- Vector
- VictoriaMetrics
- vmagent
- alerting rules
- Kubernetes
- EKS
- Containers
- Python
- Golang
- Java
- AI
- MCP
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
Not a perfect match?
- Trade RepublicFull-timeOn-siteSeniorBerlin
- Perplexity
Senior Backend/Infrastructure Engineer - Search(m/w/x)
Full-timeOn-siteSeniorBerlin - Nebius
Senior Site Reliability Engineer — AI Studio (Inference Platform)(m/w/x)
Full-timeOn-siteSeniorBerlin - emnify
Staff/Senior AWS Cloud Platform Engineer(m/w/x)
Full-timeOn-siteSeniorBerlin - Forto
Senior Site Reliability Engineer(m/w/x)
Full-timeOn-siteSeniorBerlin
Senior Infrastructure Engineer - Observability(m/w/x)
Designing and scaling production logging, metrics, and tracing stacks across multiple data centers and Kubernetes clusters. 8+ years industry experience with hands-on production observability systems required. Building infrastructure for an AI-powered orchestration platform.
Requirements
- 8+ years industry experience
- Solid hands-on production experience with observability systems
- Strong plus: familiarity with OpenTelemetry, Kafka, Vector, VictoriaMetrics
- Experience with logging pipelines: design, deployment, refactoring
- Understanding of distributed tracing and SPM
- Experience with Kubernetes cluster lifecycle management (EKS preferred)
- Practical knowledge of storage trade-offs for observability data
- Experience using AI to automate infrastructure or observability tasks
- Familiarity with AI-assisted tooling selection and workflow integration
- Experience with MCP (custom or open-source implementations)
- Background in cloud account or environment migrations
- Experience preparing infrastructure for compliance/audit processes
- Understanding network architecture, troubleshooting, incident resolution, Post-mortems
- Experience with containers and Kubernetes (installation, configuration of operators)
- Basic knowledge of Python, Golang, Java
- Good communication and collaboration skills
- Interest in modern big distributed storage technologies, architectures
- Good Spoken English for technical discussions
- Balance of hands-on and analytical approaches
Tasks
- Design, deploy, and maintain production observability stacks (logs, metrics, traces)
- Scale observability infrastructure across multiple data centers and Kubernetes clusters
- Manage logging pipeline architecture and refactoring efforts
- Improve distributed tracing coverage
- Drive distributed tracing adoption across engineering teams
- Manage EKS upgrades, node exporters, agents, and collectors
- Automate operational tasks to reduce toil and improve system stability
- Ensure compliance and audit readiness for access controls, data handling, and pipeline integrity
- Evaluate and adopt new observability tooling
Work Experience
- 8 years
Education
- Bachelor's degreeOR
- Master's degree
Languages
- English – Business Fluent
Tools & Technologies
- OpenTelemetry
- Kafka
- Vector
- VictoriaMetrics
- vmagent
- alerting rules
- Kubernetes
- EKS
- Containers
- Python
- Golang
- Java
- AI
- MCP
Like this job?
BetaYour Career Agent finds similar jobs for you every day.
About the Company
Workato
Industry
IT
Description
Workato helps businesses globally streamline operations by connecting data, processes, applications, and experiences.
Not a perfect match?
- Trade Republic
Observability Tech Lead(m/w/x)
Full-timeOn-siteSeniorBerlin - Perplexity
Senior Backend/Infrastructure Engineer - Search(m/w/x)
Full-timeOn-siteSeniorBerlin - Nebius
Senior Site Reliability Engineer — AI Studio (Inference Platform)(m/w/x)
Full-timeOn-siteSeniorBerlin - emnify
Staff/Senior AWS Cloud Platform Engineer(m/w/x)
Full-timeOn-siteSeniorBerlin - Forto
Senior Site Reliability Engineer(m/w/x)
Full-timeOn-siteSeniorBerlin