Die KI-Suchmaschine für Jobs
Senior Infrastructure Engineer - Observability(m/w/x)
Designing and scaling production logging, metrics, and tracing stacks across multiple data centers and Kubernetes clusters. 8+ years industry experience with hands-on production observability systems required. Building infrastructure for an AI-powered orchestration platform.
Anforderungen
- 8+ years industry experience
- Solid hands-on production experience with observability systems
- Strong plus: familiarity with OpenTelemetry, Kafka, Vector, VictoriaMetrics
- Experience with logging pipelines: design, deployment, refactoring
- Understanding of distributed tracing and SPM
- Experience with Kubernetes cluster lifecycle management (EKS preferred)
- Practical knowledge of storage trade-offs for observability data
- Experience using AI to automate infrastructure or observability tasks
- Familiarity with AI-assisted tooling selection and workflow integration
- Experience with MCP (custom or open-source implementations)
- Background in cloud account or environment migrations
- Experience preparing infrastructure for compliance/audit processes
- Understanding network architecture, troubleshooting, incident resolution, Post-mortems
- Experience with containers and Kubernetes (installation, configuration of operators)
- Basic knowledge of Python, Golang, Java
- Good communication and collaboration skills
- Interest in modern big distributed storage technologies, architectures
- Good Spoken English for technical discussions
- Balance of hands-on and analytical approaches
Aufgaben
- Design, deploy, and maintain production observability stacks (logs, metrics, traces)
- Scale observability infrastructure across multiple data centers and Kubernetes clusters
- Manage logging pipeline architecture and refactoring efforts
- Improve distributed tracing coverage
- Drive distributed tracing adoption across engineering teams
- Manage EKS upgrades, node exporters, agents, and collectors
- Automate operational tasks to reduce toil and improve system stability
- Ensure compliance and audit readiness for access controls, data handling, and pipeline integrity
- Evaluate and adopt new observability tooling
Berufserfahrung
- 8 Jahre
Ausbildung
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- OpenTelemetry
- Kafka
- Vector
- VictoriaMetrics
- vmagent
- alerting rules
- Kubernetes
- EKS
- Containers
- Python
- Golang
- Java
- AI
- MCP
Noch nicht perfekt?
- WorkatoVollzeitnur vor OrtSeniorBerlin, Frankfurt am Main, München
- Perplexity
Senior Backend/Infrastructure Engineer - Search(m/w/x)
Vollzeitnur vor OrtSeniorBerlin - Nebius
Senior Site Reliability Engineer — AI Studio (Inference Platform)(m/w/x)
Vollzeitnur vor OrtSeniorBerlin - Nebius
Technical Product Manager - AI Cloud Observability(m/w/x)
Vollzeitnur vor OrtSeniorBerlin - SAP
T2/T3 DevOps Engineer for Sovereign Cloud Onsite / ApeiroRA / EU AI Projects(m/w/x)
Vollzeit/Teilzeitnur vor OrtSeniorBerlin, Garching bei München, Dresden, St. Leon-Rot
Senior Infrastructure Engineer - Observability(m/w/x)
Designing and scaling production logging, metrics, and tracing stacks across multiple data centers and Kubernetes clusters. 8+ years industry experience with hands-on production observability systems required. Building infrastructure for an AI-powered orchestration platform.
Anforderungen
- 8+ years industry experience
- Solid hands-on production experience with observability systems
- Strong plus: familiarity with OpenTelemetry, Kafka, Vector, VictoriaMetrics
- Experience with logging pipelines: design, deployment, refactoring
- Understanding of distributed tracing and SPM
- Experience with Kubernetes cluster lifecycle management (EKS preferred)
- Practical knowledge of storage trade-offs for observability data
- Experience using AI to automate infrastructure or observability tasks
- Familiarity with AI-assisted tooling selection and workflow integration
- Experience with MCP (custom or open-source implementations)
- Background in cloud account or environment migrations
- Experience preparing infrastructure for compliance/audit processes
- Understanding network architecture, troubleshooting, incident resolution, Post-mortems
- Experience with containers and Kubernetes (installation, configuration of operators)
- Basic knowledge of Python, Golang, Java
- Good communication and collaboration skills
- Interest in modern big distributed storage technologies, architectures
- Good Spoken English for technical discussions
- Balance of hands-on and analytical approaches
Aufgaben
- Design, deploy, and maintain production observability stacks (logs, metrics, traces)
- Scale observability infrastructure across multiple data centers and Kubernetes clusters
- Manage logging pipeline architecture and refactoring efforts
- Improve distributed tracing coverage
- Drive distributed tracing adoption across engineering teams
- Manage EKS upgrades, node exporters, agents, and collectors
- Automate operational tasks to reduce toil and improve system stability
- Ensure compliance and audit readiness for access controls, data handling, and pipeline integrity
- Evaluate and adopt new observability tooling
Berufserfahrung
- 8 Jahre
Ausbildung
- Bachelor-AbschlussODER
- Master-Abschluss
Sprachen
- Englisch – verhandlungssicher
Tools & Technologien
- OpenTelemetry
- Kafka
- Vector
- VictoriaMetrics
- vmagent
- alerting rules
- Kubernetes
- EKS
- Containers
- Python
- Golang
- Java
- AI
- MCP
Über das Unternehmen
Workato
Branche
IT
Beschreibung
Workato helps businesses globally streamline operations by connecting data, processes, applications, and experiences.
Noch nicht perfekt?
- Workato
Senior Infrastructure Engineer /DevOps(m/w/x)
Vollzeitnur vor OrtSeniorBerlin, Frankfurt am Main, München - Perplexity
Senior Backend/Infrastructure Engineer - Search(m/w/x)
Vollzeitnur vor OrtSeniorBerlin - Nebius
Senior Site Reliability Engineer — AI Studio (Inference Platform)(m/w/x)
Vollzeitnur vor OrtSeniorBerlin - Nebius
Technical Product Manager - AI Cloud Observability(m/w/x)
Vollzeitnur vor OrtSeniorBerlin - SAP
T2/T3 DevOps Engineer for Sovereign Cloud Onsite / ApeiroRA / EU AI Projects(m/w/x)
Vollzeit/Teilzeitnur vor OrtSeniorBerlin, Garching bei München, Dresden, St. Leon-Rot