Source Job

Germany

  • Investigate and resolve complex production issues across cloud and customer environments with root cause analysis.
  • Debug across Linux, Kubernetes, networking, storage, and GPU-based systems as a senior escalation point.
  • Develop internal tools and automation to enhance troubleshooting efficiency and platform reliability.

Linux Kubernetes Cloud Infrastructure Networking Python

20 jobs similar to Senior Support Engineer

Jobs ranked by similarity.

Ireland

  • Diagnose and resolve complex production issues across Linux, Kubernetes, networking, storage, and GPU systems.
  • Act as a senior escalation point for critical incidents, collaborating with engineering teams on root cause analysis.
  • Develop tools and automation in Python, Bash, or Go to improve troubleshooting efficiency and observability.

The partner company provides advanced AI and cloud infrastructure solutions, supporting large-scale distributed computing and AI workloads. They operate in a fast-moving, collaborative environment with highly skilled engineering teams focused on cutting-edge technology and operational excellence.

Global

  • Troubleshoot and resolve issues in customer environments based on Linux, OpenStack, Kubernetes, and networking technologies, owning escalations end-to-end.
  • Reproduce customer issues in labs, confirm bug reports, and collaborate with the development team to improve product stability.
  • Communicate with customers during incidents via email and remote sessions, guiding them through troubleshooting and resolution processes.

Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI and data-intensive applications. With deep expertise in open source and Kubernetes, Mirantis empowers platform engineering teams across enterprises worldwide.

Global

  • Act as the final escalation point for complex Cloud infrastructure issues, analyzing logs and metrics to identify root causes.
  • Own high-severity incidents, coordinate resolution with Engineering, DevOps, and SRE teams, and contribute to preventive actions.
  • Mentor L1 and L2 support engineers, create runbooks and SOPs, and collaborate with Product teams to reproduce issues.

Gcore provides infrastructure and software solutions for AI, cloud, network, and security, powering real-time communication, streaming, enterprise AI, and secure web applications. With 550+ professionals globally, they collaborate with partners like Intel, NVIDIA, Dell, and Equinix to support the digital ecosystem.

Europe

  • Lead investigation and resolution of complex infrastructure, networking, and platform incidents.
  • Provide technical leadership for Kubernetes platform operations and drive automation initiatives.
  • Mentor engineers and develop operational standards, runbooks, and best practices.

Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI, machine learning, and data-intensive applications. Serving enterprises like Adobe, PayPal, and Volkswagen, Mirantis is committed to open standards and freedom from lock-in.

Europe

  • Operate and maintain Linux-based infrastructure, deploy and scale Kubernetes clusters, and implement automation with Ansible and GitOps.
  • Design networking architecture, build observability stacks, and lead incident response across the platform.
  • Manage virtualization layers and collaborate with development teams to optimize resource utilization and system availability.

Pragmatike develops cutting-edge solutions in Cloud Computing, focusing on ambitious projects with a culture of collaboration and innovation. The team is passionate and collaborative, working in a dynamic and flexible environment to shape tomorrow's technologies.

Europe

  • Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents.
  • Provide technical leadership for Kubernetes platform operations and supporting infrastructure services.
  • Mentor and support AI Infrastructure & Platform Operations Engineers, sharing technical knowledge through documentation and training.

Mirantis helps organizations ship code faster on public and private clouds, providing a public cloud experience on any infrastructure from the data center to the edge. The company serves many of the world's leading enterprises, including Adobe, DocuSign, Liberty Mutual, and PayPal, and is a leader in container management.

Europe

  • Monitor, operate, and support production AI infrastructure platforms.
  • Investigate and resolve infrastructure, networking, hardware, and platform-related incidents.
  • Collaborate with engineering teams, hardware vendors, and datacenter personnel to resolve technical issues.

Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure infrastructure for AI and data-intensive applications. The company is growing and invests heavily in AI infrastructure and platform services.

Europe

  • Support and improve hybrid production infrastructure for 15+ development teams handling 100+ products, 10K+ domains, and billions of hits per day.
  • Architect and plan improvements of a multi-datacenter development environment, advocating for migration to automated, elastic infrastructures using cloud, Kubernetes, and serverless technologies.
  • Document processes, monitor performance metrics, promote CICD practices, and mentor junior DevOps engineers.

Aylo is a tech pioneer that offers world-class adult entertainment and games on safe, popular platforms. With an international team of dynamic innovators, the company focuses on trust-and-safety protocols and has offices in Montreal, Austin, and Nicosia.

Global

  • Design, build, and operate scalable cloud infrastructure and infrastructure-as-code for globally distributed services.
  • Develop and maintain CI/CD pipelines to support rapid and reliable delivery of backend and client components.
  • Own service reliability by implementing observability (metrics, logs, tracing) and leading incident response with actionable improvements.

NetBird develops an open-source zero-trust network security platform that is easy to use and affordable for teams of all sizes. Since its launch in 2021, it has gained trust among thousands of companies and connects hundreds of thousands of users worldwide, driven by a community-focused culture.

Europe

  • Own performance optimization and reliability of large-scale GPU clusters and InfiniBand networking for HPC workloads.
  • Diagnose and resolve complex system-level issues across GPU, network, and compute layers, integrating new hardware components.
  • Develop automation for monitoring, fault detection, and proactive remediation in distributed compute environments.

Our partner is building a next-generation AI cloud infrastructure environment, focusing on large-scale high-performance computing systems. They foster a highly technical engineering culture with experts across systems, networking, and virtualization, offering career development and continuous learning opportunities.

US

  • Design, deploy, and manage production Kubernetes clusters with workload scheduling, resource quotas, network policies, and RBAC.
  • Build and optimize CI/CD pipelines using Infrastructure as Code and GitOps principles.
  • Implement observability solutions using Prometheus, Grafana, and OpenTelemetry for performance tuning and reliability.

VerTALENTS is a subsidiary of VerSprite Cybersecurity, specializing in technology staffing. The company connects top technical talent with industry clients through various methods, adding value to both clients and candidates for full-time and contracting opportunities.

Europe

  • Design, build, and operate scalable cloud infrastructure using Kubernetes, Terraform, and modern infrastructure-as-code practices.
  • Improve and evolve cloud networking architecture, including VPC/VNet design, peering, routing, DNS, TLS, ingress/egress, and load balancing.
  • Contribute to system reliability through on-call support, incident response, root cause analysis, and performance optimization.

Jobgether is an AI-powered job matching platform that connects candidates with hiring companies. They use automated review and matching to ensure fair candidate evaluation.

LATAM

  • Maintain and support core infrastructure with deep Linux expertise.
  • Design scalable networks using VLANs, routing, VPNs, and UniFi equipment.
  • Automate provisioning with Ansible, Bash/Python, and MAAS-based workflows.

A European deep-tech company is developing a decentralized, energy-efficient cloud platform using distributed bare-metal infrastructure. It is a startup or hyper-growth environment that values autonomy, speed, and problem-solving.

Poland

  • Design, write and deliver software to implement and support large web-scale, highly-performant, highly-available infrastructure on GCP/AWS.
  • Monitor infrastructure, respond to incidents, correct and improve systems to prevent incidents, and plan capacity.
  • Tune large-scale clusters for optimal performance and efficiency and support system deployments and product releases.

OpenX develops digital advertising marketplaces and technologies to optimize ad delivery for publishers and advertisers. The company operates a large-scale cloud infrastructure in Poland and values teamwork, customer centricity, and continuous learning.

Global

  • Manage deployment and upkeep of internal network infrastructure including routers, switches, firewalls, and wireless equipment.
  • Install, configure, and maintain internal servers and services, implementing infrastructure changes and improvements.
  • Troubleshoot and resolve hardware and software issues while planning regular maintenance for system health and uptime.

Gcore provides infrastructure and software solutions for AI, cloud, network, and security, powering digital experiences worldwide. With 550+ employees and partnerships with Intel, NVIDIA, Dell, and Equinix, the company focuses on connecting the world to AI.

Unlimited PTO

  • Tackle complex customer issues spanning multiple systems, collaborating across Product, Engineering, and Customer Success.
  • Lead cross-functional incident response, root cause analysis, and develop high-quality documentation and runbooks.
  • Analyze recurring issues to advocate for product improvements and contribute to automation and knowledge-sharing initiatives.

Immuta is the Data Provisioning Company, helping organizations provision secure, governed data access at speed. Founded in 2015, it has $267 million in total funding, is trusted by Fortune 500 companies globally, and operates as a hybrid workplace with offices in Boston, Columbus, and College Park.

Germany Unlimited PTO

  • Design and maintain scalable infrastructure-as-code solutions using Terraform and Kubernetes.
  • Build and operate observability systems while leading incident response and reliability improvements.
  • Embed security and compliance practices into infrastructure and optimize system performance and cloud costs.

This partner company builds a next-generation platform enabling AI-driven services across global employment infrastructure. It is a highly distributed, async-first organization where engineers thrive in ownership and autonomy.

Global 18w maternity 18w paternity

  • Analyze and resolve customer technical issues, managing escalations with security and engineering teams.
  • Develop and maintain internal and external knowledge bases to educate customers on platform success.
  • Advocate for customers by collaborating with product teams to implement feature improvements.

UpGuard builds a Cyber Risk Posture Management platform that replaces manual security bottlenecks with AI-driven precision, processing over 100 billion risk signals daily. With a US$75M Series C and 99% team member approval as a Great Place to Work, they offer a fully remote, collaborative culture.

US Unlimited PTO 18w maternity 12w paternity

  • Manage the AMER Technical Support Engineering team and own the L2/L3 escalation boundary.
  • Develop the team through coaching, career frameworks, and skill gap identification.
  • Drive tooling and automation improvements including AI-assisted workflows.

Chainguard is the trusted source for open source, delivering hardened, secure, and production-ready builds of open source software. They are a venture-backed company serving Fortune 500 enterprises, with a culture that values customer obsession, intentional action, and trust.

Canada

  • Work with industry-leading customers to maintain Illumio technology deployments.
  • Log and update cases, inform customers of status, and provide solutions in a professional and timely manner.
  • Analyze problems and defects, recommend solutions, and collaborate with internal teams.

Illumio is a leader in ransomware and breach containment, providing a platform to stop the spread of cyberattacks across hybrid multi-cloud environments. The company is recognized as a Leader in the Forrester Wave for Microsegmentation and fosters a culture of belonging and collaboration.