Source Job

Ireland

  • Diagnose and resolve complex production issues across Linux, Kubernetes, networking, storage, and GPU systems.
  • Act as a senior escalation point for critical incidents, collaborating with engineering teams on root cause analysis.
  • Develop tools and automation in Python, Bash, or Go to improve troubleshooting efficiency and observability.

Linux Kubernetes Python Networking AWS

20 jobs similar to Senior Support Engineer

Jobs ranked by similarity.

Germany

  • Investigate and resolve complex production issues across cloud and customer environments with root cause analysis.
  • Debug across Linux, Kubernetes, networking, storage, and GPU-based systems as a senior escalation point.
  • Develop internal tools and automation to enhance troubleshooting efficiency and platform reliability.

Our partner is a company building cutting-edge AI and cloud infrastructure solutions. They foster an inclusive, innovation-driven culture with a strong focus on engineering excellence and continuous improvement.

Global

  • Troubleshoot and resolve issues in customer environments based on Linux, OpenStack, Kubernetes, and networking technologies, owning escalations end-to-end.
  • Reproduce customer issues in labs, confirm bug reports, and collaborate with the development team to improve product stability.
  • Communicate with customers during incidents via email and remote sessions, guiding them through troubleshooting and resolution processes.

Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI and data-intensive applications. With deep expertise in open source and Kubernetes, Mirantis empowers platform engineering teams across enterprises worldwide.

Europe

  • Lead investigation and resolution of complex infrastructure, networking, and platform incidents.
  • Provide technical leadership for Kubernetes platform operations and drive automation initiatives.
  • Mentor engineers and develop operational standards, runbooks, and best practices.

Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI, machine learning, and data-intensive applications. Serving enterprises like Adobe, PayPal, and Volkswagen, Mirantis is committed to open standards and freedom from lock-in.

Global

  • Act as the final escalation point for complex Cloud infrastructure issues, analyzing logs and metrics to identify root causes.
  • Own high-severity incidents, coordinate resolution with Engineering, DevOps, and SRE teams, and contribute to preventive actions.
  • Mentor L1 and L2 support engineers, create runbooks and SOPs, and collaborate with Product teams to reproduce issues.

Gcore provides infrastructure and software solutions for AI, cloud, network, and security, powering real-time communication, streaming, enterprise AI, and secure web applications. With 550+ professionals globally, they collaborate with partners like Intel, NVIDIA, Dell, and Equinix to support the digital ecosystem.

Europe

  • Operate and maintain Linux-based infrastructure, deploy and scale Kubernetes clusters, and implement automation with Ansible and GitOps.
  • Design networking architecture, build observability stacks, and lead incident response across the platform.
  • Manage virtualization layers and collaborate with development teams to optimize resource utilization and system availability.

Pragmatike develops cutting-edge solutions in Cloud Computing, focusing on ambitious projects with a culture of collaboration and innovation. The team is passionate and collaborative, working in a dynamic and flexible environment to shape tomorrow's technologies.

Europe

  • Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents.
  • Provide technical leadership for Kubernetes platform operations and supporting infrastructure services.
  • Mentor and support AI Infrastructure & Platform Operations Engineers, sharing technical knowledge through documentation and training.

Mirantis helps organizations ship code faster on public and private clouds, providing a public cloud experience on any infrastructure from the data center to the edge. The company serves many of the world's leading enterprises, including Adobe, DocuSign, Liberty Mutual, and PayPal, and is a leader in container management.

Unlimited PTO

  • Tackle complex customer issues spanning multiple systems, collaborating across Product, Engineering, and Customer Success.
  • Lead cross-functional incident response, root cause analysis, and develop high-quality documentation and runbooks.
  • Analyze recurring issues to advocate for product improvements and contribute to automation and knowledge-sharing initiatives.

Immuta is the Data Provisioning Company, helping organizations provision secure, governed data access at speed. Founded in 2015, it has $267 million in total funding, is trusted by Fortune 500 companies globally, and operates as a hybrid workplace with offices in Boston, Columbus, and College Park.

Europe

  • Support and improve hybrid production infrastructure for 15+ development teams handling 100+ products, 10K+ domains, and billions of hits per day.
  • Architect and plan improvements of a multi-datacenter development environment, advocating for migration to automated, elastic infrastructures using cloud, Kubernetes, and serverless technologies.
  • Document processes, monitor performance metrics, promote CICD practices, and mentor junior DevOps engineers.

Aylo is a tech pioneer that offers world-class adult entertainment and games on safe, popular platforms. With an international team of dynamic innovators, the company focuses on trust-and-safety protocols and has offices in Montreal, Austin, and Nicosia.

Europe

  • Monitor, operate, and support production AI infrastructure platforms.
  • Investigate and resolve infrastructure, networking, hardware, and platform-related incidents.
  • Collaborate with engineering teams, hardware vendors, and datacenter personnel to resolve technical issues.

Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure infrastructure for AI and data-intensive applications. The company is growing and invests heavily in AI infrastructure and platform services.

Global 18w maternity 18w paternity

  • Analyze and resolve customer technical issues, managing escalations with security and engineering teams.
  • Develop and maintain internal and external knowledge bases to educate customers on platform success.
  • Advocate for customers by collaborating with product teams to implement feature improvements.

UpGuard builds a Cyber Risk Posture Management platform that replaces manual security bottlenecks with AI-driven precision, processing over 100 billion risk signals daily. With a US$75M Series C and 99% team member approval as a Great Place to Work, they offer a fully remote, collaborative culture.

Canada

  • Work with industry-leading customers to maintain Illumio technology deployments.
  • Log and update cases, inform customers of status, and provide solutions in a professional and timely manner.
  • Analyze problems and defects, recommend solutions, and collaborate with internal teams.

Illumio is a leader in ransomware and breach containment, providing a platform to stop the spread of cyberattacks across hybrid multi-cloud environments. The company is recognized as a Leader in the Forrester Wave for Microsegmentation and fosters a culture of belonging and collaboration.

UK

  • Design, implement, and operate scalable, secure, and highly available AWS cloud infrastructure.
  • Drive the reliability and performance of containerized applications using Amazon EKS and ECS.
  • Ensure stability, security, and efficiency of production Linux environments through system administration.

NiCE provides software products used by over 8,500 global businesses, including 85 of the Fortune 100, to deliver customer experiences, fight financial crime, and ensure public safety. The company has over 8,500 employees across 30+ countries and is an innovation powerhouse in AI, cloud, and digital.

India

  • Act as senior technical authority in application support, ensuring stability, performance, and reliability of enterprise systems.
  • Partner with technology and business teams to define enhancements, production support strategies, and drive incident management.
  • Mentor junior analysts, influence operational practices, and improve system resilience in a global financial technology environment.

Jobgether is an AI-powered job matching platform that helps candidates get reviewed quickly and objectively against role requirements. They focus on using technology to connect top-fitting candidates with hiring companies.

US Unlimited PTO 18w maternity 12w paternity

  • Manage the AMER Technical Support Engineering team and own the L2/L3 escalation boundary.
  • Develop the team through coaching, career frameworks, and skill gap identification.
  • Drive tooling and automation improvements including AI-assisted workflows.

Chainguard is the trusted source for open source, delivering hardened, secure, and production-ready builds of open source software. They are a venture-backed company serving Fortune 500 enterprises, with a culture that values customer obsession, intentional action, and trust.

US

  • Design, deploy, and manage production Kubernetes clusters with workload scheduling, resource quotas, network policies, and RBAC.
  • Build and optimize CI/CD pipelines using Infrastructure as Code and GitOps principles.
  • Implement observability solutions using Prometheus, Grafana, and OpenTelemetry for performance tuning and reliability.

VerTALENTS is a subsidiary of VerSprite Cybersecurity, specializing in technology staffing. The company connects top technical talent with industry clients through various methods, adding value to both clients and candidates for full-time and contracting opportunities.

US

  • Independently troubleshoot enterprise CI/CD and infrastructure issues for top tech companies.
  • Design and implement proactive tools, processes, and open source contributions.
  • Provide support via Slack, Zoom, and Community Forum with no on-call duties.

Buildkite is rethinking software delivery, building a fast, reliable, and secure CI/CD platform for high-growth tech companies like Airbnb and Canva. They are a remote-first company with a culture of kindness, autonomy, and collaboration.

Global

  • Design, build, and operate scalable cloud infrastructure and infrastructure-as-code for globally distributed services.
  • Develop and maintain CI/CD pipelines to support rapid and reliable delivery of backend and client components.
  • Own service reliability by implementing observability (metrics, logs, tracing) and leading incident response with actionable improvements.

NetBird develops an open-source zero-trust network security platform that is easy to use and affordable for teams of all sizes. Since its launch in 2021, it has gained trust among thousands of companies and connects hundreds of thousands of users worldwide, driven by a community-focused culture.

UK

  • Design, build, and maintain CI/CD pipelines and Infrastructure as Code using tools like CloudFormation, Ansible, and Terraform.
  • Monitor and respond to infrastructure and application health, troubleshoot operational issues, and provide on-call support.
  • Maintain operational documentation, communicate proactively with teams, and ensure service delivery meets client expectations.

NICE Ltd. provides software used by 25,000+ global businesses, including 85 of the Fortune 100, to deliver customer experiences, fight financial crime, and ensure public safety. With over 8,500 employees across 30+ countries, NICE is recognized as a market leader in AI, cloud, and digital innovation.

US

  • Implement highly available, scalable infrastructure across AWS, GCP, and bare-metal environments.
  • Drive an "automation-first" culture by writing code in Python/Go to build self-healing systems.
  • Act as lead Incident Commander, develop response playbooks, and conduct post-incident analyses.

Zscaler accelerates digital transformation to secure customers with a cloud-native Zero Trust Exchange platform. The company processes over 200 billion transactions daily and fosters a culture of execution, collaboration, and accountability.

Europe

  • Own performance optimization and reliability of large-scale GPU clusters and InfiniBand networking for HPC workloads.
  • Diagnose and resolve complex system-level issues across GPU, network, and compute layers, integrating new hardware components.
  • Develop automation for monitoring, fault detection, and proactive remediation in distributed compute environments.

Our partner is building a next-generation AI cloud infrastructure environment, focusing on large-scale high-performance computing systems. They foster a highly technical engineering culture with experts across systems, networking, and virtualization, offering career development and continuous learning opportunities.