Source Job

Global

  • Travel frequently (up to 75%) to military installations to support system fielding, integration, and training.
  • Partner with operators and soldiers to tailor system configurations to mission needs.
  • Troubleshoot complex system and network issues and drive resolution to completion

Linux Kubernetes VMware Networking Troubleshooting

6 jobs similar to Senior Forward Deployed Engineer

Jobs ranked by similarity.

Global

  • Travel extensively (up to 75%) to support fielding, training, and operational deployment at military installations.
  • Serve as the operational liaison between users and engineering teams to validate system performance and resolve issues.
  • Capture user feedback, workflows, and mission needs to inform product development and enhancements.

Research Innovations, Inc. (RII) is breaking through the big, slow status quo with transformative technology that fundamentally improves the world. We build advanced software solutions for government and military missions, applying agile development and user-centered design to solve complex, mission-critical problems.

  • Maintain the reliability and performance of customer environments remotely, supporting Mirantis Opensack/k0s layers.
  • Diagnose and resolve system-level issues, requiring hands-on Linux administration experience.
  • Troubleshoot customer environments based on Linux, OpenStack, Kubernetes, networking, and other cloud technologies; detect, report, and resolve issues.

Mirantis helps enterprises move to the cloud on their terms, delivering a true cloud experience on any infrastructure, powered by Kubernetes. They serve many of the world’s leading enterprises and value openness, collaboration, risk-taking, and continuous growth.

$120,000–$160,000/yr
US Unlimited PTO

  • Support on-site and remote deployments of Shift5's platform across rail, aviation, and defense environments, handling installation and troubleshooting.
  • Partner with Engineering and Customer Success teams to ensure successful system implementations and improve deployment processes.
  • Navigate operational technology environments and customer site constraints, contributing directly to scaling field operations and capabilities.

Shift5 builds the data platform for onboard operational technology (OT), delivering cybersecurity, predictive maintenance, and compliance capabilities for defense and commercial fleets. It is a fast-growing, mission-driven startup with a focus on operational readiness and resilience.

APAC

  • Partner directly with customer engineering teams running training and inference workloads in production.
  • Investigate failures involving distributed training, Kubernetes orchestration, GPU allocation, networking, and storage systems.
  • Identify recurring patterns across customer issues and drive long term reliability improvements.

Lightning AI is the company behind PyTorch Lightning, building an end-to-end platform for developing, training, and deploying AI systems. They serve solo researchers, startups, and large enterprises, operating globally with offices in New York City, San Francisco, Seattle, and London.

US 4w PTO 14w maternity 14w paternity

  • Own Render's core network infrastructure across multiple data centers and cloud providers, shaping how networking evolves as Render rapidly scales.
  • Design and build customer-facing networking capabilities that give users greater flexibility in how their services connect and communicate, and how traffic is routed.
  • Investigate complex networking issues across the stack, from the kernel and data plane to distributed systems and edge networking.

Render is building a modern cloud platform for developers creating AI-native, full-stack, multi-service applications, eliminating the tradeoff between hyperscaler power and developer-friendliness. They are a diverse and talented team that values craft, velocity, and user experience.

Engineer

FAL
$180,000–$250,000/yr
US

  • Build and maintain Python fleet tracking system that manages the full lifecycle of servers.
  • Build server management tooling that automates provisioning, health checks, GPU diagnostics, recovery and alerting.
  • Create and maintain metrics, dashboards, and alerting for hardware health across the fleet.

FAL is committed to keeping a large fleet of GPU servers healthy and productive. They offer a collaborative and supportive culture with learning and growth opportunities.