Monitor, operate, and support production AI infrastructure platforms.
Investigate and resolve infrastructure, networking, hardware, and platform-related incidents.
Collaborate with engineering teams, hardware vendors, and datacenter personnel to resolve technical issues.
Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure infrastructure for AI and data-intensive applications. The company is growing and invests heavily in AI infrastructure and platform services.
Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents.
Provide technical leadership for Kubernetes platform operations and supporting infrastructure services.
Mentor and support AI Infrastructure & Platform Operations Engineers, sharing technical knowledge through documentation and training.
Mirantis helps organizations ship code faster on public and private clouds, providing a public cloud experience on any infrastructure from the data center to the edge. The company serves many of the world's leading enterprises, including Adobe, DocuSign, Liberty Mutual, and PayPal, and is a leader in container management.
Manage procurement requests from intake to signature.
Lead all vendor negotiations directly to secure the best outcomes.
Tailscale is building the new Internet by delivering software that makes it easy to securely interconnect people and their devices, no matter where they are. Founded in 2019 and fully distributed, they're backed by Accel, CRV, Insight, Heavybit, and Uncork Capital.
Lead teams responsible for support Government applications and systems such as Travel Assistance Center.
Work with staff to address any performance problems and conduct performance reviews.
Communicate effectively in written and verbal communication coupled with strong listening skills.
Peraton is a next-generation national security company that drives missions of consequence spanning the globe. As the world’s leading mission capability integrator and transformative enterprise IT provider, they deliver trusted, highly differentiated solutions and technologies.
Act as the primary NVIDIA AI Enterprise and vector database expert for HyperPOD customer environments, owning end-to-end triage across GPU, NVAIE services, and storage.
Author and maintain support triage runbooks, diagnostics bundles, and collaborate on observability dashboards for platform health and RAG metrics.
Build hands-on labs, PoCs, and reusable technical assets to accelerate support readiness and partner success.
DataDirect Networks (DDN) is a global market leader in AI and high-performance data storage, powering many of the world's most demanding AI data centers across industries like life sciences, healthcare, financial services, and research. They are a global company with strong innovation, customer-centricity, and a team of passionate professionals committed to shaping the future of AI and data management.
Build and own the company's procurement function from scratch, including strategy, vendor lifecycle, cost optimization, tooling, and team development.
Lead strategic sourcing across major spend categories like SaaS, AI, cloud, and professional services, and negotiate high-stakes agreements.
Partner with Finance, Legal, Compliance, IT, and Security leaders to create governance frameworks that balance speed with discipline.
Headway is building a new mental healthcare system everyone can access by solving the administrative barriers to insurance for providers. They are a Series D company with over $325M in funding and more than 75,000 providers using their software, serving over 1 million patients.
Lead installation, commissioning, startup, and validation activities for modular data center deployments across domestic and international environments.
Lead troubleshooting and root cause analysis efforts across electrical, mechanical, controls/BAS, networking, and monitoring systems.
Partner directly with Engineering, Manufacturing, Supply Chain, Deployment Leadership, and Customer Operations teams to drive successful deployment execution.
Armada is the hyperscaler for the edge, delivering modular AI infrastructure from first deployment to AI factory. With nearly half a billion dollars in funding, Armada is backed by top investors such as Microsoft (M12), Founders Fund, and BlackRock.
Define and evolve the architecture and roadmap for enterprise‑scale Data and AI platforms.
Design and build multi‑tenant, multi‑region, highly available AI platforms with governance.
Lead capacity planning and cost optimization strategies for GPU and CPU workloads.
NEORIS accelerates growth in Ibero‑America, combining global engineering with regional expertise. With over 60,000 professionals across 55+ countries, they offer technical specialization career paths and value responsibility, collaboration, creativity, and commitment.
Act as a trusted advisor to clients, providing technical expertise and guidance throughout engagements
Conduct PoCs, workshops, presentations, and training sessions on GPU cloud technologies and best practices
Collaborate with clients to understand their business requirements and develop solution architectures
Lavendo partners with startups and high‑growth companies to help them hire top‑tier sales, go‑to-market, and technical talent. They are an equal opportunity workplace and consider all qualified applicants without regard to race, color, religion, national origin, age, sex, marital status, ancestry, disability, genetic information, veteran or military status, gender identity or expression, sexual orientation, or any other characteristic protected by law.
Build and maintain Python fleet tracking system that manages the full lifecycle of servers.
Build server management tooling that automates provisioning, health checks, GPU diagnostics, recovery and alerting.
Create and maintain metrics, dashboards, and alerting for hardware health across the fleet.
FAL is committed to keeping a large fleet of GPU servers healthy and productive. They offer a collaborative and supportive culture with learning and growth opportunities.
Lead the FAE function in APAC, providing hands-on technical support for our Metis AI Platform across OEM/ODMs and system integrators.
Build and grow a regional team, establishing processes and documentation for the APAC region.
Drive pre-sales engagement, design wins, and customer advocacy, collaborating with global R&D and Product teams.
Axelera AI is the leading provider of purpose-built AI hardware acceleration technology for AI inference, including computer vision and generative AI. Headquartered in Eindhoven, Netherlands, with R&D offices across Europe and employees in 20 countries, we are a global industry pioneer with a collaborative international culture.
Execute installation, commissioning, startup, and infrastructure validation activities for modular data center deployments.
Troubleshoot infrastructure issues across power, cooling, controls, monitoring, and network-connected systems.
Partner with Senior Deployment Engineers, Engineering, Manufacturing, Supply Chain, and Customer Operations teams during deployment execution.
Armada is the hyperscaler for the edge, delivering modular AI infrastructure from first deployment to AI factory with speed, scale and sovereignty. With nearly half a billion dollars in funding from top investors such as Microsoft, Founders Fund, and BlackRock, Armada is backed by collaborations including NVIDIA and Palantir.
Lead implementation engagements end-to-end, running the implementation support program.
Architect agent workflows with customers, designing agent architectures for specific use cases.
Build alongside customers, writing and iterating on agent prompts, skills, and configurations.
Warp is building the platform for agentic development, evolving from a terminal to a full agentic development environment. With over 750k active developers, it's one of the fastest-growing startups in the AI development space, backed by venture capital firms and passionate angels.
Design end-to-end AI integration architectures connecting LLM APIs, vector databases, and inference systems to existing backend infrastructure.
Build reusable ML infrastructure components like feature pipelines, model serving layers, and evaluation frameworks that multiple portfolio companies standardize on.
Establish AI system integration best practices and governance patterns that become repeatable playbooks across the holding company.
Emergence is a thematic holding company backed by the Pritzker Organization focused exclusively on acquiring and scaling category-defining software businesses. They invest in focused portfolios, specialized operating groups with deep domain expertise and proven playbooks.
Own the agent layer of the platform, including architecture, prompts, tool surfaces, and multi-agent orchestration.
Drive translation and dependency-mapping accuracy across unfamiliar legacy paradigms.
Write production agent code daily, using subagents and multi-agent workflows as the normal way of working.
LTS applies frontier AI to modernize legacy systems in healthcare and government IT. It is a small, senior engineering team operating with high leverage and a culture of innovation and collaboration.
Lead AI tool discovery and vendor evaluation efforts across the evolving AI ecosystem.
Partner cross-functionally with departments to identify opportunities for AI integration within workflows and business processes.
Establish and maintain governance frameworks, including responsible AI policies, prompting standards, data privacy guidelines, and best practices.
Intersect develops, constructs, and operates power and data infrastructure. The company is on an aggressive growth trajectory and looking for people hungry to tackle the largest energy challenges on the planet.
Plan, coordinate, and execute hardware integration projects from concept through deployment.
Work closely with systems engineers, solution architects, and field teams to ensure alignment.
Act as a key liaison between technical teams and clients, managing expectations.
AHEAD builds platforms for digital business, weaving together advances in cloud infrastructure, automation, analytics, and software delivery to help enterprises with digital transformation. They prioritize creating a culture of belonging, where all perspectives are valued and heard.
Hire, develop and lead inclusive, engaged, and high performing Global Enterprise Engineering teams.
Technical coach, mentor, and problem solver helping with technical escalations and navigating complex technical solutions.
Identify gaps, show initiative, and take ownership to build out programs and processes in support of the SE team.
Verkada is transforming how organizations protect their people and places with an integrated, AI-powered platform. A leader in cloud physical security, Verkada has expanded rapidly with 15 offices and 2,200+ full-time employees and is trusted by over 30,000 organizations worldwide.
Lead a team of architects to define platform and AI architecture strategy, establishing standards and patterns for automation and AI-enabled solutions.
Partner with business and product teams to identify automation and AI opportunities, evaluating emerging technologies and recommending tools.
Mentor architects, present to senior management, and support talent acquisition while ensuring alignment with enterprise architecture governance.
Velera is a premier payments credit union service organization and integrated fintech solutions provider serving over 4,000 financial institutions across North America. The remote-first company fosters a culture of inclusion, wellbeing, and belonging with a people-helping-people philosophy.
Maintain, improve, and extend an AI platform already running in production.
Handle a mix of backend development, data pipelines, DevOps, and infrastructure work.
Translate business and product requirements into technical decisions independently.
Provectus is an AI consultancy and solutions provider. We help businesses adopt AI technologies, offering development and integration services. While the job posting doesn't mention company size information, they seem to foster a flexible, autonomous, and tech-forward culture.