Own the technical direction of Remote's SRE/Platform domain.
Define and drive the reliability strategy across the platform.
Identify and lead AI enablement initiatives across the engineering organisation.
Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.
Own and evolve CI/CD pipelines using GitHub Actions and OIDC-based authentication for microservices and agentic workloads.
Automate infrastructure provisioning using Infrastructure as Code tools such as Terraform and CloudFormation.
Operate and scale our Kubernetes platform, including autoscaling, ingress, and multi-tenant isolation for enterprise customers.
Zingtree is a next-generation intelligent process automation platform reimagining customer experience operations for enterprise support leaders. It is a small team with high ownership, emphasizing automation, collaboration, and transparency.
Deploy and maintain infrastructure using Terraform on AWS.
Operate and govern production-grade platforms running on Kubernetes / EKS.
Build and maintain CI/CD pipelines using GitHub Actions.
Muttdata is a dynamic startup committed to crafting innovative systems using cutting-edge Big Data and Machine Learning technologies. They are looking for a hands-on DevOps to join a strategic initiative focused on deploying and operating Data & AI platforms.
Lead the design, implementation, and ongoing improvement of reliable, scalable, performant, and secure production platforms and services.
Work closely with cross-functional teams to build and maintain resilient infrastructure and deployment patterns.
Provide technical leadership and mentorship to engineers across the organisation, promoting strong engineering standards and operational best practice.
Cision empowers individuals to make an impact and values diverse perspectives. They foster curiosity, collaboration, and innovation while driving meaningful contributions to brands; they have offices in 24 countries throughout the Americas, EMEA and APAC.
Own and operate GPU and accelerator clusters for AI training, inference, and experimentation, ensuring reliability and cost-efficiency.
Build and optimize scheduling, orchestration, and serving systems using frameworks like vLLM and Triton to improve latency, throughput, and memory efficiency.
Partner with ML engineers to remove workflow bottlenecks and build observability for GPU utilization, capacity, and incident response.
Kraken is a crypto exchange platform building premium financial products for traders and institutions, accelerating global crypto adoption. It is a mission-driven, fully remote company with a world-class team of crypto experts spread across more than 70 countries.
Design and operate our Kubernetes ecosystem with a focus on high availability and zero-downtime operations.
Own and evolve our PaaS strategy, using GitOps and CI/CD to empower domain teams to deploy independently.
Define and implement our observability strategy across metrics, logs, and tracing.
Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one financial B2B solution integrating banking, accounting, financial management, and invoicing into a mobile-first platform, with about 346 million in funding.
Design and deploy GPU cluster architectures using tools like Ansible, Terraform, Kubernetes, and Slurm.
Lead technical deep-dives, workshops, and present solutions to stakeholders, translating complex concepts.
Automate provisioning and monitoring with Infrastructure as Code, and produce documentation and training materials.
Gcore is a global provider of infrastructure and software solutions for AI, cloud, network, and security, powering digital experiences worldwide. The company collaborates with leading technology partners and employs over 550 professionals building foundational technologies.
Design, build, and maintain the core infrastructure layer supporting GenAI products.
Implement secure access controls and authentication mechanisms integrated by default into the AI platform components.
Develop and manage observability, monitoring, and logging solutions for GenAI workloads and infrastructure.
PointClickCare is a healthcare technology company. This team will serve as the product owner for GenAI capabilities, closely integrated with key horizontal partners to ensure delivery of safe, scalable and high-impact AI Products.
Own and evolve Quansight's cloud infrastructure across AWS, Azure, and GCP.
Build, deploy, and maintain internal dashboards and reporting for operations and project management.
Lead infrastructure engagements for clients from scoping and architecture through delivery, upskilling client teams.
Quansight is rooted in the Python and PyData ecosystems. They provide services ranging from open-source software development to training and consulting, believing in a culture of do-ers, learners, and collaborators.
Drive the stability and reliability of Epic's GCP infrastructure.
Manage and harden our Docker and GKE container platform.
Maintain and improve CI/CD pipelines.
Epic is the leading digital reading platform for kids ages 12 and under, used by millions of children, families, and educators around the world. As Epic continues to grow, we are reimagining what reading can be through thoughtful technology, data, and global collaboration to make learning more engaging, accessible, and impactful.
Design, build, and maintain scalable, reliable systems on GCP.
Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager.
Manage incident response, conduct postmortems, and implement improvements to reduce recurrence.
SupplyHouse.com is an industry-leading e-commerce company specializing in HVAC, plumbing, heating, and electrical supplies since 2004. They value every individual team member and cultivate a community where people come first with Generosity, Respect, Innovation, Teamwork, and GRIT.
Design, deploy, and operate critical systems balancing reliability, cost, and agility.
Perform troubleshooting and root-cause analysis of system operation issues.
Loadsmart is a logistics technology company valued at over $1 billion. We are a collection of industry veterans and user-centered engineers using innovative technology to fearlessly reinvent the future of freight.
Construct infrastructure as code, developing and enforcing best practice across configurations while preventing drift between Terraform configurations and infrastructure deployments.
SentiLink provides innovative identity and risk solutions, empowering institutions and individuals to transaction with confidence. They are building the future of identity verification in the United States replacing a clunky, ineffective, and expensive status quo with solutions that are 10x faster, smarter, and more accurate.
Build small to medium-sized infrastructure components using Terraform and AWS.
Ensure reliable build-and-deploy cycles by maintaining and debugging CI/CD workflows, including GitHub Actions and ArgoCD.
Learn to troubleshoot and resolve issues in containerized environments, including Kubernetes pods and EKS networking bottlenecks.
TrueML is a mission-driven financial software company that aims to create better customer experiences for distressed borrowers. The TrueML team includes inspired data scientists, financial services industry experts and customer experience fanatics building technology.
Design, deploy, and manage Kubernetes-based platforms in production.
Implement and manage automation frameworks for infrastructure provisioning and operations.
Administer and optimize VMware environments (vSphere, ESXi, vCenter).
EPlus believes technology is a people business and delivers solutions that make a real difference. Their team is passionate, skilled, and driven, valuing collaboration, innovation, and extraordinary results and dedicated to fostering, cultivating, and preserving a culture that represents diversity, enables inclusion.
Designing and operating always-on product environments for customer demos, internal use, and stakeholder access.
Building feature branch / preview environments to support UX and rapid feedback loops.
Integrating core system components across Fleet Management, Edge Management, OS, and related services.
Defense Unicorns delivers mission value by streamlining software delivery. They are composed of innovators, software engineers, and veterans with decades of experience delivering technology programs across the federal market.
Lead the design, implementation, and continuous improvement of our cloud infrastructure and DevOps practices.
Ensure that our systems are scalable, reliable, and secure, enabling seamless software delivery across environments.
Improve development velocity while increasing system reliability
Cadence is building a remote care delivery system that keeps older people healthy, out of the hospital, and at home. They support tens of thousands of active patients nationwide with their AI‑powered system and scalable clinical model enabling proactive, population‑level care.
Responsible for overall health, availability, performance, security, cost and day-to-day operations of the GCP platform and toolset.
Build and maintain Azure DevOps pipelines for infrastructure and application deployment.
Design, implement, maintain, operate GCP infrastructure across DEV, QA, STAGE, PROD etc.
Resultant is a consulting firm that helps clients make technology a strategic asset and use data to guide better decisions. They employ over 350 team members who operate remotely and from offices and hubs around the United States.
Build internal tooling to help other engineers and the rest of the company understand and operate our system.
Design and implement security best practices for our team and infrastructure.
Reduce toil through automation, including building and maintaining CI/CD infrastructure.
Openly is rebuilding insurance from the ground up by re-envisioning and enhancing every aspect of the customer experience. They are a rapidly growing team of exceptional, curious, empathetic people with a wide range of skill sets, spanning many departments.
Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and their team thrives in an innovation-driven environment.