Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.
Architect, operate, improve and secure the platform the Garner Health app runs on
Boost development velocity and productivity
Build systems to a high engineering standard and hold others to the same high standard
Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.
Design and develop a highly available, scalable, and secure ClickHouse Cloud platform.
Build innovative deployment automation across cloud, hybrid, and on-prem systems.
Solve unique scaling, reliability, and performance challenges in regulated environments.
ClickHouse is a fast-growing private cloud company recognized on the 2025 Forbes Cloud 100 list. With over 2,000 customers and ARR that has more than quadrupled over the past year, ClickHouse leads the market in real-time analytics, data warehousing, observability, and AI workloads.
Designs, implements, and continuously improves observability strategies across services.
Focuses on understanding system behavior in production, identifying failure modes, performance bottlenecks, and reliability risks.
Evolves and maintains shared AWS CDK and CDK8s constructs, with emphasis on observability, autoscaling, and operational safeguards.
Truelogic is a leading provider of nearshore staff augmentation services. They have a team of 600+ highly skilled tech professionals based in Latin America, partnering with U.S. companies on impactful projects and valuing expertise and aspirations.
Collaborating with the platform team on environment preparation for platform integration and expansion.
Automate installation and upgrade processes to reduce time-to-value and improve repeatability across customer deployments.
Apply and validate security configurations using Security Technical Implementation Guides (STIGs) and Security Requirements Guides (SRGs).
Istari Digital is a digital engineering software company enabling customers to turn the physical world into the digital to accomplish their specific mission or business objectives. At Istari, they are passionate about their mission of creating the world's first open and scalable industrial metaverse.
Be a keen learner, working with cloud-native, highly scalable infrastructure and gaining expertise in container orchestration, networking, and observability.
Be a passionate problem solver, tackling scalability, reliability, and troubleshooting challenges in distributed systems.
Be a great communicator, engaging directly with developers, engineering teams, and product teams to understand infrastructure challenges and provide solutions.
Temporal provides an open-source programming model that simplifies code, improves application reliability, and helps developers focus on delivering features faster. They aim to be the reliable foundation of every developer’s toolbox and value curiosity, drive, collaboration, genuineness, and humility.
Build self-service systems that automate managing, deploying and operating services.
Automate environment observability and resilience. Enable all developers to troubleshoot and resolve problems.
Ensure we hit defined SLOs, including participation in an on-call rotation.
Cohere is focused on scaling intelligence to serve humanity by training and deploying frontier models for developers and enterprises. They are a team of researchers, engineers, and designers. They value diversity and strive to create an inclusive work environment.
Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.
VGS is the world's leader in payment tokenization, empowering clients and partners by tokenizing sensitive payment data and limiting compliance scope. They embed a universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks and more. While the job posting doesn't specify size, they appear to have a culture that values transparency, collaboration, grit, and humility.
Shape the way Scalable runs microservices in a performant, secure, and cost-efficient way. Collaborate with cross-functional teams to understand scalability requirements. Develop and maintain internal tooling around Monitoring, Developer Portal, and Load Testing.
Scalable Capital is a leading digital investment and banking platform with a full banking licence, empowering people across Europe to shape their own finances.
Design, create, and maintain software and systems to improve the availability, scalability, and efficiency of Thumbtack's services
Set the architectural direction of infrastructure and platform services while supporting the engineering organization
Design and implement tools and processes used for deployment, change, service, and infrastructure management
Thumbtack helps millions of people confidently care for their homes through personalized guidance, AI tools, and a hiring experience. They have a growing community of 300,000 local service businesses.
Configure and maintain cloud infrastructure automation using Terraform, focusing on CDN optimization and content delivery performance
Develop capacity planning strategies and performance optimization initiatives for high-volume spatial content delivery.
Instrument services to understand system health.
Miris is a cutting-edge technology company building the future of 3D content delivery at global scale. Our mission is to empower creators and developers to deliver high-fidelity, photorealistic 3D experiences to billions of users instantly, seamlessly, and across all major platforms and devices.
Help build and operate core cloud-native systems including VKE, VLB, VCR, Vultr Inference, NAT Gateways, and our internal APIs. The ideal candidate has a strong understanding of Kubernetes components, container runtime internals, and modern IaC/automation practices. This role will have a direct impact on Vultr’s global cloud infrastructure footprint.
Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world.
Contribute to our core product, primarily in Go, on services that power our applications.
Design and refine technical systems, helping to shape them to remain scalable, reliable, and elegant.
Collaborate closely across disciplines to explore problems, prototype ideas, and iterate quickly.
Humanitec is reshaping how enterprises build and run their cloud-native setups and helps teams build Internal Developer Platforms (IDPs) that unlock true developer self-service. They are a fully remote company where small teams work closely.
Focused on creation of tools to enhance and support automated workflows.
Architecting and implementing solutions to support the team’s production deployments.
Work on a high-performing remote team with interesting problems to solve.
CrowdStrike is a global leader in cybersecurity, protecting the people, processes, and technologies that drive modern organizations. Since 2011, their mission has been to stop breaches, and they've redefined modern security with the world’s most advanced AI-native platform.
Automate infrastructure provisioning, configuration management, monitoring, and operational workflows using IaC and scripting languages.
Own the deployment, maintenance, and lifecycle management of systems supporting engineering, leveraging deep expertise in Kubernetes.
Troubleshoot complex infrastructure and application issues, driving root-cause analysis and developing long-term remediation solutions
SingleStore delivers the cloud-native database with the speed and scale to power the world’s data-intensive applications. They are venture-backed and headquartered in San Francisco with offices in Sunnyvale, Raleigh, Seattle, Boston, London, Lisbon, Bangalore, Dublin and Kyiv.
Design and implement cloud-native infrastructure that powers core product capabilities at scale.
Build proprietary solutions (sync engines, observability pipelines, DNS management systems) that differentiate Files.com.
Engineer infrastructure for speed, resilience, and maintainability across high-volume, distributed workloads.
Files.com powers secure file transfer and automation for over 4,000 brands. They are a profitable, founder-led SaaS company with a flat, high-trust engineering organization, where engineers are empowered to take ownership of projects.
Design and manage infrastructure-as-code with Terraform and GitOps.
Build and maintain secure CI/CD pipelines with integrated security automation.
Deploy and operate Kubernetes/K3s clusters in AWS GovCloud (IL5/IL6).
Rackner is a cloud-native software consultancy delivering solutions for startups, enterprises, and the public sector. They enable digital transformation through DevSecOps, AI/ML, and cloud-first innovation, solving high-impact problems and delivering secure, scalable solutions for the Department of Defense and federal health programs.
Oversee the reliability, scalability, performance, and security of key production services.
Collaborate with cross-functional teams to develop and maintain resilient infrastructure.
Provide expert mentorship and guidance on best practices to engineers throughout the organization.
Cision is a global leader in PR, marketing and social media management technology and intelligence, helping brands and organizations connect with customers and stakeholders to drive business results. The company has offices in 24 countries throughout the Americas, EMEA and APAC.
Take an active role in influencing our roadmap and your own career objectives
Work with your team to deliver new features, then use the results to iterate and improve
Drive projects from initial ideation all the way to operations once it is in the hands of customers
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. The company helps more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack. Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do.
Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.
NICE software products are used by 25,000+ global businesses to deliver extraordinary customer experiences, fight financial crime and ensure public safety.