Lead reliability-focused design and readiness reviews.
Build, operate, and continuously improve our observability stack.
Own and evolve incident management practices.
Transcend is building the privacy platform that easily embeds privacy into your entire tech stack. They are growing quickly, backed by top-tier investors and are proud to serve some of the world's most iconic brands.
Automate the provisioning of all of Juniper Square’s infrastructure in code.
Partner with our Platform Engineering team on building developer tooling / improving developer experiences via joint initiatives and enhancements.
Partner with our Data Engineering team on improving our data posture and driving operational excellence.
Juniper Square's mission is to unlock the full potential of private markets by digitizing them to bring efficiency, transparency, and access. They are a values-driven organization with a hybrid workplace strategy, allowing employees to collaborate effectively across multiple countries and offering physical offices in several major cities.
Configure/operate monitoring, logging, and tracing tools for application performance.
Build dashboards and automation workflows for system reliability and uptime.
Collaborate with software engineering teams to design and implement robust systems.
Jobgether is a platform that uses AI-powered matching to connect job seekers with employers. They ensure applications are reviewed quickly and fairly, then share a shortlist with the hiring company for final decisions.
Design, build, and maintain automated CI/CD pipelines to enable fast, secure, and reliable deployments.
Provision, manage, and optimize core AWS services to support scalable, highly available applications.
Implement and maintain IaC frameworks to ensure infrastructure is version-controlled, repeatable, and auditable.
Arine is a healthcare technology and clinical services company dedicated to ensuring individuals receive the safest and most effective treatment. They are backed by leading healthcare investors and collaborate with top healthcare organizations, managing more than 18 million lives across prominent health plans.
Design, deploy and maintain a cloud infrastructure to support a Dataiku SaaS offering mainly on AWS and Azure and GCP
Continuously improve the infrastructure, deployment and configuration to deliver more reliable, resilient, scalable and secure services
Automate as much as possible all technical operations
Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. They connect many data science technologies and integrate the best of data and AI tech.
Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.
Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.
Contribute to building and operating the infrastructure that supports the HackerOne platform.
Improve the reliability, security, and scalability of our systems.
Design and operate highly available cloud systems and apply best practices for reliability, observability, and security.
HackerOne is a global leader in Continuous Threat Exposure Management (CTEM). The HackerOne Platform unites agentic AI solutions with the ingenuity of the world’s largest community of security researchers to continuously discover, validate, prioritize, and remediate exposures across code, cloud, and AI systems. They combine the ingenuity of the largest security research community with a best-in-class AI-powered platform, trusted by the world’s top organizations.
Design, operate, and continuously improve the cloud infrastructure that powers our systems using infrastructure-as-code, monitoring, and observability.
Own critical parts of the platform: identify bottlenecks, propose and implement improvements, and drive reliability and performance at scale.
Run Kubernetes in production and evolve how we operate it.
Dune is on a mission to make crypto data accessible. They’re a collaborative multi-chain analytics platform used by thousands of developers, analysts, & investors to understand the on-chain world and the frontiers of finance. They are a team of ~60 employees working together across Europe and eastern US timezones.
Design, develop, and implement platform solutions that enhance the reliability, security, and scalability of the Database Platform infrastructure.
Provide technical leadership in AWS cloud infrastructure, networking, CI/CD, and security for cloud infrastructure solutions.
Mentor and coach team members, fostering a culture of knowledge sharing, technical excellence, and continuous improvement.
SYSTABUILD is building a shared cloud and platform foundation for a group of leading software companies in the construction, CAD and ERP domain. They are looking for a Lead Cloud Infrastructure Engineer to take a key role in designing, operating, and evolving their central cloud infrastructure and platform services.
Build and maintain CI/CD pipelines and infrastructure-as-code.
Lead observability and monitoring initiatives.
Truelogic is a nearshore staff augmentation services provider headquartered in New York. They deliver technology solutions to companies of all sizes, helping them achieve their digital transformation goals with a team of 600+ highly skilled tech professionals based in Latin America.
Operate and maintain large-scale data systems, ensuring stability and performance.
Design, implement, and optimize deployment processes using virtualization.
Monitor system health, analyze failures, and identify instability sources.
Jobgether is a platform that uses AI-powered matching to connect candidates with companies. They ensure applications are reviewed quickly, objectively, and fairly, then share a shortlist of top candidates directly with the hiring company.
Design and manage infrastructure configurations for high availability and performance
Automate processes for consistent and reliable deployment and scaling
Monitor application performance, execute optimizations and troubleshoot issues
Yazio helps millions of users every day on their journey to healthy nutrition and greater well-being. They are on their own growth journey to make Yazio the most successful nutrition app worldwide and has an international team with English as their company language.
Design and evolve production environments, define standards and best practices.
Partner with engineering and IT teams to build scalable, reliable systems.
Lead incident response practices, and set guardrails around security, reliability, and cost management.
They are looking for a Senior Site Reliability Engineer who can own the architecture, governance, and cost efficiency of their cloud and platform infrastructure. This role is a remote contractor role and they are seeking candidates located in LATAM.
Design, implement, and manage scalable cloud infrastructure.
Automate and optimize infrastructure management tasks.
Rival Group is a forward-thinking, results-driven organization obsessed with helping innovative brands get closer to their customers. They have a fast-growing tech company with award-winning market research agency with offices in Chicago, Toronto, and Vancouver.
Cultivate and advocate for standard DevOps practices.
Implement standards to meet security requirements.
Provide DevOps services to alleviate the workload of development teams.
Appfire creates software that empowers teams to break silos and collaborate seamlessly. They are a remote-first company with 850+ employees across 28 countries, fostering an environment where everyone is respected.
Own the reliability, scalability, and performance of Peec AI’s core systems and infrastructure
Design, build, and maintain the tooling, automation, and monitoring that keep our services fast, secure, and highly available
Partner closely with product and engineering teams to ensure new features are reliable, observable, and easy to operate from day one
Peec AI is one of Europe’s fastest-growing Series A startups (no employee count/culture details given). They provide exciting and challenging work in the AI space.
You own uptime, observability, incident response, and root cause analysis.
Own the AWS architecture.
Make ML pipelines reliable.
Ferra is building AI infrastructure for structural steel estimation. They process large-scale construction drawing PDFs, run computer vision + LLM pipelines, and generate structured steel graphs, takeoffs, and export-ready models. The team is small and technical, which means high ownership, fast decisions, and work has a direct impact on the core product.
Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack.
Developing infrastructure to support cloud-based applications.
Creating deployment architect and continuous delivery pipelines.
Designing high-availability approaches, and implementing monitoring architecture.
Nearform is a digital and AI engineering consultancy with a reputation for experience-led modernization. They focus on creating transformative digital products for enterprise customers across the UK and Ireland. Nearformers form a close-knit community built on trust and camaraderie.
Develop automation code to provision and operate infrastructure at scale.
Build resilient, scalable, secure, and observable services with cost optimization.
Proactively identify and address security concerns across systems and infrastructure.
Globality uses AI to transform enterprise spending into a more efficient and inclusive process. They aim to revolutionize enterprise procurement with AI and have a culture built on trust, collaboration, and innovation, fostering an environment where every individual feels valued and included.