Respond to production incidents and contribute to post-incident analysis.
Identify and automate manual processes to improve efficiency and reduce risk.
Enhance monitoring tools and platforms to improve observability.
Restaurant365 is a SaaS company that provides a unique, centralized solution for accounting and back-office operations for restaurants. They focus on empowering team members to produce top-notch results while elevating their skills.
Optimize cloud usage to balance performance, cost, and reliability.
Implement Infrastructure as Code (IaC) using Terraform/Bicep for consistent and repeatable deployments.
Orchestry is a rapidly growing SaaS company in the Microsoft 365 ecosystem, helping organizations simplify, govern, and automate their digital workplace. They work globally with partners and enterprise customers and operate as a fully remote company by design, intentionally building the foundation for the future of the company including their people, leadership capability, and culture.
Own the operational stability and performance of Juul’s hybrid cloud infrastructure.
Lead automation efforts and architect for reliability.
Act as the final escalation point for critical incidents.
Juul Labs aims to transition the world’s billion adult smokers away from combustible cigarettes and eliminate their use, while also combating underage usage of their products. They are backed by leading technology investors and are committed to hiring great talent and building a diverse team.
Designs and maintains CI/CD pipelines using GitLab CI/CD.
Implements Infrastructure as Code (IaC) with tools like Terraform.
Automates complex workflows and enhances infrastructure scalability.
Everseen is a vision AI solutions provider for global retailers. They have over 900 employees globally, with headquarters in Cork, Ireland, European headquarters in Cork, Ireland, and a U.S. headquarters in Miami, with hubs in Romania, Serbia, India, Australia, and Spain.
Design and evolve production environments, define standards and best practices.
Partner with engineering and IT teams to build scalable, reliable systems.
Lead incident response practices, and set guardrails around security, reliability, and cost management.
They are looking for a Senior Site Reliability Engineer who can own the architecture, governance, and cost efficiency of their cloud and platform infrastructure. This role is a remote contractor role and they are seeking candidates located in LATAM.
Design, build, and maintain highly available, scalable infrastructure.
Manage and optimize infrastructure across GCP, AWS, Azure, and other cloud providers.
Develop comprehensive monitoring, logging, and alerting systems.
Bobsled is seeking a Site Reliability Engineer to enhance its data-sharing platform's reliability and scalability. We're a company that values growth, offering flexible work hours in a fully remote environment and fully sponsored individual coaching for all employees.
Operate and support Azure-based infrastructure and Rubrik backup solutions.
Manage and resolve incidents, changes, and problem tickets related to Azure and Rubik environments.
Contribute to continuous service improvements and automation initiatives.
Deutsche Telekom IT Solutions Slovakia entered the life of the Košice region in 2006. They have grown to be the second largest employer in the eastern part of the country with more than 3900 employees, providing innovative information and communication technology services.
Own the reliability, scalability, and performance of Peec AI’s core systems and infrastructure
Design, build, and maintain the tooling, automation, and monitoring that keep our services fast, secure, and highly available
Partner closely with product and engineering teams to ensure new features are reliable, observable, and easy to operate from day one
Peec AI is one of Europe’s fastest-growing Series A startups (no employee count/culture details given). They provide exciting and challenging work in the AI space.
Contribute to the design and implementation of Infrastructure as Code (IaC) solutions.
Build and optimize CI/CD pipelines using GitHub Actions and Azure DevOps.
Implement comprehensive monitoring, logging, and alerting strategies using Grafana.
InvestorFlow delivers industry specialized CRM, built on Salesforce, and digital portals. They help alternative asset firms find opportunities, create and manage relationships, and turn relationship insights into action. They serve over 175 clients and are headquartered in San Francisco, California.
Design, deploy and maintain a cloud infrastructure to support a Dataiku SaaS offering mainly on AWS and Azure and GCP
Continuously improve the infrastructure, deployment and configuration to deliver more reliable, resilient, scalable and secure services
Automate as much as possible all technical operations
Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. They connect many data science technologies and integrate the best of data and AI tech.
Automate the provisioning of all of Juniper Square’s infrastructure in code.
Partner with our Platform Engineering team on building developer tooling / improving developer experiences via joint initiatives and enhancements.
Partner with our Data Engineering team on improving our data posture and driving operational excellence.
Juniper Square's mission is to unlock the full potential of private markets by digitizing them to bring efficiency, transparency, and access. They are a values-driven organization with a hybrid workplace strategy, allowing employees to collaborate effectively across multiple countries and offering physical offices in several major cities.
Contribute to building and operating the infrastructure that supports the HackerOne platform.
Improve the reliability, security, and scalability of our systems.
Design and operate highly available cloud systems and apply best practices for reliability, observability, and security.
HackerOne is a global leader in Continuous Threat Exposure Management (CTEM). The HackerOne Platform unites agentic AI solutions with the ingenuity of the world’s largest community of security researchers to continuously discover, validate, prioritize, and remediate exposures across code, cloud, and AI systems. They combine the ingenuity of the largest security research community with a best-in-class AI-powered platform, trusted by the world’s top organizations.
Design, build, and maintain secure, scalable cloud infrastructure.
Own CI/CD pipelines and deployment workflows across services and environments.
Improve reliability, availability, and performance through monitoring, alerting, and incident response practices.
Jobgether is a company that uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates and share this short list directly with the hiring company.
Take charge of production and development environments to drive automation and reliability.
Turn complex manual steps into simple, repeatable, automated flows and CI/CD pipelines.
Own high-impact data and system changes in production, implementing automation and guardrails.
They are a global materials science and digital identification solutions company with locations in over 50 countries. They have approximately 35,000 employees worldwide and are committed to fostering a culture of curiosity and courage.
Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
Create highly automated, available and scalable systems by applying software and infrastructure principles
Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale
66degrees is an AI transformation partner. They guide enterprises from business challenges to quantifiable outcomes, helping businesses reach their inflection point where chaotic data becomes a strategic asset, complexity becomes clarity, and AI becomes an engine for growth. They believe in thriving through challenges and winning together.
Design, operate, and continuously improve the cloud infrastructure that powers our systems using infrastructure-as-code, monitoring, and observability.
Own critical parts of the platform: identify bottlenecks, propose and implement improvements, and drive reliability and performance at scale.
Run Kubernetes in production and evolve how we operate it.
Dune is on a mission to make crypto data accessible. They’re a collaborative multi-chain analytics platform used by thousands of developers, analysts, & investors to understand the on-chain world and the frontiers of finance. They are a team of ~60 employees working together across Europe and eastern US timezones.
Lead incident response as Incident Commander, coordinating teams, communications, and service restoration
Produce executive-level incident reports, run RCAs, and drive continuous improvement
Enforce change management and risk assessment for production changes
Truelogic is a leading provider of nearshore staff augmentation services headquartered in New York, delivering top-tier technology solutions to companies of all sizes. Their team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects.
Design, build, and maintain shared platform services that support secure and scalable infrastructure across client and internal environments.
Develop and maintain infrastructure-as-code (IaC) using tools such as Terraform, ARM/Bicep, or similar frameworks.
Build automation for system provisioning, configuration management, patching, and lifecycle operations.
Sentinel Blue is bringing enterprise-class cybersecurity to small and medium sized businesses. They are pushing the envelope of how things are done and constantly seeking innovative ways to meet that mission.