Implementing the improvements to the reliability, fault tolerance, scalability, and performance of our infrastructure
Managing incidents using your technical know-how to involve the appropriate teams and automate away manual practices
Improving observability across our systems (metrics, logs, tracing) to reduce time to detection and resolution
Newton is changing how Canadians trade crypto with the goal to make financial freedom achievable for everyone by giving their customers the tools and knowledge needed to navigate the crypto world. They are a remote team spread across Canada that values pushing boundaries and getting things done.
Become a subject matter expert in applications supporting Ooma customers.
Collaborate with Development, QA and other SREs to evaluate, deploy, and debug applications.
Improve observability by implementing, refining, and adjusting application monitoring and thresholds.
Ooma empowers people to connect in smarter ways by creating powerful communication experiences through their cloud-based platform. They help small business owners stay connected, provide customized unified communications solutions, and offer smart home security solutions.
Deploy, manage, and secure Ivanti’s production Software-as-a-Service (SaaS) environments in AWS and Azure
Automate common and repetitive tasks
Participate in on-call rotations for 24x7 coverage (follow-the-sun model) for incident response, issue triage, and problem resolution
Ivanti's mission is to elevate human potential within organizations by managing, protecting and automating technology for continuous innovation. They are committed to building a diverse team and fostering an inclusive environment where everyone belongs.
Working with engineers across Yelp in supporting new features and services.
Integrating tools to monitor platform stability and performance.
Help scale our Kubernetes clusters and AWS-based infrastructure while maintaining our platform's SLOs.
Yelp's engineering culture values individual authenticity and encourages creative solutions. They focus on helping users, growing as engineers, and having fun in a collaborative environment.
Enhance system monitoring with tools like Prometheus, Grafana, and ELK Stack, ensuring visibility and alignment with business objectives.
Transition manual processes to automated solutions using IaC tools (e.g., Terraform, Ansible) to streamline deployments and improve operational efficiency.
Improve pipeline architecture for fast, reliable releases, ensuring scalability and resilience to handle high volumes of changes.
Upsun (formerly Platform.sh) is a cloud application platform designed for hybrid teams, enabling developers, DevOps engineers, and platform teams to build, ship, and scale confidently without backend infrastructure hassles. Upsunners are a remote, global workforce committed to open source and an open, welcoming environment, valuing curiosity, knowledge, and innovative ideas.
Arista Networks is a data-driven, client-to-cloud networking company for large data center, campus, and routing environments. They have over $8 billion in revenue and value diversity of thought and perspectives, fostering an inclusive environment for creativity and innovation.
Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
Participate in an on-call rotation and act as incident commander for high-severity production events.
Partner with engineering teams to build reliability into new features before they ship to production
Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.
Build and maintain Linux-based systems, including physical and virtual servers, virtualization platforms, and storage systems.
Manage and administer on-premise and Cloud-based IT Infrastructure (VMware, AWS).
Develop automation scripts using IAC tools to build highly automated and scalable Linux systems and cloud infrastructure.
Business Wire, a Berkshire Hathaway company, is the global market leader in press release distribution and regulatory disclosure. Organizations, large and small, depend on us to accurately publicize market-moving news and multimedia, and generate social engagements that develop interactions with their target audiences.
Collaborate with engineering teams to design and implement scalable, secure systems.
Establish and manage service level objectives (SLOs) and service level agreements (SLAs).
Enhance incident response processes and post-mortem analysis for outages.
ClickHouse, recognized on the 2025 Forbes Cloud 100 list, is one of the most innovative and fast-growing private cloud companies. With more than 3,000 customers and ARR that has grown over 250 percent year over year, ClickHouse leads the market in real-time analytics, data warehousing, observability, and AI workloads.
Define and evolve reliability standards for the SmarterDx platform.
Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.
SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.
Design infrastructure, networking, and software platform architecture.
Build and maintain automation of Continuous Integration and Continuous Deployment pipelines.
Troubleshoot infrastructure, internal applications, networking, and security issues.
Loadsmart is a technology company focused on the logistics and supply chain industry. They leverage data and technology to automate and optimize freight transportation, connecting shippers and carriers to streamline the shipping process. They are a mid-sized company passionate about transforming the future of freight.
Automate processes with Ansible, Terraform; manage system configurations.
Apriorit is a software engineering company established in 2002, specializing in system programming, cybersecurity, and more. With over 400 specialists, they maintain high standards in software development and teamwork, serving high-profile clients worldwide.
Execute expert-level real-time monitoring and incident dispositioning for critical client applications.
Correlate complex data across metrics, traces, and logs to perform deep-dive root cause analysis.
Lead the triage of complex alerting environments to filter noise and ensure that high-priority incidents are managed.
Atmosera empowers businesses to redefine what's possible with modern technology and human expertise. They enable organizations to accelerate innovation, enhance security, and optimize operational agility as a Microsoft Partner.
Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's.
Manage site stability, performance, reliability, and maintain uptime for production environments.
CentralReach provides autism and IDD care software for Applied Behavior Analysis (ABA), multidisciplinary therapy, and special education. They are trusted by more than 200,000 users and is backed by Roper Technologies, Inc. (Nasdaq: ROP). Their culture is centered around impact, inclusion, and flexibility.
Collaborate with application engineering teams on platform infrastructure.
Enhance observability and spearhead the adoption of SRE best practices.
Build and maintain reliable CI/CD pipelines, tooling, and infrastructure.
Rula strives to provide quality, evidence-based, compassionate mental healthcare and aims to create a world where mental health is no longer stigmatized. They are a remote-first company operating in most U.S. states, and are dedicated to having a culture of inclusion that supports their employees.
Lead efforts to scale and improve our infrastructure.
Develop and support internal team tooling.
Troubleshoot, debug and resolve issues as part of a shared on-call rotation.
Lillio, formerly HiMama, empowers early childhood educators through innovative tools. They are a Series B, private-equity backed company recognized as an industry leader and selected in 2025 by Time Magazine as one of the world's top EdTech companies.
Support teams with self‑service tools for provisioning, building, testing, and deploying applications.
Improve system reliability, security, and scalability using automation and modern DevOps practices.
Maintain and enhance CI/CD pipelines (Jenkins, GitLab CI/CD).
ST Engineering iDirect is reshaping the future of global connectivity as a leader in satellite communications. Their groundbreaking technology empowers customers to grow, innovate, and transform their networks.
Monitor cloud infrastructure and application health using observability tools; respond to alerts.
Perform Tier 1 incident triage, document findings, and escalate appropriately to Development or SRE teams.
Monitor and support CI/CD pipelines to ensure successful builds and deployments.
Lumin Digital empowers credit unions and banks by creating cutting-edge digital experiences. They are a trailblazer in digital banking solutions with a culture that fosters trust, respect, and boldness, encouraging team members to explore and experiment with new ideas.
Resolve technical issues across infrastructure, deployments, databases, caching, and web performance.
Manage support tickets via Zendesk, with occasional live chat or voice support where needed.
Contribute to platform reliability by monitoring alert queues and participating in on-call rotation during working hours.
Upsun is a cloud application platform designed for hybrid teams, where AI agents and humans collaborate to solve complex problems, allowing developers, DevOps engineers, and platform teams to build, ship, and scale confidently. They are a remote, global workforce committed to open source and an open, welcoming environment.
Support Engineering and Platform automation efforts with development and scripting skills.
Automate operational processes using scripting languages.
Develop, implement, and continually improve system and network monitoring and alerting capabilities and procedures.
Cotiviti is focused on providing payment accuracy and analytics-driven solutions that drive measurable results. They offer team members a competitive benefits package and has a culture of valuing individual qualifications without regard to race, gender, or other protected characteristics.