Provide production support on a shift according to the team on-call roster.
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support.
Continuously monitor the health and performance of our services, systems, and infrastructure.
Granicus builds and maintains technology that is transforming the Govtech industry by bringing governments and its constituents together. They serve 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers, and are known for being one of the best companies to work for.
Build internal tooling to help other engineers and the rest of the company understand and operate our system.
Design and implement security best practices for our team and infrastructure.
Reduce toil through automation, including building and maintaining CI/CD infrastructure.
Openly is rebuilding insurance from the ground up by re-envisioning and enhancing every aspect of the customer experience. They are a rapidly growing team of exceptional, curious, empathetic people with a wide range of skill sets, spanning many departments.
Improve the reliability, performance, and scalability of our production platform.
Operate reliable infrastructure, improve observability, and drive incident response.
Use data-driven reliability practices such as SLIs, SLOs, SLAs, and DORA metrics.
VRChat is a game-changing platform that provides an endless collection of social VR experiences. They empower their community to bring their imaginations to life and help shape the metaverse. Their team includes people from Netflix, Twitter, Meta, and Microsoft.
Construct infrastructure as code, developing and enforcing best practice across configurations while preventing drift between Terraform configurations and infrastructure deployments.
SentiLink provides innovative identity and risk solutions, empowering institutions and individuals to transaction with confidence. They are building the future of identity verification in the United States replacing a clunky, ineffective, and expensive status quo with solutions that are 10x faster, smarter, and more accurate.
Own the technical direction of Remote's SRE/Platform domain.
Define and drive the reliability strategy across the platform.
Identify and lead AI enablement initiatives across the engineering organisation.
Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.
Lead the design, implementation, and ongoing improvement of reliable, scalable, performant, and secure production platforms and services.
Work closely with cross-functional teams to build and maintain resilient infrastructure and deployment patterns.
Provide technical leadership and mentorship to engineers across the organisation, promoting strong engineering standards and operational best practice.
Cision empowers individuals to make an impact and values diverse perspectives. They foster curiosity, collaboration, and innovation while driving meaningful contributions to brands; they have offices in 24 countries throughout the Americas, EMEA and APAC.
Design systems with resilience, graceful degradation, and capacity in mind.
Define and measure SLOs and SLIs that actually reflect what our customers feel.
Use Datadog (logging, metrics, APM) together with CloudWatch to build signal-heavy, noise-light observability.
EarnIn is building products that deliver real-time financial flexibility for those with the unique needs of living paycheck to paycheck. They are growing fast and are excited to continue bringing world-class talent onboard to help shape the next chapter of their growth journey.
Assess and improve visibility by identifying gaps in dashboards, metrics, and logs.
Refine alerts and dashboards for critical services to catch issues earlier.
Automate routine checks and monitoring tasks to free up engineers.
PlayOn is where high school sports come to life through platforms like GoFan, NFHS Network, and MaxPreps. As a growth-stage company backed by KKR, we build the technology that powers high school athletics from ticketing and streaming to fundraising and merchandise.
Design, build, and maintain infrastructure using Infrastructure as Code tools such as Terraform.
Improve system reliability, scalability, resilience, and performance across the Mast platform.
Build systems and tooling that automate infrastructure management and operational workflows wherever possible.
Mast is on a mission to make complex lending simple by building modern, cloud-native lending technology purpose-built for specialist lenders. It is a high-performance team of engineers and lending experts that values radical honesty, transparency, and speed.
Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture.
Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control.
Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics.
Launch Potato is a digital media company that connects consumers with leading brands through data-driven content and technology. They are headquartered in South Florida with a remote-first team spanning over 15 countries, with a high-growth, high-performance culture.
Drive the stability and reliability of Epic's GCP infrastructure.
Manage and harden our Docker and GKE container platform.
Maintain and improve CI/CD pipelines.
Epic is the leading digital reading platform for kids ages 12 and under, used by millions of children, families, and educators around the world. As Epic continues to grow, we are reimagining what reading can be through thoughtful technology, data, and global collaboration to make learning more engaging, accessible, and impactful.
Design, build, and maintain scalable, reliable systems on GCP.
Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager.
Manage incident response, conduct postmortems, and implement improvements to reduce recurrence.
SupplyHouse.com is an industry-leading e-commerce company specializing in HVAC, plumbing, heating, and electrical supplies since 2004. They value every individual team member and cultivate a community where people come first with Generosity, Respect, Innovation, Teamwork, and GRIT.
Architect for Scale, partnering with product and infrastructure teams to design highly available systems.
Drive Automation to eliminate repetitive operational work through tooling and systems.
Reddit is a community-based platform where users submit, vote, and comment on various topics. It hosts over 100,000 active communities and attracts millions of daily active users, making it one of the largest and most influential internet platforms.
Design and implement secure, scalable infrastructure in Azure, integrating security best practices.
Partner with the infrastructure team to enhance the reliability and performance of systems.
Lead security incident response efforts within the Azure ecosystem and automate responses.
Mesh's mission is to enable consumers to pay and be paid with any asset, bridging the gap by making crypto payments reliable and ubiquitous. Backed by leading investors and combining a powerful orchestration engine with a seamless consumer app to unlock liquidity for the world.
Oversee a specialized SRE team focused on the design, deployment, and maintenance of automation toolsets.
Establish and enforce standards for IaC to ensure consistent, repeatable, and secure deployments.
Drive the automated lifecycle of both physical and virtual assets, from initial template creation/deployment to automated patching, scaling, and decommissioning.
Galaxy is a global leader in digital assets and data center infrastructure, delivering solutions that accelerate progress in finance and artificial intelligence. Led by CEO and Founder Michael Novogratz, their team blends deep crypto expertise with institutional experience and a shared commitment to shaping the future of Web3 and AI.
Consults and implements new innovative technologies to satisfy innovation strategy.
Deutsche Telekom IT Solutions Slovakia entered the life of Košice region in 2006. They are the second largest employer in the eastern part of the country with more than 3900 employees, providing innovative information and communication technology services.
Build and maintain CI/CD pipelines and deployment infrastructure.
Leverage AI to automate analysis and resolution of production issues.
Fal is the generative media ecosystem powering the next generation of AI products. They build the infrastructure, tools, and model access that teams need to move from idea to production.
Building infrastructure as code and DevOps pipelines and reviewing solutions.
Researching and analyzing technical solutions, maintaining and enhancing documentation.
Proactively identifying blockers, risks, and issues, proposing solutions or escalating as appropriate.
Nava is a consultancy and public benefit corporation working to make government services simple and effective. They guide agencies constrained by legacy systems to a future with sharp user experiences built on secure, reliable, fault-tolerant cloud infrastructure.
Design, deploy, and operate critical systems balancing reliability, cost, and agility.
Perform troubleshooting and root-cause analysis of system operation issues.
Loadsmart is a logistics technology company valued at over $1 billion. We are a collection of industry veterans and user-centered engineers using innovative technology to fearlessly reinvent the future of freight.
Create, deploy, and manage high performing servers.
Deliver millions of requests globally with sub-second latency.
Shape something from the core, without legacy infrastructure.
Entefy is working to create the fastest data syncing experience ever built. They hold their data syncing, consistency and uptime to the highest standards and are looking for someone to manage high performing servers.