Our engineering team is responsible for taking products from ideation to being used by tens of thousands of users every week. Weβre responsible for everything from building the product to technical operations (and everything in between such as getting better with outages, improving reliability, and reducing tech debtβ¦) We are looking for a Software Engineer - Reliability who knows how to balance moving fast while moving safely, understands (and loves) SLOs and SLAs, and has experience scaling systems from 1 to 10 to 100.
Day to day, you will:
- Design and implement developer tooling that makes deployments safer, reproducible, and easy to slowly rollout (or rollback).
- Architect (with the rest of the team) the future version(s) of our key systems taking into account order of magnitude scaling every year.
- Partner with company operations and engineering to achieve and maintain compliance standards such as PCI and SOC2.
- Add or improve introspection for all our key systems, identify scaling bottlenecks, and work with engineering on resolving them.
- Own shared services such as feature flagging, experimentation, in-app messaging, etc.
- Mentor engineering on how to design reliable, fault-tolerant, systems and easy-to-use runbooks for when things go wrong.