Design and operate highly scalable, fault-tolerant systems supporting production workloads across a distributed cloud environment.
Define and implement Service Level Objectives and error budgets to guide reliability decisions.
Automate operational processes to reduce manual toil and improve system consistency and resilience.

Team Collaboration:

Work closely with product and platform engineering teams to define and implement reliability standards.
Participate in incident response, on-call practices, and post-incident reviews, focusing on root cause analysis.
Advocate for a reliability-focused engineering culture, including blameless postmortems and operational excellence.

Qualifications and Experience:

5+ years of experience in site reliability engineering, infrastructure, or related software engineering disciplines.
Strong experience operating and scaling distributed systems in cloud environments, with AWS preferred.
Proficiency with Infrastructure as Code tooling, such as Terraform, and deep understanding of system performance and reliability patterns.

Company Culture:

Operates as a values-based company with principles like being Fearless, Fast, Lovable, Owners, Win-win, and Inclusive.
Offers a remote-first environment that enables you to do your best work from anywhere, backed by top-tier investors.
Focuses on building an inclusive and supportive team dedicated to creating the future of business trust and audit software.

Fieldguide

Fieldguide is establishing a new state of trust for global commerce and capital markets through automating and streamlining the work of assurance and audit practitioners, specifically within cybersecurity, privacy, and financial audit. It is a remote-first, values-driven company backed by top investors, building an inclusive and supportive team to create the future of audit and advisory software.

Apply for This Position