Job Description
As a Site Reliability Engineer at Axiom, you will be pivotal in upholding our promise of superior reliability and performance to our customers. Collaborating with backend engineers and product teams, you will emphasize creating and operating scalable and reliable systems. Axiom's emphasis on SREs revolves around automating, measuring, and continuously improving the reliability and efficiency of our systems.
Your primary responsibilities include engineering and maintaining a robust, secure, and scalable infrastructure for Axiom Cloud. You will collaborate with engineering teams to define and refine service level objectives, contribute to disaster recovery planning, capacity engineering, performance analysis, and system tuning. You will also foster best practices for code deployments, aid in the education of the broader development team, roll out tooling and solutions that improve system reliability and reduce manual toil, and address and remediate service incidents.
The ideal candidate will have over two years of experience in a reliability-focused engineering environment, be passionate about system reliability, latency, performance, and efficiency, and be familiar with AWS tools and technologies. Hands-on experience with Docker, Kubernetes, and Amazon EKS, understanding of infrastructure-as-code tools such as Terraform/Pulumi, and strong networking knowledge are required.
About Axiom
Axiom’s mission is to empower developers to get the best insights into their data, as fast as possible.