Similar Jobs

See all

Site Reliability Engineering:

  • Drive the definition and adoption of SLIs and SLOs across multiple services or entire platforms, ensuring alignment with business goals.
  • Design and architect Infrastructure as Code (IaC) solutions for large-scale, complex environments, establishing standards and best practices.

Toil Reduction and Incident Management:

  • Implement and refine comprehensive monitoring, alerting, and logging to detect and address performance and availability issues proactively.
  • Lead the strategic effort to eliminate toil, identifying and championing major automation projects that deliver significant organizational efficiency.

Testing and Service Resiliency:

  • Implement cloud security best practices, including identity and access management (IAM), encryption, and compliance controls.
  • Proactively identify and address system weaknesses and ensure performance under stress.

Collaboration and Knowledge Sharing:

  • Serve as a primary SRE liaison for development teams, influencing application architecture and design to meet reliability and scalability targets from inception.
  • Create and maintain documentation for cloud architectures, deployment processes, and best practices.

Noctua Technology, LLC

Noctua Technology, LLC is a company that drives digital transformation by treating operations as a software engineering challenge, focusing on cloud native systems. They are a dynamic team seeking a Senior SRE to define strategy and bridge development and operations for clients.

Apply for This Position