Senior Site Reliability Engineer

CertifyOS

Remote regions

US

Benefits

Unlimited PTO

Similar Jobs

See all

Reliability & Observability:

  • Ensure uptime and reduce alert fatigue by building actionable SLIs, error budgets, and data quality signals.
  • Operate monitoring platforms like Google Cloud Monitoring, Datadog, Grafana, and Prometheus.

Infrastructure Automation:

  • Maintain Infrastructure as Code with Terraform or Pulumi, and CI/CD pipelines using GitHub Actions.
  • Automate scaling, resource optimization, and deployment patterns like canary and blue/green.

Incident Response:

  • Own incident response processes, root cause analysis, and escalation workflows.
  • Build runbooks and postmortem culture to make hard problems not happen again.

Scale & Efficiency:

  • Improve autoscaling behavior and resource utilization for distributed cloud-native systems.
  • Instrument data freshness and infrastructure health to monitor provider record accuracy.

CertifyOS

CertifyOS is building the data infrastructure that powers modern healthcare, automating provider licensing, enrollment, credentialing, and network monitoring through an API-first platform. The company is backed by leading investors with a team of deep experience in provider data systems, valuing authenticity, accountability, collaboration, results, and openness to feedback.

Apply for This Position