You will design, build, and maintain observability platform tools and frameworks. This role involves designing and implementing systems that monitor and analyze the performance/health of software applications and infrastructure. You will collaborate closely with development, site reliability engineering, DevOps, and infrastructure teams.
Remote Devops Jobs · Grafana
5 results
FiltersJob listings
- Own and evolve the release process ensuring safe, high-quality, and on-time deployments to production.
- Automate release workflows wherever possible to reduce manual effort and increase reliability.
- Implement observability and metrics to monitor release health, stability, and deployment success.
This role is responsible for the engineering, operations, support, deployment and maintenance of core Distribution Engineering Monitoring and Control systems. Utilizes scripting and automation to develop, customize and enhance monitoring/alerting tools for “on-air” environments. Drives investigations related to Broadcast issues and report back findings in a timely manner to leadership and operations.
- Responsible for ensuring the reliability, scalability, and performance of cloud-based applications.
- Design, implement, and maintain systems that support high-performing services while driving automation, observability, and operational excellence.
- Serve as a technical leader within agile squads, mentoring peers, enforcing best practices, and shaping the long-term architecture of critical systems.
The Site Reliability Engineer plays a key role in platform enablement by building and maintaining core infrastructure tooling that enables teams to deploy and operate services reliably using AWS and Kubernetes. This position focuses on managing and evolving internal Infrastructure as Code (IaC) constructs, primarily Python-based abstractions built with AWS CDK and CDK8s. The engineer works closely with backend teams driving platform reliability and developer productivity.