Source Job

  • Responsible for custom architectural design, implementation, monitoring, and maintenance for production application environments.
  • Work with the Principal Software Engineer on technical architecture and design based on customer product requirements.
  • Hands-on commissioning, configuration, administration, documentation, and support for all on-prem & cloud (AWS) environments.

AWS Linux Unix Agile SDLC

20 jobs similar to Director, Site Reliability Engineering

Jobs ranked by similarity.

US

  • Lead and Mentor a High-Performing Team: Hire, develop, and retain top engineering talent.
  • Develop the Strategic Roadmap: Define and execute the strategy for security infrastructure, automation, and operations.
  • Oversee Secure and Resilient Infrastructure: Guide the architectural design and implementation of secure, scalable, and highly available infrastructure in our multi-cloud (predominantly AWS) environment.

Smartsheet helps people and teams achieve anything with seamless work management and smart, scalable solutions. They build tools that empower teams to automate the manual, uncover insights, and scale smarter; they welcome diverse perspectives and non-traditional paths.

$155,000–$165,000/yr
US Unlimited PTO

  • Lead maintenance and operations for production and development environments.
  • Architect and implement complex solutions spanning OS, virtualization, network, and cloud layers.
  • Lead automation initiatives for infrastructure provisioning and operational tasks.

NMI enables partners with choice in payments, challenging the one-size-fits-all approach. They power innovative tech for SMBs, entrepreneurs, and fintech startups, fostering a diverse and welcoming workplace with a dedicated Diversity, Equity & Inclusion action group.

As an Automation and Release Engineer, you will work closely with CIQ’s Engineering and Operations teams to architect, implement, and maintain automation, build, and test tooling for everything from Linux packages to entire operating system images for use in on-prem, cloud, and high-performance computing environments. This role requires a deep technical understanding of Linux, modern automation and build tooling, scripting and software development, and systems engineering, as well as an insatiable appetite for learning more.

CIQ is becoming the fastest-growing and most impactful young company for providing software infrastructure.

$95,000–$175,000/yr

  • Provide architecture plans for multiple cloud-based applications supporting stakeholders.
  • Analyze performance and ensure applications meet the scalability and reliability needs of internal teams.
  • Identify and troubleshoot performance bottlenecks and reliability issues across the stack.

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster.

$140,000–$190,000/yr
US Canada Unlimited PTO

  • Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
  • Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
  • Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.

VGS is the world's leader in payment tokenization, empowering clients and partners by tokenizing sensitive payment data and limiting compliance scope. They embed a universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks and more. While the job posting doesn't specify size, they appear to have a culture that values transparency, collaboration, grit, and humility.

$125,000–$169,000/yr
Unlimited PTO

  • Design, scale, and operate resilient, cloud-native infrastructure in AWS with an emphasis on EKS, IAM, RBAC, and modern security-first practices.
  • Build and optimize CI/CD pipelines with GitHub Actions and GitHub Advanced Security enabling velocity without compromising safety.
  • Own observability across the stack using Datadog (metrics, logging, alerting, and tracing).

DexCare optimizes time in healthcare, streamlining patient access, reducing waits, and enhancing overall experiences. They are committed to creating an inclusive workplace where diversity drives innovation and belonging strengthens collaboration, enabling everyone to thrive.

Design, implement, monitor and maintain Sysdig's Infrastructure at scale on different clouds and on-prem. Collaborate with development teams to improve system reliability, performance, and scalability. Participate in on-call rotation, respond to incidents, conduct root cause analyses, and implement preventive measures.

Sysdig helps organizations secure innovation in the cloud with runtime insights, open innovation, and agentic AI, trusted by over 60% of the Fortune 500.

  • Design and implement AWS infrastructure for a headless eCommerce stack.
  • Build and maintain CI/CD pipelines for React, Next.js frontend deployments and AWS serverless backend services.
  • Implement security and compliance guardrails for systems handling sensitive and regulated data.

TechTorch is a high-growth enterprise technology consultancy that collaborates with leading private equity-backed businesses. They deliver AI-powered solutions and data-driven transformation initiatives, operating with the agility of a scale-up and the rigor demanded by sophisticated investors.

UK

Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.

NICE software products are used by 25,000+ global businesses to deliver extraordinary customer experiences, fight financial crime and ensure public safety.

India Unlimited PTO

Seeking an experienced Site Reliability Engineer to help build highly resilient and scalable systems by automating, measuring, and monitoring everything. Implement highly-available and scalable architectures for core and third-party components of Acquia Source. Implement metrics, monitoring, and incident response processes.

Acquia is an open source digital experience company providing technology to brands that allows them to embrace innovation and create customer moments that matter.

Canada

  • Build and deploy better services in partnership with Development groups.
  • Implement system and service telemetry to improve reliability and availability.
  • Design and evolve deployment systems and pipelines for reliability, security, and efficiency.

Jobgether is a platform that connects job seekers with companies. They utilize AI-powered matching to ensure applications are reviewed quickly and objectively.

$133,109–$239,596/yr

  • Lead a team of engineers focused on cloud platform services.
  • Oversee development of API Gateway and Swagger/OpenAPI specifications.
  • Manage AWS services including IAM, EC2, EFS, S3, Lambda, SNS, SQS, Kinesis, and EMR Studio.

Experian is a global data and technology company, powering opportunities for people and businesses around the world.

$105,271–$131,588/yr
US 4w PTO 4w paternity

  • Responsible for administration, support, troubleshooting and implementation of Azure DevOps.
  • Implement DevOps principles at an enterprise level and enable continuous integration and continuous delivery.
  • Streamline and optimize the application lifecycle, adding visibility to technical debt and increasing software delivery speed.

Lumicera Health Services is defining the “new norm” in specialty pharmacy to optimize patient well-being through our core principles of transparency and stewardship.

US Unlimited PTO

  • Manage the Tech Pod from a complete delivery perspective
  • Work with your client to ensure project scope and alignment
  • Guide your Tech Pod both technically and career-wise

EverOps partners with global enterprise software and tech companies to perform complex deliveries and services. They are a premier Embedded Service Provider, helping clients address delivery and service issues in the DevOps space.

Brazil 26w maternity 4w paternity

Support the evolution of our platform by improving scalability, reliability, observability, and security. Proactively identify bottlenecks and unlock the autonomy of the entire engineering team. Maintain infrastructure & deployment pipelines and collaborate with engineering teams on architectural decisions and production-readiness practices.

Feegow joined the Docplanner Group, a health-tech company, in 2022 and is dedicated to developing innovative solutions for physicians and managers.

$160,000–$182,000/yr
US

  • Lead and mentor multiple teams across SRE, cloud infrastructure, and platform engineering functions.
  • Drive multi-team initiatives to deliver scalable, secure, and cost-efficient infrastructure leveraging AWS-native and serverless technologies.
  • Drive adoption of FinOps practices and partner with finance and product teams on budgeting and forecasting.

Model N is the leader in revenue optimization and compliance for pharmaceutical, medtech, and high-tech innovators. Model N is trusted by over 150 of the world’s leading companies across more than 120 countries.

US

  • Design and implement the next generation of our Continuous Integration and Continuous Delivery (CI/CD) pipelines, focusing on security, speed, and reliability.
  • Maintain and optimize the health of our monorepo, ensuring scalable dependency management and fast incremental builds.
  • Work with GCP to architect secure, scalable runtime environments.

Anchorage Digital is building the world’s most advanced digital asset platform for institutions to participate in crypto. As a diverse team of more than 600 members, they are united in one common goal: building the future of finance by providing the foundation upon which value moves safely in the new global economy.

Canada

  • Operate and optimize AWS environments for security, reliability, and scalability.
  • Implement and maintain security frameworks across cloud infrastructure.
  • Automate deployments and configurations using Infrastructure as Code (IaC) tools like Terraform.

SurveyMonkey is the world’s most popular platform for surveys and forms, built for business—loved by users that helps teams gather insights and information.

$133,109–$239,596/yr

As a Senior Technical Program Manager, oversee the technical portfolio, create roadmaps, define milestones, and ensure the scalability, security, and reliability of products and platforms. Present to executives, hold engineering teams accountable, and help establish TPM practices. Manage dependencies across multiple teams and mitigating risk.

Experian is a global data and technology company, powering opportunities for people and businesses around the world.

  • Design, implement, and manage infrastructure for our cloud-based platforms (AWS).
  • Create and automate deployment pipelines using CI/CD tools (Gitlab / Github Actions).
  • Ensure system scalability, availability, and reliability through proactive monitoring and automation.

Prompt is revolutionizing healthcare by delivering highly automated and modern B2B enterprise software to rehab therapy businesses, the teams within, and the patients they serve.