Design, implement, and evolve large-scale, cloud-native infrastructure supporting MariaDB's global SaaS platform. Lead reliability and scalability initiatives, driving automation and resilience through infrastructure-as-code and GitOps practices. Proactively identify and remediate systemic reliability issues, ensuring high service availability and performance across multi-cloud environments.
Source Job
20 jobs similar to Senior Site Reliability Engineer
Jobs ranked by similarity.
Design, implement, monitor and maintain Sysdig's Infrastructure at scale on different clouds and on-prem. Collaborate with development teams to improve system reliability, performance, and scalability. Participate in on-call rotation, respond to incidents, conduct root cause analyses, and implement preventive measures.
Sysdig helps organizations secure innovation in the cloud with runtime insights, open innovation, and agentic AI, trusted by over 60% of the Fortune 500.
- Design and evolve infrastructure systems to ensure scalability, reliability, and cost efficiency.
- Lead and mentor a distributed infrastructure team, fostering a collaborative and inclusive culture.
- Oversee all cloud environments supporting MZLA’s products and business systems.
MZLA Technologies Corporation (MZLA) is a wholly owned, for-profit subsidiary of the Mozilla Foundation and home to Thunderbird. They are a small but growing team of 50+ people distributed across seven countries building an open-source email and productivity platform.
- Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
- Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
- Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.
VGS is the world's leader in payment tokenization, empowering clients and partners by tokenizing sensitive payment data and limiting compliance scope. They embed a universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks and more. While the job posting doesn't specify size, they appear to have a culture that values transparency, collaboration, grit, and humility.
- Share SRE expertise with teams across the company.
- Keep our build systems running with high reliability and availability.
- Improve and iterate on our existing reliability practices.
Octopus Deploy sets the standard for Continuous Delivery, empowering software teams to deliver value in an agile way.
Lead and manage the Platform Engineering team, providing technical guidance and mentorship. Design, build, and evangelize Golden Paths and Service Scaffolding to reduce friction across the development lifecycle. Oversee the design, implementation, and maintenance of Shared DB Platforms, ensuring optimal performance, integrity, and security across the organization.
Founded in 2012, EasyPost is a YC unicorn whose mission is to make shipping simple for businesses from garage startups to the Fortune 500.
- Oversee the reliability, scalability, performance, and security of key production services.
- Collaborate with cross-functional teams to develop and maintain resilient infrastructure.
- Provide expert mentorship and guidance on best practices to engineers throughout the organization.
Cision is a global leader in PR, marketing and social media management technology and intelligence, helping brands and organizations connect with customers and stakeholders to drive business results. The company has offices in 24 countries throughout the Americas, EMEA and APAC.
- Lead maintenance and operations for production and development environments.
- Architect and implement complex solutions spanning OS, virtualization, network, and cloud layers.
- Lead automation initiatives for infrastructure provisioning and operational tasks.
NMI enables partners with choice in payments, challenging the one-size-fits-all approach. They power innovative tech for SMBs, entrepreneurs, and fintech startups, fostering a diverse and welcoming workplace with a dedicated Diversity, Equity & Inclusion action group.
Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.
NICE software products are used by 25,000+ global businesses to deliver extraordinary customer experiences, fight financial crime and ensure public safety.
As an SRE you will be responsible for ensuring the availability, performance and cost effectiveness of these services. You will be working with multiple feature development teams and the BAU/Support team to define and evolve our cloud & on-prem infrastructure & delivery pipelines, improving system observability. Proactively identifying and mitigating reliability risks.
In 2019, our founders were working as engineers solving complex cross domain problems within government organisations TwinStream was formed.
Responsible for automating infrastructure, maintaining system reliability, and bridging the gap between operations and database management. Design, deploy, and manage scalable infrastructure on Google Cloud Platform (GCP). Implement and maintain CI/CD pipelines for seamless deployment.
Miratech is a global IT services and consulting company that brings together enterprise and start-up innovation to support digital transformation.
- Designing, building, and maintaining infrastructure that enables fast, reliable, and secure product delivery.
- Improving and maintaining CI/CD pipelines to streamline deployments and increase reliability.
- Contributing to infrastructure reliability and ensuring systems are designed for resilience and growth.
Incident.io is the leading AI incident response platform, built to help teams dramatically reduce incident response time and improve reliability. They have raised $100M from Index Ventures, Insight Partners, and Point Nine, alongside founders and executives from world-class technology companies.
- Lead the design, implementation, and continuous improvement of our cloud-native platform infrastructure.
- Create and maintain tooling and automation that improves efficiency and developer experience.
- Drive platform optimization initiatives focused on performance, cost efficiency, and reliability.
Intelerad's medical imaging solutions streamline the flow of information, simplifying complex processes, maximizing efficiencies, and shining a light on the unknown.
- Deploy and manage cloud infrastructure across all three clouds using Terraform IaC.
- Architect, build, and maintain reliable CI/CD pipelines in Github Actions and ArgoCD.
- Contribute to decisions around our departmental roadmap and project priorities.
Coalesce is the only data transformation and governance platform designed for the AI era, improving data professionals' lives since its founding in 2020.
- Design and plan cloud-native systems aligned with business goals and security best practices.
- Implement and support AI-based automation tools and services.
- Continuously tune cloud and automation workloads to improve reliability and performance.
PerfectServe offers unified healthcare communication solutions to help physicians, nurses, and care team members provide exceptional patient care.
Help build and operate core cloud-native systems including VKE, VLB, VCR, Vultr Inference, NAT Gateways, and our internal APIs. The ideal candidate has a strong understanding of Kubernetes components, container runtime internals, and modern IaC/automation practices. This role will have a direct impact on Vultr’s global cloud infrastructure footprint.
Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world.
- Become a member of a highly collaborative engineering team offering a unique blend of Cloud Infrastructure Administration, Site Reliability Engineering, Security Operations, and Vulnerability Management.
- Coordinate with client product teams, engineering team members, and other stakeholders to monitor and maintain a secure and resilient cloud-hosted infrastructure to established SLAs.
- Innovate and implement using automated orchestration and configuration management techniques.
Coalfire is on a mission to make the world a safer place by solving our clients’ toughest cybersecurity challenges.
Support the evolution of our platform by improving scalability, reliability, observability, and security. Proactively identify bottlenecks and unlock the autonomy of the entire engineering team. Maintain infrastructure & deployment pipelines and collaborate with engineering teams on architectural decisions and production-readiness practices.
Feegow joined the Docplanner Group, a health-tech company, in 2022 and is dedicated to developing innovative solutions for physicians and managers.
Shape and scale critical infrastructure for one of the largest online platforms in the world. Build, maintain, and optimize multi-cloud compute systems for high-performance, reliable, and secure operations. Influence the technical direction of infrastructure platforms while mentoring and guiding other engineers.
This position is posted by Jobgether on behalf of a partner company.
As a Platform Engineer, enhance and maintain foundational tools and systems, working hands-on with Kubernetes clusters and AWS infrastructure. Build and maintain services that abstract and orchestrate our infrastructure, designing and implementing backend services like APIs and controllers. Develop software for complex projects, and manage infrastructure migrations and security tooling.
Monzo is on a mission to make money work for everyone, waving goodbye to the complicated ways of traditional banking, offering personal and business bank accounts.
- Create and test reliable cloud infrastructure services that support Webflow’s range of products.
- Balance reliability, scalability, and cost efficiency concerns while refactoring and modernizing existing services.
- Collaborate with product engineering teams to deliver new solutions for services and ways of working that might not exist yet.
Webflow is the leading visual development platform for building powerful websites without writing code.