Jobs Similar to Principal Site Reliability Engineer (AI-first SRE)

Senior Infrastructure Engineer

Marketing Intelligence Platform 9 days ago

Latin America Unlimited PTO

Audit and optimize cloud usage, capacity, and spend.
Improve reliability through better automation, monitoring, and alerting.
Partner with engineers to upgrade infrastructure components and roll out changes safely.

Our client builds a high-scale data and analytics platform used by sophisticated teams to make critical business decisions. They are trusted by 800+ companies and value collaboration, high ownership, and long-term system reliability.

View details Similar jobs

Site Reliability Engineer (AI Forms Platform)

Filevine 7 days ago

US

Architect and deploy secure, scalable infrastructure using Terraform, CloudFormation, or similar tools.
Ensure the platform meets strict SLA requirements for enterprise clients, minimizing downtime.
Implement comprehensive monitoring, logging, and alerting to provide deep visibility into system health.

Filevine provides cloud-based workflow tools for legal professionals, helping them manage organizations and serve clients. They are recognized as a fast-growing and innovative technology company with a team of passionate professionals.

View details Similar jobs

DevOps Engineer

Jobgether 14 days ago

US

Play a crucial part in designing and scaling secure cloud infrastructure.
Lead the charge in intelligent automation systems and ensure robust deployment processes.
Collaborate with product, engineering, and leadership to drive company success.

Jobgether is a company that connects job seekers with employers. They utilize an AI-powered matching process to ensure applications are reviewed quickly and objectively.

View details Similar jobs

Senior DevOps Engineer/SRE

Cision 13 days ago

India

Oversee the reliability, scalability, performance, and security of key production services.
Collaborate with cross-functional teams to develop and maintain resilient infrastructure.
Provide expert mentorship and guidance on best practices to engineers throughout the organization.

Cision is a global leader in PR, marketing and social media management technology and intelligence, helping brands and organizations connect with customers and stakeholders to drive business results. The company has offices in 24 countries throughout the Americas, EMEA and APAC.

View details Similar jobs

New Staff Site Reliability Engineer

Garner Health 5 days ago

$219,000–$245,000/yr

US Unlimited PTO

Architect, operate, improve and secure the platform the Garner Health app runs on
Boost development velocity and productivity
Build systems to a high engineering standard and hold others to the same high standard

Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.

View details Similar jobs

Senior Site Reliability Engineer

NICE 28 days ago

UK

Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.

NICE software products are used by 25,000+ global businesses to deliver extraordinary customer experiences, fight financial crime and ensure public safety.

View details Similar jobs

Platform Engineer

Incident.io 11 days ago

Designing, building, and maintaining infrastructure that enables fast, reliable, and secure product delivery.
Improving and maintaining CI/CD pipelines to streamline deployments and increase reliability.
Contributing to infrastructure reliability and ensuring systems are designed for resilience and growth.

Incident.io is the leading AI incident response platform, built to help teams dramatically reduce incident response time and improve reliability. They have raised $100M from Index Ventures, Insight Partners, and Point Nine, alongside founders and executives from world-class technology companies.

View details Similar jobs

Sr. Infrastructure Engineer

VGS 22 days ago

$140,000–$190,000/yr

US Canada Unlimited PTO

Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.

VGS is the world's leader in payment tokenization, empowering clients and partners by tokenizing sensitive payment data and limiting compliance scope. They embed a universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks and more. While the job posting doesn't specify size, they appear to have a culture that values transparency, collaboration, grit, and humility.

View details Similar jobs

Engineering Manager (Infra) - AI Reliability

Canva 21 days ago

ANZ

Building world-class AI infrastructure to support a 100+ person research team.
Designing and scaling multi-cloud systems that support high-performance model training and inference.
Improving monitoring, alerting and system observability for AI workloads.

Canva is redefining how the world experiences design. They have campuses in Sydney and Melbourne, co-working spaces in Brisbane, Perth, Adelaide and Auckland, and trust their employees to choose the balance that empowers them and their team to achieve their goals.

View details Similar jobs

Site Reliability Engineer

Cohere 11 hours ago

Global 6w PTO 26w maternity

Build self-service systems that automate managing, deploying and operating services.
Automate environment observability and resilience. Enable all developers to troubleshoot and resolve problems.
Ensure we hit defined SLOs, including participation in an on-call rotation.

Cohere is focused on scaling intelligence to serve humanity by training and deploying frontier models for developers and enterprises. They are a team of researchers, engineers, and designers. They value diversity and strive to create an inclusive work environment.

View details Similar jobs

Head of Reliability and Operations

RWS 25 days ago

Europe

Lead the Reliability & Operations function within the Developer & Production Enablement (DPE) division of RWS’s Product & Technology organization. Take ownership of global production operations and lead the transition from manual, ticket-based workflows to platform-integrated automation. Ensure stability today, while designing for scalability and autonomy in the future.

RWS's purpose is to unlock global understanding, valuing every language and culture, and celebrating diversity and inclusion to make the company strong.

View details Similar jobs

Site Reliability Engineer

Miris 3 days ago

$89,155–$287,488/yr

Global

Configure and maintain cloud infrastructure automation using Terraform, focusing on CDN optimization and content delivery performance
Develop capacity planning strategies and performance optimization initiatives for high-volume spatial content delivery.
Instrument services to understand system health.

Miris is a cutting-edge technology company building the future of 3D content delivery at global scale. Our mission is to empower creators and developers to deliver high-fidelity, photorealistic 3D experiences to billions of users instantly, seamlessly, and across all major platforms and devices.

View details Similar jobs

Sr. Software Engineer, Site Reliability Engineer

Thumbtack 7 days ago

$179,400–$272,800/yr

US

Design, create, and maintain software and systems to improve the availability, scalability, and efficiency of Thumbtack's services.
Set the architectural direction of infrastructure and platform services while supporting the engineering organization.
Troubleshoot and debug critical systems throughout the SDLC.

Thumbtack helps millions of people confidently care for their homes by offering personalized guidance, AI tools, and a hiring experience. They have a growing community of 300,000 local service businesses and value a cross functional collaborative culture.

View details Similar jobs

Intermediate Site Reliability Engineer, Tenant Scale

GitLab 4 days ago

Americas EMEA Unlimited PTO

Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.

GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.

View details Similar jobs

Staff DevOps Infrastructure Engineer

NMI 16 days ago

$155,000–$165,000/yr

US Unlimited PTO

Lead maintenance and operations for production and development environments.
Architect and implement complex solutions spanning OS, virtualization, network, and cloud layers.
Lead automation initiatives for infrastructure provisioning and operational tasks.

NMI enables partners with choice in payments, challenging the one-size-fits-all approach. They power innovative tech for SMBs, entrepreneurs, and fintech startups, fostering a diverse and welcoming workplace with a dedicated Diversity, Equity & Inclusion action group.

View details Similar jobs

Cloud Infrastructure Engineer (GCP) Contract

Software Mind 22 days ago

US

Designing & maintaining GCP infrastructure (GKE, Bigtable, BigQuery, GCS, networking).
Building monitoring, alerting, logging, and observability from the ground up.
Improving our security posture across auth, IAM, policies, and data access.

Software Mind develops solutions that make an impact for companies around the globe. They build cross-functional engineering teams that take ownership and crave more, embracing openness, respect, and grit. They combine employment with enjoyment in their culture.

View details Similar jobs

Principal Software Engineer, Infra

Jobgether 30 days ago

US

Shape and scale critical infrastructure for one of the largest online platforms in the world. Build, maintain, and optimize multi-cloud compute systems for high-performance, reliable, and secure operations. Influence the technical direction of infrastructure platforms while mentoring and guiding other engineers.

This position is posted by Jobgether on behalf of a partner company.

View details Similar jobs

AI DevOps Engineer

Jobgether 11 days ago

India

Design and manage AWS infrastructure for AI services.
Implement Infrastructure as Code using Terraform.
Collaborate with cross-functional teams to enhance performance.

Jobgether uses an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

View details Similar jobs

Senior Software Engineer (Infra)

Canva 12 hours ago

Australia New Zealand

You’ll own challenging infrastructure problems end-to-end.
You’ll design scalable, maintainable services and contribute to technical proposals.
You’ll contribute to the roadmap for our Provisioning team.

Canva is a design platform that enables users to create a variety of visual content. They have campuses in Sydney and Melbourne, co-working spaces in other major cities, and offer a flexible work environment.

View details Similar jobs

Sr. Software Engineer, Site Reliability Engineer

Thumbtack 7 days ago

$131,446–$169,975/yr

Canada

Design, create, and maintain software and systems to improve the availability, scalability, and efficiency of Thumbtack's services
Set the architectural direction of infrastructure and platform services while supporting the engineering organization
Design and implement tools and processes used for deployment, change, service, and infrastructure management

Thumbtack helps millions of people confidently care for their homes through personalized guidance, AI tools, and a hiring experience. They have a growing community of 300,000 local service businesses.

View details Similar jobs

Source Job