Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.
VGS is the world's leader in payment tokenization, empowering clients and partners by tokenizing sensitive payment data and limiting compliance scope. They embed a universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks and more. While the job posting doesn't specify size, they appear to have a culture that values transparency, collaboration, grit, and humility.
Design, implement, monitor and maintain Sysdig's Infrastructure at scale on different clouds and on-prem. Collaborate with development teams to improve system reliability, performance, and scalability. Participate in on-call rotation, respond to incidents, conduct root cause analyses, and implement preventive measures.
Sysdig helps organizations secure innovation in the cloud with runtime insights, open innovation, and agentic AI, trusted by over 60% of the Fortune 500.
Design, implement, and evolve large-scale, cloud-native infrastructure supporting MariaDB's global SaaS platform. Lead reliability and scalability initiatives, driving automation and resilience through infrastructure-as-code and GitOps practices. Proactively identify and remediate systemic reliability issues, ensuring high service availability and performance across multi-cloud environments.
MariaDB is making a big impact on the world and is the backbone of applications used everyday, including 75% of the Fortune 500 companies.
Design and evolve infrastructure systems to ensure scalability, reliability, and cost efficiency.
Lead and mentor a distributed infrastructure team, fostering a collaborative and inclusive culture.
Oversee all cloud environments supporting MZLA’s products and business systems.
MZLA Technologies Corporation (MZLA) is a wholly owned, for-profit subsidiary of the Mozilla Foundation and home to Thunderbird. They are a small but growing team of 50+ people distributed across seven countries building an open-source email and productivity platform.
Support the evolution of our platform by improving scalability, reliability, observability, and security. Proactively identify bottlenecks and unlock the autonomy of the entire engineering team. Maintain infrastructure & deployment pipelines and collaborate with engineering teams on architectural decisions and production-readiness practices.
Feegow joined the Docplanner Group, a health-tech company, in 2022 and is dedicated to developing innovative solutions for physicians and managers.
Lead and manage the Platform Engineering team, providing technical guidance and mentorship. Design, build, and evangelize Golden Paths and Service Scaffolding to reduce friction across the development lifecycle. Oversee the design, implementation, and maintenance of Shared DB Platforms, ensuring optimal performance, integrity, and security across the organization.
Founded in 2012, EasyPost is a YC unicorn whose mission is to make shipping simple for businesses from garage startups to the Fortune 500.
Responsible for automating infrastructure, maintaining system reliability, and bridging the gap between operations and database management. Design, deploy, and manage scalable infrastructure on Google Cloud Platform (GCP). Implement and maintain CI/CD pipelines for seamless deployment.
Miratech is a global IT services and consulting company that brings together enterprise and start-up innovation to support digital transformation.
Oversee the reliability, scalability, performance, and security of key production services.
Collaborate with cross-functional teams to develop and maintain resilient infrastructure.
Provide expert mentorship and guidance on best practices to engineers throughout the organization.
Cision is a global leader in PR, marketing and social media management technology and intelligence, helping brands and organizations connect with customers and stakeholders to drive business results. The company has offices in 24 countries throughout the Americas, EMEA and APAC.
Automate manual processes to provide efficiencies in managing services.
Create visual representations of systems and services using Grafana dashboards.
Collaborate with engineering teams to align development efforts with reliability, scalability, and business objectives.
Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world.
Gcore provides infrastructure and software solutions for AI, cloud, network, and security, powering everything from real-time communication and streaming to enterprise AI and secure web applications. They are a global team of over 550 professionals building infrastructure and software that supports the entire digital ecosystem.
Help build and operate core cloud-native systems including VKE, VLB, VCR, Vultr Inference, NAT Gateways, and our internal APIs. The ideal candidate has a strong understanding of Kubernetes components, container runtime internals, and modern IaC/automation practices. This role will have a direct impact on Vultr’s global cloud infrastructure footprint.
Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world.
Lead and Mentor a High-Performing Team: Hire, develop, and retain top engineering talent.
Develop the Strategic Roadmap: Define and execute the strategy for security infrastructure, automation, and operations.
Oversee Secure and Resilient Infrastructure: Guide the architectural design and implementation of secure, scalable, and highly available infrastructure in our multi-cloud (predominantly AWS) environment.
Smartsheet helps people and teams achieve anything with seamless work management and smart, scalable solutions. They build tools that empower teams to automate the manual, uncover insights, and scale smarter; they welcome diverse perspectives and non-traditional paths.
In this role, you’ll be at the intersection of security, automation, and distributed systems. You’ll take ownership of hardening complex hybrid environments from bare-metal validators to multi-cloud clusters ensuring our systems are both fast and fortress-strong. You’ll join a distributed, high-performing Blockchain DevOps team that values ownership, transparency, and innovation.
Figment powers the future of Web3 through industry-leading blockchain infrastructure as the leading provider of staking solutions.
Become a member of a highly collaborative engineering team offering a unique blend of Cloud Infrastructure Administration, Site Reliability Engineering, Security Operations, and Vulnerability Management.
Coordinate with client product teams, engineering team members, and other stakeholders to monitor and maintain a secure and resilient cloud-hosted infrastructure to established SLAs.
Innovate and implement using automated orchestration and configuration management techniques.
Coalfire is on a mission to make the world a safer place by solving our clients’ toughest cybersecurity challenges.
Play a crucial part in designing and scaling secure cloud infrastructure.
Lead the charge in intelligent automation systems and ensure robust deployment processes.
Collaborate with product, engineering, and leadership to drive company success.
Jobgether is a company that connects job seekers with employers. They utilize an AI-powered matching process to ensure applications are reviewed quickly and objectively.
Lead and mentor a team of Specialists, fostering a culture of ownership and continuous learning.
Enable Change and Problem Management teams to leverage Datadog observability tools for evaluating release quality.
Oversee implementation and optimization of CI/CD Observability pipelines to ensure Operational Readiness standards are met.
BWH Hotels is a global leader in hospitality for nearly 80 years, inspiring travel through unique experiences. Headquartered in Phoenix, Arizona, BWH Hotels boasts a powerful portfolio of 18 brands and they foster a workplace culture where contributions truly matter.
Deploy and manage cloud infrastructure across all three clouds using Terraform IaC.
Architect, build, and maintain reliable CI/CD pipelines in Github Actions and ArgoCD.
Contribute to decisions around our departmental roadmap and project priorities.
Coalesce is the only data transformation and governance platform designed for the AI era, improving data professionals' lives since its founding in 2020.