Architect and maintain self-healing systems with 99.9%+ availability targets.
Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns.
Implement adaptive SLIs/SLOs that evolve automatically from real-time data.
Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. Even with thousands of employees spread across multiple continents, they still maintain a culture that inspires innovation, rewards risk-taking and celebrates success.
Oversee the reliability, scalability, performance, and security of key production services.
Collaborate with cross-functional teams to develop and maintain resilient infrastructure.
Provide expert mentorship and guidance on best practices to engineers throughout the organization.
Cision is a global leader in PR, marketing and social media management technology and intelligence, helping brands and organizations connect with customers and stakeholders to drive business results. The company has offices in 24 countries throughout the Americas, EMEA and APAC.
Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
Create highly automated, available and scalable systems by applying software and infrastructure principles
Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale
66degrees is an AI transformation partner. They guide enterprises from business challenges to quantifiable outcomes, helping businesses reach their inflection point where chaotic data becomes a strategic asset, complexity becomes clarity, and AI becomes an engine for growth. They believe in thriving through challenges and winning together.
Hire, lead, and support a high-performing Infrastructure Platforms team.
Connect business goals and customer needs with sound engineering.
Guide the security, reliability, performance, and scalability of core platform components.
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Their mission is to enable everyone to contribute to and co-create the software that powers our world.
Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.
Ensure the smooth operation and high availability of Clarifai's core services
Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
Design and implement scalable, secure, and cost-effective infrastructure solutions
Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.
Architect, operate, improve and secure the platform the Garner Health app runs on
Boost development velocity and productivity
Build systems to a high engineering standard and hold others to the same high standard
Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.
Design and evolve infrastructure systems to ensure scalability, reliability, and cost efficiency.
Lead and mentor a distributed infrastructure team, fostering a collaborative and inclusive culture.
Oversee all cloud environments supporting MZLA’s products and business systems.
MZLA Technologies Corporation (MZLA) is a wholly owned, for-profit subsidiary of the Mozilla Foundation and home to Thunderbird. They are a small but growing team of 50+ people distributed across seven countries building an open-source email and productivity platform.
Lead maintenance and operations for production and development environments.
Architect and implement complex solutions spanning OS, virtualization, network, and cloud layers.
Lead automation initiatives for infrastructure provisioning and operational tasks.
NMI enables partners with choice in payments, challenging the one-size-fits-all approach. They power innovative tech for SMBs, entrepreneurs, and fintech startups, fostering a diverse and welcoming workplace with a dedicated Diversity, Equity & Inclusion action group.
Lead and develop a high-performing GitLab SaaS Production Engineering team.
Drive the unification of platforms, tooling, and processes.
Collaborate with teams to define, prioritize, and manage the team roadmap.
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Their high-performance culture is driven by their values and continuous knowledge exchange, enabling their team members to reach their full potential.
Build self-service systems that automate managing, deploying and operating services.
Automate environment observability and resilience. Enable all developers to troubleshoot and resolve problems.
Ensure we hit defined SLOs, including participation in an on-call rotation.
Cohere is focused on scaling intelligence to serve humanity by training and deploying frontier models for developers and enterprises. They are a team of researchers, engineers, and designers. They value diversity and strive to create an inclusive work environment.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. The system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Own developer operations and platform reliability across Introzy’s product stack.
Lead how we run infrastructure on Render, design and evolve our observability and alerting, shape our CI/CD and release practices.
Continuously improve internal developer experience so the engineering team can ship quickly and safely.
Introzy is a multi-app platform designed to unify networking, workflow, and productivity. As a subsidiary of Sanguine Technology Solutions, they are an early-stage company moving fast to deliver value, with a lean engineering team and a culture that embraces AI.
Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.
VGS is the world's leader in payment tokenization, empowering clients and partners by tokenizing sensitive payment data and limiting compliance scope. They embed a universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks and more. While the job posting doesn't specify size, they appear to have a culture that values transparency, collaboration, grit, and humility.
Lead and Mentor a High-Performing Team: Hire, develop, and retain top engineering talent.
Develop the Strategic Roadmap: Define and execute the strategy for security infrastructure, automation, and operations.
Oversee Secure and Resilient Infrastructure: Guide the architectural design and implementation of secure, scalable, and highly available infrastructure in our multi-cloud (predominantly AWS) environment.
Smartsheet helps people and teams achieve anything with seamless work management and smart, scalable solutions. They build tools that empower teams to automate the manual, uncover insights, and scale smarter; they welcome diverse perspectives and non-traditional paths.
Configure and maintain cloud infrastructure automation using Terraform, focusing on CDN optimization and content delivery performance
Develop capacity planning strategies and performance optimization initiatives for high-volume spatial content delivery.
Instrument services to understand system health.
Miris is a cutting-edge technology company building the future of 3D content delivery at global scale. Our mission is to empower creators and developers to deliver high-fidelity, photorealistic 3D experiences to billions of users instantly, seamlessly, and across all major platforms and devices.
Building world-class AI infrastructure to support a 100+ person research team.
Designing and scaling multi-cloud systems that support high-performance model training and inference.
Improving monitoring, alerting and system observability for AI workloads.
Canva is redefining how the world experiences design. They have campuses in Sydney and Melbourne, co-working spaces in Brisbane, Perth, Adelaide and Auckland, and trust their employees to choose the balance that empowers them and their team to achieve their goals.
Play a crucial part in designing and scaling secure cloud infrastructure.
Lead the charge in intelligent automation systems and ensure robust deployment processes.
Collaborate with product, engineering, and leadership to drive company success.
Jobgether is a company that connects job seekers with employers. They utilize an AI-powered matching process to ensure applications are reviewed quickly and objectively.
Automate infrastructure provisioning, configuration management, monitoring, and operational workflows using IaC and scripting languages.
Own the deployment, maintenance, and lifecycle management of systems supporting engineering, leveraging deep expertise in Kubernetes.
Troubleshoot complex infrastructure and application issues, driving root-cause analysis and developing long-term remediation solutions
SingleStore delivers the cloud-native database with the speed and scale to power the world’s data-intensive applications. They are venture-backed and headquartered in San Francisco with offices in Sunnyvale, Raleigh, Seattle, Boston, London, Lisbon, Bangalore, Dublin and Kyiv.
Lead and mentor multiple teams across SRE, cloud infrastructure, and platform engineering functions.
Drive multi-team initiatives to deliver scalable, secure, and cost-efficient infrastructure leveraging AWS-native and serverless technologies.
Drive adoption of FinOps practices and partner with finance and product teams on budgeting and forecasting.
Model N is the leader in revenue optimization and compliance for pharmaceutical, medtech, and high-tech innovators. Model N is trusted by over 150 of the world’s leading companies across more than 120 countries.