Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
Participate in an on-call rotation and act as incident commander for high-severity production events.
Partner with engineering teams to build reliability into new features before they ship to production
Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.
Propel builds technology that strengthens the social safety net. They are a passionate team of ~100 Propellers who envision a future where every American has the tools and resources they need to thrive, offering a remote-first working environment with headquarters in Brooklyn.
Working with engineers across Yelp in supporting new features and services.
Integrating tools to monitor platform stability and performance.
Help scale our Kubernetes clusters and AWS-based infrastructure while maintaining our platform's SLOs.
Yelp's engineering culture values individual authenticity and encourages creative solutions. They focus on helping users, growing as engineers, and having fun in a collaborative environment.
Deliver a scalable internal infrastructure platform on public cloud environments.
Establish and evolve Kubernetes-based platform capabilities to support high-availability, production-grade workloads at scale.
Build a secure and reliable foundation that supports CI/CD pipelines and minimizes operational risk across engineering teams
Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions in transaction value and now secures the vast majority of DeFi.
Building tools and applications to extends Calendly’s infrastructure platform
Evaluating and deploying cloud native open source tools
Exercising expertise in cloud infrastructure concepts and patterns
Calendly makes it possible for their customers through impactful innovation. They have millions of users and are in the midst of exciting product growth.
Work closely with developers and operations teams to scale and optimize their infrastructure for sustained growth.
Design, deploy, and operate their core backend infrastructure using automated, Infrastructure-as-Code approach.
Prioritize and own delivery in a small, highly efficient team — you set the bar, not just maintain it.
Relai is Europe's fastest growing Bitcoin-only app. They are looking for an experienced, results-oriented and impact-driven Senior DevOps Engineer who can help them scale their infrastructure and pursue their mission of bringing the best store of value to more people.
Collaborate with cross-functional teams to translate product requirements into technical solutions.
Develop and maintain core services for Chainguard.
Practice continuous improvement by iterating on how services are deployed, configured, monitored, and maintained on our platform.
Chainguard provides a secure foundation for software development and deployment by offering guarded open source software that is built from source and continuously updated. Founded by industry experts, they aim to be the safe source for open source and have built the largest library of open source software that is secure by default.
Build and manage GCP infrastructure across core services.
Support and execute migrations from on-premises or multi-cloud environments into GCP.
Implement and maintain infrastructure using Terraform for repeatable deployments.
Ontrac Solutions is a technology consulting firm that specializes in cutting-edge solutions driving business transformation. They partner with organizations to modernize infrastructure, streamline processes, and deliver results through innovation, collaboration and excellence.
Design, build, and maintain Kubernetes-based infrastructure and cloud environments.
Build and optimize CI/CD pipelines that enable fast, safe, and repeatable deployments.
Leverage AI coding tools and agentic workflows as a core part of your work.
Intrahealth, a subsidiary of HEALWELL AI Inc., is an enterprise class EMR provider supporting approximately 20,000 providers and the care delivery of tens of millions of patients and clients across Canada, Australia and New Zealand. Intrahealth provides a suite of flexible software solutions to a wide variety of customers including health authorities, public health, community health, home care, and primary care professionals.
Support the availability and durability of critical services across production environments.
Develop automation for common operational tasks, reducing manual intervention and toil.
Partner with engineering, product, and operations teams to support resilient system design and operations.
Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets and unleash innovators. Founded in 2007, they scaled the business with less than $3 million in outside funding until 2021, and generate over $100m in revenue managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries.
Design and implement infrastructure and tools that empower our product teams to rapidly and securely iterate, emphasizing reliability and automation.
Influence the strategic direction of our infrastructure and operational practices, ensuring that we are well-positioned to scale and support our growing organization.
Take a proactive role in the resolution of production issues, ensuring that we are well-prepared to handle incidents and that we learn from them in a blameless manner.
SSV Labs is the core team behind the SSV Network - pioneering decentralized infrastructure for Ethereum staking. They are building tools, protocols, and standards to make staking more secure, scalable, and trustless.
Work with other Engineering teams to design sustainable infrastructure and microservice solutions.
Automate tools and infrastructure to reduce manual work.
Monitor applications and participate in an on-call rotation as required.
Bloomreach is building the world’s premier agentic platform for personalization, revolutionizing how businesses connect with their customers by building and deploying AI agents to personalize the entire customer journey. They power personalization for more than 1,400 global brands.
Operate and improve platform tools so product teams can ship reliably, triaging tickets, fixing build issues, and handling routine service requests.
Maintain and extend self-service workflows by updating docs, examples, and guardrails under guidance from senior engineers.
Perform day-to-day Kubernetes operations: deploy/update Helm charts, manage namespaces, diagnose rollout issues, and follow runbooks for incident response.
ISHIR is a digital innovation and enterprise AI services provider. They work with startups and enterprises to shape the future through accelerated innovation, deep technical expertise, access to global digital talent and a passion for complex problem-solving. ISHIR attracts proactive individuals who thrive on challenges and promote self-reliance, open communication, and collaboration.
Define and drive the roadmap for deployment, configuration, infrastructure, and operational tooling across cloud and on-premise environments.
Work closely with engineering, design, customer-facing teams, and customers to identify and resolve deployment friction.
Improve how enterprise customers install, configure, upgrade, secure, and operate Rasa in production.
Rasa is a leader in generative conversational AI, enabling enterprises to build and deliver next-level AI assistants. The company was founded in 2016 and is remote-first with a global presence.
Cooperate closely with other Platform and Engineering teams on strategic initiatives
Improve, automate and grow SmartRecruiters cloud platform
Respond to client threats and remediate issues
SmartRecruiters is the Recruiting AI Company that transforms hiring for the world’s leading enterprises. An SAP company, they deliver an AI-powered hiring platform that automates and optimizes the entire talent acquisition process. They are a values-driven tech company with strong financial backing and a bold vision.
Operating and evolving 100+ multi-cloud streaming clusters and related database infrastructure.
Diagnosing and eliminating cross-layer failure modes.
Designing safe upgrade and rollout strategies at scale.
Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana, its open source visualization tool. Grafana Labs helps more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and its team thrives in an innovation-driven environment.
Building tools and applications to extends Calendly’s infrastructure platform
Evaluating and deploying cloud native open source tools
Exercising expertise in cloud infrastructure concepts and patterns
Calendly's product powers connections for millions through impactful innovation. They are in the midst of exciting growth and desire people that want to learn, grow, and do their best work.
Provide production support on a shift according to the team on-call roster.
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support.
Continuously monitor the health and performance of our services, systems, and infrastructure.
Granicus is driven by the excitement of building, implementing, and maintaining technology that is transforming the Govtech industry by bringing governments and its constituents together. They have served 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers.
Collaborate with application engineering teams on platform infrastructure.
Enhance observability and spearhead the adoption of SRE best practices.
Build and maintain reliable CI/CD pipelines, tooling, and infrastructure.
Rula strives to provide quality, evidence-based, compassionate mental healthcare and aims to create a world where mental health is no longer stigmatized. They are a remote-first company operating in most U.S. states, and are dedicated to having a culture of inclusion that supports their employees.
Build and roll out new features with your team, iterating based on results.
Drive projects from initial ideation to operational deployment.
Analyze, design, and build modular solutions for complex challenges.
Jobgether is a platform helping candidates find the right job. They use AI-powered matching to ensure every application is reviewed quickly, objectively, and fairly against the core requirements.