Lead maintenance and operations for production and development environments.
Architect and implement complex solutions spanning OS, virtualization, network, and cloud layers.
Lead automation initiatives for infrastructure provisioning and operational tasks.
NMI enables partners with choice in payments, challenging the one-size-fits-all approach. They power innovative tech for SMBs, entrepreneurs, and fintech startups, fostering a diverse and welcoming workplace with a dedicated Diversity, Equity & Inclusion action group.
You will design, build, and maintain observability platform tools and frameworks. This role involves designing and implementing systems that monitor and analyze the performance/health of software applications and infrastructure. You will collaborate closely with development, site reliability engineering, DevOps, and infrastructure teams.
Wellmark is a mutual insurance company owned by its policy holders across Iowa and South Dakota, building its reputation on over 80 years of trust.
Design, implement, monitor and maintain Sysdig's Infrastructure at scale on different clouds and on-prem. Collaborate with development teams to improve system reliability, performance, and scalability. Participate in on-call rotation, respond to incidents, conduct root cause analyses, and implement preventive measures.
Sysdig helps organizations secure innovation in the cloud with runtime insights, open innovation, and agentic AI, trusted by over 60% of the Fortune 500.
Lead and manage the Platform Engineering team, providing technical guidance and mentorship. Design, build, and evangelize Golden Paths and Service Scaffolding to reduce friction across the development lifecycle. Oversee the design, implementation, and maintenance of Shared DB Platforms, ensuring optimal performance, integrity, and security across the organization.
Founded in 2012, EasyPost is a YC unicorn whose mission is to make shipping simple for businesses from garage startups to the Fortune 500.
Design, implement, and evolve large-scale, cloud-native infrastructure supporting MariaDB's global SaaS platform. Lead reliability and scalability initiatives, driving automation and resilience through infrastructure-as-code and GitOps practices. Proactively identify and remediate systemic reliability issues, ensuring high service availability and performance across multi-cloud environments.
MariaDB is making a big impact on the world and is the backbone of applications used everyday, including 75% of the Fortune 500 companies.
Be a keen learner, working with cloud-native, highly scalable infrastructure and gaining expertise in container orchestration, networking, and observability.
Be a passionate problem solver, tackling scalability, reliability, and troubleshooting challenges in distributed systems.
Be a great communicator, engaging directly with developers, engineering teams, and product teams to understand infrastructure challenges and provide solutions.
Temporal provides an open-source programming model that simplifies code, improves application reliability, and helps developers focus on delivering features faster. They aim to be the reliable foundation of every developer’s toolbox and value curiosity, drive, collaboration, genuineness, and humility.
Design and evolve infrastructure systems to ensure scalability, reliability, and cost efficiency.
Lead and mentor a distributed infrastructure team, fostering a collaborative and inclusive culture.
Oversee all cloud environments supporting MZLA’s products and business systems.
MZLA Technologies Corporation (MZLA) is a wholly owned, for-profit subsidiary of the Mozilla Foundation and home to Thunderbird. They are a small but growing team of 50+ people distributed across seven countries building an open-source email and productivity platform.
Designs, implements, and continuously improves observability strategies across services.
Focuses on understanding system behavior in production, identifying failure modes, performance bottlenecks, and reliability risks.
Evolves and maintains shared AWS CDK and CDK8s constructs, with emphasis on observability, autoscaling, and operational safeguards.
Truelogic is a leading provider of nearshore staff augmentation services. They have a team of 600+ highly skilled tech professionals based in Latin America, partnering with U.S. companies on impactful projects and valuing expertise and aspirations.
Seeking an experienced Site Reliability Engineer to help build highly resilient and scalable systems by automating, measuring, and monitoring everything. Implement highly-available and scalable architectures for core and third-party components of Acquia Source. Implement metrics, monitoring, and incident response processes.
Acquia is an open source digital experience company providing technology to brands that allows them to embrace innovation and create customer moments that matter.
Responsible for automating infrastructure, maintaining system reliability, and bridging the gap between operations and database management. Design, deploy, and manage scalable infrastructure on Google Cloud Platform (GCP). Implement and maintain CI/CD pipelines for seamless deployment.
Miratech is a global IT services and consulting company that brings together enterprise and start-up innovation to support digital transformation.
As a Senior Software Engineer, Enterprise Platform at Vanta, you will build and operate systems that power Vanta’s FedRAMP environments, including automated release, vulnerability remediation, and evidence generation pipelines that meet strict compliance timelines. You will also define and evolve Vanta’s production reliability framework, including SLOs, incident response patterns, observability standards, service catalog, metrics dashboards, and the Vanta SLA definition. You will identify and solve complex scalability and performance challenges, particularly related to service reliability and data throughput.
Vanta helps businesses earn and prove trust by empowering companies to practice better security and prove it with ease.
Help build and operate core cloud-native systems including VKE, VLB, VCR, Vultr Inference, NAT Gateways, and our internal APIs. The ideal candidate has a strong understanding of Kubernetes components, container runtime internals, and modern IaC/automation practices. This role will have a direct impact on Vultr’s global cloud infrastructure footprint.
Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world.
Own challenging infrastructure problems end-to-end by understanding how engineers use the platform.
Design scalable, maintainable services and contribute to technical proposals.
Contribute to the roadmap, highlighting opportunities, validating approaches and helping keep our platform solutions current with cloud best practices.
Canva's intuitive suite of design products is powered by our large distributed infrastructure group, setting large and ambitious goals.
Lead and mentor a team of Specialists, fostering a culture of ownership and continuous learning.
Enable Change and Problem Management teams to leverage Datadog observability tools for evaluating release quality.
Oversee implementation and optimization of CI/CD Observability pipelines to ensure Operational Readiness standards are met.
BWH Hotels is a global leader in hospitality for nearly 80 years, inspiring travel through unique experiences. Headquartered in Phoenix, Arizona, BWH Hotels boasts a powerful portfolio of 18 brands and they foster a workplace culture where contributions truly matter.
Lead the Reliability & Operations function within the Developer & Production Enablement (DPE) division of RWS’s Product & Technology organization. Take ownership of global production operations and lead the transition from manual, ticket-based workflows to platform-integrated automation. Ensure stability today, while designing for scalability and autonomy in the future.
RWS's purpose is to unlock global understanding, valuing every language and culture, and celebrating diversity and inclusion to make the company strong.
Act as a trusted technical advisor to enterprise customers, bridging the gap between product and customer outcomes. Design, demonstrate, and validate Dash0’s technical capabilities in real-world environments through Proofs of Concept (POCs). Partner with sales and product teams to guide observability architecture discussions and ensure customers realize the full technical value of Dash0.
Dash0 is building a delightful, simple, and AI-centric platform that eliminates vendor lock-in and meaningless toil for observability.
Influence and align cross-functional teams on platform evolution.
Architect and evolve hypervisor integrations across thousands of hosts.
Drive advanced performance tuning across CPU, memory, I/O, networking, and storage layers.
Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world.
Support the evolution of our platform by improving scalability, reliability, observability, and security. Proactively identify bottlenecks and unlock the autonomy of the entire engineering team. Maintain infrastructure & deployment pipelines and collaborate with engineering teams on architectural decisions and production-readiness practices.
Feegow joined the Docplanner Group, a health-tech company, in 2022 and is dedicated to developing innovative solutions for physicians and managers.