Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.
VGS is the world's leader in payment tokenization, empowering clients and partners by tokenizing sensitive payment data and limiting compliance scope. They embed a universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks and more. While the job posting doesn't specify size, they appear to have a culture that values transparency, collaboration, grit, and humility.
Design, scale, and operate resilient, cloud-native infrastructure in AWS with an emphasis on EKS, IAM, RBAC, and modern security-first practices.
Build and optimize CI/CD pipelines with GitHub Actions and GitHub Advanced Security enabling velocity without compromising safety.
Own observability across the stack using Datadog (metrics, logging, alerting, and tracing).
DexCare optimizes time in healthcare, streamlining patient access, reducing waits, and enhancing overall experiences. They are committed to creating an inclusive workplace where diversity drives innovation and belonging strengthens collaboration, enabling everyone to thrive.
Own and improve database reliability, performance, and scalability; participate in incident response.
Partner with engineering teams to design, build, and operate scalable, fault-tolerant, and secure distributed systems.
Build tools, automation, and frameworks that eliminate toil, reduce operational overhead, and establish best practices.
Boulevard provides a client experience platform for appointment-based, self-care businesses, empowering customers to enhance client interactions. They value diversity, curiosity, and simple solutions, fostering an inclusive and open environment for employees to perform their best work.
Designs, implements, and continuously improves observability strategies across services.
Focuses on understanding system behavior in production, identifying failure modes, performance bottlenecks, and reliability risks.
Evolves and maintains shared AWS CDK and CDK8s constructs, with emphasis on observability, autoscaling, and operational safeguards.
Truelogic is a leading provider of nearshore staff augmentation services. They have a team of 600+ highly skilled tech professionals based in Latin America, partnering with U.S. companies on impactful projects and valuing expertise and aspirations.
Ensure the smooth operation and high availability of Clarifai's core services
Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
Design and implement scalable, secure, and cost-effective infrastructure solutions
Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.
Lead the Reliability & Operations function within the Developer & Production Enablement (DPE) division of RWS’s Product & Technology organization. Take ownership of global production operations and lead the transition from manual, ticket-based workflows to platform-integrated automation. Ensure stability today, while designing for scalability and autonomy in the future.
RWS's purpose is to unlock global understanding, valuing every language and culture, and celebrating diversity and inclusion to make the company strong.
Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.
Configure and maintain cloud infrastructure automation using Terraform, focusing on CDN optimization and content delivery performance
Develop capacity planning strategies and performance optimization initiatives for high-volume spatial content delivery.
Instrument services to understand system health.
Miris is a cutting-edge technology company building the future of 3D content delivery at global scale. Our mission is to empower creators and developers to deliver high-fidelity, photorealistic 3D experiences to billions of users instantly, seamlessly, and across all major platforms and devices.
Own the strategy and execution for Runtime Platform.
Set the technical direction, build and develop the team, and are accountable for outcomes.
Translate product needs into platform capabilities and building trust through consistent delivery.
Wealthsimple aims to help everyone achieve financial freedom by reimagining how people manage their money. As the largest fintech company in Canada, it has over 3+ million users and manages more than $100 billion in assets, fostering inclusive and high-performing teams.
Build and evolve the infrastructure foundations that support Fanvue’s move toward a service-oriented architecture
Enable stream teams to deploy and operate services independently using platform-provided tooling and patterns
Design and maintain AWS infrastructure using AWS CDK (TypeScript), with a strong focus on safety, reuse, and automation
Fanvue is the fastest-growing creator monetisation platform in the creator economy. We are the leading AI-powered creator-first platform, designed to empower creators worldwide to directly monetise their audience.
Lead maintenance and operations for production and development environments.
Architect and implement complex solutions spanning OS, virtualization, network, and cloud layers.
Lead automation initiatives for infrastructure provisioning and operational tasks.
NMI enables partners with choice in payments, challenging the one-size-fits-all approach. They power innovative tech for SMBs, entrepreneurs, and fintech startups, fostering a diverse and welcoming workplace with a dedicated Diversity, Equity & Inclusion action group.
Implement and maintain observability tools and dashboards using [e.g., AWS CloudWatch, Datadog, Sentry, OpenTelemetry].
Assist with cloud cost visibility and optimization, analyze infrastructure usage patterns to identify waste and implement aggressive tagging strategies.
Manage the tooling and processes for deploying applications to AWS EKS / Kubernetes / ECS / Serverless and facilitate modern deployment strategies.
True is a global platform of companies that optimizes value creation by placing executive talent, developing business leaders, creating diverse and inclusive networks, and using innovative technology to advance executive talent priorities. True was founded on the belief that doing good is the pathway to doing well and their growth and success are a by-product of their values treating people right, listening to new ideas and keeping culture at the heart of their business.
Oversee the reliability, scalability, performance, and security of key production services.
Collaborate with cross-functional teams to develop and maintain resilient infrastructure.
Provide expert mentorship and guidance on best practices to engineers throughout the organization.
Cision is a global leader in PR, marketing and social media management technology and intelligence, helping brands and organizations connect with customers and stakeholders to drive business results. The company has offices in 24 countries throughout the Americas, EMEA and APAC.
Lead and mentor multiple teams across SRE, cloud infrastructure, and platform engineering functions.
Drive multi-team initiatives to deliver scalable, secure, and cost-efficient infrastructure leveraging AWS-native and serverless technologies.
Drive adoption of FinOps practices and partner with finance and product teams on budgeting and forecasting.
Model N is the leader in revenue optimization and compliance for pharmaceutical, medtech, and high-tech innovators. Model N is trusted by over 150 of the world’s leading companies across more than 120 countries.
Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.
NICE software products are used by 25,000+ global businesses to deliver extraordinary customer experiences, fight financial crime and ensure public safety.
Maintaining and updating Glia’s core infrastructure.
Troubleshooting and resolving infrastructure-related issues.
Improving our security posture.
Glia provides an AI customer service solution for banks and credit unions, unifying AI and human agents across every voice and digital conversation through its ChannelLess® Architecture. Valued at over $1 billion, Glia powers over 700 financial institutions and is certified as a Great Place to Work, with 98% employee satisfaction.
Own and operate AWS Aurora (PostgreSQL) in a high-load production environment.
Design and evolve schemas for large transactional domains.
Analyze and optimize slow queries and production metrics.
Ruby Labs is a leading tech company that creates and operates innovative consumer products, offering opportunities across health, education, and entertainment. Their innovative teams are driving the future of consumer-led products.
Shape the way Scalable runs microservices in a performant, secure, and cost-efficient way. Collaborate with cross-functional teams to understand scalability requirements. Develop and maintain internal tooling around Monitoring, Developer Portal, and Load Testing.
Scalable Capital is a leading digital investment and banking platform with a full banking licence, empowering people across Europe to shape their own finances.