Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.
Architect and maintain self-healing systems with 99.9%+ availability targets.
Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns.
Implement adaptive SLIs/SLOs that evolve automatically from real-time data.
Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. Even with thousands of employees spread across multiple continents, they still maintain a culture that inspires innovation, rewards risk-taking and celebrates success.
Own the reliability, performance, and operational health of production AI systems.
Lead efforts to refactor and harden the AI codebase.
Design and build monitoring, alerting, and debugging tools.
MixMode is a leading provider of AI-powered cybersecurity solutions at scale, pioneering a patented third-wave, context-aware AI approach. Large organizations with big data workloads trust MixMode to defend their most important assets.
Building world-class AI infrastructure to support a 100+ person research team.
Designing and scaling multi-cloud systems that support high-performance model training and inference.
Improving monitoring, alerting and system observability for AI workloads.
Canva is redefining how the world experiences design. They have campuses in Sydney and Melbourne, co-working spaces in Brisbane, Perth, Adelaide and Auckland, and trust their employees to choose the balance that empowers them and their team to achieve their goals.
Architect, operate, improve and secure the platform the Garner Health app runs on
Boost development velocity and productivity
Build systems to a high engineering standard and hold others to the same high standard
Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.
Understand and participate in the changing FedRAMP space.
Own and champion high operational standards of Confluent Cloud systems leveraged by federal agencies.
Innovate and design solutions to reduce toil, bolster operational maturity, and make day-to-day worklife easier.
Confluent is rewriting how data moves and what the world can do with it. Their platform puts information in motion, streaming in near real-time so companies can react faster and build smarter. They value team players who ask hard questions, give honest feedback, and show up for each other.
Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.
NICE software products are used by 25,000+ global businesses to deliver extraordinary customer experiences, fight financial crime and ensure public safety.
Design, create, and maintain software and systems to improve the availability, scalability, and efficiency of Thumbtack's services
Set the architectural direction of infrastructure and platform services while supporting the engineering organization
Design and implement tools and processes used for deployment, change, service, and infrastructure management
Thumbtack helps millions of people confidently care for their homes through personalized guidance, AI tools, and a hiring experience. They have a growing community of 300,000 local service businesses.
Work with research teams to design and build our training infrastructure
Prototype new training frameworks and production-ize solutions at scale
Design, optimize and test model integration infrastructure
Clarifai is a leading AI platform specializing in computer vision, NLP, LLMs, and audio recognition, helping organizations transform unstructured data into structured data. Founded in 2013, they remotely operate across multiple countries with backing from industry leaders, fostering a diverse and equal opportunity workplace.
Shape the way Scalable runs microservices in a performant, secure, and cost-efficient way. Collaborate with cross-functional teams to understand scalability requirements. Develop and maintain internal tooling around Monitoring, Developer Portal, and Load Testing.
Scalable Capital is a leading digital investment and banking platform with a full banking licence, empowering people across Europe to shape their own finances.
Oversee the reliability, scalability, performance, and security of key production services.
Collaborate with cross-functional teams to develop and maintain resilient infrastructure.
Provide expert mentorship and guidance on best practices to engineers throughout the organization.
Cision is a global leader in PR, marketing and social media management technology and intelligence, helping brands and organizations connect with customers and stakeholders to drive business results. The company has offices in 24 countries throughout the Americas, EMEA and APAC.
Hire, lead, and support a high-performing Infrastructure Platforms team.
Connect business goals and customer needs with sound engineering.
Guide the security, reliability, performance, and scalability of core platform components.
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Their mission is to enable everyone to contribute to and co-create the software that powers our world.
Design, create, and maintain software and systems to improve the availability, scalability, and efficiency of Thumbtack's services.
Set the architectural direction of infrastructure and platform services while supporting the engineering organization.
Troubleshoot and debug critical systems throughout the SDLC.
Thumbtack helps millions of people confidently care for their homes by offering personalized guidance, AI tools, and a hiring experience. They have a growing community of 300,000 local service businesses and value a cross functional collaborative culture.
Own deployment engineering projects, leading the technical execution of Parloa’s deployments inside large, complex enterprise environments.
Design for scale and resilience, architecting deployment solutions that meet enterprise-grade requirements for performance, reliability, and security.
Engineer solutions where none exist, building custom extensions, integrations, and configurations to close product gaps and meet enterprise requirements.
Parloa is a fast-growing startup in the world of Generative AI and customer service. Their voice-first GenAI platform automates customer service with natural-sounding conversations and has over 400+ employees in Berlin, Munich, and New York.
Lead the Reliability & Operations function within the Developer & Production Enablement (DPE) division of RWS’s Product & Technology organization. Take ownership of global production operations and lead the transition from manual, ticket-based workflows to platform-integrated automation. Ensure stability today, while designing for scalability and autonomy in the future.
RWS's purpose is to unlock global understanding, valuing every language and culture, and celebrating diversity and inclusion to make the company strong.
Designs, implements, and continuously improves observability strategies across services.
Focuses on understanding system behavior in production, identifying failure modes, performance bottlenecks, and reliability risks.
Evolves and maintains shared AWS CDK and CDK8s constructs, with emphasis on observability, autoscaling, and operational safeguards.
Truelogic is a leading provider of nearshore staff augmentation services. They have a team of 600+ highly skilled tech professionals based in Latin America, partnering with U.S. companies on impactful projects and valuing expertise and aspirations.
As a Platform Engineer, enhance and maintain foundational tools and systems, working hands-on with Kubernetes clusters and AWS infrastructure. Build and maintain services that abstract and orchestrate our infrastructure, designing and implementing backend services like APIs and controllers. Develop software for complex projects, and manage infrastructure migrations and security tooling.
Monzo is on a mission to make money work for everyone, waving goodbye to the complicated ways of traditional banking, offering personal and business bank accounts.
Automate infrastructure provisioning, configuration management, monitoring, and operational workflows using IaC and scripting languages.
Own the deployment, maintenance, and lifecycle management of systems supporting engineering, leveraging deep expertise in Kubernetes.
Troubleshoot complex infrastructure and application issues, driving root-cause analysis and developing long-term remediation solutions
SingleStore delivers the cloud-native database with the speed and scale to power the world’s data-intensive applications. They are venture-backed and headquartered in San Francisco with offices in Sunnyvale, Raleigh, Seattle, Boston, London, Lisbon, Bangalore, Dublin and Kyiv.