Source Job

$174,600–$220,000/yr
US

  • Lead capacity planning, autoscaling, and performance optimization across our application.
  • Define and enforce best practices for scalability, reliability, observability, and infrastructure resilience.
  • Conduct architectural reviews and propose improvements to enhance performance and cost efficiency.

AWS Terraform Kubernetes Python Automation

20 jobs similar to Staff Engineer Cloud Scalability

Jobs ranked by similarity.

$150,100–$188,100/yr
US Canada 2w PTO 12w maternity 12w paternity

  • Create and test reliable cloud infrastructure services that support Webflow’s range of products.
  • Balance reliability, scalability, and cost efficiency concerns while refactoring and modernizing existing services.
  • Collaborate with product engineering teams to deliver new solutions for services and ways of working that might not exist yet.

Webflow is the leading visual development platform for building powerful websites without writing code.

$120,000–$140,000/yr

  • Design and plan cloud-native systems aligned with business goals and security best practices.
  • Implement and support AI-based automation tools and services.
  • Continuously tune cloud and automation workloads to improve reliability and performance.

PerfectServe offers unified healthcare communication solutions to help physicians, nurses, and care team members provide exceptional patient care.

$160,000–$182,000/yr
US

  • Lead and mentor multiple teams across SRE, cloud infrastructure, and platform engineering functions.
  • Drive multi-team initiatives to deliver scalable, secure, and cost-efficient infrastructure leveraging AWS-native and serverless technologies.
  • Drive adoption of FinOps practices and partner with finance and product teams on budgeting and forecasting.

Model N is the leader in revenue optimization and compliance for pharmaceutical, medtech, and high-tech innovators. Model N is trusted by over 150 of the world’s leading companies across more than 120 countries.

Design, implement, and maintain cloud infrastructure and deployment pipelines across AWS environments. Ensure efficient CI/CD operations and infrastructure automation. Uphold high platform reliability and security standards.

Software Mind develops solutions that make an impact for companies around the globe.

$95,000–$175,000/yr

  • Provide architecture plans for multiple cloud-based applications supporting stakeholders.
  • Analyze performance and ensure applications meet the scalability and reliability needs of internal teams.
  • Identify and troubleshoot performance bottlenecks and reliability issues across the stack.

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster.

Germany

Shape the way Scalable runs microservices in a performant, secure, and cost-efficient way. Collaborate with cross-functional teams to understand scalability requirements. Develop and maintain internal tooling around Monitoring, Developer Portal, and Load Testing.

Scalable Capital is a leading digital investment and banking platform with a full banking licence, empowering people across Europe to shape their own finances.

$125,000–$169,000/yr
Unlimited PTO

  • Design, scale, and operate resilient, cloud-native infrastructure in AWS with an emphasis on EKS, IAM, RBAC, and modern security-first practices.
  • Build and optimize CI/CD pipelines with GitHub Actions and GitHub Advanced Security enabling velocity without compromising safety.
  • Own observability across the stack using Datadog (metrics, logging, alerting, and tracing).

DexCare optimizes time in healthcare, streamlining patient access, reducing waits, and enhancing overall experiences. They are committed to creating an inclusive workplace where diversity drives innovation and belonging strengthens collaboration, enabling everyone to thrive.

Nigeria

Design, deploy, and maintain cloud infrastructure solutions, adhering to security guidelines. Monitor cloud infrastructure and applications, addressing performance bottlenecks and security vulnerabilities. Implement automation tools/IaC to streamline provisioning and deployment of cloud resources.

Moniepoint is an all-in-one financial services platform for emerging markets and the second-fastest growing company in Africa.

Brazil 26w maternity 4w paternity

Support the evolution of our platform by improving scalability, reliability, observability, and security. Proactively identify bottlenecks and unlock the autonomy of the entire engineering team. Maintain infrastructure & deployment pipelines and collaborate with engineering teams on architectural decisions and production-readiness practices.

Feegow joined the Docplanner Group, a health-tech company, in 2022 and is dedicated to developing innovative solutions for physicians and managers.

$140,000–$190,000/yr
US Canada Unlimited PTO

  • Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
  • Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
  • Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.

VGS is the world's leader in payment tokenization, empowering clients and partners by tokenizing sensitive payment data and limiting compliance scope. They embed a universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks and more. While the job posting doesn't specify size, they appear to have a culture that values transparency, collaboration, grit, and humility.

$120,000–$205,000/yr
US

  • Dive into client environments to explore application workloads, infrastructure dependencies, and security controls.
  • Aid in the design and implement migration strategies to reduce risks and unlock automation opportunities.
  • Develop scalable and secure infrastructure using Infrastructure as Code (IaC) tools.

Kunai builds full-stack technology solutions for banks, credit and payment networks, infrastructure providers, and their customers.

  • Lead the design, implementation, and continuous improvement of our cloud-native platform infrastructure.
  • Create and maintain tooling and automation that improves efficiency and developer experience.
  • Drive platform optimization initiatives focused on performance, cost efficiency, and reliability.

Intelerad's medical imaging solutions streamline the flow of information, simplifying complex processes, maximizing efficiencies, and shining a light on the unknown.

Canada

  • Operate and optimize AWS environments for security, reliability, and scalability.
  • Implement and maintain security frameworks across cloud infrastructure.
  • Automate deployments and configurations using Infrastructure as Code (IaC) tools like Terraform.

SurveyMonkey is the world’s most popular platform for surveys and forms, built for business—loved by users that helps teams gather insights and information.

  • Design and implement foundational patterns and libraries for Python applications.
  • Develop and maintain robust CI/CD pipelines using tools such as Jenkins, ArgoCD.
  • Instrument observability through tools such as CloudWatch and DataDog to monitor and optimize application performance across multiple environments.

As a leader in aging care innovation, Honor provides the technology, tools, and services that empower older adults to live life on their own terms.

US Unlimited PTO

Architect, build, and maintain secure, scalable, HIPAA- and HITRUST-compliant infrastructure on multiple cloud platforms (AWS and Azure). Design, implement, and manage scalable, secure, and highly available cloud infrastructure. Collaborate with engineering, product, and security teams to design robust infrastructure solutions.

Abacus Insights is changing the way healthcare works for you and is on a mission to unlock the power of data.

India

  • Design and manage AWS infrastructure for AI services.
  • Implement Infrastructure as Code using Terraform.
  • Collaborate with cross-functional teams to enhance performance.

Jobgether uses an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

  • Design, implement, and manage infrastructure for our cloud-based platforms (AWS).
  • Create and automate deployment pipelines using CI/CD tools (Gitlab / Github Actions).
  • Ensure system scalability, availability, and reliability through proactive monitoring and automation.

Prompt is revolutionizing healthcare by delivering highly automated and modern B2B enterprise software to rehab therapy businesses, the teams within, and the patients they serve.

US Unlimited PTO

  • Implement and maintain observability tools and dashboards using [e.g., AWS CloudWatch, Datadog, Sentry, OpenTelemetry].
  • Assist with cloud cost visibility and optimization, analyze infrastructure usage patterns to identify waste and implement aggressive tagging strategies.
  • Manage the tooling and processes for deploying applications to AWS EKS / Kubernetes / ECS / Serverless and facilitate modern deployment strategies.

True is a global platform of companies that optimizes value creation by placing executive talent, developing business leaders, creating diverse and inclusive networks, and using innovative technology to advance executive talent priorities. True was founded on the belief that doing good is the pathway to doing well and their growth and success are a by-product of their values treating people right, listening to new ideas and keeping culture at the heart of their business.

Design, implement, monitor and maintain Sysdig's Infrastructure at scale on different clouds and on-prem. Collaborate with development teams to improve system reliability, performance, and scalability. Participate in on-call rotation, respond to incidents, conduct root cause analyses, and implement preventive measures.

Sysdig helps organizations secure innovation in the cloud with runtime insights, open innovation, and agentic AI, trusted by over 60% of the Fortune 500.

UK

Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.

NICE software products are used by 25,000+ global businesses to deliver extraordinary customer experiences, fight financial crime and ensure public safety.