Source Job

  • Develop and maintain RESTful APIs that facilitate the effective management of GPU clusters, virtual machines, and dedicated servers.
  • Enhance CI/CD processes and infrastructure reliability through proactive service monitoring and problem resolution.
  • Design and implement system architectures that support high availability and disaster recovery principles.

Python Docker Kubernetes AWS Azure

20 jobs similar to Software Python Engineer (GPU Cloud)

Jobs ranked by similarity.

Europe

  • Develop and maintain scalable Python applications.
  • Design and implement chatbot applications using generative AI technologies.
  • Act as a trusted advisor within the team, guiding best practices, raising risks early, and ensuring solutions meet both technical and business needs.

Provectus is an AI consultancy and solutions provider. They help companies across industries embrace AI and solve their most complex challenges.

  • Design and implement foundational patterns and libraries for Python applications.
  • Develop and maintain robust CI/CD pipelines using tools such as Jenkins, ArgoCD.
  • Instrument observability through tools such as CloudWatch and DataDog to monitor and optimize application performance across multiple environments.

As a leader in aging care innovation, Honor provides the technology, tools, and services that empower older adults to live life on their own terms.

Australia US

  • Design resource management systems provisioning and orchestrating compute across AWS, GCP, and Azure using infrastructure-as-code.
  • Architect fault-tolerant infrastructure for distributed ML, GPU clusters, NVIDIA runtime, S3 checkpointing, Large dataset management and streaming, health monitoring.
  • Build systems that simulate and handle real-world network conditions — bandwidth shaping, latency injection, packet loss.

Pluralis Research is pioneering Protocol Learning—a fully decentralised way to train and deploy AI models that opens this layer to individuals rather than well resourced corporates.

Mirantis is looking for a talented Systems/DevOps engineer to join our product team and will be designing, implementing, deploying and testing cloud infrastructure products on top of open-source components. Deploy, test, and evaluate Mirantis K8S-related products. Manage and maintain bare metal test environments, ensuring optimal performance, security, and availability.

Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure.

UK

Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.

NICE software products are used by 25,000+ global businesses to deliver extraordinary customer experiences, fight financial crime and ensure public safety.

Design, implement, monitor and maintain Sysdig's Infrastructure at scale on different clouds and on-prem. Collaborate with development teams to improve system reliability, performance, and scalability. Participate in on-call rotation, respond to incidents, conduct root cause analyses, and implement preventive measures.

Sysdig helps organizations secure innovation in the cloud with runtime insights, open innovation, and agentic AI, trusted by over 60% of the Fortune 500.

Europe 4w PTO

Design, build, and own AWS-based MLOps infrastructure, defining standards for security, automation, cost-efficiency, and governance. Architect and operate production Kubernetes clusters, including containerizing and deploying ML models using Docker and Helm. Build and maintain CI/CD pipelines for training, validation, and deployment of ML workloads, implementing canary, blue-green, and rollback strategies.

Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.

Responsible for automating infrastructure, maintaining system reliability, and bridging the gap between operations and database management. Design, deploy, and manage scalable infrastructure on Google Cloud Platform (GCP). Implement and maintain CI/CD pipelines for seamless deployment.

Miratech is a global IT services and consulting company that brings together enterprise and start-up innovation to support digital transformation.

US

  • Lead hands-on development of robust, scalable, and secure RESTful and event-driven APIs using FastAPI and OpenAPI 3.0+.
  • Own backend architecture and technical execution, serving as a primary contributor to the codebase.
  • Establish and enforce best practices for API design, versioning, documentation, and maintainability.

Truelogic is a leading provider of nearshore staff augmentation services headquartered in New York. Their team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects.

US 3w PTO

  • Minimum 5 years of related experience in software engineering, or an equivalent combination of education/experience.
  • Proficiency in Python, React and AWS Cloud Build Microservices that connect to NoSQL databases, DynamoDB preferred.
  • Build software components that integrate with a workflow engine and/or ESB to execute asynchronous business processes.

Railroad19 builds custom solutions and provides clients with top tier development services. They are a specialized team of developers and architects with a culture built on hard work and a desire to be thought leaders in the industry, that values your work and gives you the tools you need to succeed, while offering you a work/life balance.

US Europe Unlimited PTO

  • Contribute to and review PRs for dapr/dapr-agents, dapr/python-sdk, dapr/durabletask-python and dapr/docs upstream
  • Collaborate with partners on integration of Agentic Frameworks into the Dapr ecosystem
  • Participate in promoting the Dapr project with Blog posts, recordings and demos

Diagrid provides developers with APIs and tools that help them focus on their code and not on infrastructure.

US

  • Maintaining and improving Linux-based VPS infrastructure.
  • Managing Dockerized services and applications, implementing automation scripts and CI/CD pipelines.
  • Contributing to backend services and APIs using Python or TypeScript.

LAULAU is an AI startup. They are building a growing technology team.

India

  • Design and manage AWS infrastructure for AI services.
  • Implement Infrastructure as Code using Terraform.
  • Collaborate with cross-functional teams to enhance performance.

Jobgether uses an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

  • Design, implement, and manage infrastructure for our cloud-based platforms (AWS).
  • Create and automate deployment pipelines using CI/CD tools (Gitlab / Github Actions).
  • Ensure system scalability, availability, and reliability through proactive monitoring and automation.

Prompt is revolutionizing healthcare by delivering highly automated and modern B2B enterprise software to rehab therapy businesses, the teams within, and the patients they serve.

$141,487–$184,800/yr
Europe

  • Design scalable, future-proof data platforms optimized for AI research workloads.
  • Build efficient self-serve data processing pipelines leveraging GCP's advanced services.
  • Implement guardrails for cost, quality, and performance.

AssemblyAI is at the forefront of Speech AI, creating powerful models for speech-to-text and speech understanding via an API. They're a remote team of startup veterans and AI researchers looking to build one of the next great AI companies.

$215,000–$245,000/yr
US

  • Lead the architecture and development of our off-robot software stack — including fleet management services, telemetry ingestion, and customer-facing APIs.
  • Design and implement APIs and SDKs for integration with external systems, partner portals, and data analytics platforms.
  • Mentor and lead engineers fostering a culture of reliability and technical excellence.

At Cobot, we’re creating the software backbone that powers our fleet and are looking for a Tech Lead Manager, Fleet Management to help us lead the way.

US

  • Design, develop, and maintain secure RESTful APIs using Python frameworks.
  • Build containerized microservices and deploy applications in AWS or other cloud environments.
  • Develop data processing and ETL pipelines, including SQL/NoSQL databases, and apply NLP techniques to documents.

Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

  • Design, build, and optimize high-performance systems in Python supporting AI data pipelines and evaluation workflows
  • Develop full-stack tooling and backend services for large-scale data annotation , validation, and quality control
  • Improve reliability, performance, and safety across existing Python codebases

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.

US

  • Genuinely excited about technology, continuous learning, and shaping modern engineering practices.
  • Strong interest in Python development, search technologies, B2B services, and AWS-based serverless architectures.
  • Passionate about finding simple, efficient solutions to complex software challenges within a collaborative team culture.

NBCUniversal is a leading media and entertainment company creating world-class content across film, television, and streaming. As a subsidiary of Comcast Corporation, they own and operate major entertainment and news brands and have renowned theme parks and attractions.

$150,100–$188,100/yr
US Canada 2w PTO 12w maternity 12w paternity

  • Create and test reliable cloud infrastructure services that support Webflow’s range of products.
  • Balance reliability, scalability, and cost efficiency concerns while refactoring and modernizing existing services.
  • Collaborate with product engineering teams to deliver new solutions for services and ways of working that might not exist yet.

Webflow is the leading visual development platform for building powerful websites without writing code.