Design resource management systems provisioning and orchestrating compute across AWS, GCP, and Azure using infrastructure-as-code.
Architect fault-tolerant infrastructure for distributed ML, GPU clusters, NVIDIA runtime, S3 checkpointing, Large dataset management and streaming, health monitoring.
Build systems that simulate and handle real-world network conditions — bandwidth shaping, latency injection, packet loss.
Develop and maintain RESTful APIs that facilitate the effective management of GPU clusters, virtual machines, and dedicated servers.
Enhance CI/CD processes and infrastructure reliability through proactive service monitoring and problem resolution.
Design and implement system architectures that support high availability and disaster recovery principles.
Gcore is a global provider of infrastructure and software solutions for AI, cloud, network, and security, powering everything from real-time communication.
As a Senior MLE, debug complex AI implementations and optimize inference performance. Work directly with product teams building solutions and develop blueprints for proven patterns. Operate in a high-velocity environment where priorities shift rapidly based on team needs.
Join the team redefining how the world experiences design.
Design, build, and own AWS-based MLOps infrastructure, defining standards for security, automation, cost-efficiency, and governance. Architect and operate production Kubernetes clusters, including containerizing and deploying ML models using Docker and Helm. Build and maintain CI/CD pipelines for training, validation, and deployment of ML workloads, implementing canary, blue-green, and rollback strategies.
Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.
Lead engineering teams responsible for Edge Traffic Infrastructure, ensuring networking architecture remains resilient.
Define and deliver a vision, strategy, and roadmap for system ingress and egress points, tying it to business impact.
Collaborate with infrastructure, security, and engineering teams to build robust networking solutions.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Lead development stages for AI/ML projects from exploration to maintenance.
Design and implement scalable ML pipelines for large datasets with data scientists and network security experts.
Conduct experiments and analyze results using metrics and visualization techniques.
Corelight is a cybersecurity company that transforms network and cloud activity into evidence for elite defenders. Fueled by accelerating revenue and investments from top-tier venture capital organizations, they are rapidly expanding their team with a geographically dispersed yet connected employee base.
Architect, build, test, and monitor AWS-based workflows to solve critical business problems
Develop microservices for ML-driven applications using Python or Java, ensuring scalability and resilience.
Guarantee high levels of service availability through participation in an on-call rotation.
PlayStation is a global leader in interactive and digital entertainment. They've thrilled gamers since 1994 and are a wholly-owned subsidiary of Sony Corporation, striving to create an inclusive environment that empowers employees and embraces diversity.
Define and own the multi-year technical vision for Docker's foundational platform.
Establish strategic plans and objectives for major platform initiatives, ensuring effective achievement of Docker's business objectives.
Lead large cross-company programs that require coordination across multiple engineering organizations.
Docker makes app development easier so developers can focus on what matters. With over 20 million monthly users and 20 billion image pulls, Docker is a trusted tool for building, sharing, and running apps.
Design, implement, and manage infrastructure for our cloud-based platforms (AWS).
Create and automate deployment pipelines using CI/CD tools (Gitlab / Github Actions).
Ensure system scalability, availability, and reliability through proactive monitoring and automation.
Prompt is revolutionizing healthcare by delivering highly automated and modern B2B enterprise software to rehab therapy businesses, the teams within, and the patients they serve.
Mirantis is looking for a talented Systems/DevOps engineer to join our product team and will be designing, implementing, deploying and testing cloud infrastructure products on top of open-source components. Deploy, test, and evaluate Mirantis K8S-related products. Manage and maintain bare metal test environments, ensuring optimal performance, security, and availability.
Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure.
Design and administer networks based on Cisco, Arista and Huawei technologies. Develop and implement Ansible and AWX playbooks for swift network device provisioning. Provide 24/7 on-call support, promptly addressing and resolving network incidents.
We are a leading trading platform that is ambitiously expanding to the four corners of the globe.
Design and manage AWS infrastructure for AI services.
Implement Infrastructure as Code using Terraform.
Collaborate with cross-functional teams to enhance performance.
Jobgether uses an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Design scalable, future-proof data platforms optimized for AI research workloads.
Build efficient self-serve data processing pipelines leveraging GCP's advanced services.
Implement guardrails for cost, quality, and performance.
AssemblyAI is at the forefront of Speech AI, creating powerful models for speech-to-text and speech understanding via an API. They're a remote team of startup veterans and AI researchers looking to build one of the next great AI companies.
Responsible for automating infrastructure, maintaining system reliability, and bridging the gap between operations and database management. Design, deploy, and manage scalable infrastructure on Google Cloud Platform (GCP). Implement and maintain CI/CD pipelines for seamless deployment.
Miratech is a global IT services and consulting company that brings together enterprise and start-up innovation to support digital transformation.
Building world-class AI infrastructure to support a 100+ person research team.
Designing and scaling multi-cloud systems that support high-performance model training and inference.
Improving monitoring, alerting and system observability for AI workloads.
Canva is redefining how the world experiences design. They have campuses in Sydney and Melbourne, co-working spaces in Brisbane, Perth, Adelaide and Auckland, and trust their employees to choose the balance that empowers them and their team to achieve their goals.
Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.
NICE software products are used by 25,000+ global businesses to deliver extraordinary customer experiences, fight financial crime and ensure public safety.
Deploy and monitor machine learning models in production using tools like Docker, Kubernetes, and MLflow to ensure scalability and reliability. Build and maintain data pipelines using Airflow, Spark, or Kafka to support model training and inference. Integrate ML models into business applications, collaborating with software engineers to operationalize solutions.
Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.
Be a keen learner, working with cloud-native, highly scalable infrastructure and gaining expertise in container orchestration, networking, and observability.
Be a passionate problem solver, tackling scalability, reliability, and troubleshooting challenges in distributed systems.
Be a great communicator, engaging directly with developers, engineering teams, and product teams to understand infrastructure challenges and provide solutions.
Temporal provides an open-source programming model that simplifies code, improves application reliability, and helps developers focus on delivering features faster. They aim to be the reliable foundation of every developer’s toolbox and value curiosity, drive, collaboration, genuineness, and humility.