Jobs Similar to Platform Support Engineer (APAC)

Platform Support Engineer (APAC)

Lightning AI 21 days ago

APAC

Partner directly with customer engineering teams running training and inference workloads in production.
Investigate failures involving distributed training, Kubernetes orchestration, GPU allocation, networking, and storage systems.
Identify recurring patterns across customer issues and drive long term reliability improvements.

Kubernetes PyTorch CUDA Linux

View details

9 jobs similar to Platform Support Engineer (APAC)

Jobs ranked by similarity.

AI Compute and Infrastructure Engineer

Kraken 30 days ago

Global

Own and operate GPU and accelerator clusters for AI training, inference, and experimentation, ensuring reliability and cost-efficiency.
Build and optimize scheduling, orchestration, and serving systems using frameworks like vLLM and Triton to improve latency, throughput, and memory efficiency.
Partner with ML engineers to remove workflow bottlenecks and build observability for GPU utilization, capacity, and incident response.

Kraken is a crypto exchange platform building premium financial products for traders and institutions, accelerating global crypto adoption. It is a mission-driven, fully remote company with a world-class team of crypto experts spread across more than 70 countries.

View details Similar jobs

Infrastructure Engineer (Observability)

Lightning AI 19 days ago

$180,000–$200,000/yr

Own and evolve a scalable observability platform spanning metrics, logs, traces, and events.
Design telemetry pipelines ingesting data from GPUs, CPUs, networking, containers, APIs, and BMC/Redfish.
Design and implement noise-resistant alerting systems to improve signal quality and reduce operational load.

Lightning AI builds an end-to-end platform for developing, training, and deploying AI systems, designed to take ideas from research to production with less friction. They combine developer-first software with cost-efficient, large-scale compute, serving solo researchers, startups, and large enterprises.

View details Similar jobs

SRE

Fal 12 days ago

$180,000–$250,000/yr

Own and operate our Kubernetes infrastructure.
Build and maintain CI/CD pipelines and deployment infrastructure.
Leverage AI to automate analysis and resolution of production issues.

Fal is the generative media ecosystem powering the next generation of AI products. They build the infrastructure, tools, and model access that teams need to move from idea to production.

View details Similar jobs

Model Performance Engineer

Fathom 13 days ago

Benchmark FP8 quantization across GPU families and ship a production config to achieve speedup.
Evaluate serving frameworks with speculative decoding to improve performance.
Build a fine-tuning pipeline to enable faster model training and deployment.

Fathom eliminates the needless overhead of meetings with an AI assistant that captures, summarizes, and organizes key moments. They are a small company that creates magical experiences through focused builders and values a supportive environment.

View details Similar jobs

Staff Software Developer, Machine Learning

Kinaxis 12 days ago

Canada

Define, drive, design, and build/ship end-to-end solutions that solve real customer problems.
Contribute to the end-to-end AI/ML software development lifecycle, ensuring reproducible research.
Drive architecture, design, and delivery of advanced ML systems in the Product R&D team.

Kinaxis is a global leader in modern supply chain orchestration. Known for its AI-infused platform and transparency across end-to-end supply chains, Kinaxis helps customers make faster, better decisions. The company has over 2000 employees worldwide and is recognized with Top Employer awards.

View details Similar jobs

AI Solution Architect

Gcore 28 days ago

Poland

Design and deploy GPU cluster architectures using tools like Ansible, Terraform, Kubernetes, and Slurm.
Lead technical deep-dives, workshops, and present solutions to stakeholders, translating complex concepts.
Automate provisioning and monitoring with Infrastructure as Code, and produce documentation and training materials.

Gcore is a global provider of infrastructure and software solutions for AI, cloud, network, and security, powering digital experiences worldwide. The company collaborates with leading technology partners and employs over 550 professionals building foundational technologies.

View details Similar jobs

Technical Support Engineer

Mirantis 26 days ago

Maintain the reliability and performance of customer environments remotely, supporting Mirantis Opensack/k0s layers.
Diagnose and resolve system-level issues, requiring hands-on Linux administration experience.
Troubleshoot customer environments based on Linux, OpenStack, Kubernetes, networking, and other cloud technologies; detect, report, and resolve issues.

Mirantis helps enterprises move to the cloud on their terms, delivering a true cloud experience on any infrastructure, powered by Kubernetes. They serve many of the world’s leading enterprises and value openness, collaboration, risk-taking, and continuous growth.

View details Similar jobs

Principal GenAI Platform Engineer (US)

PointClickCare 11 days ago

$179,000–$199,000/yr

Design, build, and maintain the core infrastructure layer supporting GenAI products.
Implement secure access controls and authentication mechanisms integrated by default into the AI platform components.
Develop and manage observability, monitoring, and logging solutions for GenAI workloads and infrastructure.

PointClickCare is a healthcare technology company. This team will serve as the product owner for GenAI capabilities, closely integrated with key horizontal partners to ensure delivery of safe, scalable and high-impact AI Products.

View details Similar jobs

Sr. Product Marketing Manager - NVIDIA Partnership

MinIO 11 days ago

Own the messaging and content that defines MinIO's role in the NVIDIA AI Factory across NVIDIA products.
Develop the technical positioning and content for MinIO's integrations with NVIDIA technologies.
Build solutions content that shows how MinIO and NVIDIA infrastructure solve specific customer problems.

MinIO is the industry leader in high-performance object storage. It is the company behind the world’s fastest, most widely deployed object store, powering production infrastructure for more than half of the Fortune 500. The enterprise offering, AIStor, is engineered to handle the scale, speed, and pressure of modern AI and analytics, from terabytes to exabytes, all in a single namespace.

View details Similar jobs

Source Job