Job Description

Implementation of complex computational algorithms on GPU and CPU with demanding latency and throughput requirements. Refactor existing solutions to improve their scalability. Commercial experience in developing and debugging high-performance GPU and CPU applications with strong focus on latency and throughput. Hands-on experience with third-party libraries and designing custom CUDA kernels. Proficient with profiling and performance analysis tools (Nsight Systems, Nsight Compute, nvprof). Solid understanding of data structures, algorithms, and object-oriented programming in C++. Proven ability to work effectively in remote or hybrid teams with variable, project-based responsibilities. Curiosity and proactive engagement with emerging trends in GPU/HPC/ML, continuously seeking to learn and apply new techniques.
Apply for This Position