Job Description

As a Site Reliability Engineer (SRE) at Alpaca, you will be responsible for ensuring the reliability, scalability, and performance of our systems and services. You will work closely with development, operations and DevOps teams to build and maintain robust applications, ensuring they run smoothly and efficiently. This role requires a blend of software engineering and operations skills, with a strong ability to troubleshoot technical issues and resolve problems before they impact our users.

You will triage difficult technical problems and implement solutions. Enhance our RabbitMQ and Redpanda observability stack by defining Service Level Objectives (SLOs) and alerts, as well as implementing profiling and logging. Improving our RabbitMQ and Redpanda clients' reliability. Also, you will respond to and resolve incidents in a timely manner and implement improvements, monitor system capacity and performance, making recommendations and implementing changes to handle future growth.

About Alpaca

Alpaca is a US-headquartered self-clearing broker-dealer and brokerage infrastructure for stocks, ETFs, options, crypto, fixed income, 24/5 trading, and more.

Apply for This Position