As a Blockchain Site Reliability Engineer (SRE), you will be responsible for ensuring the reliability, availability, and performance of blockchain nodes and related infrastructure. Youβll monitor, troubleshoot, and resolve incidents in production environments, while also building automation tools to improve efficiency and reduce operational risks. This role requires strong Linux system expertise, solid on-call and incident response experience, and the ability to work under pressure to quickly restore services. Youβll also collaborate with protocol engineers and open-source communities to ensure smooth upgrades and long-term system stability.
Key responsibilities include deploying, monitoring, and maintaining blockchain nodes across multiple networks, ensuring system reliability and uptime by actively managing incidents, troubleshooting, and resolving node failures. You will also develop automation and maintenance tools (using Golang, Shell, Python, etc.) to streamline operations and build and maintain monitoring, alerting, and logging systems to proactively detect and address issues.