Job Description
As a Senior Incident Manager, you will lead Databricks’ most critical production incidents while providing clear, accurate, and timely communication to customers, executives, and engineers. You’ll serve as both incident commander and reliability engineer; orchestrating multi-team responses, driving real-time status updates, and partnering with engineering to analyze and prevent failures. Your work will ensure Databricks maintains not only technical resilience but also customer and stakeholder confidence during high-impact events.
You will lead critical incidents — coordinate multi-disciplinary response efforts across Databricks’ cloud-based services to rapidly mitigate impact and restore operations. You will also drive technical root cause analysis and Reliability improvements, collaborating with engineering teams to trace and document underlying causes across distributed systems, services, and data stores.
About Databricks
Databricks is the data and AI company, with more than 10,000 organizations worldwide relying on its Data Intelligence Platform to unify and democratize data, analytics and AI.