Monitor cloud infrastructure and application health using observability tools; respond to alerts.
Perform Tier 1 incident triage, document findings, and escalate appropriately to Development or SRE teams.
Monitor and support CI/CD pipelines to ensure successful builds and deployments.
Lumin Digital empowers credit unions and banks by creating cutting-edge digital experiences. They are a trailblazer in digital banking solutions with a culture that fosters trust, respect, and boldness, encouraging team members to explore and experiment with new ideas.
Own the technical relationship for a portfolio of monitoring company customers, acting as their trusted advisor and escalation point
Partner with customers to design, launch, and optimize call flows, including telephony routing, failover strategies, and integration patterns
Proactively monitor system performance, identify risks, and drive improvements to reliability, latency, and call success rates
RapidSOS is the leading public safety AI company that unlocks mission-critical intelligence for first responders and security teams – enabling faster, smarter and more accurate emergency response. They are in an exciting phase of growth, welcoming new members from across the globe to their mission-driven, ambitious, and inclusive team.
Provide operational leadership to multiple IT operations teams, ensuring effective service delivery.
Manage workforce direction, oversee technical processes, and drive improvement initiatives.
Manage and report on operational network statistics including KPIs, SLAs, and EOL configuration.
Jobgether is a platform that connects job seekers with potential employers. They leverage AI to match candidates with suitable roles and streamline the hiring process.
Develop and maintain observability solutions using platforms like Datadog, Prometheus and Grafana
Take a leading role in incident management, including coordinating response efforts, troubleshooting issues, and identifying follow-up actions
Partner with product engineering teams to architect reliable systems, recover from incidents, and learn from mistakes
Ditto is redefining how data moves at the edge, aiming to make resilient, real-time applications seamless for developers, regardless of network conditions. It's a globally distributed and fast-growing startup with over $145 million in funding that is committed to building a diverse and inclusive team.
Execute expert-level real-time monitoring and incident dispositioning for critical client applications.
Correlate complex data across metrics, traces, and logs to perform deep-dive root cause analysis.
Lead the triage of complex alerting environments to filter noise and ensure that high-priority incidents are managed.
Atmosera empowers businesses to redefine what's possible with modern technology and human expertise. They enable organizations to accelerate innovation, enhance security, and optimize operational agility as a Microsoft Partner.
Deploy standard infrastructure components and assist in cloud computing architectures and identity migrations.
Execute infrastructure tasks using scripting and assist in managing VDI and computing infrastructure in Azure.
Resolve alerts/tickets in a timely fashion and participate in the On-Call rotation; support root-cause analysis activities.
Aledade exists to empower the most transformational part of our health care landscape - independent primary care. They were founded in 2014 and have become the largest network of independent primary care in the country, helping practices, health centers and clinics deliver better care to their patients and thrive in value-based care. Aledade has a collaborative, inclusive and remote-first culture.
Implementing the improvements to the reliability, fault tolerance, scalability, and performance of our infrastructure
Managing incidents using your technical know-how to involve the appropriate teams and automate away manual practices
Improving observability across our systems (metrics, logs, tracing) to reduce time to detection and resolution
Newton is changing how Canadians trade crypto with the goal to make financial freedom achievable for everyone by giving their customers the tools and knowledge needed to navigate the crypto world. They are a remote team spread across Canada that values pushing boundaries and getting things done.
Provide production support on a shift according to the team on-call roster.
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support.
Continuously monitor the health and performance of our services, systems, and infrastructure.
Granicus is driven by the excitement of building, implementing, and maintaining technology that is transforming the Govtech industry by bringing governments and its constituents together. They have served 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers.
Manage and support infrastructure for Growth teams, including Nomad, Hashistack, databases, and any other underlying systems
Maintain and troubleshoot GitLab CI pipelines, ensuring reliable and fast build, test, and deployment cycles
Provide operational support across Onboarding, Acquire, and Engage teams, helping debug issues in staging and production environments
Kraken is a mission-focused company rooted in crypto values, aiming to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. As a fully remote company, they have Krakenites in 70+ countries who speak over 50 languages.
Daily operations of the MCC, including monitoring live services and managing incident response.
Responsible for customer requests and tickets within committed SLA response times.
Create, refine, and follow policies and procedures for incident management, escalation, and communication.
Rocket Science Group is a co-development game studio specializing in multiplayer, platform services, publishing technology, and live operations for PC, console, and mobile titles. They have teams in Europe and North America and work in partnership with the game industry’s top creators.
Monitor ticketing system service boards, inbound email, and receive inbound phone calls.
Act on requests and notifications in accordance with defined procedures, which vary by client.
Self-manage assigned tasks in accordance with Service Desk guidelines.
Trace3 is a leading Transformative IT Authority, providing unique technology solutions and consulting services to its clients. They employ more than 1,200 people all over the United States and their culture embodies the spirit of a startup with the advantage of a scalable business.
Manage event and information intake, including intelligence reports and monitoring ticket queues.
Triage alerts and correlate and analyze events to determine the scope of cybersecurity incidents.
Provide 24x7 on-call support and monitor and manage security incidents using SIEM, SOAR, and DLP tools.
Brightspeed provides fast, reliable internet connections and an awesome customer experience in twenty states throughout the Midwest and South. Backed by funds managed by Apollo Global Management, they are accelerating the upgrade of copper to fiber optic technologies.
Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
Participate in an on-call rotation and act as incident commander for high-severity production events.
Partner with engineering teams to build reliability into new features before they ship to production
Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.
Define and evolve reliability standards for the SmarterDx platform.
Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.
SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.
Support Engineering and Platform automation efforts with development and scripting skills.
Automate operational processes using scripting languages.
Develop, implement, and continually improve system and network monitoring and alerting capabilities and procedures.
Cotiviti is focused on providing payment accuracy and analytics-driven solutions that drive measurable results. They offer team members a competitive benefits package and has a culture of valuing individual qualifications without regard to race, gender, or other protected characteristics.