Define and implement SLIs, SLOs, and error budgets
Design and maintain comprehensive monitoring and alerting systems
Lead incident response and post-mortem processes
Implement automation to reduce toil and improve reliability
Conduct capacity planning and performance optimization
Build tools for deployment, monitoring, and troubleshooting
Collaborate with engineering teams on reliability requirements