Site Reliability Engineer (SRE)

Mumbai | Remote (India)Full-timeSeniorCloud/DevOps
SLIs/SLOs, incident response, reliability engineering.

About the Role

Ensure the reliability, availability, and performance of our production systems. You'll design SLOs, implement monitoring strategies, and lead incident response efforts.
What You'll Do
What You'll Do
Check Icon
Define and implement SLIs, SLOs, and error budgets
Check Icon
Design and maintain comprehensive monitoring and alerting systems
Check Icon
Lead incident response and post-mortem processes
Check Icon
Implement automation to reduce toil and improve reliability
Check Icon
Conduct capacity planning and performance optimization
Check Icon
Build tools for deployment, monitoring, and troubleshooting
Check Icon
Collaborate with engineering teams on reliability requirements
What You'll Bring
Check Icon
5+ years of SRE or production operations experience
Check Icon
Strong programming skills (Python, Go, or similar)
Check Icon
Deep understanding of distributed systems and microservices
Check Icon
Experience with observability tools (Prometheus, Jaeger, ELK)
Check Icon
Knowledge of incident management and on-call practices
Check Icon
Understanding of chaos engineering and reliability testing
Check Icon
Experience with cloud platforms and auto-scaling
What You'll Bring
Nice to Have
Nice to Have
Check Icon
Experience with large-scale distributed systems
Check Icon
Knowledge of performance engineering and optimization
Check Icon
Familiarity with machine learning for operations (AIOps)
Check Icon
Understanding of disaster recovery and business continuity
Why Join StackBinary?
Check Icon
Flexible working hours
Check Icon
Remote-friendly culture
Check Icon
Learning & development budget
Check Icon
High-ownership projects
Check Icon
Pragmatic engineering culture
Check Icon
Work with cutting-edge tech
Why Join StackBinary

Ready to Apply?

Join our team of builders who love shipping quality software.
Questions about this role?
careers@stackbinary.io