Job Description
About the Role:
Please note: It’s a fully onsite role (5 days a week in the office)
We’re looking for an experienced Sr. DevOps/Site Reliability Engineer to build and optimize scalable, resilient cloud infrastructure. You’ll partner with development teams to improve automation and CI/CD, while also owning incident response and system reliability. This includes monitoring, troubleshooting, and ensuring our services remain highly available and performant.
A day in life of our Sr. DevOps/SRE:
-
Respond to monitoring alerts, participate in incident calls, and guide them to resolution.
-
Collaborate with software development teams to facilitate their daily operations.
-
Design, configure, and optimize CI/CD pipelines.
-
Build, monitor, and maintain a resilient and scalable infrastructure.
-
Maintain documentation for processes, architectures, and configurations
Qualifications
Who we are looking for:
-
Strong analytical and troubleshooting skills.
-
Hands-on experience with AWS CloudOps.
-
Understanding of cloud security best practices and industry standards.
-
Participate in an on-call rotation schedule.
-
Minimum of 7 years in a DevOps / SRE role.
-
7 years working with Linux and Windows systems.
-
3 years of advanced knowledge in Terraform module development.
-
3 years of production experience with Docker and Kubernetes (EKS).
-
5 years expertise in AWS services (EC2, RDS, S3, ElastiCache, WAF, CDN, Route 53).
-
Experience in cloud networking (Transit Gateway, subnets, routing, security groups).
-
Strong knowledge of Jenkins and GitLab.
-
Hands-on experience configuring IIS, NGINX, or other web servers.
-
Proficient with monitoring solutions (Zabbix, Prometheus, Grafana, etc.).