Description :
Title - Site Reliability Engineer
Experience = 3 - 8 years
Location - Delhi
Site Reliability Engineer Job Description
A Site Reliability Engineer (SRE) is a professional who acts as a bridge between development and IT operations, taking on operational tasks to ensure the efficient functioning of computer systems. They are responsible for monitoring, automating, and improving the reliability, performance, and availability of software systems.
Key Responsibilities:
Working on-call shift to prevent incidents from ever happening
Running infrastructure with Chef, Ansible, Terraform, GitLab CI/CD, and Kubernetes
Monitoring computer systems and building alerts for various operational issues
Administering production jobs
Understanding debugging info
"Draining" traffic away from a cluster
Rolling back a bad software push
Blocking or rate-limiting unwanted traffic
Bringing up additional serving capacity
Using monitoring systems (for alerting and dashboards)
Error Budgets and Learning from Failure:
Learning from failures and helping the rest of the software development and delivery team do the same
Implementing error budgets to keep metrics like customer satisfaction and usability above an acceptable level
Measurement and Enhancement of Simplicity:
Striving for simplicity through efforts and involvement in collaborative discussions with other teams
Measuring simplicity in terms of training time, explanation time, administrative diversity, and age of systems
Prakhar Softwares Solutions is a CMMI Level 3 , ISO 9001:2015, ISO 27001:2013 certified company dealing in multiple projects including software development, Staffing Management, Recruitment Process Outsourcing, E-governance. We have 10 offices across India and working for various e-governance projects of National Importance.