Nov 29, 2024
Job Description
Business Function
Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels
Roles & Responsibilities
• Troubleshoot Recurring failures & participate in incident triages
• Troubleshoot issues, both from a production as well as a performance standpoint
• on-call to be able to respond during App failures
• Monitor critical applications and services to minimize downtime and ensure their availability
• Responsible for ensuring that the underlying infrastructure is running smoothly, and that systems and tools are working as expected
• Work across different teams, mainly operations and development
• Collaborate with developers to help with troubleshooting and provide consultation when alerts are issued
• Work on process improvements
• Guide the new team members, and mentor them as required
Requirements
• Total 3 to 4 years of proven work experience as a Site Reliability Engineer or similar role.
• Experience in UNIX Shell Scripting
• Experience in SQL
• Experience with data analysis
• Good understanding in HDFS, HIVE, SPARK, API (SOAP, and REST) and S3(good to have)
• Experience with Observability tools such as Grafana/Prometheus/App dynamics beneficial
• Collaborate and communicate asynchronously
• Excellent problem-solving skills and logical process thinking.
• Ready to work in 24x7 environment
• Ready to work in challenging work environment