Site Reliability Engineer (SRE)

Urgent

Apply for this job

Email *
Full Name *
CV Attachment *
Browse

Upload file .pdf, .doc, .docx

Job Description

The Site Reliability Engineer (SRE) will play a crucial role in ensuring the reliability and performance of our digital services. This position requires a solid foundation in software and systems engineering, a deep understanding of SRE principles, and expertise in instrumentation tools. The SRE will collaborate closely with engineering and product teams to implement scalable solutions that support our growing digital capabilities while maintaining a seamless and reliable user experience.

Main Responsibilities:

  • Service Reliability: Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) to ensure the reliability of services.
  • Incident Response: Lead the response to high-severity incidents, minimizing impacts and driving service restoration efforts.
  • Root Cause Analysis (RCA) / Postmortem: Conduct blameless postmortems after incidents, document lessons learned, and implement changes to prevent future occurrences.
  • Change Management: Review and approve changes to services, balancing risk minimization with the need for innovation and fast iteration.
  • Capacity Planning: Manage computing resources to maintain service performance as demand changes.
  • Performance Management: Monitor and optimize application and system performance, recommending code refactoring or architectural changes when necessary.
  • Automation: Automate repetitive tasks and manual processes to increase efficiency and reliability.
  • Tooling and Support: Develop and maintain internal tools for automation, monitoring, and alerting, using platforms such as Dynatrace, Splunk, Blue Triangle, Quantum Metrics, and Adobe Analytics.
  • Cross-Functional Collaboration: Work with development, operations, and other engineering teams to design and support scalable and reliable services.

Requirements:

  • Bachelor’s degree in Computer Science, Engineering, or a related field, with a minimum of 3 years of experience in SRE support or a related role.
  • An additional 3 years of relevant work experience may be substituted for the degree requirement, totaling 6 years of experience.
  • Proficiency with instrumentation tools and monitoring solutions like Dynatrace and Splunk.
  • Experience with scripting languages such as Python, Shell, or Perl.
  • Strong Unix/Linux experience.
  • Excellent communication skills.
  • Strong problem-solving and analytical abilities.
  • Customer-focused mindset, with a commitment to delivering exceptional service and ensuring member satisfaction.

Desirable Skills:

  • Experience with cloud environments.
  • Certifications in cloud services and instrumentation tools.
  • Familiarity with container orchestration technologies like Kubernetes.
  • Proven ability to troubleshoot and diagnose technical issues at scale.

Benefits:

  • Competitive Salary
  • Private Medical Insurance
  • Dental Insurance
  • Life Insurance
  • On-site Doctor
  • Telehealth Services
  • Additional Maternity Leave
  • Paternity Leave
  • Personal Time Off
  • Cafeteria
  • Solidarity Association
  • Transportation