Senior Site Reliability Engineer

Senior Site Reliability Engineer Job Description Template

Our company is looking for a Senior Site Reliability Engineer to join our team.

Responsibilities:

  • Defining and implementing self-healing and self-management for the platforms;
  • Ensuring that the Error Budgets in place are tracked and defended;
  • Designing and measuring Service Level Objectives for our platforms ,ensuring that they are effective measures of our clients’ success;
  • Bringing expertise to bear on the design and engineering of the product to ensure reliability and high availability concerns are up front;
  • Maintaining and improving our observability tools such as Prometheus, Grafana, Thanos, and Splunk;
  • Facilitate stand-ups and discussions with developers, engineering teams and project managers;
  • Configure and deploy regular software releases using a continuous delivery pipeline;
  • Improve observability of our platform and applications to make troubleshooting process straightforward;
  • Ensure our engineering processes have a focus on security, scalability and performance;
  • Ensure the best practices of trustworthy computing and secure development and implementation lifecycle are adhered to;
  • Break requirements down into stories and tasks, along with work estimates;
  • You’d be reporting directly to our SRE Architecture and Development Lead;
  • Research and gather project requirements;
  • Provide expertise and guidance to design and develop a wide range of key systems;
  • Design, develop and implement solutions that improve the stability, scalability, availability, and performance of Cookpad’s Global service.

Requirements:

  • Clear understanding of SRE principles and eagerness to put them into practice;
  • History of mentoring junior engineering resources, and ability to influence across engineering teams;
  • Substantial experience in a platform engineering role, with exposure to infrastructure and middleware platforms;
  • Educated to Bachelor’s degree level or equivalent qualification/relevant work experience;
  • Sports and social activities;
  • Strong communication skills in English and building working relationships with coworkers in locations around the globe;
  • SRE/DevOps experience and comfortable operating software in a Linux based environment;
  • Interest-free loans to buy a bike or a season ticket, so it’s even easier for you to get to work and start making a difference;
  • Volunteering and charitable giving;
  • Passion for solving problems using open source software;
  • Learning and training opportunities, including coaching, mentoring, events, community meet ups and lots more;
  • Experience in software engineering and automation;
  • Familiar with at least one Cloud environment, for example, AWS, GCP, or Azure;
  • Strong coding skills in Ruby and Golang;
  • Flexible working and family friendly policies.