Site Reliability Engineer

Site Reliability Engineer Job Description Template

Our company is looking for a Site Reliability Engineer to join our team.

Responsibilities:

  • Participate in IT operations support as a part of SRE team to meet availability requirements;
  • Work closely with Monitoring team; react to, troubleshoot and fix whatever unexpected issues arise daily;
  • Manage resource usage and capacity for runtime environment;
  • Our tools and stack: Linux/SLES; Xcat/Puppet; Nginx/PHP/Java; MySQL, Memcached and MongoDB; Zabbix and Cacti; Dell and HP server hardware;
  • Work with Professional Services team to deliver effective 2nd and 3rd line support;
  • You’ll effectively investigate, resolve and mitigate service-impacting events for applications owned by the Browse team;
  • Work with other software engineers to enable effective delivery of 4th-line support;
  • We expect you to contribute to the team culture and help us keep things functional, and fun;
  • Reduce the number of DevOps resources in projects by enabling developers to be more self-service;
  • You will take charge of the team’s Docker and K8’s assets, and educate the engineers on their use;
  • Support customers and internal teams;
  • Respond in a timely manner to any disruption affecting our customers;
  • POC’ing new ideas and products, and balancing trade offs between technical, analytical and product needs;
  • Assist teams on making the platform components production-ready and provide support on IT-related issues;
  • Improving operational visibility around core products such as tracking technologies, improved metrics etc.

Requirements:

  • Advanced troubleshooting skills;
  • Go, Node.js and Bash;
  • Solid knowledge of host-side TCP/IP stack;
  • Extensive system administration experience (ideally 5+ years, with emphasis on supporting web stacks);
  • Experience with algorithms, data structures and software design;
  • Experience working with JVM language (Java, Scala, Kotlin);
  • Good knowledge of at least one programming language and the willingness to dabble in others (Go, Ruby, Python);
  • You have experience with DevOps culture and processes;
  • Ability to troubleshoot and tune performance of computer systems;
  • Strong cloud knowledge – ideally AWS;
  • Experience with Immutable infrastructure;
  • Linux containers and orchestration (Docker, Kubernetes);
  • Exposure to cloud IaaS (AWS, GCP or other relevant);
  • Solving novel problems from first principles;
  • Scripting/automation languages such as Powershell.