Site Reliability Engineer Job Description Template
Our company is looking for a Site Reliability Engineer to join our team.
Responsibilities:
- Participate in IT operations support as a part of SRE team to meet availability requirements;
- Work closely with Monitoring team; react to, troubleshoot and fix whatever unexpected issues arise daily;
- Manage resource usage and capacity for runtime environment;
- Our tools and stack: Linux/SLES; Xcat/Puppet; Nginx/PHP/Java; MySQL, Memcached and MongoDB; Zabbix and Cacti; Dell and HP server hardware;
- Work with Professional Services team to deliver effective 2nd and 3rd line support;
- You’ll effectively investigate, resolve and mitigate service-impacting events for applications owned by the Browse team;
- Work with other software engineers to enable effective delivery of 4th-line support;
- We expect you to contribute to the team culture and help us keep things functional, and fun;
- Reduce the number of DevOps resources in projects by enabling developers to be more self-service;
- You will take charge of the team’s Docker and K8’s assets, and educate the engineers on their use;
- Support customers and internal teams;
- Respond in a timely manner to any disruption affecting our customers;
- POC’ing new ideas and products, and balancing trade offs between technical, analytical and product needs;
- Assist teams on making the platform components production-ready and provide support on IT-related issues;
- Improving operational visibility around core products such as tracking technologies, improved metrics etc.
Requirements:
- Advanced troubleshooting skills;
- Go, Node.js and Bash;
- Solid knowledge of host-side TCP/IP stack;
- Extensive system administration experience (ideally 5+ years, with emphasis on supporting web stacks);
- Experience with algorithms, data structures and software design;
- Experience working with JVM language (Java, Scala, Kotlin);
- Good knowledge of at least one programming language and the willingness to dabble in others (Go, Ruby, Python);
- You have experience with DevOps culture and processes;
- Ability to troubleshoot and tune performance of computer systems;
- Strong cloud knowledge – ideally AWS;
- Experience with Immutable infrastructure;
- Linux containers and orchestration (Docker, Kubernetes);
- Exposure to cloud IaaS (AWS, GCP or other relevant);
- Solving novel problems from first principles;
- Scripting/automation languages such as Powershell.