Skip to content

Systems Reliability Engineer


  • £40000 - £45000 per annum
  • Maidenhead
  • Posted: 31/07/2018
  • Permanent
  • Job Ref: 216102610

Job Details

Systems Reliability Engineer (SRE)

Location: Maidenhead, Berkshire

Responsibilities:

• Help to eliminate operational toil - seek to automate repetitive operations work

• Work with product development teams to ensure that our new features are able to meet SLAs

• Help mature the delivery process for teams; defining Jenkins pipelines, designing canary release deploys, building in automated fallbacks or optimizing the build chain, you help craft the appropriate solution for the product

• Work as part of the team transitioning existing systems from an hosted environment to AWS

• Optimize product service code to ensure that it's secure, scalable and performant

• Optimize testing capabilities to increase the assurances we have with each release

• Improve the fault detection for our services

• Create dashboards which help communicate the metrics for a given product service

• Work with product owners and product engineering teams to perform capacity planning

• Work with product engineering teams to understand performance and behavior patterns

• Be part of an on-call rotation for alerts that require engineering expertise to diagnose

• Help carry out root cause analysis for incidents, and design solutions (both software and human processes) that will help to ensure the same problem doesn't happen in the same way again

Critical Skills / Competencies:

• A positive attitude and willingness to learn

• Expertise in one or more of the following languages: Python / Go / Java / C# / C / C++

• A solid understanding of data structures and algorithms

• Experience with IaaS and Serverless services from a cloud provider

• A strong understanding in TCP/IP, DNS and experience designing networks

• Linux system administration experience

• Strong conflict resolution competence

• Excellent written and verbal communication skills

• Experience implementing fault detection, and automating fixes

• Experience designing scalable services

• Experience designing distributed, fault-tolerant systems

• A good understanding of SQL databases

• Experience managing services in AWS

• Detail oriented. The ideal candidate is one who naturally digs as deep as they need to understand the why


Apply Now