Service Reliability Engineer - Monitoring

Total package
Competitive salary DOE plus additional benefits

Contract
Permanent

Location
Manchester, UK

Shift pattern
Flexible working between the hours of 08:00 – 18:00. Times to be agreed with Line Manager

You’ll want to know about the department that our role is in…

Our Service Reliability Team bridge the gap between operations and software developers. They encompass a wide range of responsibilities including managing and monitoring system availability, performance, efficiency, incident response as well as ensuring that software is deployed properly and delivering a reliable service to end users so when the finished product is ready for production, there are no surprises.

The department operates between the hours of 08:00 and 18:00, Monday to Friday. The breakdown of your normal hours of work will be by agreement with your Line Manager.

The role:

An SRE Monitoring Engineer is responsible for ensuring that all systems, applications, and networks function efficiently by continuously monitoring their performance, availability, and security. They set up and maintain monitoring tools, insights, and alerts to ensure software applications and systems are running properly. The focus is on systems and application monitoring (log, metrics events), covering existing and open-source monitoring tooling.

Tasks & responsibilities include:

  • Implement and monitor system checks for early detection of potential problems
  • Develop visualizations in Grafana and Azure Application Insights for end-user experience, application, infrastructure, and security
  • Apply strong technical skills and good business knowledge together with investigative techniques to identify and resolve issues efficiently and in a timely manner.
  • Work on initiatives and continuous improvement processes around proactive application health, monitoring, reporting, and technical support.
  • Act proactively and help organizations uncover performance bottlenecks across the system.

The successful candidate will have:

  • Experience working with various monitoring tools (Grafana, Application Insights, etc)
  • Hands-on experience in designing and building dashboards
  • Hands-on experience with setting up and assisting incident management workflows
  • Automation skills and the ability to automate a full DevOps/GitOPS pipeline. Must understand infrastructure and configurations, CI/CD pipelines, app performance monitoring, and more.
  • Good technical knowledge in implementing, troubleshooting, and performance tuning of hardware, operating system, and system services.

Additional ‘desirable’ but not essential skills:

  • Terraform
  • Kubernetes
  • Progressive Delivery tooling (eg Argo, flux)
Apply now

About the Travel Innovation Group

As renowned travel industry heavyweights (if we do say so ourselves), the Travel Innovation Group offer a wealth of unique services via our three companies; Lime, Aviate and Calrom.

So what exactly do we do? It all began with our boutique service and market-leading tech, connecting the travel trade with the world’s leading airlines and most recognisable travel brands. Thanks to this foundation, our growth continues to skyrocket with new, exciting products launching regularly, from cruise packages to luxury hotel booking services. What can we say – we’ve always been innovators at heart!

Our offices, people and partners now span the globe, but the hub of the action remains at our thriving Cheshire Oaks (UK) HQ and we’re looking for exceptional talent to work with us, succeed with us and grow with us.