Service Reliability Engineer - Monitoring


Manchester, UK

Shift pattern
Flexible working between the hours of 08:00 – 18:00. Times to be agreed with Line Manager

Competitive DOE

About us and the department this role sits in:

As a fast-growing SaaS company, Calrom has a proven record of delivering innovative software for international airlines. We’re part of the Travel Innovation Group, made up of Lime, Calrom & Aviate who all provide unique services to the travel trade.

Our Service Reliability Team bridge the gap between operations and software developers. They encompass a wide range of responsibilities including managing and monitoring system availability, performance, efficiency, incident response as well as ensuring that software is deployed properly and delivering a reliable service to end users so when the finished product is ready for production, there are no surprises.

The role:

An SRE Monitoring Engineer is responsible for ensuring that all systems, applications, and networks function efficiently by continuously monitoring their performance, availability, and security. They set up and maintain monitoring tools, insights, and alerts to ensure software applications and systems are running properly. The focus is on systems and application monitoring (log, metrics events), covering existing and open-source monitoring tooling.

Tasks & responsibilities include:

  • Implement and monitor system checks for early detection of potential problems
  • Develop visualizations in Grafana and Azure Application Insights for end-user experience, application, infrastructure, and security
  • Apply strong technical skills and good business knowledge together with investigative techniques to identify and resolve issues efficiently and in a timely manner.
  • Work on initiatives and continuous improvement processes around proactive application health, monitoring, reporting, and technical support.
  • Act proactively and help organizations uncover performance bottlenecks across the system.

We think you’ll be a great fit if you are:

  • Experienced working with various monitoring tools (Grafana, Application Insights, etc)
  • Have hands-on experience in designing and building dashboards
  • Have hands-on experience with setting up and assisting incident management workflows
  • Automation skills and the ability to automate a full DevOps/GitOPS pipeline. Must understand infrastructure and configurations, CI/CD pipelines, app performance monitoring, and more.
  • Good technical knowledge in implementing, troubleshooting, and performance tuning of hardware, operating system, and system services.
  • Skills in Terraform, Kubernetes or Progressive Delivery tooling (e.g. Argo, flux) are desirable but not essential.

What you’ll get:

  • Flexible and home working policies.
  • 33 days annual leave, including bank holidays.
  • 5% matched company pension contribution.
  • Modern offices with great spaces to kick back and relax – there’s even an on-site gym and bar in our Cheshire Oaks office!
  • Internal training academy to support your learning and personal development.
  • Additional company benefits to support your wellbeing and happiness!