Alerting is an important feature in monitoring when it comes to maintaining site reliability, and Prometheus is being used widely for this. Hence it becomes very important to be able to check the correctness of the alerting rules. Prometheus lacks any good and convenient way of visualising and testing the alert rules before it can be used.

There are many long standing issues and feature requests regarding the above, and many others, and my GSoC aims to solve some of them.

Deliverables

From cncf/soc
  • Persist “for” state for alerts [1]
  • Label Values Composite Index (TSDB) [1]
  • Unit testing for alerts in promtool [1]
  • Features for building and testing alert expressions [1] [2]
Nice to have
  • More features in TSDB CLI for easy debugging [1]

Student

Ganesh Vernekar

Mentors

  • gouthamve
close

2018