Auto healing monitoring alert system : part 01

Sumudu Nissanka
3 min readDec 15, 2018

(My Experience with Icinga2 and Capistrano3.)

Hi Friends,

Are you fed up with deploy server error manually when it became fail? Are you still searching the way to recover your server automatically? Auto-healing monitoring alert system is there for you :).

Before moving to my experience let me to do small introduction of my main roles Icinga2 and Capistrano 3.

Icinga2

As a DevOps engineer, you may use icinga2 as your monitoring tool. It is an open sources monitoring system that helps you to check your network availability. C++ uses for writing the icinga2. You can use Linux/UNIX and Windows for building it. In here I chose icinga2 as my monitoring tool because Icinga2 gives you a lot of features to make your works easy. Among them, I used their API feature more than others to build up my system.

Icinga2 icon

Capistrano 3

Why did I choose Capistrano ? Capistrano is an open source tool for running script remotely in the servers. Capistrano can deploy the web application on n number of the server at the same time. It helps you to go through the previous log. In addition to that, it executes the task on the server automatically. This features make developers work easy and save their time. Because they don’t need to going through long procedure when server face error. Mostly Capistrano is based on ruby language.

Capistrano

Auto-healing monitoring alert system is automating the handle the solution for specific server errors without involving the team members support and sending a recovery notification to team members including the error type and solution that apply to the server to recover the error. You may have an idea about the system after going through the below system diagram.

System Diagram

Summary of the diagram

In here Icinga2 server and Capistrano Server installed in two instances. We wrote the ruby file for inside the Capistrano server as a deploy file. Using cronjob or daemon the ruby deploy file can be run inside the Capistrano server. In here we called icinga2 API for getting the services which are not in OK state. The first API call shows in the light green arrow and after having details form API call, solution can be applied to the remote server. It shows in maroon colour arrow. After executing the solution task inside the remote server, process take 50 or 60 seconds time for generating the email notification. It shows in red colour arrow.

  1. Icinga2 master server is monitoring the clients
  2. Capistrano server calls the icinga2 through API to get details about service which are not in OK status.
  3. Capistrano server receive details from icinga2
  4. Capistrano server select the proper solution task from task file
  5. Capistrano server runs task file remotely in the client servers according to the details.
  6. Capistrano server calls the icinga2 via API to receive current service status.
  7. Icinga2 server sends the recovery status and details according to service.
  8. Capistrano server send email notification to the team.

Steps for installation

  1. We supposed to install icinga2 and Capistrano in two instance.
  2. Enable Icinga API feature in icinga2 server. More details followed the documentation Icinga 2 API
  3. Keep the ruby file inside the Capistrano server [path : Capistrano Server/config/deploy/ruby file(auto healing file) ]
  4. Create a daemon or cronjob for execute the ruby file.

I hope to guide you to build this system by yourselves, keep in touch with my next article :) :) .

--

--

Sumudu Nissanka

Software Engineer @wso2 | Graduate @University of Colombo School of Computing | Former DevOps intern @wso2