Auto healing monitoring alert system : part 03

Sumudu Nissanka
4 min readJan 21, 2019

(My Experience with Icinga2 and Capistrano3)

Hi my friends,

As I promised in my previous blog, Today I bring to you explanation of my previous code. I believe you may get some idea at the end of this article. Before moving to the explanation i would like to say you, you can make any change to this code as your need of service. In here i mainly focus on two option about monitoring servers in icinga. One is considered for nrpe checks and other is considered without nrpe checks.

Main Code

First two lines called API for getting the all host name and their ip address which are monitor from icinga2 server.

we receive json object as a result of API call . Going through the json object we filter host names and their IP address. After all details append to the ruby dictionary.

Above mentioned code represents the API call for getting service which are not in OK status.

Services are checked using nrpe commands and some of them are checked without nrpe commands. If it is nrpe command it executes selectservice() function, if not it will run withoutnrpeservice() function.

Function 01

This is the first function called sendemail(). This function uses to send the email notification to the team. It has 3 parameters.

Service: this used to attach the which service has to be getting recover from remotely in the email.

Host: This for getting to know about which host has faced the trouble

Name: important variable this variable uses to getting recover status from service, below shows how to apply name variable.

call icinga2 API for getting service details after apply the solution

uri = URI.parse(“https://192.168.32.40:5665/v1/object/services?service=#{name}")

Using this API we can get recover status of given service that is why we used name variable of the service. Make sure to change the IP address related to icinga2 server.

Importance

  • Change the mail server credentials
  • Add email address for from and to fields

Function 02

Function for select solution for if its check under nrpe commands

This function is with 3 parameters.

service: This variable is used to find the solution to the service , as a condition input for the if condition

hostip: This for identify which server to remotely access.

servicename: service name normally comes with hostname with service type. This uses for sendemail function to get service status.

**use sleep time for 60 seconds because icinga2 takes 40s to get its next monitoring process.

Importance

Inside the if-elsif condition keep this code block

Here I used sample test cases for testing purposes.

Function 03

This function for service check without nrpe commands. This also same as the previous function

Steps and changes for new service apply.

  1. First check whether it is related to nrpe commands or not. If it is nrpe commands choose the function for edit selectservice() if not choose withoutnrpeservice() function.
  2. If you select selectservice() , edit the if condition statements with your service nrpe commands
  3. If you select withoutnrpeservice() put check_commads value to the if condition.

4. After that apply the above mentioned code blocks to the body under the if statements.

I think you may have some idea about the code. Hope to see you guys with my new article series.

my previous articles

Thank you. :)

--

--

Sumudu Nissanka

Software Engineer @wso2 | Graduate @University of Colombo School of Computing | Former DevOps intern @wso2