Monday, 19 June 2017

Cloud Monitoring : Nagios vs. Prometheus

Written By: Amol Khanorkar

What is monitoring and why it is needed in Cloud-Computing ?

Monitoring of infrastructure resources in clouds computing plays a crucial role in providing guarantees of up-time in application's performance, availability, and security. As we are moving from traditional servers to cloud servers and infrastructure which provides an opportunity to take advantage of the scaling of Infrastructure vertical as well as horizontal as most of the things are moving to the cloud computing era. 

Here, we need to make sure that the applications should be up and running without facing any trouble and fulfill the end-user's request with great Performance and Highly available in time. For this, we need to use monitoring service to ensure all the services are up and running well. Not often discussed, server monitoring has only gained in importance with the move to the cloud. 

In this blog, we are going to see the features of two monitoring tools Nagios & Prometheus. Also, their features.

In short, why monitor ? 
- Know when thing goes wrong
- Be able to debug & gain insight
- Trending to see changes over time, and drive technical/business decision

What is Nagios ? 
Nagios is recognized as the top solution to monitor servers in a variety of different ways.  Server monitoring is made easy in Nagios because of the flexibility to monitor your servers with and without agents.

Nagios provides monitoring solution largely in below areas,
  • Network Monitoring
  • Server Monitoring
  • Application Monitoring
Nagios provides complete monitoring of network protocols – including TCP/IP and UDP protocols. Implementing effective network protocol monitoring with Nagios offers increased server, services, and application availability, as well as fast detection of network outages and protocol failures.
What is Prometheus ?
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company.
Prometheus Offers:
  • Inclusive monitoring
  • Powerful data model
  • Manageable and reliable
  • Efficient and Scalable

Comparison and Features: 

Parameters
Nagios
Prometheus
Protocols / Plugins
NRPE, NRDP, NSClient++ (mainly used to monitor Windows machines), NCPA.

Nagios executes scripts on remote server and then plugins sends metrics data to the centralized Nagios server.
NRPE (nagios remote plugin executor) is build upon a client server protocol where a remote server we intend to monitor listens on a specific port for TCP/IP.
Pull method uses here to fetch data, it means server approaches open ports on it’s agents rather than agents connecting to it.

It uses exporters to PULL metrics data from remote servers.

Prometheus- time series collection happens via a pull model over HTTP
Query Language
Nagios does not support native query language for data analysis.
The Prometheus query language allows you to slice and dice the dimensional data for ad-hoc exploration, graphing, and alerting.
Data Model
Nagios has non dimensional data so it is little bit hard to read and analyze data.
Prometheus has great support for service discovery for many major cloud and container platforms like   kubernetes ,EC2,Azure etc)So it can not only go out and pull metrics data directly from instances as they float around your dynamic cluster scheduler, but also attach identity metadata (as provided by the service discovery) to the time series collected from each instance. For example, you may map EC2 tags or Kubernetes labels into your Prometheus time series labels to give you more useful metrics.
Which is more suitable for docker based apps?
Checking a Docker container is a little bit harder, because the command can only be run as root, whereas the NRPE service   on the remote host runs as a non-privileged user (usually called nagios).
Here, we recommends Prometheus as it works in a pull based server which expects monitored servers to provide a web interface from which it can scrape data. There are several exporters available for Prometheus which will capture metrics and then expose them over http for Prometheus to scrape. In addition there are libraries which can be used to create custom exporters. As we are concerned with monitoring docker containers we will use the container_exporter capture metrics.
High Availability
Nagios is host centric
Yes, By Duplication but No clustering
 Prometheus can implement in high level architecture
Data Storage
Nagios only keeps current state of checks and requires plugin for storage.
Prometheus uses levelDB for indexing and uses internal subsystem for storage. We can set flags to control memory usage for same and stores data in time series format.
Support data store Time Series
Yes
Yes
Autodetect Pods/Machines
Yes, need to write custom scripts
Yes
Metrics Collectors (DB/Infra)
Yes
Yes
Docker Support
Yes
Yes
Kubernetes Stats Support
No, need to write custom scripts to fetch stats from remote server.
Yes
Maintenance Requirements
Mysql + Server
Tuning + Lightweight DB
Ease of Configuration
Simple Install it
Dockerized and Yaml files
UI / Dashboards
Built In Dashboard
Grafana 
Alert Manager
Nagios notifications can be sent on emails. Nagios notifications can be sent on emails. 
We can integrate it with Pagerduty and other PUSH notification services.
It is possible to send alerts to a number of services: Email, Generic Webhooks, Slack. All of the setup is performed via configuration and the altering rules are scriptable
Scalability
Yes (Methods of achieving a distributed monitoring solution can sometimes be complicated. Before you embark on designing and deploying a distributed monitoring solution you should outline the goals you wish to achieve with the solution you are proposing.)
Yes (A single Prometheus server can easily handle millions of time series. That’s enough for a thousand servers with a thousand time series each scraped every 10 seconds. As your systems scale beyond that, Prometheus can scale too.)
License
GPL 2.0, NagiosPL 
Apache 2.0

Conclusion:
As per the comparison, both the renowned tools has their own capabilities and the features.  
It depends on your infrastructure's need and the architecture, evaluate the tool, decide which tool will fulfill the goal and then adopt it for monitoring and analyzing purposes.

Finally, I came to the conclusion that Prometheus is more popular and majorly used now a days. It has lot of good features over Nagios.

I hope this blog will help you out to evaluate the tool and please give your valuable inputs to improve it.

References: https://prometheus.io/ 
                         https://www.nagios.org/ 

Tags: Server Monitoring, Cloud Computing, DevOps, Nagios, Prometheus


6 comments:

  1. Really the best monitoring and tracking solution I've ever used. I can not even imagine that with the help of a smartphone I can easily control employees, a guy, a child, this is really cool. As in spy movies) And most importantly, a person does not guess about the spying, since the mobile tracker is hidden https://snoopza.com/. And this gives an unreal advantage on people, because who owns the information-owns the world)

    ReplyDelete
  2. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area "Devops Training in Chennai"

    ReplyDelete
  3. I believe there are many more pleasurable opportunities ahead for
    individuals that looked at your site.

    AWS Training in Bangalore



    AWS Training in Bangalore



    AWS Training in Chennai


    ReplyDelete

Amazon EKS - Kubernetes on AWS

By Komal Devgaonkar Amazon Elastic Container Service for Kubernetes (Amazon EKS), which is highly available and scalable AWS service....