Segfault Research > Projects > Server > NetXMS Simple Dashboard > NetXMS Simple Dashboard - Usage
First published on: 05-Nov-2013

NetXMS setup

To make NetXMS Simple Dashboard take into consideration the metric data that NetXMS collects you'll have to change the text that you put into the "Description" field of the single metric that you want to track.
This is mandatory - if you don't do it, the utility will just ignore the metric.

What you have to write in the description needs to have the following format:
PT="chart" P0="[anything]" P1="[anything]" P2="[anything]" P3="[anything]" A="[P1/P2/P3]" F="[hist/real]"

That is a fixed structured we thought that could describe best what kind of data was being collected, if it can be shown in the same graph and what kind of data it is - more informations later.


Example:

  • Average KB/s received on the server by the network interface eth0, historical data:
    PT="chart" P0="srv" P1="Network" P2="eth0" P3="Avg KB/s in" A="P2" F="hist"
  • Average KB/s sent on the server by the network interface eth0, historical data:
    PT="chart" P0="srv" P1="Network" P2="eth0" P3="Avg KB/s out" A="P2" F="hist"

The setup of the first metric of the first example therefore looks in NetXMS like this:

After having set the description of the 2 above metric properties and having saved it by closing the configuration window (if you're using templates close the template-window so that the changes are commited to the database) and having waited up to 60 seconds (apparently commits to the DB don't happen immediately in NetXMS), you should see the new graph showing up in the utility:

Please note in the plotted graph above that what you wrote in the description field is mirrored in the title and legend.

Other examples:

  • How many MB/s were read by the disk VDA
    PT="chart" P0="srv" P1="Disk" P2="vda" P3="Avg MB/s read" A="P2" F="hist"
  • How many MB/s were written by the disk VDA
    PT="chart" P0="srv" P1="Disk" P2="vda" P3="Avg MB/s written" A="P2" F="hist"

  • System uptime
    PT="chart" P0="srv" P1="System" P2="Uptime" P3="days" A="P1" F="hist"

  • MBs used on server for buffers
    PT="chart" P0="srv" P1="RAM" P2="Physical" P3="MBs buffers" A="P2" F="hist"
  • MBs used on server for cache
    PT="chart" P0="srv" P1="RAM" P2="Physical" P3="MBs cache" A="P2" F="hist"

  • % of used disk space on root partition, where the graph maximum value on the Y-axis has always to reach 100 (impossible to go beyond 100% of used disk space) - the MAX is an optional parameter
    PT="chart" P0="srv" P1="Disk" P2="/" P3="% used" A="P2" MAX="100" F="hist"

The structure of the description field is as follows:

  • PT (Plot Type)
    Has to be "chart" (keeping for the future options open as to what kind of graph has to be plotted).
  • P0, P1, P2, P3 (Property)
    You can basically write anything which describes what kind of metric it is.

    The structure we came up so far was:
    • P0
      The device being monitored:
      a server (we always used "srv" so far to keep it short), a process, a router, whatever.
    • P1
      The type of device component that is being monitored:
      a disk, the RAM, the network, a property of the server, etc... .
    • P2
      The real name of the device component that is being monitored:
      sda, vdb, eth0, physical (ram), a process name (e.g. "apache"), etc... .
    • P3
      The units of the device component that is being monitored:
      are they KBs, or KB/s, or MBs, or a process count, days up, etc... .
  • A (Aggregation)
    Has to be P1, P2 or P3 (P0 not tested and probably not useful)
    How the metric data should be aggregated into the graphs (if you have the aggregation option switched on "Auto")

    E.g. if you have 2 metrics using the descriptions...

    PT="chart" P0="srv" P1="Disk" P2="vda" P3="Avg MB/s read" A="P2" F="real"
    PT="chart" P0="srv" P1="Disk" P2="vda" P3="Avg MB/s written" A="P2" F="real"

    PT="chart" P0="srv" P1="Disk" P2="vdb" P3="Avg MB/s read" A="P2" F="real"
    PT="chart" P0="srv" P1="Disk" P2="vdb" P3="Avg MB/s written" A="P2" F="real"

    ...the utility (as there is "P2" specified for the parameter "A") will create only 2 graphs for the above 4 metrics, showing as title the unique combination up to and including P2 (therefore the first graph will be for "srv => Disk => vda" and the second one for "srv => Disk => vdb"), and putting into each of the single graphs both charts for "Avg MB/s read" and "Avg MB/s written".
  • (optional) MAX
    Has to be a numeric value.
    If you want the graphs to have a maximum hardcoded value on the Y-axis then specify here the value.

    This is useful when comparing graphs - e.g. if you have 10 VMs and all of them can have a maximum allocated amount of RAM of 10GB.
  • F (Frequency)
    Has to be "hist" (Historical data) or "real" (quasi-realtime data)

    We had the need to perform:
    1) Detailed analysis on the latest data to check the almost real-time health status of the systems or the detailed trends just after having introduced changes into the system.
    2) Historical trend analysis for much larger timespans than the realtime ones.

    As NetXMS currently does not offer a data archiving functionality (e.g. to aggregate data older than X days into a lower resolution) we came up with the workaround of using often identical metrics except from their data retention period and sample frequency.
    We therefore created 2 categories:

    1) The metrics which had a high sample frequency (we're using 10 seconds) did not have to be kept for a long time (we're keeping them for 14 days) and assigned them to the "real" category, meaning "Realtime".
    2) The metrics which had a low sample frequency (we're using 600 seconds) have to be kept for a long time (keeping them for 2 years), therefore we assigned them to the "hist" category, meaning "Historical".

    Selecting one of both of these categories for any metric you can ensure that you can effectively analyze your system.
    Of course, which frequency is used for which metric depends on you - e.g.:
    - tracking the "uptime" of a server does not normally make much sense in realtime mode.
    - CPU and RAM should probably be tracked in both modes, effectively doubling for these metrics the amount of things that NetXMS tracks.
    - the free disk space probably does not need a realtime tracking, therefore only the "hist" would be needed.
    - the amount of threads generated by process X would probably have to be tracked only in realtime-mode as their influence can be identified already only within few minutes/hours/days.
    Etc... .

Once you have set up the description field of the metrics, you can start using this utility to plot the graphs and drill down into the data or do comparisons or make the graphs available to the general public.


Graphs

Your metric descriptions were set up (see above), so you can now finally plot the graphs.

As soon as you get into the Graphs section of NetXMS Simple Dashboard the utility will try to plot all the graphs for all the devices & metrics that contain a valid description (in the format mentioned above):

You can restrict the device and/or the type of metric that you want to plot by using the dropdown-menus you find in the configuration grid on the same page.

All parameters should be self-explaining, with maybe the exception of the one called "Max data points to forward to the plot engine" and "Data aggregation function":

let's assume e.g. that you have set up a realtime-metric as mentioned above, which samples data with a 10-seconds resolution and has a retention policy of 14 days => after 2 weeks you'll end up having a database containing 120960 data points for that metric.

Sending all those 120960 rows to the PHP-engine will probably result in an out-of-memory isse and even if you manage to raise that limit the plotting engine, no matter what you're using, will need ages until it finally plots the final graph.

This is where the "Max data points to forward to the plot engine" comes into play: if when plotting the graph you select the full 2-weeks timespan it will retrieve all single 120960 data points from the database, but while it does so it will aggregate them into what we call "buckets" that are evenly distributed accross the whole time-range that you selected => how many "buckets" are used is defined by this option.
This will most probably be deprecated in the next version as ideally you would want a "bucket" for every single pixel of the graph => this option will be set equal to the width of the graph.

 

In any case, as the data will be aggregated, you'll have to choose how that data will be aggregated => this is where the parameter "Data aggregation function" comes into play.
Assuming you selected a huge timespan to be evaluated, all those hundreds of data points that lie inbetween cannot be displayed in your graph with a resolution higher than your graph width in pixels => the data will be aggregated and you'll have to choose what kind of aggregation to use.

In most of the cases the "Max" will be the one that you'll need, as usually all values are positive and you'll be interested in the highest value that fits a particular pixel (or "bucket").

In other cases you might be interested more in a general average.
Assuming in this case that a single pixel represents data collected between today 10:00 and 14:00 and that during that time 3 measurements where collected each having a value of 5/8/1, the graph will plot a value of 4.666 ([5+8+1]/3).


Status

Right now this page displays informations for all metrics that have a threshold defined.

If a metric does not comply to a defined threshold and triggers an alarm, the upper section will show the alarm message (red during the first minute, yellow during the first hour, light-yellow during the first day) and the lower section will highlight the metric linked to it.