System Monitoring

Modified on Tue, 21 Jan at 1:35 PM

Labforward provides observability solutions for tracking system resources and health metrics, as well as tools for visualizing metrics and configuring alerts.


The solution is divided into internal and external monitoring systems. With the internal system, data never leaves the cluster. External monitoring allows central monitoring by Labforward or your own operations team.


TABLE OF CONTENTS


Extra Resource Requirements

If you already have 8 CPUs and 32 GB of memory, you are not required to increase either of them because the cluster should already have enough resources to distribute them as required.


The monitoring system consists of one or more applications deployed inside the cluster. Hence, it is necessary to ensure that there are enough resources in the cluster.


External Monitoring System

1 extra CPU;


3 extra Gigabytes of memory.


Internal Monitoring System

1 extra CPU;


5 extra Gigabytes of memory.


Monitoring Configurations

Labforward monitoring options can be deployed via the Config page in the admin console. Go to Monitoring Solution Configuration → Deploy Monitoring System, and select if you want the internal or external monitoring system:


Please note that enabling the built-in monitoring solution requires cluster-level permissions. If your instance is scoped to a namespace and shares a cluster with other applications, you need to ensure the cluster permissions or deploy your own monitoring system.


External Monitoring System

By selecting Send data to external monitoring system, the platform will deploy an agent inside the cluster. This agent is responsible for collecting and sending the telemetry data to an external receiver outside the cluster. The current options are NewRelic or Labforward Central Monitoring:


The system will not send any personal or confidential data, such as IPs, passwords, secrets, or users' personal information. All the data can be used only for statistical analysis and monitoring support. It is not possible to correlate it with any user.


The system will send:


  • Logs: detailed, timestamped records of events that happen in an application or system. Used to investigate and troubleshoot possible problems;
  • Metrics: aggregated values that provide a real-time view of the system’s performance and health. Used for real-time monitoring and alert systems;
  • Traces: records of requests as they travel through a distributed system. Used to track performance and identify possible slownesses.


Contact [email protected] to see how Labforward can assist you in monitoring your applications.


Internal Monitoring System 

By selecting Keep all data inside the internal monitoring system, the platform will deploy all required components inside your Kubernetes installation. No data will leave your environment.


By default, the platform will start collecting the telemetry data using Prometheus. 


A monitoring dashboard to visualize the cluster metrics can be included. For more details, see the Internal Monitoring Dashboard section.


Lastly, you can deploy the Alert Manager, which will monitor the metrics and send notifications if the system trespasses given thresholds. This can be configured in the Show Metrics & Thresholds section. It is possible to receive the notifications via email or Slack:


After deploying the new sequence with the desired notification channel, you will start receiving notifications.


The main metrics are CPU, Memory, filesystem (storage), and readiness (if an application is unavailable). Labforward reserves the right to include other application-specific thresholds considered relevant for the infrastructure monitoring.


You can scrape the Prometheus components to include Labforward’s data in your own dashboards. Contact [email protected] for more details.


Internal Monitoring Dashboard

The internal solution can only collect metrics. Logs and traces are available only for the external solution.


It is possible to include a monitoring dashboard to visualize the data by deploying Grafana:


To access the dashboard, you can configure the subdomain and admin password. The address will be https://{{Monitoring Subdomain}}.{{domain}}. For this example, it will be https://monitoring.test.labforward.app. See the complete URL link in the Expose monitoring dashboard help text.


This configuration will expose a new subdomain. To access it, it must be properly configured in the DNS and TLS certificate.


After login, select the Dashboards option in the left menu:


Grafana will show many dashboards. Only the ones with the “overview” tag are available for internal monitoring. It is possible to visualize them by using the tag filter:




The three dashboards offer different perspectives of the cluster:


Cluster Overview - Cluster: Shows CPU, memory, and network usage from the Kubernetes cluster perspective. Also, it lists the Kubernetes PODs (applications running) and possible situations that might happen to them, such as out-of-memory or unexpected shutdown problems;


Cluster Overview - Node & Storage: Shows CPU, memory, network, storage, and other metrics from the operational system perspective. It is important to notice that the persistent volumes are divided per application. However, the on-premise storage solution (OpenEBS) does not support this division. This will result in all the applications using all the disk space available:


The data is not duplicated. This means that the sum of all applications' data is 34.8 GiB, and the system has a total of 72.5 GiB available. For alerting purposes, you will be notified when the Used data gets closer to the Total space available on the disk containing the /var/openebs directory, regardless of which application uses it.


Cluster Overview - Pods: Shows CPU, memory, and other details from the cluster perspective but isolated to each POD. 


All the dashboards have a common set of settings:


  • Customer: The name used by Labforward in your application’s license;
  • Resolution: The time interval between data points retrieved from the data source. It determines the granularity of the data displayed on the graph or visualization. For example, a 30s resolution will collect the current CPU usage every 30 seconds to build the CPU graphs;
  • Period (Last 1 hour): The period considered for the data visualization;
  • Auto refresh interval: The interval which the dashboard will reload to include new data.


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article