Skip to main content
Version: Next

Kubernetes Fury Logging

Overview​

Kubernetes Fury Logging uses a collection of open source tools to provide the most resilient and robust logging stack for the cluster.

The central piece of the stack is the open source search engine opensearch, combined with its analytics and visualization platform opensearch-dashboards. The logs are collected using a node-level data collection and enrichment agent fluentbit, pushing it to the OpenSearch via fluentd. The fluentbit and fluentd stack is managed by Banzai Logging Operator. We are also providing an alternative to OpenSearch: loki.

High level diagram of the stack:

logging module

Module's repository: https://github.com/sighupio/fury-kubernetes-logging

Packages​

The following packages are included in the Kubernetes Fury Logging module:

PackageDescription
opensearchLog storage and visualization.
logging-operatorBanzai logging operator, manages fluentbit/fluentd and their configurations
loki-distributedDistributed Loki deployment to provide log visualization from Grafana
minio-haThree nodes HA MinIO deployment (optional, used as storage for Loki)
info

All the components are deployed in the logging namespace in the cluster.

Compatibility​

Kubernetes VersionCompatibilityNotes
1.27.x✅No known issues
1.28.x✅No known issues
1.29.x✅No known issues
1.30.x✅No known issues

Check the compatibility matrix for additional information about previous releases of the modules.

Introduction: Logging in Kubernetes​

Logs help developers and sysadmins to understand what is happening inside an application or a system, enabling them to debug and troubleshoot issues.

Pods and containers logs​

Containers are designed to support logging. The easiest method to log messages for a containerized application is to write them directly into the "standard output" (stdout) and "standard error" (stderr) streams, relying on the container engine or runtime.

This is often enough to debug a live application, but container engines/runtimes often do not provide a complete logging management. For example, you may need to access logs from a crashed or deleted container, which would not be available.

In a Kubernetes cluster, when an application (Pod) writes logs on stdout/stderr streams, logs are captured by the container runtime and saved in a file inside the node that is currently running the Pod.

The Kubelet component is in charge of maintaing track of log files, which are saved inside /var/log/pods by default, and provide them through the Kubernetes APIs (for example using the kubectl logs command). It is also responsible for log rotations.

In Kubernetes, logs should have dedicated storage and lifecycle management which should be different from the one provided by nodes, pods and containers. This is commonly referred as "cluster-level logging".

Cluster-level logging architectures require a separate backend to provide storage, analysis and queries on logs. Vanilla Kubernetes does not provide a cluster-level solution for logging.

System components logging​

Kubernetes' system-level components, such as the Kubelet, container runtimes and the etcd database, are not executed asd Pods inside the cluster: they are system daemons. As such, they are not subject to the log management techniques mentioned before.

System-level components log their messages through systemd/journald and are accessible using the journalctl tool inside each node.

Best-practices for application logging in Kubernetes​

In this section you can find some of the commonly suggested best-practices about configuring and designing a proper logging architecture for applications inside Kubernetes:

  • Write logs to stdout and stderr streams and do not wirte logs to the filesystem. Leave the log capturing and rotation jobs to the underlying cluster-level logging management functionality.
  • Use structured logs whenever possible, for example using a json formatter, because they enable easier indexing and mapping for fields and provide powerful queries capabilities when debugging.
  • Find the right balance on the quantity of generated log messages. Having too many logging messages not only add "noise" when troubleshooting, but they also put the logging systems under unnecessary pressure on both CPUs and storage. For example, it is a good practice to disable DEBUG-level logging on applications running normally: you can always increase the logging level when problems arise, troubleshoot them and lower the level again.

If an application cannot write logs on stdout/stderr streams (for example, legacy applications that you cannot edit), you can find some alternative on this link, particularily the Tailer Webhook of the Logging Operator.

KFD: Logging module​

The Logging module provided by KFD offers a cluster-level solution for logging inside a KFD cluster.

The module includes:

  • Fluentbit agents running on each node, which collect and enrich logs coming from both Pods and system daemons and ship them to Fluentd.
  • Fluentd instances, which filter and ship log messages to the centralized log storage.
  • A centralized log storage system (OpenSearch or Loki from Grafana).
  • A system to view and query log messages (OpenSearch Dashboards or Grafana1)

KFD Logging diagram

This module can be configured in four different ways:

  • Disabled
  • OpenSearch (default): installs the Logging Operator with pre-configured Flows e Outputs, an HA MinIO instance, and OpenSearch in a single instance or with three instances to provide an HA installation.
  • Loki: installs the same components as the OpenSearch option, using Loki as the storage provider instead of OpenSearch.
  • customOutputs: installs Logging Operator with pre-configured Flows, without Outputs and storage. This option lets you configure the Outputs directly in the furyctl.yaml file, where you must specify the destination for each Flow (for example, an off-cluster instance of Loki).

Log collection​

The fluentbit and Fluentd stack is managed and configured using the Logging Operator (ex-Banzai Cloud Logging Operator). The operator provides some CRDs (Custom Resources Definitions), including:

ℹī¸ INFO

To simplify the wording, this document will use Flow to indicate both Flows and ClusterFlows, and Output to indicate both Outputs and ClusterOutputs.

The Logging module includes the following Flows, each with its respective Output (using the same name):

  • audit: Kubernetes API server's audit logs. KFD configures Kubernetes API by default to record the most relevant security events on the cluster (audit-logs).
  • events: Kubernetes events (equivalent to kubectl get events).
  • infra: logs written by Pods inside the "infra" namespace (kube-system, logging, monitoring, etc.), which provide infrastructural services for a KFD cluster and are not application workload. This includes, for example, logs from the logging system itself.
  • ingressNginx: logs written bt Ingress NGINX Controller's pods inside the cluster. Logs are processed by a parsed and fields are mapped in a standardized structure.
  • kubernetes: logs written by Pods on non-"infra" namespaces. Basically, this Flow includes application workload's logs.
  • systemdCommon: logs written by system daemons running inside the cluster nodes.
  • systemdEtcd: logs written by the etcd daemons.
  • errors: logs which cannot be processed by the logging stack are sent to an internal MinIO bucket to enable debugging in case of errors. The bucket has a 7-day retention policy.

💡 TIP

Each Flow has its dedicated index in OpenSearch, to provide a simple way to visualize logs from a specific category inside OpenSearch Dashboards.

Log storage​

By default, Flows are sent to a centralized log storage deployed inside the cluster. The Logging modules provides 2 options for this storage:

  • OpenSearch (default): forked from ElasticSeach, it's an open-source, distributed suite that provides storage, analytics and querying capabilities for data. KFD can install OpenSearch in a single replica, which is suitable for dev/test environments, or using 3 replicas to provide high availability, which is more suitable for production environments.

    OpenSearch can be compared to a database, uses an index system for the data and enables full-text search on logs. OpenSearch can be vertically scaled to provide more computational power when needed, for example when the quantity of ingested logs is growing.

  • Grafana Loki: Loki from Grafana Labs, it's an highly-available, distributed system that enables log aggregation. It can be horizontaly scaled and provides multi-tenancy.

    Loki uses a time-series databases, saves chuncks of data in an S3-compatible object storage system and does not use and index system. Loki can filter logs using labels (similar to Prometheus) and thanks to it's S3 interface, it enables to save more data and gives you easier access to older logs if compared to OpenSearch.

    On the downside, being a distributed system based on micro-services, Loki has more components to be managed.

Remote Storage​

When configuring a KFD cluster using the furyctl.yaml file, if you specify the Logging module's type to be customOutputs, Logging Operator will be installed with pre-configured Flows without Outputs, which have to be customized by the user, and it will not install a storage solution inside the cluster.

Outputs are defined inside the same furyctl.yaml file and you can specify any Output type supported by Logging Operator. For example, you can send a Flow to a syslog server, send another Flow to an off-cluster Loki instance and all other flow to an S3 bucket in AWS. You can also choose to not send logs from a Flow anywhere using the nullout option.

Example:

spec:
distribution:
modules:
logging:
type: customOutputs
customOutputs:
audit: |-
syslog:
host: SYSLOG-HOST
port: 123
buffer:
timekey: 1m
timekey_wait: 10s
timekey_use_utc: true
events: |-
nullout: {}
infra: |-
s3:
aws_key_id:
value: minio
aws_sec_key:
value: minio123
s3_bucket: infra
s3_region: local
s3_endpoint: 'http://minio.mycompany:9000'
force_path_style: 'true'
path: logs/${tag}/%Y/%m/%d/
buffer:
timekey: 10m
timekey_wait: 30s
timekey_use_utc: true
ingressNginx: |-
nullout: {}
kubernetes: |-
loki:
url: http://loki.mycompany:3100
extract_kubernetes_labels: true
configure_kubernetes_labels: true
extra_labels:
flow: kubernetes
buffer:
timekey: "1m"
timekey_wait: "10s"
timekey_use_utc: true
chunk_limit_size: "2m"
retry_max_interval: "30"
retry_forever: true
overflow_action: "block"
systemdCommon: |-
nullout: {}
systemdEtcd: |-
nullout: {}
errors: |-
nullout: {}
...

Querying and retrieving logs​

Once saved inside the dedicated storage, logs collected by the logging stack can be retrieved to be visualized and queried with a UI.

For OpenSearch-based logging, KFD provides OpenSearch Dashboards for visualization, search and query capabilities for both real-time and historical logs with DQL.

On the other hand, with Loki-based logging you can use the same tooling that is used to visualize and query metrics: Grafana. With Grafana you can create custom dashboards to show both metrics and logs (and also traces, if you install the Tracing module) of a component on the same page, using both in real-time and historical data with the Explore option and LogQL.

Provided Flows and CusterFlows​

This module provides the following Flows and ClusterFlows out of the box:

  • configs/kubernetes: only the cluster wide pods logging configuration (infrastructural namespaced excluded).
  • configs/infra: only the infrastructural namespaces logs
  • configs/ingress-nginx: only the nginx-ingress-controller logging configuration.
  • configs/audit: all the Kubernetes audit logs related configurations (with master selector and tolerations).
  • configs/events: all the Kubernetes events related configurations (with master selector and tolerations).
  • configs/systemd: all the systemd related configurations.
  • configs/systemd/kubelet: kubelet, docker, ssh systemd service logs configuration.
  • configs/systemd/etcd: only the etcd service logs configuration (with master selector and tolerations).

Read more​

You can find more info on logging in Kubernetes at the following links:

Footnotes​

  1. Grafana is a component of KFD's Monitoring module, so it's not included inside the Logging module. They are configured to integrate nicely with each other out of the box in KFD. ↩