Kubernetes Fury Logging
Overviewâ
Kubernetes Fury Logging uses a collection of open source tools to provide the most resilient and robust logging stack for the cluster.
The central piece of the stack is the open source search engine opensearch, combined with its analytics and visualization platform opensearch-dashboards. The logs are collected using a node-level data collection and enrichment agent fluentbit, pushing it to the OpenSearch via fluentd. The fluentbit and fluentd stack is managed by Banzai Logging Operator. We are also providing an alternative to OpenSearch: loki.
High level diagram of the stack:
Module's repository: https://github.com/sighupio/fury-kubernetes-logging
Packagesâ
The following packages are included in the Kubernetes Fury Logging module:
Package | Description |
---|---|
opensearch | Log storage and visualization. |
logging-operator | Banzai logging operator, manages fluentbit/fluentd and their configurations |
loki-distributed | Distributed Loki deployment to provide log visualization from Grafana |
minio-ha | Three nodes HA MinIO deployment (optional, used as storage for Loki) |
All the components are deployed in the logging
namespace in the cluster.
Compatibilityâ
Kubernetes Version | Compatibility | Notes |
---|---|---|
1.27.x | â | No known issues |
1.28.x | â | No known issues |
1.29.x | â | No known issues |
1.30.x | â | No known issues |
Check the compatibility matrix for additional information about previous releases of the modules.
Introduction: Logging in Kubernetesâ
Logs help developers and sysadmins to understand what is happening inside an application or a system, enabling them to debug and troubleshoot issues.
Pods and containers logsâ
Containers are designed to support logging. The easiest method to log messages for a containerized application is to write them directly into the "standard output" (stdout
) and "standard error" (stderr
) streams, relying on the container engine or runtime.
This is often enough to debug a live application, but container engines/runtimes often do not provide a complete logging management. For example, you may need to access logs from a crashed or deleted container, which would not be available.
In a Kubernetes cluster, when an application (Pod) writes logs on stdout
/stderr
streams, logs are captured by the container runtime and saved in a file inside the node that is currently running the Pod.
The Kubelet component is in charge of maintaing track of log files, which are saved inside /var/log/pods
by default, and provide them through the Kubernetes APIs (for example using the kubectl logs
command). It is also responsible for log rotations.
In Kubernetes, logs should have dedicated storage and lifecycle management which should be different from the one provided by nodes, pods and containers. This is commonly referred as "cluster-level logging".
Cluster-level logging architectures require a separate backend to provide storage, analysis and queries on logs. Vanilla Kubernetes does not provide a cluster-level solution for logging.
System components loggingâ
Kubernetes' system-level components, such as the Kubelet, container runtimes and the etcd
database, are not executed asd Pods inside the cluster: they are system daemons. As such, they are not subject to the log management techniques mentioned before.
System-level components log their messages through systemd
/journald
and are accessible using the journalctl
tool inside each node.
Best-practices for application logging in Kubernetesâ
In this section you can find some of the commonly suggested best-practices about configuring and designing a proper logging architecture for applications inside Kubernetes:
- Write logs to
stdout
andstderr
streams and do not wirte logs to the filesystem. Leave the log capturing and rotation jobs to the underlying cluster-level logging management functionality. - Use structured logs whenever possible, for example using a
json
formatter, because they enable easier indexing and mapping for fields and provide powerful queries capabilities when debugging. - Find the right balance on the quantity of generated log messages. Having too many logging messages not only add "noise" when troubleshooting, but they also put the logging systems under unnecessary pressure on both CPUs and storage. For example, it is a good practice to disable
DEBUG
-level logging on applications running normally: you can always increase the logging level when problems arise, troubleshoot them and lower the level again.
If an application cannot write logs on stdout
/stderr
streams (for example, legacy applications that you cannot edit), you can find some alternative on this link, particularily the Tailer Webhook of the Logging Operator.
KFD: Logging moduleâ
The Logging module provided by KFD offers a cluster-level solution for logging inside a KFD cluster.
The module includes:
Fluentbit
agents running on each node, which collect and enrich logs coming from both Pods and system daemons and ship them toFluentd
.Fluentd
instances, which filter and ship log messages to the centralized log storage.- A centralized log storage system (OpenSearch or Loki from Grafana).
- A system to view and query log messages (OpenSearch Dashboards or Grafana1)
This module can be configured in four different ways:
- Disabled
- OpenSearch (default): installs the Logging Operator with pre-configured Flows e Outputs, an HA MinIO instance, and OpenSearch in a single instance or with three instances to provide an HA installation.
- Loki: installs the same components as the OpenSearch option, using Loki as the storage provider instead of OpenSearch.
- customOutputs: installs Logging Operator with pre-configured Flows, without Outputs and storage. This option lets you configure the Outputs directly in the
furyctl.yaml
file, where you must specify the destination for each Flow (for example, an off-cluster instance of Loki).
Log collectionâ
The fluentbit
and Fluentd
stack is managed and configured using the Logging Operator (ex-Banzai Cloud Logging Operator). The operator provides some CRDs (Custom Resources Definitions), including:
- Flows and ClusterFlows: they define which log messages to collect and to which Output/ClusterOutput they must be shipped.
- Outputs and ClusterOutputs: they define the storage destination of Flows/ClusterFlows.
âšī¸ INFO
To simplify the wording, this document will use
Flow
to indicate bothFlows
andClusterFlows
, andOutput
to indicate bothOutputs
andClusterOutputs
.
The Logging module includes the following Flows, each with its respective Output (using the same name):
audit
: Kubernetes API server's audit logs. KFD configures Kubernetes API by default to record the most relevant security events on the cluster (audit-logs).events
: Kubernetes events (equivalent tokubectl get events
).infra
: logs written by Pods inside the "infra" namespace (kube-system
,logging
,monitoring
, etc.), which provide infrastructural services for a KFD cluster and are not application workload. This includes, for example, logs from the logging system itself.ingressNginx
: logs written bt Ingress NGINX Controller's pods inside the cluster. Logs are processed by a parsed and fields are mapped in a standardized structure.kubernetes
: logs written by Pods on non-"infra" namespaces. Basically, this Flow includes application workload's logs.systemdCommon
: logs written by system daemons running inside the cluster nodes.systemdEtcd
: logs written by theetcd
daemons.errors
: logs which cannot be processed by the logging stack are sent to an internal MinIO bucket to enable debugging in case of errors. The bucket has a 7-day retention policy.
đĄ TIP
Each Flow has its dedicated index in OpenSearch, to provide a simple way to visualize logs from a specific category inside OpenSearch Dashboards.
Log storageâ
By default, Flows are sent to a centralized log storage deployed inside the cluster. The Logging modules provides 2 options for this storage:
-
OpenSearch (default): forked from ElasticSeach, it's an open-source, distributed suite that provides storage, analytics and querying capabilities for data. KFD can install OpenSearch in a single replica, which is suitable for dev/test environments, or using 3 replicas to provide high availability, which is more suitable for production environments.
OpenSearch can be compared to a database, uses an index system for the data and enables full-text search on logs. OpenSearch can be vertically scaled to provide more computational power when needed, for example when the quantity of ingested logs is growing.
-
Grafana Loki: Loki from Grafana Labs, it's an highly-available, distributed system that enables log aggregation. It can be horizontaly scaled and provides multi-tenancy.
Loki uses a time-series databases, saves chuncks of data in an S3-compatible object storage system and does not use and index system. Loki can filter logs using labels (similar to Prometheus) and thanks to it's S3 interface, it enables to save more data and gives you easier access to older logs if compared to OpenSearch.
On the downside, being a distributed system based on micro-services, Loki has more components to be managed.
Remote Storageâ
When configuring a KFD cluster using the furyctl.yaml
file, if you specify the Logging module's type
to be customOutputs
, Logging Operator will be installed with pre-configured Flows without Outputs, which have to be customized by the user, and it will not install a storage solution inside the cluster.
Outputs are defined inside the same furyctl.yaml
file and you can specify any Output type supported by Logging Operator. For example, you can send a Flow to a syslog server, send another Flow to an off-cluster Loki instance and all other flow to an S3 bucket in AWS. You can also choose to not send logs from a Flow anywhere using the nullout
option.
Example:
spec:
distribution:
modules:
logging:
type: customOutputs
customOutputs:
audit: |-
syslog:
host: SYSLOG-HOST
port: 123
buffer:
timekey: 1m
timekey_wait: 10s
timekey_use_utc: true
events: |-
nullout: {}
infra: |-
s3:
aws_key_id:
value: minio
aws_sec_key:
value: minio123
s3_bucket: infra
s3_region: local
s3_endpoint: 'http://minio.mycompany:9000'
force_path_style: 'true'
path: logs/${tag}/%Y/%m/%d/
buffer:
timekey: 10m
timekey_wait: 30s
timekey_use_utc: true
ingressNginx: |-
nullout: {}
kubernetes: |-
loki:
url: http://loki.mycompany:3100
extract_kubernetes_labels: true
configure_kubernetes_labels: true
extra_labels:
flow: kubernetes
buffer:
timekey: "1m"
timekey_wait: "10s"
timekey_use_utc: true
chunk_limit_size: "2m"
retry_max_interval: "30"
retry_forever: true
overflow_action: "block"
systemdCommon: |-
nullout: {}
systemdEtcd: |-
nullout: {}
errors: |-
nullout: {}
...
Querying and retrieving logsâ
Once saved inside the dedicated storage, logs collected by the logging stack can be retrieved to be visualized and queried with a UI.
For OpenSearch-based logging, KFD provides OpenSearch Dashboards for visualization, search and query capabilities for both real-time and historical logs with DQL.
On the other hand, with Loki-based logging you can use the same tooling that is used to visualize and query metrics: Grafana. With Grafana you can create custom dashboards to show both metrics and logs (and also traces, if you install the Tracing module) of a component on the same page, using both in real-time and historical data with the Explore option and LogQL.
Provided Flows and CusterFlowsâ
This module provides the following Flows and ClusterFlows out of the box:
- configs/kubernetes: only the cluster wide pods logging configuration (infrastructural namespaced excluded).
- configs/infra: only the infrastructural namespaces logs
- configs/ingress-nginx: only the nginx-ingress-controller logging configuration.
- configs/audit: all the Kubernetes audit logs related configurations (with master selector and tolerations).
- configs/events: all the Kubernetes events related configurations (with master selector and tolerations).
- configs/systemd: all the systemd related configurations.
- configs/systemd/kubelet: kubelet, docker, ssh systemd service logs configuration.
- configs/systemd/etcd: only the etcd service logs configuration (with master selector and tolerations).
Read moreâ
You can find more info on logging in Kubernetes at the following links:
- https://kubernetes.io/docs/concepts/cluster-administration/logging/
- https://kubernetes.io/docs/concepts/cluster-administration/system-logs/
- https://kube-logging.dev/
- https://opensearch.org/
- https://grafana.com/docs/loki/latest/
Footnotesâ
-
Grafana is a component of KFD's Monitoring module, so it's not included inside the Logging module. They are configured to integrate nicely with each other out of the box in KFD. âŠ