Skip to main content
Version: 1.28.6

Kubernetes Fury Tracing

Overview

Kubernetes Fury Tracing uses a collection of open source tools to provide the most resilient and robust tracing stack for the cluster.

The module contains the [tempo][tempo-page] tool from grafana.

Module's repository: https://github.com/sighupio/fury-kubernetes-tracing

Packages

Fury Kubernetes Tracing provides the following packages:

PackageDescription
tempo-distributedDistributed Tempo deployment, with an optional MinIO instance as storage backend.
info

All the components are deployed in the tracing namespace in the cluster.

Introduction: Tracing in Kubernetes

Observability enables understanding (through "observation") a system from outside without knowing its implementation details. Moreover, it makes troubleshooting issues easier providing an answer to the question "why is this happening?".

To make such observations on a system, applications that compose it must be correctly instrumented, meaning that they should emit traces, metrics and logs. An application is correctly instrumented when developers do not need to add other instrumentation on it, because they already have all the necessary information.

Distributed Tracing

Distributed tracing makes possible to observe network requests as they propagate through complex, distributed systems. It provides a better visibility on the health of applications that compose a distributed system and enables debugging of patterns that can be hard to reproduce locally. It's essential to troubleshoot issues in a microservices architecture, because they typically show non-deterministic or complex behaviours that are too complex to be debugged in environments different from the one showing the problem.

To fully understand distributed tracing, you must have a basic knowledge about its components: logs, spans, and traces.

Logs

A log message is a timed message emitted by application services or components. They are not necessarily tied to a specific user request, because they are written when a specific line of code tells the application to write it. Logs can be found pretty much on every application, as they are one of the first historic ways to debug issues with software.

Here is a simple example of a log message:

I, [2021-02-23T13:26:23.505892 #22473]  INFO -- : [6459ffe1-ea53-4044-aaa3-bf902868f730] Started GET "/" for ::1 at 2021-02-23 13:26:23 -0800

Log messages are not enough to trace code execution by themselves, as generally they do not provide context like the originating request. They become much more helpful when they are included inside a span, or correlated with a span and a trace.

Spans

Spans represent a unit of work or an operation. Spans trace operations triggered by a specific request, showing the full picture of what has happened during the time a request has been executed.

Spans typically include a name, timestamps, structured log messages and other metadata to provide context about one operation.

Span attributes

Span attributes are metadata that are added to the span itself. The following table contains possible examples of attributes:

KeyValue
http.request.method"GET"
network.protocol.version"1.1"
url.path"/webshop/articles/4"
url.query"?s=1"
server.address"example.com"
server.port8080
url.scheme"https"
http.route"/webshop/articles/:article_id"
http.response.status_code200
client.address"192.0.2.4"
client.socket.address"192.0.2.5" (this particular client is going through a proxy)
user_agent.original"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"

Distributed traces

A distributed trace, also referred to simply as trace, records the full path of requests that are originated from one point as they propagate through distributed architectures, such as microservices and/or serverless applications.

A trace is composed by one or more spans. The first span can be also names the root span, representing the point where all subsequent requests originate from. Child spans provide a deeper context of what has happened during a request (or what steps are composing a specific request).

Without tracing, root cause analysis on issues that arise in a distributed system can be quite hard. Tracing enables a streamlined process for debugging issues in such cases, providing data that correlates different operations with each other as they flow inside the system.

Many observability backends show traces in the form of waterfall diagrams that show parent-child relations between spans.

Summary

Tracing makes use of traces to keep track of requests flowing inside a complex, distributed system and to have a clear picture of that application's health. Each component of this system (such as microservices) needs to be correctly instrumented to emit traces that can be correlated with each other by a tracing backend.

There are different ways to instrument applications, different protocols to be used and different backends to store and query traces. OpenTelemetry, a CNCF open-source project, provides a framework and a toolkit to instrument applications in a consistent way. Some options for storing and visualizing traces are Jaeger and [Grafana Tempo][tempo-page].

Some commercial solutions come with their own proprietary protocols, agents and instrumenting.

KFD: Tracing module

As explained in the Introduction chapter, application instrumentation is in charge of creating and sending traces to a backend. This backend can be both inside the same Kubernetes cluster as the originating application and outside of it. The requirements are network reachability and protocol compatibility between different components' traces, which can be achieved using, for example, OpenTelemetry.

The KFD Tracing module provides an in-cluster tracing backend to store and visualize traces. It is based on [Grafana Tempo][tempo-page], which is responsible of the collection and storage of traces and uses Grafana to show saved data1.

Grafana Tempo is an open-source distributed tracing backend that is simple to use and highly scalable. Tempo is efficient in cost terms, as it only requires an object storage to work, and it integrates perfectly with Grafana, Prometheus and Loki. It's compatible with the most common open-source tracing protocols, such as Jaeger, Zipkin and OpenTelemetry.

The KFD Tracing module installs Tempo in its distribuited variant, also called "Microservice mode". It provides 3 ingesters by default and its microservices are configured to automatically scale when needed.

Architecture

Grafana Tempo architecture

In its distributed variant, Tempo includes the following components:

  • Distributor: it collects spans in different formats (protocols), processes them, adds an hash and sends them to Ingesters. The OTLP protocol has the best performance.

  • Ingester: it creates traces blocks, provides filters and indexing features, and sends them to the backend.

  • Query Frontend: it's responsible of partitioning the storage for incoming queries.

  • Querier: it's responsible of finding the ID of the requested trace on both Ingesters and storage backend.

  • Compactor: as the name suggests, it compresses data to reduce the total number of used storage blocks.

  • Storage: Tempo supports using an S3-compatbile object storage as backend. The KFD Tracing module can be configured to use an external storage facility or to create an internal MinIO cluster.

Sending traces to Tempo

To send traces to Tempo, instrumented applications need to be configured with the enpoint of Tempo Distributor, the Tempo component responsible to collect the traces. The default URL is:

tempo-distributed-distributor.tracing.svc.cluster.local:4317/
note

4317 is the port for the OpenTelemetry Protocol (OTLP), the Distributor supports other protocols but it is recommended to use OTLP for performance reasons.

warning

For production workloads, it is better to use something like the OpenTelemetry Collector instead of pushing the traces directly to Tempo, so the application can offload the traces quickly and minimize impact on the application's performance.

::

Visualizing traces

In a KFD cluster where the Tracing module has been installed and applications have been correctly instrumented, you can visualize the tracing data following these steps:

  1. Open the Grafana web UI of the cluster that hosts the applications, usually exposed with the https://grafana.internal.<cluster's name>/ URL, and login.
  2. Click on the "hamburger menu" at the top-left and select "Explore"
  3. On the Explore page, select "Tempo" on the first dropdown as the source.
  4. Write a query or use the "Search" functionality to filter traces and click "Run Query".
  5. Traces that respect the filter will be shown in the center panel.
  6. Clicking on one of the traces opens a panel on the right with more details.

Read more

You can find more info on tracing in Kubernetes at the following links:

Footnotes

  1. Grafana is one of the Monitoring module's components and it's not included in the Tracing module.