user-icon Florian Ernst
13. March 2023
timer-icon 6 min

Kubernetes Service Meshes - Cilium

Cilium as a Kubernetes Service Mesh: how to install, and how to get some insights out of it.

This is one in a small series of blog posts about Kubernetes Service Meshes, please check there for reference. While the previous post started out by introducing the general picture and the basic concepts, here I will deal with a specific implementation and its respective details.

Much of the contents presented here will be covered in more detail as part of the Novatec training “Docker und Kubernetes in der Praxis”. And of course all shown technologies have more features than merely those presented here, where I will only provide a glimpse into them.


I’ll first briefly talk about eBPF, a technology utilized by Cilium, as that is the differentiating factor when compared to classical Service Meshes as I will further show below.

eBPF overview

The extended Berkeley Packet Filter (eBPF) allows one to load and run custom programs sandboxed with the Linux Kernel to extend or modify the way the kernel behaves, and it is thus ideally suited to dynamically implement observability, security, and networking functionality. Which is just what a Service Mesh strives to do, incidentally.

However, implementing all these on the kernel level saves the user the overhead of plugging Service Mesh functionality individually into their workloads, allowing for a smoother and more encompassing user experience. And thanks to eBPF this implementation can be achieved on the fly, without the need to load additional kernel modules or rewrite the kernel itself, speeding up the implementation.

But of course this interface to the kernel has its limitations concerning the complexity and size of these custom programs to add to the kernel, so while eBPF provides the groundwork to affect the kernel, additional user-space components will be needed to handle further functionality.

Further information about eBPF can also be found in “What is eBPF?” by Liz Rice of Isovalent.


Cilium is fully open source, licensed under Apache License 2.0, and is a Cloud Native Computing Foundation incubating project.

Using eBPF as a foundation, Cilium provides both a Container Network Interface (CNI) to be plugged into a Kubernetes cluster, and now also a Service Mesh implementation with user space components interacting with the kernel via eBPF programs. This allows integrating workloads into a Service Mesh without the need for adding sidecar proxies, and thus enables a sidecar-less model.

In addition there is Hubble – a fully distributed networking and security observability platform built on top of Cilium and eBPF.

Isovalent, the original creators of Cilium, offer support, training, and enterprise products around the open-source Cilium tool.



As per Cilium Quick Installation I can download the CLI directly, both for Cilium and also for Hubble. At the time of this writing this is version 1.12.13 and 0.11.2, respectively, and thus

will serve to install the CLI tools to my local system.

Yes, Cilium v1.13 has just been released, and I’m looking forward to actually test it, but not yet.


Using this CLI I can then deploy the Cilium components into a Kubernetes cluster. However, due to Cilium’s use of eBPF that cluster needs to fulfill certain requirements, making brownfield deployments somewhat harder to achieve. For my testing I will use Minikube configured as follows:

And once this cluster is ready I can start installing Cilium, already with mectrics exposure enabled for later observability:

which might then yield output like the following:

And I will also activate the Hubble UI with mectrics exposure enabled for later observability and deploy the matching Prometheus & Grafana example setup:

As indicated above I can display the installation health status, or go under the hood for even more details:

Cilium also provides an extensive self-test for validating that my cluster has proper network connectivity, which can also interface with Hubble:

which might easily take 15 minutes to finish and which then will yield output like the following:

Now that the Cilium control plane is running in my cluster I can start integrating workloads.

Integrate workloads

I will now deploy two sample applications that together build a simple todo list. First a backend:

And then a frontend:

Please note that I have already added a Cilium-specific Pod annotation io.cilium.proxy-visibility that will help Hubble to understand the traffic.

Examine the meshed workloads

When I now examine the newly-deployed workloads I can check what Cilium might automatically have added:

Huh, no sidecar proxies have been added! So Cilium is not active, or what does this mean? Well, let’s put some load on the deployed todo app, so I issue in a separate command window:

Of course feel free to also access the todo app frontend pointing your browser of choice at that URL. What does Cilium know about this traffic now?

Let’s check with Hubble:

and I can see output like the following

So yes, the curl loop from the separate command window can be traced:

  • line 1: a GET on / reaches the todoui pod via its service running on
  • lines 2 3: this pod queries DNS for the todobackend service and receives a response
  • lines 4 5: this todoui pod sends a GET on /todos/ to the todobackend pod and receives a response with status 200
  • line 6: the todoui pod sends a response with status 200 back to curl

Thus so far so good: thanks to the power of eBPF Cilium is able to gain insight into my traffic without having to deploy any sidecar proxies.


Without any sidecar proxies it appears there is no overhead associated per pod. However, even if I cannot see it, the additional data gathering takes effort.

Due to the efficiency of eBPF this overhead remains low, though, allowing Cilium to provide great network performance as shown in a CNI benchmark when running such a sidecarless model.

Of course there remains the control plane as well as the various helper applications:

Access the Hubble UI

I might prefer a different interface for accessing all these data on the fly. For that I can easily utilize the Hubble UI, i.e. a web interface.

So I’ll open a port-forwarding to this service:

Now I will be able to access this through my local browser at http://localhost:8081/, where I then can select the “default” namespace for viewing, and I will see the application like in the following picture:

So all I could query via CLI I will find here as well.

And yes, there also exists a cilium hubble ui command, but the above way via kubectl could be used even if the cilium binary were not available locally, hence the mention.

Traffic shaping

Now that I have made use of some basic functionality commonly provided by a Service Mesh, i.e. network observability, I’d like to start steering my traffic. That is, I will lock down access to the UI of my sample application, and in doing so go beyond what is already easily possible using standard Network Policies.

Apply a Cilium Network Policy

That UI component has a certain flaw intentionally built in: calls to a “/stress/” endpoint will cause just that, i.e. the component will uselessly burn some CPU cycles. For demonstrating that I can cancel the while loop from above and issue in the same command windows:

This access can then be seen in Hubble:

So the request was indeed served, but it took about 3.7 seconds.

Now I want to lock down access to my UI component to prevent abusing this “/stress/” endpoint. So I apply a L7-aware CiliumNetworkPolicy as follows:

If I now retry my access from above I see a different result:

“Access denied”, huh! And this represented by Hubble as follows:

So the request was intercepted by Cilium and then just “DROPPED”. For once I start applying a CiliumNetworkPolicy to a resource only what is explicitly whitelisted will be allowed and everything else will be denied.

Of course I will have widely crippled my UI component if all I allow to access is “/” via get, so I’d have to add paths and methods as specifically required by my application, e.g. also allowing POST on “/” etc. in the rules.http of my CiliumNetworkPolicy.

But this example serves to illustrate one feature I have gained by deploying Cilium into my cluster, just to give a taste of what is possible.

Reality check and further outlook

Gaining insight into my in-cluster network without having to deploy sidecar proxies along my containers does have its appeal. But I must not forget that even though I don’t see some of the complexity involved anymore, it is still present, albeit handled at a different place.

For instance, the eBPF programs injected into the kernel for the purposes of the Cilium Service Mesh bear heavy load, and they should never fail or malfunction as that might jeopardize the overall system stability. Owing to this consideration at the moment all eBPF programs are limited in size to one million instructions; CPU instructions, that is, and not lines of code. And before loading such a program the verifier component will need to evaluate all possible program states and branches, making it all too easy to reach this complexity limit.

As such, what can be done with eBPF programs is limited. And furthermore, observing and debugging these programs at runtime poses a considerable challenge, too.

Due to this Cilium still relies on userspace components in addition to eBPF programs doing much heavy lifting. These components don’t need to be deployed per pod anymore as with classic Service Meshes, but still per cluster node running as per-host proxies, thus somewhat obliterating the previously given clear separation between individual workloads when now Service Mesh logic is handled intermingled on the cluster host. This leads to a perception of less workload isolation, combined with a potential higher blast radius in case of any problems.

Also as of now Cilium does not yet offer some of the advanced features or easy configuration thereof as I could find with some of the more established Service Meshes like Istio. Among these are full mTLS, network retries and timeouts, and fault or delay injection. The newly released Cilium 1.13 addresses this concern to a great deal, but it still remains to be evaluated.


Cilium provides a very welcome additional option in the field of Service Meshes, with its sidecar-less model. In its current form it already readily provides the basics one would expect from a Service Mesh, and more, such as:

  • cluster-internal network observability
  • cluster-internal traffic control

In doing so, Cilium remains rather resource-efficient due to building on top of eBPF, so if a Service Mesh is needed and its feature set suffices Cilium will be a good choice.

Image Sources: (c) Samuel Zeller on Unsplash

Comment article