If you are a performance engineer/network engineer or even security engineer, the chance of you encountering eBPF technology in the future is very high. eBPF now has a huge community of users, including big players like Meta, Google, Cloudflare, and Netflix all using this tech in their daily operations.
Let me start the blog with a real story. A year back one of my friends called me to discuss tech (which is a very common thing between us). We share different technical challenges each of us faces at our workplace or by any of our peers and these discussions lead to some informative and creative knowledge-sharing sessions. In such a discussion, he described a specific challenge faced by his cousin, who works for a giant cloud provider. The challenge was to restrict certain IPs dynamically as they provide threats such as a DOS (Denial of service attack), the application developer’s brain in me impetuously replied that these should be handled at a firewall level, or middleware can be written to check the origin of the packet and maintain a blacklist for the malicious senders and ignore the requests (Yes I come from a NodeJS and Go background so the initial solution strikes as a middleware). My friend patiently explained the scale and performance at which this needed to be executed which was way beyond my comprehension. After a noob’s doubt clearing session, we agreed that the scale he wanted could only be achieved at a kernel level. I wished him luck (sarcastically) to write a kernel patch and raise a PR hoping the OS maintainers would include the kernel patch in an upcoming kernel release and he can use this feature when it is released. As a reply to my sarcasm, he shared with me a link to an article that detailed something called “eBPF” (extended Berkeley Packet Filter). I did a basic skimming through the article, and my ignorant mind came to sense that there are amazing inventions in the tech world that I am unaware of.
According to eBPF, you can inject the code directly into the kernel without writing a patch, waiting for it to be approved by the OS maintainer,
“RUN YOUR CUSTOM CODE DIRECTLY IN OS KERNEL” — LIZ RICE
“SUPERPOWERS FOR LINUX”. — BRENDAN GREGGS
NB: I have added some video and blog links in the reference section please check it out for some amazing sessions and blogs of eBPF.
The eBPF came to life in 2014, introduced in Linux kernel 3.18, thereby unlocking the God mode of the Linux kernel. The natural doubt anyone reading this blog would have is regarding the name. If this is an “extended berkeley packet filter” then there should be a BPF “berkeley packet filter”. Well, you are right. The BSD packet filter is not a new concept. It was from the 90’s. This gem has been hiding under the radar for years, the Xennails were true innovators. BPF was very basic and its only job was to filter packets at the kernel level hence the name.
NB: I have added the original BPF paper published on December 19, 1992, a very interesting read.
The eBPF has come a long way from BPF, just a packet filtering utility to the consideration of microservices architecture for kernels or as they call it microkernels. All the top tech companies that work at scale nowadays use eBPF on daily basis. CNCF community nowadays breathes and lives on eBPF, if you are a DevOps engineer or sysadmin you would have heard of cilium and Falco both popular in Kubernetes users and production tools that are written on top of eBPF. In 2018 Linux announced it would replace its iptable-based implementation with an eBPF version in the kernel (well replacing iptable with any solution would be better), fall back and disadvantage of using iptables is out of the scope of this article, please go to the reference section and find a well-written article about it. The Kubernetes used iptables for the following use cases mostly
- Kube-proxy — the component which implements Services and load balancing by DNAT iptables rules
- Most CNI plugins are using iptables for Network Policies
Cilium has made it more efficient by eliminating the iptable whose performance degrades. You can refer to the details here.
Program Execution Bozo’s Guide
To explain the importance of eBPF there needs to be an explanation of how programs are executed in Linux, I will try to explain it from a 1000ft view for everyone.
NB: Windows User? Well why are you even reading this article, you guys do not have all these cool features.
Linux memory is divided into two
- Kernel space
- User space
The image itself explains the difference between these two. All the programs that you write are just collections of syscalls that are kernel APIs. Just take the example of opening a file through your favorite programming language that just translates into a fileopen syscalls in the kernel.
When your application asks the kernel for something, a chunk of data in kernel space is frequently copied into user space. We must do this because operating systems strictly partition memory regions used by the kernel, making it impossible to simply provide a pointer to some region of kernel memory to a user space program. This is known as “crossing the user/kernel boundary,” Because of the copy operation, operations like these can have significant performance consequences.
While syscalls cover almost all cases, there arise situations where this is not sufficient like when we need kernel-level performance or write a new driver programming, etc. Depending on the OS maintainers to make patches for all these small use cases is a waste of time and an impossible process. This is where ebpf comes into the picture.
eBPF helps you to write programs in the user space which get packaged and injected into the kernel directly, these programs run on VM in the kernel with a limited instruction set thus extending the capability of the base kernel module.
eBPF is the provision to run custom code that runs on the kernel for various processes like
- Observability (tracing)
- Load Balancing
- Network related activity
Anyone who has worked in tracing the various programs in the kernel would know its difficulty. The half-baked utilities available in the Linux systems are not enough for profile complex systems or even to extend the perf tooling.
Ebpf is event-driven which means it gets triggered on the following scenarios
- A system call
- Function entry/exit
- When a packet enters or leaves
- K probes or U probes
The programs are written in a language called restricted c which is c with a limited instruction set. The BPF compiler BCC converts this into a bytecode which is loaded into the kernel for execution. A validator is run before compiling to ensure there is no infinite loop or such never-ending I/O operation which could crash the kernel.
Additional Trick Under Your Sleeve
The ebpf is indeed a powerful tool that you could have under your sleeve. When working on high-performance projects tweaking the packets or extending the tracing functionality all help you give better observability of what’s happening with the system. Even though encountering the ebpf by an application developer at the present stage is very feeble, if you are a performance engineer/network engineer or even security engineer, the chance of you encountering ebpf in the future is going to shoot up to the sky.
There are some considerations while writing ebpf programs, there have been several privilege escalation attacks that leverage ebpf since it runs in a sudo privilege. The ebpf programs could be used as a powerful aid when leveraging kernel memory vulnerabilities. A detailed writeup of leveraging such a vulnerability was found by Qualys, there is a writeup by them which you can refer to from here.
As said in Spiderman movies “Great power comes with great responsibility” when you unlock the God mode of Linux you are on your own, the guards that protected your program from corrupting the whole are not available now. There are specific use cases to use Ebpf, it is not the swiss knife for all your performance issues. The community is pretty huge now including big players like meta, google, Cloudflare, and Netflix all using the tech daily. The tech has loads of potential to grow, recent years have seen separate conf for ebpf enthusiasts.
This blog serves as a small opening to people who are unaware of this cool tech, so please do your research. There are tons of resources available online about ebpf and open source projects being built on top of it. I will be writing a follow-up article detailing how to write a sample ebpf program and execute it.
- The BPF research paper link was published in the year 1992 — https://www.tcpdump.org/papers/bpf-usenix93.pdf
- Brendann Gregg talks about eBPF -
- Ebpf over iptables blog — https://cilium.io/blog/2018/04/17/why-is-the-kernel-community-replacing-iptables
- Qualys vulnerability — https://www.qualys.com/2021/07/20/cve-2021-33909/sequoia-local-privilege-escalation-linux.txt