Wouldn’t it be interesting to count the percentage of the x86 ISA that a program uses? Well, first of all, we need to define what the term “instruction” even means when it comes to the behemoth that is x86. For a detailed explanation, I highly recommend reading this article. In this post, I’ll follow XED’s definition of an instruction, which is: Intel® XED classifies instructions as iclasses (ADD, SUB, MUL, etc.) of type xed_iclass_enum_t. All of these “iclasses” are defined inside xed-iclass-enum.h, turns out there are 1974 of them at the time of writing this post (ignoring XED_ICLASS_INVALID and XED_ICLASS_LAST).
Now, all we need to do is build XED, give it an executable and count the number of unique iclasses. The results below were attained using the following script (by default, XED decodes at most 100 million instructions, make sure to increase that limit if decoding a large file):
xed -i file | awk '/^XDIS/ {list[$6]++} END {printf "%.2f%%\n", length(list) / 1974 * 100}'
- Hello world in C: 0.91%
- Busybox: 5.83%
- Tor: 5.93%
- Go: 10.59%
- LLVM: 13.12%
Turns out very few x86 instructions are actually used, to the point where even something as massive as LLVM only uses 259 out of 1974 instructions. This explains why x86 is so power-hungry compared to ARM or RISC-V.