Spedi is a speculative disassembler for the variable-size Thumb ISA. Given an ELF file as input, Spedi can:
Spedi works directly on the binary without using symbol information. We found Spedi to outperform IDA Pro in our experiments.
Spedi recovers all possible Basic Blocks (BBs) available in the binary. BBs that share the same jump instruction are grouped in one Maximal Block (MB). Then, MBs are refined using overlap and CFG conflict analysis. Details can be found in our CASES'16 paper "Speculative disassembly of binary code". The paper is available here.
Spedi (almost) perfectly recovers assembly instructions from our benchmarks binaries with 99.96% average. In comparison, IDA Pro has an average of 95.83% skewed by the relative poor performance on sha benchmark.
Spedi precisely recovers 97.46% of functions on average. That is, it identifies the correct start address and end address. Compare that to 40.53% average achieved by IDA Pro.
A nice property of our technique is that it's also fast and scales well with increased benchmark size. For example, spedi disassembles du (50K instructions) in about 150 ms. Note that there is good room for further optimizations.
To cite Spedi in an academic work please use:
@inproceedings{BenKhadraSK2016,
author = {Ben Khadra, M. Ammar and Stoffel, Dominik and Kunz, Wolfgang},
title = {Speculative Disassembly of Binary Code},
booktitle = {Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems},
year = {2016},
location = {Pittsburgh, Pennsylvania},
articleno = {16},
doi = {10.1145/2968455.2968505},
acmid = {2968505},
publisher = {ACM},
}
Build the project and try it on one of the binaries in our benchmark suite available in this repository.
The following command will instruct spedi
to speculatively disassemble
the .text
section,
$ ./spedi -t -s -f $FILE > speculative.inst
Use the following command to disassemble the .text
section
based on ARM code mapping symbols which provides the ground truth about correct instructions,
$ ./spedi -t -f $FILE > correct.inst
The easiest way to compare both outputs is by using,
$ diff -y correct.inst speculative.inst |less
Currently, you need to manually modify main.cpp
to show results related to
switch table and call-graph recovery.
This tool is an academic proof-of-concept. Currently, it's not on our priority list. However, there are certain features that we have in mind for the future, namely:
bx
and blx
) should be analyzed. This paper
provides some related details.Recently, Andriesse et. al. have been working on Nucleus
, a tool for function
identification in x64 binaries. Their paper "Compiler-Agnostic Function Detection in Binaries"
was accepted at IEEE Euro S&P 2017. They use more or less the same function identification
techniques implemented in Spedi. If you are interested in x64 support, you can
have a look at their tool.
Note, however, that their tool is based on the assumption that recent x64
compilers allocate jump-table data in .rodata
section.
That makes instruction recovery significantly easier since it can be done reliably with
linear sweep. In comparison, Spedi handles the more general case of mixed code/data
using speculative disassembly.
This project depends on Capstone disassembly library (v3.0.4).