Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .github/workflows/CI.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,19 @@ jobs:
- uses: actions-rust-lang/setup-rust-toolchain@v1
- run: cargo test

# Check lints with clippy
clippy:
name: cargo clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# Ensure clippy is installed
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: clippy
- name: Clippy Check
uses: clechasseur/rs-clippy-check@v4

# Check formatting with rustfmt
formatting:
name: cargo fmt
Expand Down
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,11 @@ generated/
**/out

*.vsix
*.deb
*.deb

# cargo-mutants
mutants*

*.warp
*.sbin
!fixtures/*.warp
42 changes: 9 additions & 33 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,28 +1,13 @@
[package]
name = "warp"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
[workspace]
resolver = "2"
members = [
"rust",
"rust/fuzz",
"warp_cli"
]

[lib]
path = "rust/lib.rs"

[dependencies]
flatbuffers = "24.3.25"
bon = "2.3.0"
uuid = { version = "1.11.0", features = ["v5"]}
rand = "0.8.5"
flate2 = "1.0.34"

[features]
default = []
gen_flatbuffers = ["dep:flatbuffers-build"]

[dev-dependencies]
criterion = "0.5.1"

[build-dependencies]
flatbuffers-build = { git = "https://github.com/emesare/flatbuffers-build", features = ["vendored"], optional = true }
[workspace.dependencies]
warp = { path = "rust" }

[profile.release]
panic = "abort"
Expand All @@ -31,12 +16,3 @@ debug = "full"

[profile.bench]
lto = true

[[example]]
name = "simple"
path = "rust/examples/simple.rs"

[[bench]]
name = "void"
path = "rust/benches/void.rs"
harness = false
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright 2020-2024 Vector 35 Inc.
Copyright 2020-2025 Vector 35 Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
86 changes: 71 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,16 @@ common functions within any binary efficiently and accurately.

### Integration Requirements

To integrate with **WARP** function matching you must be able to:
To integrate with **WARP** function matching, you must be able to:

1. Disassemble instructions
2. Identify basic blocks that make up a function
3. Identify register groups with implicit extend operation
4. Identify relocatable instructions (see [What is considered a relocatable operand?](#what-is-considered-a-relocatable-operand))
4. Identify relocatable instructions (see [What is considered a relocatable instruction?](#what-is-considered-a-relocatable-instruction))

### Creating a Function GUID

The function GUID is the UUIDv5 of the basic block GUID's (sorted highest to lowest start address) that make up the function.
The function GUID is the UUIDv5 of the basic block GUIDs (sorted highest to lowest start address) that make up the function.

#### Example

Expand Down Expand Up @@ -56,44 +56,66 @@ function = uuid5(function_namespace, bb1.bytes + bb2.bytes + bb3.bytes)

#### What is the UUIDv5 namespace?

The namespace for Function GUID's is `0192a179-61ac-7cef-88ed-012296e9492f`.
The namespace for Function GUIDs is `0192a179-61ac-7cef-88ed-012296e9492f`.

### Creating a Basic Block GUID

The basic block GUID is the UUIDv5 of the byte sequence of the instructions (sorted in execution order) with the following properties:

1. Zero out all instructions containing a relocatable operand.
1. Zero out all relocatable instructions.
2. Exclude all NOP instructions.
3. Exclude all instructions that set a register to itself if they are effectively NOPs.

#### When are instructions that set a register to itself removed?

To support hot-patching we must remove them as they can be injected by the compiler at the start of a function (see: [1] and [2]).
To support hot-patching, we must remove them as they can be injected by the compiler at the start of a function (see: [1] and [2]).
This does not affect the accuracy of the function GUID as they are only removed when the instruction is a NOP:

- Register groups with no implicit extension will be removed (see: [3] (under 3.4.1.1))

For the `x86_64` architecture this means `mov edi, edi` will _not_ be removed, but it _will_ be removed for the `x86` architecture.

#### What is considered a relocatable operand?
#### What is considered a relocatable instruction?

An operand that is used as a pointer to a mapped region.
An instruction with an operand that is used as a _constant_ pointer to a mapped region.

For the `x86` architecture the instruction `e8b55b0100` (or `call 0x15bba`) would be zeroed.
- For the `x86` architecture the instruction `e8b55b0100` (or `call 0x15bba`) would be zeroed.

An instruction which is used to calculate a _constant_ pointer to a mapped region, with a _constant_ offset.

- For the `aarch64` architecture the instruction `21403c91` (or `add x1, x1, #0xf10`) would be zeroed if the incoming `x1` was a pointer into a mapped region.

#### What is the UUIDv5 namespace?

The namespace for Basic Block GUID's is `0192a178-7a5f-7936-8653-3cbaa7d6afe7`.
The namespace for Basic Block GUIDs is `0192a178-7a5f-7936-8653-3cbaa7d6afe7`.

### Function Constraints
### Constraints

Function constraints allow us to further disambiguate between functions with the same GUID, when creating the functions we store information about the following:
Constraints allow us to further disambiguate between functions with the same GUID; when creating the functions, we retrieve extra information
that is consistent between versions of the same function, some examples are:

- Called functions
- Caller functions
- Adjacent functions

Each entry in the lists above is referred to as a "constraint" that can be used to further reduce the number of matches for a given function GUID.
Each extra piece of information is referred to as a "constraint" that can be used to further reduce the number of matches for a given function GUID.

#### Creating a Constraint

Constraints are made up of a GUID and optionally, a matching offset. Adding a matching offset is preferred to give locality to the constraints,
for example, if you have a function `A` which calls into function `B` that is one constraint, but if the function `B` is also adjacent to function `A`
without a matching offset the two constraints may be merged into a single one, reducing the number of matching constraints.

- The adjacent function `B` as a constraint: `(9F188A12-3EA1-477D-B368-361936EEA213, -30)`
- The call to function `B` as a constraint: `(9F188A12-3EA1-477D-B368-361936EEA213, 48)`

#### Creating a Constraint GUID

The constraint GUID is the UUIDv5 of the relevant bytes that would be computable at creation time and lookup time.

##### What is the UUIDv5 namespace?

The namespace for Constraint GUIDs is `019701f3-e89c-7afa-9181-371a5e98a576`.

##### Why don't we require matching on constraints for trivial functions?

Expand All @@ -111,12 +133,46 @@ The main difference between **WARP** and **FLIRT** is the approach to identifica
#### Function Identification

- **WARP** the function identification is described [here](#function-identification).
- **FLIRT** uses incomplete function byte sequence with a mask where there is a single function entry (see: [IDA FLIRT Documentation] for a full description).
- **FLIRT** uses an incomplete function byte sequence with a mask where there is a single function entry (see: [IDA FLIRT Documentation] for a full description).

What this means in practice is **WARP** will have less false positives based solely off the initial function identification.
What this means in practice is **WARP** will have fewer false positives based solely off the initial function identification.
When the returned set of functions is greater than one, we can use the list of [Function Constraints](#function-constraints) to select the best possible match.
However, that comes at the cost of requiring a computed GUID to be created whenever the lookup is requested and that the function GUID is _**always**_ the same.

### WARP vs SigKit

Because WARP is a replacement for SigKit it makes sense to not only talk about the function identification approach, but also the integration with [Binary Ninja].

#### SigKit Function Identification

SigKit's function identification is similar to FLIRT so to not repeat what is said above, see [here](#function-identification).

One difference to point out is SigKit relies on relocations during signature generation. Because of this, firmware or other types of binaries lacking relocations will likely fail to mask off the required instructions.

#### Binary Ninja Integration

The two main processes that exist for both SigKit and WARP integration with Binary Ninja are the function lookup process and the signature generation process.

##### Function lookup

SigKit's function lookup process is integrated as a core component to Binary Ninja as such it is not open source, however, the process is described [here](https://binary.ninja/2020/03/11/signature-libraries.html).

What this means is **WARP** unlike SigKit can identify a greater number of smaller functions, ones which would be required to be pruned in the generation process.
After looking up a function and successfully matching **WARP** will also be able to apply type information.

##### Signature generation

SigKit's signature generation is provided through user python scripts located [here](https://github.com/Vector35/sigkit/tree/master).

Because of the separation of the signature generation and the core integration, the process becomes very cumbersome, specifically the process is too convoluted for smaller samples, and too slow for bigger samples.

#### What does this mean?

WARP can match on a greater number of functions which otherwise would be pruned at the generation process, this is not without its tradeoffs, we generate this function UUID on both ends, meaning that the algorithm must be carefully upgraded to ensure that previously generated UUID's are no longer valid.

Aside from just the matching of functions, we _never_ prune functions when added to the dataset this means we actually can store multiple functions for any given UUID. This is a major advantage for users who can now identify exactly what causes a collision and override, or otherwise understand more about the function.

After matching on a function successfully, we can reconstruct the function signature, not just the symbol name. SigKit has no information about the function calling convention or the function type.

[1]: https://devblogs.microsoft.com/oldnewthing/20110921-00/?p=9583
[2]: https://devblogs.microsoft.com/oldnewthing/20221109-00/?p=107373
Expand Down
4 changes: 3 additions & 1 deletion about.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ accepted = [
"Apache-2.0",
"MIT",
"Unicode-DFS-2016",
"Unicode-3.0",
"OFL-1.1",
"BSL-1.0",
"BSD-3-Clause",
Expand All @@ -11,5 +12,6 @@ accepted = [
"NOASSERTION",
"ISC",
"Zlib",
"OpenSSL"
"OpenSSL",
"NCSA"
]
31 changes: 31 additions & 0 deletions rust/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
[package]
name = "warp"
version = "1.0.0"
edition = "2021"
license = "Apache-2.0"

[dependencies]
flatbuffers = "25.2.10"
bon = "3.6.4"
uuid = { version = "1.17.0", features = ["v5"]}
flate2 = "1.1.2"
itertools = "0.14"

[features]
default = []
gen_flatbuffers = ["dep:flatbuffers-build"]

[dev-dependencies]
criterion = "0.6.0"
insta = { version = "1.43.1", features = ["yaml"] }

[build-dependencies]
flatbuffers-build = { git = "https://github.com/emesare/flatbuffers-build", rev = "44410b9", features = ["vendored"], optional = true }

[[bench]]
name = "type"
harness = false

[[bench]]
name = "chunk"
harness = false
64 changes: 64 additions & 0 deletions rust/benches/chunk.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
use criterion::{criterion_group, criterion_main, Criterion};
use std::str::FromStr;
use warp::chunk::{Chunk, ChunkKind, CompressionType};
use warp::mock::{mock_function, mock_function_type_class, mock_type};
use warp::r#type::chunk::TypeChunk;
use warp::signature::chunk::SignatureChunk;
use warp::signature::function::FunctionGUID;
use warp::{WarpFile, WarpFileHeader};

pub fn chunk_benchmark(c: &mut Criterion) {
let count = 10000;
// Fill out a signature chunk with functions.
let mut functions = Vec::new();
for i in 0..count {
functions.push(mock_function(&format!("function_{}", i)));
}
let mut _signature_chunk = SignatureChunk::new(&functions).expect("Failed to create chunk");
let signature_chunk = Chunk::new(
ChunkKind::Signature(_signature_chunk.clone()),
CompressionType::None,
);

// Fill out a type chunk with types.
let mut types = Vec::new();
for i in 0..count {
types.push(mock_type(
&format!("type_{}", i),
mock_function_type_class(),
));
}
let _type_chunk = TypeChunk::new(&types).expect("Failed to create chunk");
let type_chunk = Chunk::new(ChunkKind::Type(_type_chunk), CompressionType::Zstd);
let file = WarpFile::new(
WarpFileHeader::new(),
vec![signature_chunk.clone(), type_chunk],
);
c.bench_function("file to bytes", |b| {
b.iter(|| {
file.to_bytes();
})
});

let known_function = functions.get(326).expect("Failed to get function 326").guid;
c.bench_function("find known function", |b| {
b.iter(|| {
_signature_chunk
.raw_functions_with_guid(&known_function)
.count()
})
});

let unknown_function = FunctionGUID::from_str("467aae0d-84d4-4804-90d2-a62159502367")
.expect("Failed to get unknown function GUID");
c.bench_function("find unknown function", |b| {
b.iter(|| {
_signature_chunk
.raw_functions_with_guid(&unknown_function)
.count()
})
});
}

criterion_group!(benches, chunk_benchmark);
criterion_main!(benches);
45 changes: 45 additions & 0 deletions rust/benches/type.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
use criterion::{criterion_group, criterion_main, Criterion};
use warp::mock::{mock_int_type_class, mock_type};
use warp::r#type::class::{StructureClass, StructureMember, TypeClass};
use warp::r#type::guid::TypeGUID;
use warp::r#type::Type;

pub fn void_benchmark(c: &mut Criterion) {
let void_type = Type::builder()
.name("my_void".to_owned())
.class(TypeClass::Void)
.build();

c.bench_function("uuid void", |b| {
b.iter(|| {
let _ = TypeGUID::from(&void_type);
})
});

c.bench_function("computed void", |b| b.iter(|| void_type.to_bytes()));
}

pub fn struct_benchmark(c: &mut Criterion) {
let int_type = mock_type("my_int", mock_int_type_class(None, false));
let structure_member = StructureMember::builder()
.name("member")
.ty(int_type)
.offset(0)
.build();
let struct_class = StructureClass::new(vec![structure_member]);
let struct_type = Type::builder()
.name("my_struct".to_owned())
.class(TypeClass::Structure(struct_class))
.build();

c.bench_function("uuid struct", |b| {
b.iter(|| {
let _ = TypeGUID::from(&struct_type);
})
});

c.bench_function("computed struct", |b| b.iter(|| struct_type.to_bytes()));
}

criterion_group!(benches, void_benchmark, struct_benchmark);
criterion_main!(benches);
Loading
Loading