Skip to content

Conversation

@snissn
Copy link

@snissn snissn commented Nov 18, 2025

Summary

This PR adds FFI bindings for the new tipset reservation session API required by the Multi‑Stage Tipset Gas Reservations work.

Specifically, it:

  • Extends the C FFI with Begin/End reservation session entrypoints and associated status codes.
  • Wires new Go wrappers in cgo/fvm.go to call the reservation API and surface structured errors to Lotus.
  • Introduces a new Rust reservations module in rust/src/fvm and hooks it into mod.rs.

These bindings are used by Lotus to start and end a reservation session around tipset execution while keeping the reservation implementation inside ref‑fvm.

Motivation

Lotus and ref‑fvm are adding tipset‑scope gas reservations to eliminate miner‑penalty exposure from intra‑tipset self‑drain scenarios, as described in the draft FIP
“Multi‑Stage Tipset Gas Reservations”.

filecoin‑ffi needs to expose the engine’s reservation session API (Begin/End and error reporting) to Go so that Lotus can orchestrate reservations at the tipset boundary
without introducing new on‑chain actors or migrations.

Changes

  • cgo/fvm.go:
    • Adds FFI wrappers that call the engine’s reservation Begin/End API.
    • Propagates reservation error codes and messages into Go.
  • cgo/fvm_reservations.go:
    • New Go helpers for reservation management, used by Lotus’ FVM VM wrapper.
  • rust/src/fvm/mod.rs:
    • Registers the new reservations submodule.
  • rust/src/fvm/reservations.rs:
    • Defines the Rust side of the reservation FFI surface.

Testing

  • go test ./... in this repo.
  • Rebuilt Lotus with this branch as the extern/filecoin-ffi submodule and ran the existing FVM/consensus test suites that exercise FFI calls.

@snissn
Copy link
Author

snissn commented Nov 18, 2025

@snissn snissn marked this pull request as ready for review November 21, 2025 00:47
Copilot AI review requested due to automatic review settings November 21, 2025 00:47
Copilot finished reviewing on behalf of snissn November 21, 2025 00:50
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds FFI bindings for FVM reservation session management, enabling Lotus to orchestrate gas reservations at tipset boundaries. The implementation spans Rust FFI exports, CGO wrappers, and Go public APIs.

Key Changes:

  • Adds FVM_BeginReservations and FVM_EndReservations FFI exports with status codes and error messages
  • Introduces reservation error types and status-to-error mapping in both Rust and Go
  • Implements error message allocation/deallocation across the FFI boundary

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
rust/src/fvm/machine.rs Implements FFI exports for Begin/End reservation sessions, error message handling, and tracks current machine in global state
rust/src/fvm/engine.rs Defines ReservationError enum and default trait methods for reservation operations
cgo/fvm.go Provides CGO wrappers that invoke C ABI and manage error message memory
fvm.go Exposes typed reservation errors and high-level BeginReservations/EndReservations methods on FVM instances
cgo/fvm_reservations_test.go Tests error message handling for invalid reservation calls

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +181 to +187
// Track the most recently created machine for reservation sessions.
{
let mut current = CURRENT_MACHINE
.lock()
.map_err(|e| anyhow!("current executor lock poisoned: {e}"))?;
*current = MachinePtr(&*boxed as *const InnerFvmMachine);
}
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global CURRENT_MACHINE stores a raw pointer to the most recently created InnerFvmMachine, but this pointer is never cleared when the machine is destroyed via drop_fvm_machine. This creates a use-after-free vulnerability: if a machine is destroyed and then FVM_BeginReservations or FVM_EndReservations is called, the code will dereference a dangling pointer at lines 685 and 747.

Recommendation: Either:

  1. Clear CURRENT_MACHINE in the drop_fvm_machine destructor, or
  2. Use a different design such as passing the machine pointer as a parameter to the reservation functions, or
  3. Use a weak reference or reference counting scheme to detect when the machine has been destroyed.

Copilot uses AI. Check for mistakes.
Comment on lines +662 to +686
let machine_ptr = match CURRENT_MACHINE.lock() {
Ok(guard) => guard.0,
Err(_) => {
set_reservation_error_message_out(
error_msg_ptr_out,
error_msg_len_out,
"reservation invariant violated: current executor lock poisoned",
);
return FvmReservationStatus::ErrReservationInvariant;
}
};

if machine_ptr.is_null() {
set_reservation_error_message_out(
error_msg_ptr_out,
error_msg_len_out,
"reservations not implemented for current machine",
);
return FvmReservationStatus::ErrNotImplemented;
}

// SAFETY: the pointer is set when the machine is created and
// is expected to remain valid while reservations are used.
let inner = unsafe { &*machine_ptr };
let machine_mutex = match &inner.machine {
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safety comment states "the pointer is set when the machine is created and is expected to remain valid while reservations are used," but there's no mechanism to enforce this. If multiple machines are created (which is allowed), CURRENT_MACHINE will point to the most recent one. If that machine is destroyed while an older machine is still in use, this pointer becomes dangling. Additionally, if reservations are used across multiple machines concurrently, this global state will cause race conditions where the wrong machine's reservations are accessed.

Recommendation: The reservation functions should take the machine pointer as an explicit parameter rather than relying on global state, ensuring type-safe lifetime management.

Suggested change
let machine_ptr = match CURRENT_MACHINE.lock() {
Ok(guard) => guard.0,
Err(_) => {
set_reservation_error_message_out(
error_msg_ptr_out,
error_msg_len_out,
"reservation invariant violated: current executor lock poisoned",
);
return FvmReservationStatus::ErrReservationInvariant;
}
};
if machine_ptr.is_null() {
set_reservation_error_message_out(
error_msg_ptr_out,
error_msg_len_out,
"reservations not implemented for current machine",
);
return FvmReservationStatus::ErrNotImplemented;
}
// SAFETY: the pointer is set when the machine is created and
// is expected to remain valid while reservations are used.
let inner = unsafe { &*machine_ptr };
let machine_mutex = match &inner.machine {
// The machine pointer/reference should be passed as an explicit parameter to this function.
// Remove reliance on CURRENT_MACHINE global state.
//
// Example: fn reservation_function(machine: &Machine, ...)
//
// Use the passed-in machine reference directly:
let machine_mutex = match &machine.machine {

Copilot uses AI. Check for mistakes.
Comment on lines +160 to +171
func (f *FVM) BeginReservations(plan []byte) error {
defer runtime.KeepAlive(f)
status, msg := cgo.FvmBeginReservations(cgo.AsSliceRefUint8(plan))
baseErr := ReservationStatusToError(status)
if baseErr == nil {
return nil
}
if msg == "" {
return baseErr
}
return fmt.Errorf("%w: %s", baseErr, msg)
}
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BeginReservations and EndReservations methods are called on an FVM instance but don't actually pass f.executor to the underlying CGO functions. This means these methods operate on whatever machine is stored in the global CURRENT_MACHINE rather than the machine associated with this FVM instance. This can lead to incorrect behavior if multiple FVM instances exist concurrently - calling BeginReservations on one instance could affect a different instance's machine.

Recommendation: Pass f.executor to the CGO functions, and update the CGO and Rust FFI signatures to accept a machine pointer parameter.

Copilot uses AI. Check for mistakes.
Comment on lines +176 to +187
func (f *FVM) EndReservations() error {
defer runtime.KeepAlive(f)
status, msg := cgo.FvmEndReservations()
baseErr := ReservationStatusToError(status)
if baseErr == nil {
return nil
}
if msg == "" {
return baseErr
}
return fmt.Errorf("%w: %s", baseErr, msg)
}
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as in BeginReservations: the method doesn't pass f.executor to the underlying CGO function, operating on global state instead of the instance's machine.

Copilot uses AI. Check for mistakes.
Comment on lines +622 to +625
// Empty plan is a no-op and must not enter reservation mode.
if cbor_plan_len == 0 {
return FvmReservationStatus::Ok;
}
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The early return for an empty plan (lines 622-625) assumes that an empty plan should be a no-op. However, this means that calling BeginReservations with an empty plan will return Ok without actually entering reservation mode or validating that the machine supports reservations. If the caller then calls EndReservations, it may get an ErrSessionClosed error because no session was opened. This asymmetry could be confusing.

Recommendation: Consider documenting this behavior clearly, or alternatively, still validate that the machine supports reservations even for empty plans (though this may be intentional to avoid overhead).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 📌 Triage

Development

Successfully merging this pull request may close these issues.

1 participant