Skip to content

Conversation

fynnsu
Copy link
Collaborator

@fynnsu fynnsu commented Jul 8, 2025

Note: this is a WIP and I've attached a bunch of clarifying questions below.

This pr:

Comments/Questions:

  • I don't fully understand the "architectures" field on the eagle config. Is this just meta data for the end user or does this need to be verified somewhere? I set it to default to ["LlamaForCausalLM"] based on what I saw in the EagleSpeculatorConfig but I'm not sure about that.
  • Currently there is no real interaction between the draft and verifier model. These can be two completely separate models but they should have the same tokenizer / token ids so that the logit distributions are comparable. Should I try to enforce this when the speculator is defined (/when a verifier is added)? Doing so would probably require finding the tokenizer for each model (if it exists), downloading it, and comparing the vocab lists. Alternatively, we can say the user is required to ensure the two models work together (as is the case for the eagle speculator) and not enforce it in code.
  • The forward method currently just wraps the self.draft.forward method. The arguments I used for this method are based on the eagle speculator implementation, however, I'm not sure that they're all guaranteed to exist for all PreTrainedModels. Do we currently limit the model types this works with? Is there an api limiting this? I could also change this to *args, **kwargs if this call varies significantly across models.

@fynnsu fynnsu force-pushed the feat/independent_speculator branch from 05874d4 to 7cfc8ae Compare July 10, 2025 13:50
Copy link
Collaborator

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few quick comments, overall I think there's more that needs to be considered and worked out here and additionally some of the model loading pathways and config loading pathways likely will benefit from the utilities being added in #48.

For discussion points to work through for the config:

  • The goal should be to augment the base pre trained model config so that the speculator_* properties are added while all other keys and functionality from the original config are kept exactly the same.
  • I think we should edit the from_pretrained_config to override the parent from_pretrained method or at least figure out how to support and combine those. The ideal goal would be that from_pretrained on the base SpeculatorModelConfig would continue to only work if it is a previously saved speculator config. Going through the function for the independent config would result in either loading the speculator config if it is already a saved speculator config or it would go through the conversion and injection if not. All of this being able to be supported with the base functionality of being able to load from an HF stub, a local path, a pre trained config, or a dict.

For discussion points to work through for the model:

  • We'll need to ensure that from_pretrained and save_pretrained all work with the config requirements laid out above where it's just extras we're injecting so the model is still runnable externally from speculators as the ideal end state -- essentially guaranteeing we've only added extras to it
  • The save_pretrained / converting to a state dict pathways will need to flatten the structure so that whatever property we are using to store the model under doesn't get injected into the namespace for saving -- essentially just need to ensure we save the state_dict exactly the same as it would be external from speculators save pathways
  • For the above point, that would ideally mean that we have an invisible wrapper / injector to ensure the properties and pathways from the original pretrained model are set to the IndependentSpeculator or forwarded


speculators_model_type: str = "independent"
speculators_model_type: Literal["independent"] = "independent"
architectures: list[str] = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should be able to get rid of this completely, can you double check that this is pulled from the original config and set appropriately in the from_pretrained_config pathway?

Essentially we just want to ensure that saving/loading the config will result in speculators_model_type, speculators_version, and speculators_config being added while keeping the rest of the config exactly the same

default_factory=lambda: ["LlamaForCausalLM"],
description=("List of model architectures that can be used with the model "),
)
draft_model: str = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following why this would end up being used. We should have the verifier in the speculators_config and the config we're injecting to would represent the draft/speculator model

@fynnsu fynnsu force-pushed the feat/independent_speculator branch from 83e148b to 856fcbf Compare August 6, 2025 15:22
Signed-off-by: Fynn Schmitt-Ulms <[email protected]>
@fynnsu
Copy link
Collaborator Author

fynnsu commented Aug 6, 2025

I've updated the implementation so that we simply inject the speculator_* items into the existing model config. We now also have the following behavior:

IndependentSpeculatorConfig.from_pretrained("meta-llama/Llama-3.2-3B-Instruct") # works, updates config
SpeculatorModelConfig.from_pretrained("meta-llama/Llama-3.2-3B-Instruct") # fails because model is not a speculator model

IndependentSpeculator.from_pretrained("meta-llama/Llama-3.2-3B-Instruct") # almost works (see note on default SpeculatorConfig)
SpeculatorModel.from_pretrained("meta-llama/Llama-3.2-3B-Instruct") # fails because model is not a speculator model

model = # IndependentSpeculator ...
model.save_pretrained("./indep") # saves draft model only, but includes `speculator_*` additions to config

Changes required to get this working

  1. Removed from src/speculators/config.py SpeculatorModelConfig class
model_type: ClassVar[str] = "speculator_model" # type: ignore[misc]

This was fixing the model_type to be speculator_model which caused conflicts when loading another config (e.g. LlamaConfig) which would set this value to the respective value for that model (e.g. llama). By removing this line, this value becomes an instance variable (since it's still defined in PreTrainedConfig), which can be updated at runtime. Note: to ensure behavior remains the same for existing speculators model types, we by default set the value of the instance variable to "speculator_model" in __init__.

  1. Removed from src/speculators/config.py SpeculatorModelConfig class
_auto_class: ClassVar[Optional[str]] = "" # type: ignore[misc]

This didn't seem to be needed for loading models (and currently seems to be incorrectly set), however if _auto_class is any value except None, a line is added to the saved config, and the source file for the config (in this case src/speculators/models/independent.py) gets saved when save_pretrained is called.

  1. Added from_dict and from_pretrained to IndependentSpeculatorConfig
    from_dict adds logic to automatically set the speculators_model_type value if it doesn't exist (for automatic conversion support).
    from_pretrained is the same as the super class' from_pretrained except it doesn't error out if speculators_model_type is not defined.

  2. Added from_pretrained, save_pretrained, load_state_dict, state_dict, forward methods which pass calls through the their respective counterparts on the self._draft_model object. Note: from_pretrained requires a bit of extra logic because it is a class method, and we need to first instantiate the IndependentSpeculator before we can access the draft model.

  3. Added __getattr__, __setattr__, __delattr__ to IndependentSpeculator: This adds additional passthrough support that should enable things like model.lm_head accessing model._draft_model.lm_head. Note: this was not explicitly needed to support the from_pretrained/save_pretrained behavior above. Implementation loosely based off logic from PyTorch's OptimizedModule (the thing that gets returned if you torch.compile something): https://github.com/pytorch/pytorch/blob/4c01991b386e7b56da59f5cc68c2edd400a28871/torch/_dynamo/eval_frame.py#L442

Note on the default SpeculatorConfig value

Currently, the default value for speculators_config in SpeculatorModelConfig is None, although the field type is SpeculatorConfig (not Optional[SpeculatorConfig]). If the type of the field is updated to Optional[SpeculatorConfig] mypy detects a number of type errors due to missing checks on speculators_config is not None before accessing attributes. One of these missing checks is currently causing IndependentSpeculator.from_pretrained to fail.

There are 3 potential fixes I see for this.

  1. Explicitly allow None values for speculators_config and update all attribute accesses to first check the value is not None.
  2. Create a different valid SpeculatorConfig value to use as the default value. Unfortunately, all attributes of SpeculatorConfig are required with no default values.
  3. Remove the default value for speculators_config forcing all calling code to pass in a valid SpeculatorConfig object.

Note on output of IndependentSpeculator.save_pretrained

I compared the output of loading and saving a llama model using AutoModelForCausalLM and IndependentSpeculator.

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
model.save_pretrained("./test_llama_model")

model = IndependentSpeculator.from_pretrained(
    "meta-llama/Llama-3.2-3B-Instruct",
    verifier=None,
    verifier_attachment_mode="detached",
)
model.save_pretrained("./test_independent_model")
diff -r test_independent_model/ test_llama_model/

# output
diff -r test_independent_model/config.json test_llama_model/config.json
13d12
<   "has_no_defaults_at_init": false,
35,37c34
<   "speculators_config": null,
<   "speculators_model_type": "independent",
<   "speculators_version": "0.1.0.dev11",
---
>   "tie_word_embeddings": true,

Note: this means that all files in the directory were exactly the same except for the config.json files. The speculator_* values were correctly added to the config. There were two other changes (the has_no_defaults_at_init and tie_word_embeddings) but I believe these are benign.

@fynnsu fynnsu marked this pull request as ready for review August 26, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Enable Initial Base Support for Independent Speculators

2 participants