-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
检查清单
- 1. 我已经搜索过相关问题,但未能获得预期的帮助
- 2. 该问题在最新版本中尚未修复
- 3. 请注意,如果您提交的BUG相关 issue 缺少对应环境信息和最小可复现示例,我们将难以复现和定位问题,降低获得反馈的可能性
- 4. 如果您提出的不是bug而是问题,请在讨论区发起讨论 https://github.com/kvcache-ai/ktransformers/discussions。否则该 issue 将被关闭
- 5. 为方便社区交流,我将使用中文/英文或附上中文/英文翻译(如使用其他语言)。未附带翻译的非中文/英语内容可能会被关闭
问题描述
RAM: 64G
GPU: RTX4060
运行 deepseek-r1-70b-k4_m 报错:
root@8ebfaf9abff7:/models/gguf_path# python -m ktransformers.local_chat --gguf_path /models/gguf_path/DeepSeek-R1-Distill-Llama-70B-K4/ --model_path /models/model_path/DeepSeek-R1-Distill-Llama-70B/ --cpu_infer 33
2025-10-22 00:50:45,817 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
found flashinfer
using custom modeling_xxx.py.
LlamaForCausalLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From 👉v4.50👈 onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.
- If you're using
trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes - If you are the owner of the model architecture code, please modify your model class such that it inherits from
GenerationMixin(afterPreTrainedModel, otherwise you'll get an exception). - If you are not the owner of the model architecture class, please contact the model code owner to update it.
using default_optimize_rule for LlamaForCausalLM
Injecting model as ktransformers.operators.models . KLlamaModel
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/opt/conda/lib/python3.11/site-packages/ktransformers/local_chat.py", line 196, in
fire.Fire(local_chat)
File "/opt/conda/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/ktransformers/local_chat.py", line 123, in local_chat
optimize_and_load_gguf(model, optimize_config_path, gguf_path, config, default_device=device)
File "/opt/conda/lib/python3.11/site-packages/ktransformers/optimize/optimize.py", line 128, in optimize_and_load_gguf
inject(module, optimize_config, model_config, weights_loader)
File "/opt/conda/lib/python3.11/site-packages/ktransformers/optimize/optimize.py", line 33, in inject
inject_module=module_cls(key = inject_module_meta["key"], gguf_loader = gguf_loader, config = model_config, orig_module=child, **inject_module_meta["kwargs"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/ktransformers/operators/models.py", line 995, in init
BaseInjectedModule.init(
TypeError: BaseInjectedModule.init() got multiple values for argument 'prefill_device'
能看下是什么原因导致的吗?
复现步骤
root@8ebfaf9abff7:/models/gguf_path# python -m ktransformers.local_chat --gguf_path /models/gguf_path/DeepSeek-R1-Distill-Llama-70B-K4/ --model_path /models/model_path/DeepSeek-R1-Distill-Llama-70B/ --cpu_infer 33
环境信息
CPU: i7-13650HX
RAM: 64G
GPU: RTX4060
docker pull approachingai/ktransformers:v0.3.2-AVX2