Skip to content

Commit 625cb9f

Browse files
irajagopochougul
andauthored
Caching + API changes (quic#116)
* Move compile & export to base model - Remove abstract methods - Remove unused variables Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Remove unused Runtime enum Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Add type hinting to transforms variables Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Replace autoclass mappings with class variables Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * No need to pass model_card_name Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * No need of tokenizer Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Remove transform() and use transforms Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Add QEFFAutoModelForCausalLMwithCB for CB Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Add init and from_pretrained in sub-classes Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Move export & compile docs to base class Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Add export() and compile() to CausalLM class Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Add export() and compile() for CausalLMwithCB Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Add model_name and model_hash props to CausalLM Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Call export() in compile Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * CausalLM init before setting vars Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Remove "QEff" name in cache dir creation Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Check class name suffix for correct model class Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * tests: CausalLM init, from_pretrained and hash Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Fix license header Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * fix: compute the order of input_names No need of passing input_names to `_export()` Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * test: Restructure configs, pass attn_impl="eager" - Fix hash test Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * test: Added test for export() and compile() Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * test: Parametrize with CB for init, hash Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Fix CB export() Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Refactor AutoClass to QEFFAutoClass Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * test: export and compile for CB with exceptions Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Avoid deprecated abstractproperty Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Add warning regarding unsupported model Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Make compiler command constant Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Add better error message for compiler failure Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * fix: Update causal_lm test Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * fix: ApiRunner load tensors from external data Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Remove tests/test_loader.py Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Remove tests/utils.py and move to required place Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Remove separate class for CB - Added continuous_batching boolean argument - Refactor export() Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Enable CB tests for all models, fix codegen Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Use QEfficient.export instead of model.export Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * fix: Add MPTForCausalLM as valid architecture Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * fix: No need of passing model_card_name in infer Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * tests: Use mark "on_qaic" to run in parallel Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * ci: Run CLI tests without parallelism Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Add continuous batching for hashing Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * tests: Use config_ids instead of config_id fn Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * fix for CB check + Revert "Add MPTForCausalLM" This reverts commit b6b5b3a. Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * address PR comments Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * Fixed doc, don't use mutable default, fix typo Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * fix: Use same opset as export for custom-ops Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * fix: opset version int to str Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * fix: Revert onnx opset to 13, to work with old API Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> * added generate method back, added test for automodel code, added deprecation warning, fixed bugs Signed-off-by: Onkar Chougule <[email protected]> * run formatter and linter Signed-off-by: Onkar Chougule <[email protected]> * fixed test failures Signed-off-by: Onkar Chougule <[email protected]> * fixed gptbigcode 3d PKV bug in export and deprecated API test Signed-off-by: Onkar Chougule <[email protected]> * ran formatter Signed-off-by: Onkar Chougule <[email protected]> * fixed documentation Signed-off-by: Onkar Chougule <[email protected]> --------- Signed-off-by: Ilango Rajagopal <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> Co-authored-by: Onkar Chougule <[email protected]>
1 parent 23ca9ca commit 625cb9f

31 files changed

+896
-794
lines changed

QEfficient/base/common.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -79,9 +79,6 @@ def from_pretrained(cls, pretrained_model_name_or_path: str, *args, **kwargs) ->
7979
Downloads HuggingFace model if already doesn't exist locally, returns QEffAutoModel object based on type of model.
8080
"""
8181
if not os.path.isdir(pretrained_model_name_or_path):
82-
# Save model_card_name if passed
83-
model_card_name = kwargs.pop("model_card_name", pretrained_model_name_or_path)
84-
kwargs.update({"model_card_name": model_card_name})
8582
pretrained_model_name_or_path = login_and_download_hf_lm(pretrained_model_name_or_path, *args, **kwargs)
8683
model_type = get_hf_model_type(hf_model_path=pretrained_model_name_or_path)
8784
qeff_auto_model_class = MODEL_TYPE_TO_QEFF_AUTO_MODEL_MAP[model_type]

QEfficient/base/modeling_qeff.py

Lines changed: 279 additions & 65 deletions
Large diffs are not rendered by default.

QEfficient/cloud/execute.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@ def main(
5858
prompt=prompt,
5959
prompts_txt_file_path=prompts_txt_file_path,
6060
generation_len=generation_len,
61-
full_batch_size=full_batch_size,
6261
)
6362

6463

QEfficient/cloud/infer.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,6 @@ def main(
116116
prompt=prompt,
117117
prompts_txt_file_path=prompts_txt_file_path,
118118
generation_len=generation_len,
119-
full_batch_size=full_batch_size,
120119
)
121120

122121

QEfficient/compile/compile_helper.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import os
1010
import shutil
1111
import subprocess
12+
import warnings
1213
from typing import List, Optional, Tuple
1314

1415
from QEfficient.utils.logging_utils import logger
@@ -51,6 +52,11 @@ def compile_kv_model_on_cloud_ai_100(
5152
device_group: Optional[List[int]] = None,
5253
**kwargs,
5354
) -> Tuple[bool, str]:
55+
warnings.warn(
56+
"\033[93mUse `QEFFAutoModelForCausalLM.compile` instead, this method will be removed soon.\033[0m",
57+
DeprecationWarning,
58+
stacklevel=2,
59+
)
5460
if kwargs:
5561
# FIXME
5662
raise NotImplementedError("Can't handle extra compilation args now!")

QEfficient/customop/ctx_scatter_gather.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@
88
import onnxscript
99
import torch
1010

11-
ops = onnxscript.opset13
11+
from QEfficient.utils import constants
12+
13+
ops = getattr(onnxscript, "opset" + str(constants.ONNX_EXPORT_OPSET))
1214

1315

1416
@onnxscript.script(onnxscript.values.Opset("com.qualcomm.cloud", 1))

QEfficient/customop/ctx_scatter_gather_cb.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@
88
import onnxscript
99
import torch
1010

11-
ops = onnxscript.opset13
11+
from QEfficient.utils import constants
12+
13+
ops = getattr(onnxscript, "opset" + str(constants.ONNX_EXPORT_OPSET))
1214

1315

1416
@onnxscript.script(onnxscript.values.Opset("com.qualcomm.cloud", 1))

QEfficient/customop/rms_norm.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@
99
import torch
1010
from torch import nn
1111

12-
ops = onnxscript.opset13
12+
from QEfficient.utils import constants
13+
14+
ops = getattr(onnxscript, "opset" + str(constants.ONNX_EXPORT_OPSET))
1315

1416

1517
@onnxscript.script(onnxscript.values.Opset(domain="com.qti.aisw.onnx", version=1))

QEfficient/exporter/export_hf_to_cloud_ai_100.py

Lines changed: 14 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
import torch
1414
from transformers import PreTrainedTokenizer, PreTrainedTokenizerFast
1515

16-
import QEfficient
1716
from QEfficient.base.common import AUTO_MODEL_MAP_TO_MODEL_TYPE_MAP, QEFF_MODEL_TYPE, QEFFCommonLoader
1817
from QEfficient.base.modeling_qeff import QEFFBaseModel
1918
from QEfficient.exporter.export_utils import export_onnx, fix_onnx_fp16, generate_input_files, run_model_on_ort
@@ -168,11 +167,6 @@ def convert_to_cloud_kvstyle(
168167
Returns:
169168
:str: Path of exported ``ONNX`` file.
170169
"""
171-
warnings.warn(
172-
"\033[93mThis function will be deprecated soon, use QEfficient.export instead\033[0m",
173-
DeprecationWarning,
174-
stacklevel=2,
175-
)
176170
if os.path.exists(onnx_dir_path):
177171
logger.warning(f"Overriding {onnx_dir_path}")
178172
shutil.rmtree(onnx_dir_path)
@@ -323,7 +317,9 @@ def export_for_cloud(
323317
full_batch_size: Optional[int] = None,
324318
) -> str:
325319
# Check if model architecture is supported for continuous batching.
326-
if full_batch_size and qeff_model.model.config.architectures[0] not in get_lists_of_cb_qeff_models.architectures:
320+
if full_batch_size and qeff_model.model.config.architectures[0].lower() not in {
321+
x.lower() for x in get_lists_of_cb_qeff_models.architectures
322+
}:
327323
raise NotImplementedError(
328324
f"Continuous batching is not supported for {qeff_model.model.config.architectures[0]}"
329325
)
@@ -356,24 +352,14 @@ def export_lm_model_for_cloud(
356352
logger.warning(f"Overriding {onnx_dir_path}")
357353
shutil.rmtree(onnx_dir_path)
358354

359-
if qeff_model.is_transformed:
360-
model_name = export_kvstyle_transformed_model_to_onnx(
361-
model_name=model_name,
362-
transformed_model=qeff_model.model,
363-
tokenizer=tokenizer,
364-
onnx_dir_path=onnx_dir_path,
365-
seq_len=seq_length,
366-
full_batch_size=full_batch_size,
367-
) # type: ignore
368-
369-
else:
370-
model_name = export_bertstyle_model_to_onnx(
371-
model_name=model_name,
372-
model=qeff_model.model,
373-
tokenizer=tokenizer,
374-
onnx_dir_path=onnx_dir_path,
375-
seq_len=seq_length,
376-
) # type: ignore
355+
model_name = export_kvstyle_transformed_model_to_onnx(
356+
model_name=model_name,
357+
transformed_model=qeff_model.model,
358+
tokenizer=tokenizer,
359+
onnx_dir_path=onnx_dir_path,
360+
seq_len=seq_length,
361+
full_batch_size=full_batch_size,
362+
)
377363
return os.path.join(onnx_dir_path, f"{model_name}.onnx")
378364

379365

@@ -398,7 +384,7 @@ def qualcomm_efficient_converter(
398384
399385
Usage 2: You can pass ``model_name`` and ``model_kv`` as an object of ``QEfficient.QEFFAutoModelForCausalLM``, In this case will directly export the ``model_kv.model`` to ``ONNX``
400386
401-
We will be deprecating this function and it will be replaced by ``QEffAutoModelForCausalLM.export``.
387+
We will be deprecating this function and it will be replaced by ``QEFFAutoModelForCausalLM.export``.
402388
403389
``Mandatory`` Args:
404390
:model_name (str): The name of the model to be used.
@@ -423,7 +409,7 @@ def qualcomm_efficient_converter(
423409
424410
"""
425411
warnings.warn(
426-
"\033[93mmodel_kv argument will be replaced by qeff_model of type QEFFBaseModel\033[0m",
412+
"\033[93m`qualcomm_efficient_converter` method will be deprecated soon, use `QEFFAutoModelForCausalLM.export` instead\033[0m",
427413
DeprecationWarning,
428414
stacklevel=2,
429415
)
@@ -440,13 +426,8 @@ def qualcomm_efficient_converter(
440426
)
441427
)
442428

443-
# Transform if required
444-
if model_kv.is_transformed and not kv:
445-
raise AttributeError("Transformed model is passed while requesting to convert non-transformed model")
446-
model_kv = model_kv if model_kv.is_transformed else QEfficient.transform(model_kv) if kv else model_kv
447-
448429
if onnx_dir_path is None:
449-
model_card_dir = os.path.join(QEFF_MODELS_DIR, str(model_kv.model_card_name))
430+
model_card_dir = os.path.join(QEFF_MODELS_DIR, str(model_name))
450431
onnx_dir_path = os.path.join(model_card_dir, "onnx")
451432
os.makedirs(onnx_dir_path, exist_ok=True)
452433

QEfficient/generation/cloud_infer.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
#
66
# -----------------------------------------------------------------------------
77

8-
from typing import Dict, List, Optional
8+
from pathlib import Path
9+
from typing import Dict, List, Optional, Union
910
from warnings import warn
1011

1112
import numpy as np
@@ -43,7 +44,7 @@
4344
class QAICInferenceSession:
4445
def __init__(
4546
self,
46-
qpc_path: str,
47+
qpc_path: Union[Path, str],
4748
device_ids: Optional[List[int]] = None,
4849
activate: bool = True,
4950
enable_debug_logs: bool = False,
@@ -68,8 +69,7 @@ def __init__(
6869
if enable_debug_logs:
6970
if self.context.setLogLevel(qaicrt.QLogLevel.QL_DEBUG) != qaicrt.QStatus.QS_SUCCESS:
7071
raise RuntimeError("Failed to setLogLevel")
71-
72-
qpc = qaicrt.Qpc(qpc_path)
72+
qpc = qaicrt.Qpc(str(qpc_path))
7373
# Load IO Descriptor
7474
iodesc = aicapi.IoDesc()
7575
status, iodesc_data = qpc.getIoDescriptor()

0 commit comments

Comments
 (0)