Skip to content
This repository was archived by the owner on Sep 20, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 36 additions & 9 deletions docs/supported_models.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,36 @@
# LLM
## Qwen series
- Qwen2.5-72B-Instruct-AWQ
- Install:
```

```
# VLM
# ASR
| ModeId | ModelSeries | ModelType | Supported Engines | Supported Instances | Supported Services | Support China Region |
|:----------------------------------|:-------------------------|:------------|:--------------------|:---------------------------------------------------------------------------------------------------------------------|:------------------------------|:-----------------------|
| glm-4-9b-chat | glm4 | llm | vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| internlm2_5-20b-chat-4bit-awq | internlm2.5 | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| internlm2_5-20b-chat | internlm2.5 | llm | vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| internlm2_5-7b-chat | internlm2.5 | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| internlm2_5-7b-chat-4bit | internlm2.5 | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ❎ |
| internlm2_5-1_8b-chat | internlm2.5 | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Qwen2.5-7B-Instruct | qwen2.5 | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Qwen2.5-72B-Instruct-AWQ | qwen2.5 | llm | vllm,tgi | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Qwen2.5-72B-Instruct | qwen2.5 | llm | vllm | g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Qwen2.5-72B-Instruct-AWQ-128k | qwen2.5 | llm | vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Qwen2.5-32B-Instruct | qwen2.5 | llm | vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Qwen2.5-0.5B-Instruct | qwen2.5 | llm | vllm,tgi | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge,inf2.8xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Qwen2.5-1.5B-Instruct | qwen2.5 | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Qwen2.5-3B-Instruct | qwen2.5 | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Qwen2.5-14B-Instruct-AWQ | qwen2.5 | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Qwen2.5-14B-Instruct | qwen2.5 | llm | vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| QwQ-32B-Preview | qwen reasoning model | llm | huggingface,vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| llama-3.3-70b-instruct-awq | llama | llm | tgi | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ❎ |
| DeepSeek-R1-Distill-Qwen-32B | deepseek reasoning model | llm | vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ❎ |
| DeepSeek-R1-Distill-Qwen-14B | deepseek reasoning model | llm | vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ❎ |
| DeepSeek-R1-Distill-Qwen-7B | deepseek reasoning model | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge | sagemaker,sagemaker_async,ecs | ❎ |
| DeepSeek-R1-Distill-Qwen-1.5B | deepseek reasoning model | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge | sagemaker,sagemaker_async,ecs | ❎ |
| DeepSeek-R1-Distill-Llama-8B | deepseek reasoning model | llm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge | sagemaker,sagemaker_async,ecs | ❎ |
| deepseek-r1-distill-llama-70b-awq | deepseek reasoning model | llm | tgi,vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ✅ |
| Baichuan-M1-14B-Instruct | baichuan | llm | huggingface | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async,ecs | ❎ |
| Qwen2-VL-72B-Instruct-AWQ | qwen2vl | vlm | vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async | ✅ |
| QVQ-72B-Preview-AWQ | qwen reasoning model | vlm | vllm | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async | ❎ |
| Qwen2-VL-7B-Instruct | qwen2vl | vlm | vllm | g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.12xlarge,g5.16xlarge,g5.24xlarge,g5.48xlarge,g6e.2xlarge | sagemaker,sagemaker_async | ✅ |
| InternVL2_5-78B-AWQ | internvl2.5 | vlm | lmdeploy | g5.12xlarge,g5.24xlarge,g5.48xlarge | sagemaker,sagemaker_async | ❎ |
| txt2video-LTX | comfyui | video | comfyui | g5.4xlarge,g5.8xlarge,g6e.2xlarge | sagemaker_async | ❎ |
| whisper | whisper | whisper | huggingface | g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge | sagemaker_async | ❎ |
| bge-base-en-v1.5 | bge | embedding | vllm | g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge | sagemaker | ✅ |
| bge-m3 | bge | embedding | vllm | g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge | sagemaker,ecs | ✅ |
| bge-reranker-v2-m3 | bge | rerank | vllm | g5.xlarge,g5.2xlarge,g5.4xlarge,g5.8xlarge,g5.16xlarge | sagemaker | ✅ |
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ modelscope = "^1.21.1"
# optional dependencies
cli = ["typer","rich","questionary","requests"]
langchain = ["langchain", "langchain-aws"] # langchain required
all = ["typer","rich","questionary","langchain", "langchain-aws","sagemaker","openai","jinja2","huggingface_hub","modelscope"] # all
all = ["typer","rich","questionary","langchain", "langchain-aws","sagemaker","openai","jinja2","huggingface_hub","hf_transfer","modelscope"] # all


[tool.poetry.group.dev.dependencies]
Expand Down
6 changes: 4 additions & 2 deletions src/dmaa/commands/deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,8 @@ def deploy(
if not check_service_support_on_cn_region(service_type,region):
raise ServiceNotSupported(region, service_type=service_type)

#

# support instance
supported_instances = model.supported_instances
supported_instances = supported_instances_filter(region,allow_local_deploy,supported_instances)
Expand All @@ -281,7 +283,7 @@ def deploy(
support_gpu_num = supported_instances[0].gpu_num
default_gpus_str = ",".join([str(i) for i in range(min(gpu_num,support_gpu_num))])
gpus_to_deploy = questionary.text(
"input the local gpus to deploy the model:",
"input the local gpu ids to deploy the model (e.g. 0,1,2):",
default=f"{default_gpus_str}"
).ask()
os.environ['CUDA_VISIBLE_DEVICES']=gpus_to_deploy
Expand Down Expand Up @@ -394,7 +396,7 @@ def deploy(
console.print("[red]Invalid JSON format. Please try again.[/red]")

# model tag
if model_tag==MODEL_DEFAULT_TAG and not skip_confirm:
if model_tag==MODEL_DEFAULT_TAG and not skip_confirm and not service_type == ServiceType.LOCAL:
while True:
model_tag = questionary.text(
"(Optional) Add a model deployment tag (custom label), you can skip by pressing Enter:",
Expand Down
16 changes: 16 additions & 0 deletions src/dmaa/models/chat_templates/deepseek_r1.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}
{%- for message in messages %}{%- if message['role'] == 'system' %}
{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}
{%- else %}{% set ns.system_prompt = ns.system_prompt + '\\n\\n' + message['content'] %}
{%- endif %}{%- endif %}
{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}
{%- for message in messages %}
{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}
{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}
{%- for tool in message['tool_calls'] %}
{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{'<|Assistant|>' + message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}
{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}
{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and not loop.last and'tool_calls' not in message %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}
{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}
{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}
{%- else %}{{'<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{%- if messages[-1].role == "assistant" %}{{- messages[-1].content }}{%- endif %}{% endif %}
26 changes: 26 additions & 0 deletions src/dmaa/models/chat_templates/deepseek_r1_distill.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{%- if not add_generation_prompt is defined %}
{%- set add_generation_prompt = false %}
{%- endif %}
{%- set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}
{%- for message in messages %}
{%- if message['role'] == 'system' %}
{%- set ns.system_prompt = message['content'] %}
{%- endif %}
{%- endfor %}{{bos_token}}{{ns.system_prompt}}
{%- for message in messages %}
{%- if message['role'] == 'user' %}
{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}
{%- endif %}
{%- if message['role'] == 'assistant' and message['content'] is none %}
{%- set ns.is_tool = false -%}
{%- for tool in message['tool_calls']%}
{%- if not ns.is_first %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}
{%- set ns.is_first = true -%}
{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- endfor %}
{%- endif %}
{%- if message['role'] == 'assistant' and not loop.last and message['content'] is not none %}
{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}
{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}
{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}
{%- set ns.is_output_first = false %}{%- else %}{{'\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}
{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{%- if messages[-1].role == "assistant" %}{{- messages[-1].content }}{%- endif %}{% endif %}
Loading