CUDA error: operation not supported

Hello, I encountered the following error while running Evo2:


> Found complete file in repo: evo2_7b.pt
> [08/07/25 08:58:08] INFO     StripedHyena - INFO - Initializing     model.py:616
>                              StripedHyena with config:                          
>                              {'model_name': 'shc-evo2-7b-8k-2T-v2',             
>                              'vocab_size': 512, 'hidden_size':                  
>                              4096, 'num_filters': 4096,                         
>                              'hcl_layer_idxs': [2, 6, 9, 13, 16,                
>                              20, 23, 27, 30], 'hcm_layer_idxs': [1,             
>                              5, 8, 12, 15, 19, 22, 26, 29],                     
>                              'hcs_layer_idxs': [0, 4, 7, 11, 14,                
>                              18, 21, 25, 28], 'attn_layer_idxs':                
>                              [3, 10, 17, 24, 31],                               
>                              'hcm_filter_length': 128,                          
>                              'hcl_filter_groups': 4096,                         
>                              'hcm_filter_groups': 256,                          
>                              'hcs_filter_groups': 256,                          
>                              'hcs_filter_length': 7, 'num_layers':              
>                              32, 'short_filter_length': 3,                      
>                              'num_attention_heads': 32,                         
>                              'short_filter_bias': False,                        
>                              'mlp_init_method':                                 
>                              'torch.nn.init.zeros_',                            
>                              'mlp_output_init_method':                          
>                              'torch.nn.init.zeros_', 'eps': 1e-06,              
>                              'state_size': 16, 'rotary_emb_base':               
>                              100000000000,                                      
>                              'rotary_emb_scaling_factor': 128,                  
>                              'use_interpolated_rotary_pos_emb':                 
>                              True, 'make_vocab_size_divisible_by':              
>                              8, 'inner_size_multiple_of': 16,                   
>                              'inner_mlp_size': 11264,                           
>                              'log_intermediate_values': False,                  
>                              'proj_groups': 1,                                  
>                              'hyena_filter_groups': 1,                          
>                              'column_split_hyena': False,                       
>                              'column_split': True, 'interleave':                
>                              True, 'evo2_style_activations': True,              
>                              'model_parallel_size': 1,                          
>                              'pipe_parallel_size': 1,                           
>                              'tie_embeddings': True,                            
>                              'mha_out_proj_bias': True,                         
>                              'hyena_out_proj_bias': True,                       
>                              'hyena_flip_x1x2': False,                          
>                              'qkv_proj_bias': False,                            
>                              'use_fp8_input_projections': False,                
>                              'max_seqlen': 1048576,                             
>                              'max_batch_size': 1, 'final_norm':                 
>                              True, 'use_flash_attn': True,                      
>                              'use_flash_rmsnorm': False,                        
>                              'use_flash_depthwise': False,                      
>                              'use_flashfft': False,                             
>                              'use_laughing_hyena': False,                       
>                              'inference_mode': True,                            
>                              'tokenizer_type':                                  
>                              'CharLevelTokenizer', 'prefill_style':             
>                              'fft', 'mlp_activation': 'gelu',                   
>                              'print_activations': False, 'Loader':              
>                              <class 'yaml.loader.FullLoader'>}                  
> Traceback (most recent call last):
>   File "/home-ssd/Users/evo2/evo2_offical_test/github_test_1.py", line 10, in <module>
>     evo2_model = Evo2('evo2_7b')
>                  ^^^^^^^^^^^^^^^
>   File "/home-ssd/Users/miniconda3/envs/flattn274/lib/python3.12/site-packages/evo2/models.py", line 48, in __init__
>     self.model = self.load_evo2_model(model_name, config_path)
>                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/home-ssd/Users/miniconda3/envs/flattn274/lib/python3.12/site-packages/evo2/models.py", line 255, in load_evo2_model
>     model = StripedHyena(global_config)
>             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/home-ssd/Users/miniconda3/envs/flattn274/lib/python3.12/site-packages/vortex/model/model.py", line 619, in __init__
>     self.embedding_layer = VocabParallelEmbedding(config)
>                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/home-ssd/Users/miniconda3/envs/flattn274/lib/python3.12/site-packages/vortex/model/layers.py", line 232, in __init__
>     super().__init__(
>   File "/home-ssd/Users/miniconda3/envs/flattn274/lib/python3.12/site-packages/torch/nn/modules/sparse.py", line 167, in __init__
>     torch.empty((num_embeddings, embedding_dim), **factory_kwargs),
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/home-ssd/Users/miniconda3/envs/flattn274/lib/python3.12/site-packages/torch/utils/_device.py", line 104, in __torch_function__
>     return func(*args, **kwargs)
>            ^^^^^^^^^^^^^^^^^^^^^
> RuntimeError: CUDA error: operation not supported
> Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


I have already set `use_fp8_input_projections` to `False`.
In addition, I have tried the following combinations:

* CUDA 12.8 + Torch 2.7 + FlashAttention 2.8.0.post2
* CUDA 12.8 + Torch 2.7 + FlashAttention 2.7.4.post1
* CUDA 12.8 + Torch 2.6 + FlashAttention 2.8.0.post2
* CUDA 12.4 + Torch 2.6 + FlashAttention 2.8.0.post2

The issue occurs in all of these setups.

My GPU is NVIDIA A800.

How can I resolve this? Thank you.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA error: operation not supported #170

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA error: operation not supported #170

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions