LLaVA-Gemma processor is bugged

### System Info

- `transformers` version: 4.56.1
- Platform: Linux-5.14.0-503.40.1.el9_5.x86_64-x86_64-with-glibc2.34
- Python version: 3.12.11
- Huggingface_hub version: 0.35.0
- Safetensors version: 0.5.3
- Accelerate version: 1.10.1
- Accelerate config:    not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.7.1+cu128 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: Yes
- GPU type: NVIDIA B200

### Who can help?

I think @zucchini-nlp can help with this.

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Run the following (official) code snippet from the [model card](https://huggingface.co/Intel/llava-gemma-2b): 

```
import requests
from PIL import Image
from transformers import (
  LlavaForConditionalGeneration,
  AutoTokenizer,
  AutoProcessor,
  CLIPImageProcessor
)
#In this repo, needed for version < 4.41.1
#from processing_llavagemma import LlavaGemmaProcessor
#processor = LlavaGemmaProcessor( tokenizer=AutoTokenizer.from_pretrained(checkpoint), image_processor=CLIPImageProcessor.from_pretrained(checkpoint))

checkpoint = "Intel/llava-gemma-2b"

# Load model
model = LlavaForConditionalGeneration.from_pretrained(checkpoint)
processor = AutoProcessor.from_pretrained(checkpoint)

# Prepare inputs
# Use gemma chat template
prompt = processor.tokenizer.apply_chat_template(
    [{'role': 'user', 'content': "<image>\nWhat's the content of the image?"}],
    tokenize=False,
    add_generation_prompt=True
)
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=prompt, images=image, return_tensors="pt")

# Generate
generate_ids = model.generate(**inputs, max_length=30)
output = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(output)
```

The following error will show up:
```
TypeError                                 Traceback (most recent call last)
Cell In[46], [line 32](vscode-notebook-cell:?execution_count=46&line=32)
     30 url = "https://www.ilankelman.org/stopsigns/australia.jpg"
     31 image = Image.open(requests.get(url, stream=True).raw)
---> [32](vscode-notebook-cell:?execution_count=46&line=32) inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)
     34 # Generate
     35 generate_ids = model.generate(**inputs)

File .../transformers/models/llava/processing_llava.py:156, in LlavaProcessor.__call__(self, images, text, audio, videos, **kwargs)
    154 pixel_values = image_inputs["pixel_values"]
    155 height, width = get_image_size(to_numpy_array(pixel_values[0]))
--> [156](.../transformers/models/llava/processing_llava.py:156) num_image_tokens = (height // self.patch_size) * (
    157     width // self.patch_size
    158 ) + self.num_additional_image_tokens
    159 if self.vision_feature_select_strategy == "default":
    160     num_image_tokens -= 1

TypeError: unsupported operand type(s) for //: 'int' and 'NoneType'
```

Comparing this repo and the LLaVA 1.5 repo, it seems that the `processor_config.json` file is missing. Adding the following code makes the model (seemingly) work:

```
processor.num_additional_image_tokens = 1
processor.patch_size = 14
processor.vision_feature_select_strategy = "default"
```

### Expected behavior

The code should run as-is without errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLaVA-Gemma processor is bugged #41206

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LLaVA-Gemma processor is bugged #41206

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions