Speed up inference performance for attention and ssm layers with caching

# 🎯 **Goal (What & Why)**
**Speed up inference performance for attention layers with KV caching**

Implemented based on functionality in Hugging Face Transformers. Needs to support data, model, and sequence parallelism.


# 🚀 **Execution Plan**

### **Step 1**
support range of positions starting from > 0 in fast-llm inference which is needed for HF generate with cache as they send only the last token to model forward

### **Step 2: W**  
implement KV cache for HF generte

# 📌 **Acceptance Criteria** (Must-Haves for Completion)
* The feature must be **functional and tested**.  
* The implementation must be **documented in practical terms**.  
* The PR must include a **performance/impact summary**.  
* **No refactors unless directly necessary** for feature completion.  

# 🛠️ **Project Management**
- [x] **Assign the project to the Fast-LLM project.**
- [ ] **Set the `Estimate` field (in days) in the GitHub project.**
- [ ] **Use the `Size` field to categorize the PR size (Small/Medium/Large).**
- [ ] **Assign an owner when opening the issue.**  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up inference performance for attention and ssm layers with caching #279

🎯 Goal (What & Why)

🚀 Execution Plan

Step 1

Step 2: W

📌 Acceptance Criteria (Must-Haves for Completion)

🛠️ Project Management

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speed up inference performance for attention and ssm layers with caching #279

Description

🎯 Goal (What & Why)

🚀 Execution Plan

Step 1

Step 2: W

📌 Acceptance Criteria (Must-Haves for Completion)

🛠️ Project Management

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions