Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 91 additions & 21 deletions src/arduino/app_bricks/cloud_llm/README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,109 @@
# Cloud LLM brick
# Cloud LLM Brick

This directory contains the implementation of the Cloud LLM brick, which provides an interface to interact with cloud-based Large Language Models (LLMs) through their REST API.
The Cloud LLM Brick provides a seamless interface to interact with cloud-based Large Language Models (LLMs) such as OpenAI's GPT, Anthropic's Claude, and Google's Gemini. It abstracts the complexity of REST APIs, enabling you to send prompts, receive responses, and maintain conversational context within your Arduino projects.

## Overview

The Cloud LLM brick allows users to send prompts to a specified LLM service and receive generated responses.
It can be configured to work with a curated set of LLM providers that offer RESTful APIs, notably: ChatGPT, Claude and Gemini.
This Brick acts as a gateway to powerful AI models hosted in the cloud. It is designed to handle the nuances of network communication, authentication, and session management. Whether you need a simple one-off answer or a continuous conversation with memory, the Cloud LLM Brick provides a unified API for different providers.

## Features

- **Multi-Provider Support**: Compatible with major LLM providers including Anthropic (Claude), OpenAI (GPT), and Google (Gemini).
- **Conversational Memory**: Built-in support for windowed history, allowing the AI to remember context from previous exchanges.
- **Streaming Responses**: Receive text chunks in real-time as they are generated, ideal for responsive user interfaces.
- **Configurable Behavior**: Customize system prompts, temperature (creativity), and request timeouts.
- **Simple API**: Unified `chat` and `chat_stream` methods regardless of the underlying model provider.

## Prerequisites

Before using the Cloud LLM brick, ensure you have the following:
- An account with a cloud-based LLM service (e.g., OpenAI, Cohere, etc.).
- API access credentials (API key or token) for the LLM service.
- Network connectivity to access the LLM service endpoint.
- **Internet Connection**: The board must be connected to the internet to reach the LLM provider's API.
- **API Key**: A valid API key for the chosen service (e.g., OpenAI API Key, Anthropic API Key).
- **Python Dependencies**: The Brick relies on LangChain integration packages (`langchain-anthropic`, `langchain-openai`, `langchain-google-genai`).

## Features
## Code Example and Usage

### Basic Conversation

- Send prompts to a cloud-based LLM service.
- Receive and process responses from the LLM.
- Supports both one-shot requests and memory for follow-up questions and answers.
- Supports a curated set of LLM providers.
This example initializes the Brick with an OpenAI model and performs a simple chat interaction.

## Code example and usage
Here is a basic example of how to use the Cloud LLM brick:
**Note:** The API key is not hardcoded. It is retrieved automatically from the **Brick Configuration** in App Lab.

```python
from arduino.app_bricks.cloud_llm import CloudLLM
import os
from arduino.app_bricks.cloud_llm import CloudLLM, CloudModel
from arduino.app_utils import App

llm = CloudLLM(api_key="your_api_key_here")
# Initialize the Brick (API key is loaded from configuration)
llm = CloudLLM(
model=CloudModel.OPENAI_GPT,
system_prompt="You are a helpful assistant for an IoT device."
)

App.start_bricks()
def simple_chat():
# Send a prompt and print the response
response = llm.chat("What is the capital of Italy?")
print(f"AI: {response}")

response = llm.chat("What is the capital of France?")
print(response)
# Run the application
App.run(simple_chat)
```

### Streaming with Memory

This example demonstrates how to enable conversational memory and process the response as a stream of tokens.

```python
from arduino.app_bricks.cloud_llm import CloudLLM, CloudModel
from arduino.app_utils import App

App.stop_bricks()
# Initialize with memory enabled (keeps last 10 messages)
# API Key is retrieved automatically from Brick Configuration
llm = CloudLLM(
model=CloudModel.ANTHROPIC_CLAUDE
).with_memory(max_messages=10)

def chat_loop():
while True:
user_input = input("You: ")
if user_input.lower() in ["exit", "quit"]:
break

print("AI: ", end="", flush=True)

# Stream the response token by token
for token in llm.chat_stream(user_input):
print(token, end="", flush=True)
print() # Newline after response

App.run(chat_loop)
```

## Configuration

The Brick is initialized with the following parameters:

| Parameter | Type | Default | Description |
| :-------------- | :-------------------- | :---------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------- |
| `api_key` | `str` | `os.getenv("API_KEY")` | The authentication key for the LLM provider. **Recommended:** Set this via the **Brick Configuration** menu in App Lab instead of code. |
| `model` | `str` \| `CloudModel` | `CloudModel.ANTHROPIC_CLAUDE` | The specific model to use. Accepts a `CloudModel` enum or its string value. |
| `system_prompt` | `str` | `""` | A base instruction that defines the AI's behavior and persona. |
| `temperature` | `float` | `0.7` | Controls randomness. `0.0` is deterministic, `1.0` is creative. |
| `timeout` | `int` | `30` | Maximum time (in seconds) to wait for a response. |

### Supported Models

You can select a model using the `CloudModel` enum or by passing the corresponding raw string identifier.

| Enum Constant | Raw String ID | Provider Documentation |
| :---------------------------- | :------------------------- | :-------------------------------------------------------------------------- |
| `CloudModel.ANTHROPIC_CLAUDE` | `claude-3-7-sonnet-latest` | [Anthropic Models](https://docs.anthropic.com/en/docs/about-claude/models) |
| `CloudModel.OPENAI_GPT` | `gpt-4o-mini` | [OpenAI Models](https://platform.openai.com/docs/models) |
| `CloudModel.GOOGLE_GEMINI` | `gemini-2.5-flash` | [Google Gemini Models](https://ai.google.dev/gemini-api/docs/models/gemini) |

## Methods

- **`chat(message)`**: Sends a message and returns the complete response string. Blocks until generation is finished.
- **`chat_stream(message)`**: Returns a generator yielding response tokens as they arrive.
- **`stop_stream()`**: Interrupts an active streaming generation.
- **`with_memory(max_messages)`**: Enables history tracking. `max_messages` defines the context window size.
- **`clear_memory()`**: Resets the conversation history.
99 changes: 69 additions & 30 deletions src/arduino/app_bricks/cloud_llm/cloud_llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,11 @@ class AlreadyGenerating(Exception):

@brick
class CloudLLM:
"""A simplified, opinionated wrapper for common LangChain conversational patterns.
"""A Brick for interacting with cloud-based Large Language Models (LLMs).

This class provides a single interface to manage stateless chat and chat with memory.
This class wraps LangChain functionality to provide a simplified, unified interface
for chatting with models like Claude, GPT, and Gemini. It supports both synchronous
'one-shot' responses and streaming output, with optional conversational memory.
"""

def __init__(
Expand All @@ -44,18 +46,24 @@ def __init__(
temperature: Optional[float] = 0.7,
timeout: int = 30,
):
"""Initializes the CloudLLM brick with the given configuration.
"""Initializes the CloudLLM brick with the specified provider and configuration.

Args:
api_key: The API key for the LLM service.
model: The model identifier as per LangChain specification (e.g., "anthropic:claude-3-sonnet-20240229")
or by using a CloudModels enum (e.g. CloudModels.OPENAI_GPT). Defaults to CloudModel.ANTHROPIC_CLAUDE.
system_prompt: The global system-level instruction for the AI.
temperature: The sampling temperature for response generation. Defaults to 0.7.
timeout: The maximum time to wait for a response from the LLM service, in seconds. Defaults to 30 seconds.
api_key (str): The API access key for the target LLM service. Defaults to the
'API_KEY' environment variable.
model (Union[str, CloudModel]): The model identifier. Accepts a `CloudModel`
enum member (e.g., `CloudModel.OPENAI_GPT`) or its corresponding raw string
value (e.g., `'gpt-4o-mini'`). Defaults to `CloudModel.ANTHROPIC_CLAUDE`.
system_prompt (str): A system-level instruction that defines the AI's persona
and constraints (e.g., "You are a helpful assistant"). Defaults to empty.
temperature (Optional[float]): The sampling temperature between 0.0 and 1.0.
Higher values make output more random/creative; lower values make it more
deterministic. Defaults to 0.7.
timeout (int): The maximum duration in seconds to wait for a response before
timing out. Defaults to 30.

Raises:
ValueError: If the API key is missing.
ValueError: If `api_key` is not provided (empty string).
"""
if api_key == "":
raise ValueError("API key is required to initialize CloudLLM brick.")
Expand Down Expand Up @@ -99,34 +107,34 @@ def __init__(
def with_memory(self, max_messages: int = DEFAULT_MEMORY) -> "CloudLLM":
"""Enables conversational memory for this instance.

This allows the chatbot to remember previous user and AI messages.
Calling this modifies the instance to be stateful.
Configures the Brick to retain a window of previous messages, allowing the
AI to maintain context across multiple interactions.

Args:
max_messages: The total number of past messages (user + AI) to
keep in the conversation window. Set to 0 to disable memory.
max_messages (int): The maximum number of messages (user + AI) to keep
in history. Older messages are discarded. Set to 0 to disable memory.
Defaults to 10.

Returns:
self: The current CloudLLM instance for method chaining.
CloudLLM: The current instance, allowing for method chaining.
"""
self._max_messages = max_messages

return self

def chat(self, message: str) -> str:
"""Sends a single message to the AI and gets a complete response synchronously.
"""Sends a message to the AI and blocks until the complete response is received.

This is the primary way to interact. It automatically handles memory
based on how the instance was configured.
This method automatically manages conversation history if memory is enabled.

Args:
message: The user's message.
message (str): The input text prompt from the user.

Returns:
The AI's complete response as a string.
str: The complete text response generated by the AI.

Raises:
RuntimeError: If the chat model is not initialized or if text generation fails.
RuntimeError: If the internal chain is not initialized or if the API request fails.
"""
if self._chain is None:
raise RuntimeError("CloudLLM brick is not started. Please call start() before generating text.")
Expand All @@ -137,19 +145,20 @@ def chat(self, message: str) -> str:
raise RuntimeError(f"Response generation failed: {e}")

def chat_stream(self, message: str) -> Iterator[str]:
"""Sends a single message to the AI and streams the response as a synchronous generator.
"""Sends a message to the AI and yields response tokens as they are generated.

Use this to get tokens as they are generated, perfect for a streaming UI.
This allows for processing or displaying the response in real-time (streaming).
The generation can be interrupted by calling `stop_stream()`.

Args:
message: The user's message.
message (str): The input text prompt from the user.

Yields:
str: Chunks of the AI's response as they become available.
str: Chunks of text (tokens) from the AI response.

Raises:
RuntimeError: If the chat model is not initialized or if text generation fails.
AlreadyGenerating: If the chat model is already streaming a response.
RuntimeError: If the internal chain is not initialized or if the API request fails.
AlreadyGenerating: If a streaming session is already active.
"""
if self._chain is None:
raise RuntimeError("CloudLLM brick is not started. Please call start() before generating text.")
Expand All @@ -168,18 +177,33 @@ def chat_stream(self, message: str) -> Iterator[str]:
self._keep_streaming.clear()

def stop_stream(self) -> None:
"""Signals the LLM to stop generating a response."""
"""Signals the active streaming generation to stop.

This sets an internal flag that causes the `chat_stream` iterator to break
early. It has no effect if no stream is currently running.
"""
self._keep_streaming.clear()

def clear_memory(self) -> None:
"""Clears the conversational memory.
"""Clears the conversational memory history.

This only has an effect if with_memory() has been called.
Resets the stored context. This is useful for starting a new conversation
topic without previous context interfering. Only applies if memory is enabled.
"""
if self._history:
self._history.clear()

def _get_session_history(self, session_id: str) -> WindowedChatMessageHistory:
"""Retrieves or creates the chat history for a given session.

Internal callback used by LangChain's `RunnableWithMessageHistory`.

Args:
session_id (str): The unique identifier for the session.

Returns:
WindowedChatMessageHistory: The history object managing the message window.
"""
if self._max_messages == 0:
self._history = InMemoryChatMessageHistory()
if self._history is None:
Expand All @@ -188,6 +212,21 @@ def _get_session_history(self, session_id: str) -> WindowedChatMessageHistory:


def model_factory(model_name: CloudModel, **kwargs) -> BaseChatModel:
"""Factory function to instantiate the specific LangChain chat model.

This function maps the supported `CloudModel` enum values to their respective
LangChain implementations.

Args:
model_name (CloudModel): The enum or string identifier for the model.
**kwargs: Additional arguments passed to the model constructor (e.g., api_key, temperature).

Returns:
BaseChatModel: An instance of a LangChain chat model wrapper.

Raises:
ValueError: If `model_name` does not match one of the supported `CloudModel` options.
"""
if model_name == CloudModel.ANTHROPIC_CLAUDE:
from langchain_anthropic import ChatAnthropic

Expand Down