Update Cloud LLM brick docs (#27)

91volt · web-flow · commit f1d346d5ce0f · 2025-11-26T13:07:29.000+01:00
diff --git a/src/arduino/app_bricks/cloud_llm/README.md b/src/arduino/app_bricks/cloud_llm/README.md
@@ -1,39 +1,109 @@
-# Cloud LLM brick
+# Cloud LLM Brick
 
-This directory contains the implementation of the Cloud LLM brick, which provides an interface to interact with cloud-based Large Language Models (LLMs) through their REST API.
+The Cloud LLM Brick provides a seamless interface to interact with cloud-based Large Language Models (LLMs) such as OpenAI's GPT, Anthropic's Claude, and Google's Gemini. It abstracts the complexity of REST APIs, enabling you to send prompts, receive responses, and maintain conversational context within your Arduino projects.
 
 ## Overview
 
-The Cloud LLM brick allows users to send prompts to a specified LLM service and receive generated responses.
-It can be configured to work with a curated set of LLM providers that offer RESTful APIs, notably: ChatGPT, Claude and Gemini.
+This Brick acts as a gateway to powerful AI models hosted in the cloud. It is designed to handle the nuances of network communication, authentication, and session management. Whether you need a simple one-off answer or a continuous conversation with memory, the Cloud LLM Brick provides a unified API for different providers.
+
+## Features
+
+- **Multi-Provider Support**: Compatible with major LLM providers including Anthropic (Claude), OpenAI (GPT), and Google (Gemini).
+- **Conversational Memory**: Built-in support for windowed history, allowing the AI to remember context from previous exchanges.
+- **Streaming Responses**: Receive text chunks in real-time as they are generated, ideal for responsive user interfaces.
+- **Configurable Behavior**: Customize system prompts, temperature (creativity), and request timeouts.
+- **Simple API**: Unified `chat` and `chat_stream` methods regardless of the underlying model provider.
 
 ## Prerequisites
 
-Before using the Cloud LLM brick, ensure you have the following:
-- An account with a cloud-based LLM service (e.g., OpenAI, Cohere, etc.).
-- API access credentials (API key or token) for the LLM service.
-- Network connectivity to access the LLM service endpoint.
+- **Internet Connection**: The board must be connected to the internet to reach the LLM provider's API.
+- **API Key**: A valid API key for the chosen service (e.g., OpenAI API Key, Anthropic API Key).
+- **Python Dependencies**: The Brick relies on LangChain integration packages (`langchain-anthropic`, `langchain-openai`, `langchain-google-genai`).
 
-## Features
+## Code Example and Usage
+
+### Basic Conversation
 
-- Send prompts to a cloud-based LLM service.
-- Receive and process responses from the LLM.
-- Supports both one-shot requests and memory for follow-up questions and answers.
-- Supports a curated set of LLM providers.
+This example initializes the Brick with an OpenAI model and performs a simple chat interaction. 
 
-## Code example and usage
-Here is a basic example of how to use the Cloud LLM brick:
+**Note:** The API key is not hardcoded. It is retrieved automatically from the **Brick Configuration** in App Lab.
 
 ```python
-from arduino.app_bricks.cloud_llm import CloudLLM
+import os
+from arduino.app_bricks.cloud_llm import CloudLLM, CloudModel
 from arduino.app_utils import App
 
-llm = CloudLLM(api_key="your_api_key_here")
+# Initialize the Brick (API key is loaded from configuration)
+llm = CloudLLM(
+    model=CloudModel.OPENAI_GPT,
+    system_prompt="You are a helpful assistant for an IoT device."
+)
 
-App.start_bricks()
+def simple_chat():
+    # Send a prompt and print the response
+    response = llm.chat("What is the capital of Italy?")
+    print(f"AI: {response}")
 
-response = llm.chat("What is the capital of France?")
-print(response)
+# Run the application
+App.run(simple_chat)
+```
+
+### Streaming with Memory
+
+This example demonstrates how to enable conversational memory and process the response as a stream of tokens.
+
+```python
+from arduino.app_bricks.cloud_llm import CloudLLM, CloudModel
+from arduino.app_utils import App
 
-App.stop_bricks()
+# Initialize with memory enabled (keeps last 10 messages)
+# API Key is retrieved automatically from Brick Configuration
+llm = CloudLLM(
+    model=CloudModel.ANTHROPIC_CLAUDE
+).with_memory(max_messages=10)
+
+def chat_loop():
+    while True:
+        user_input = input("You: ")
+        if user_input.lower() in ["exit", "quit"]:
+            break
+            
+        print("AI: ", end="", flush=True)
+        
+        # Stream the response token by token
+        for token in llm.chat_stream(user_input):
+            print(token, end="", flush=True)
+        print() # Newline after response
+
+App.run(chat_loop)
 ```
+
+## Configuration
+
+The Brick is initialized with the following parameters:
+
+| Parameter       | Type                  | Default                       | Description                                                                                                                              |
+| :-------------- | :-------------------- | :---------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------- |
+| `api_key`       | `str`                 | `os.getenv("API_KEY")`        | The authentication key for the LLM provider. **Recommended:** Set this via the **Brick Configuration** menu in App Lab instead of code. |
+| `model`         | `str` \| `CloudModel` | `CloudModel.ANTHROPIC_CLAUDE` | The specific model to use. Accepts a `CloudModel` enum or its string value.                                                              |
+| `system_prompt` | `str`                 | `""`                          | A base instruction that defines the AI's behavior and persona.                                                                           |
+| `temperature`   | `float`               | `0.7`                         | Controls randomness. `0.0` is deterministic, `1.0` is creative.                                                                          |
+| `timeout`       | `int`                 | `30`                          | Maximum time (in seconds) to wait for a response.                                                                                        |
+
+### Supported Models
+
+You can select a model using the `CloudModel` enum or by passing the corresponding raw string identifier.
+
+| Enum Constant                 | Raw String ID              | Provider Documentation                                                      |
+| :---------------------------- | :------------------------- | :-------------------------------------------------------------------------- |
+| `CloudModel.ANTHROPIC_CLAUDE` | `claude-3-7-sonnet-latest` | [Anthropic Models](https://docs.anthropic.com/en/docs/about-claude/models)  |
+| `CloudModel.OPENAI_GPT`       | `gpt-4o-mini`              | [OpenAI Models](https://platform.openai.com/docs/models)                    |
+| `CloudModel.GOOGLE_GEMINI`    | `gemini-2.5-flash`         | [Google Gemini Models](https://ai.google.dev/gemini-api/docs/models/gemini) |
+
+## Methods
+
+- **`chat(message)`**: Sends a message and returns the complete response string. Blocks until generation is finished.
+- **`chat_stream(message)`**: Returns a generator yielding response tokens as they arrive.
+- **`stop_stream()`**: Interrupts an active streaming generation.
+- **`with_memory(max_messages)`**: Enables history tracking. `max_messages` defines the context window size.
+- **`clear_memory()`**: Resets the conversation history.
diff --git a/src/arduino/app_bricks/cloud_llm/cloud_llm.py b/src/arduino/app_bricks/cloud_llm/cloud_llm.py
@@ -31,9 +31,11 @@ class AlreadyGenerating(Exception):
 
 @brick
 class CloudLLM:
-    """A simplified, opinionated wrapper for common LangChain conversational patterns.
+    """A Brick for interacting with cloud-based Large Language Models (LLMs).
 
-    This class provides a single interface to manage stateless chat and chat with memory.
+    This class wraps LangChain functionality to provide a simplified, unified interface
+    for chatting with models like Claude, GPT, and Gemini. It supports both synchronous
+    'one-shot' responses and streaming output, with optional conversational memory.
     """
 
     def __init__(
@@ -44,18 +46,24 @@ def __init__(
         temperature: Optional[float] = 0.7,
         timeout: int = 30,
     ):
-        """Initializes the CloudLLM brick with the given configuration.
+        """Initializes the CloudLLM brick with the specified provider and configuration.
 
         Args:
-            api_key: The API key for the LLM service.
-            model: The model identifier as per LangChain specification (e.g., "anthropic:claude-3-sonnet-20240229")
-                   or by using a CloudModels enum (e.g. CloudModels.OPENAI_GPT). Defaults to CloudModel.ANTHROPIC_CLAUDE.
-            system_prompt: The global system-level instruction for the AI.
-            temperature: The sampling temperature for response generation. Defaults to 0.7.
-            timeout: The maximum time to wait for a response from the LLM service, in seconds. Defaults to 30 seconds.
+            api_key (str): The API access key for the target LLM service. Defaults to the
+                'API_KEY' environment variable.
+            model (Union[str, CloudModel]): The model identifier. Accepts a `CloudModel`
+                enum member (e.g., `CloudModel.OPENAI_GPT`) or its corresponding raw string
+                value (e.g., `'gpt-4o-mini'`). Defaults to `CloudModel.ANTHROPIC_CLAUDE`.
+            system_prompt (str): A system-level instruction that defines the AI's persona
+                and constraints (e.g., "You are a helpful assistant"). Defaults to empty.
+            temperature (Optional[float]): The sampling temperature between 0.0 and 1.0.
+                Higher values make output more random/creative; lower values make it more
+                deterministic. Defaults to 0.7.
+            timeout (int): The maximum duration in seconds to wait for a response before
+                timing out. Defaults to 30.
 
         Raises:
-            ValueError: If the API key is missing.
+            ValueError: If `api_key` is not provided (empty string).
         """
         if api_key == "":
             raise ValueError("API key is required to initialize CloudLLM brick.")
@@ -99,34 +107,34 @@ def __init__(
     def with_memory(self, max_messages: int = DEFAULT_MEMORY) -> "CloudLLM":
         """Enables conversational memory for this instance.
 
-        This allows the chatbot to remember previous user and AI messages.
-        Calling this modifies the instance to be stateful.
+        Configures the Brick to retain a window of previous messages, allowing the
+        AI to maintain context across multiple interactions.
 
         Args:
-            max_messages: The total number of past messages (user + AI) to
-                          keep in the conversation window. Set to 0 to disable memory.
+            max_messages (int): The maximum number of messages (user + AI) to keep
+                in history. Older messages are discarded. Set to 0 to disable memory.
+                Defaults to 10.
 
         Returns:
-            self: The current CloudLLM instance for method chaining.
+            CloudLLM: The current instance, allowing for method chaining.
         """
         self._max_messages = max_messages
 
         return self
 
     def chat(self, message: str) -> str:
-        """Sends a single message to the AI and gets a complete response synchronously.
+        """Sends a message to the AI and blocks until the complete response is received.
 
-        This is the primary way to interact. It automatically handles memory
-        based on how the instance was configured.
+        This method automatically manages conversation history if memory is enabled.
 
         Args:
-            message: The user's message.
+            message (str): The input text prompt from the user.
 
         Returns:
-            The AI's complete response as a string.
+            str: The complete text response generated by the AI.
 
         Raises:
-            RuntimeError: If the chat model is not initialized or if text generation fails.
+            RuntimeError: If the internal chain is not initialized or if the API request fails.
         """
         if self._chain is None:
             raise RuntimeError("CloudLLM brick is not started. Please call start() before generating text.")
@@ -137,19 +145,20 @@ def chat(self, message: str) -> str:
             raise RuntimeError(f"Response generation failed: {e}")
 
     def chat_stream(self, message: str) -> Iterator[str]:
-        """Sends a single message to the AI and streams the response as a synchronous generator.
+        """Sends a message to the AI and yields response tokens as they are generated.
 
-        Use this to get tokens as they are generated, perfect for a streaming UI.
+        This allows for processing or displaying the response in real-time (streaming).
+        The generation can be interrupted by calling `stop_stream()`.
 
         Args:
-            message: The user's message.
+            message (str): The input text prompt from the user.
 
         Yields:
-            str: Chunks of the AI's response as they become available.
+            str: Chunks of text (tokens) from the AI response.
 
         Raises:
-            RuntimeError: If the chat model is not initialized or if text generation fails.
-            AlreadyGenerating: If the chat model is already streaming a response.
+            RuntimeError: If the internal chain is not initialized or if the API request fails.
+            AlreadyGenerating: If a streaming session is already active.
         """
         if self._chain is None:
             raise RuntimeError("CloudLLM brick is not started. Please call start() before generating text.")
@@ -168,18 +177,33 @@ def chat_stream(self, message: str) -> Iterator[str]:
             self._keep_streaming.clear()
 
     def stop_stream(self) -> None:
-        """Signals the LLM to stop generating a response."""
+        """Signals the active streaming generation to stop.
+
+        This sets an internal flag that causes the `chat_stream` iterator to break
+        early. It has no effect if no stream is currently running.
+        """
         self._keep_streaming.clear()
 
     def clear_memory(self) -> None:
-        """Clears the conversational memory.
+        """Clears the conversational memory history.
 
-        This only has an effect if with_memory() has been called.
+        Resets the stored context. This is useful for starting a new conversation
+        topic without previous context interfering. Only applies if memory is enabled.
         """
         if self._history:
             self._history.clear()
 
     def _get_session_history(self, session_id: str) -> WindowedChatMessageHistory:
+        """Retrieves or creates the chat history for a given session.
+
+        Internal callback used by LangChain's `RunnableWithMessageHistory`.
+
+        Args:
+            session_id (str): The unique identifier for the session.
+
+        Returns:
+            WindowedChatMessageHistory: The history object managing the message window.
+        """
         if self._max_messages == 0:
             self._history = InMemoryChatMessageHistory()
         if self._history is None:
@@ -188,6 +212,21 @@ def _get_session_history(self, session_id: str) -> WindowedChatMessageHistory:
 
 
 def model_factory(model_name: CloudModel, **kwargs) -> BaseChatModel:
+    """Factory function to instantiate the specific LangChain chat model.
+
+    This function maps the supported `CloudModel` enum values to their respective
+    LangChain implementations.
+
+    Args:
+        model_name (CloudModel): The enum or string identifier for the model.
+        **kwargs: Additional arguments passed to the model constructor (e.g., api_key, temperature).
+
+    Returns:
+        BaseChatModel: An instance of a LangChain chat model wrapper.
+
+    Raises:
+        ValueError: If `model_name` does not match one of the supported `CloudModel` options.
+    """
     if model_name == CloudModel.ANTHROPIC_CLAUDE:
         from langchain_anthropic import ChatAnthropic