diff --git a/README-details.md b/README-details.md index e004ddd..76b1bff 100644 --- a/README-details.md +++ b/README-details.md @@ -35,6 +35,7 @@ - [AI Repository by Goku Mohandas](https://www.linkedin.com/posts/asif-bhat_datascience-data-dataanalysis-activity-6643083915873615872-je6g) - [Digital Twins: Bringing artificial intelligence to Engineering](https://www.datasciencecentral.com/profiles/blogs/digital-twins-brining-artificial-intelligence-to-engineering) - See [Artificial Intelligence](./details/artificial-intelligence.md) +- See [AI Agents](./ai-agents/) ### Automation diff --git a/ai-agents/.gitkeep b/ai-agents/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/ai-agents/README.md b/ai-agents/README.md new file mode 100644 index 0000000..72bc598 --- /dev/null +++ b/ai-agents/README.md @@ -0,0 +1,86 @@ +# AI Agents + +This document outlines key concepts and components related to building AI agents. + +## APIs + +Application Programming Interfaces (APIs) are crucial for AI agents to interact with external services and data sources. Some popular APIs and libraries used in AI agent development include: + +* **Ollama:** Allows running large language models (LLMs) locally. This is beneficial for privacy, offline capabilities, and cost savings. +* **LiteLLM:** Provides a unified interface to interact with various LLM APIs (e.g., OpenAI, Cohere, Anthropic). This simplifies switching between different models and providers. + +## Context and Instructions + +Providing clear and concise context and instructions is vital for an AI agent to perform tasks accurately and efficiently. + +* **Context:** This includes relevant background information, data, and previous interactions that the agent needs to understand the current task. +* **Instructions:** These are specific commands or guidelines that tell the agent what to do, how to do it, and what constraints to follow. Well-defined instructions help in guiding the agent's behavior and ensuring desired outcomes. + +## Tool Calls + +Tool calls enable AI agents to extend their capabilities by interacting with external tools and functions. + +* **Function Calling:** LLMs can be instructed to call predefined functions or tools to perform specific actions, such as retrieving information from a database, calling an external API, or executing a piece of code. +* **Output Parsing:** The agent needs to be able to parse the output from tool calls and integrate the results back into its workflow. + +## MCP (Model-View-Controller-Presenter) + +While not a direct acronym, the concepts from software architecture patterns like Model-View-Controller (MVC) and Model-View-Presenter (MVP) can be adapted for structuring AI agents. + +* **Model:** Represents the agent's knowledge, data, and the underlying LLM. +* **View:** Handles the interaction with the user or other systems (e.g., displaying information, receiving input). +* **Controller/Presenter:** Manages the flow of information and logic between the Model and the View. It interprets user input, invokes tools, updates the model, and determines what to present to the user. This helps in separating concerns and making the agent more modular and maintainable. + +## Pydantic + +Pydantic is a Python library for data validation and settings management using Python type hints. It is highly useful in AI agent development for: + +* **Data Validation:** Ensuring that data passed to and from LLMs, tools, and APIs conforms to expected schemas. +* **Structured Output:** Defining clear and validated output structures for LLM responses, making it easier to parse and use the generated information reliably. +* **Configuration Management:** Managing agent configurations and settings in a type-safe manner. + +## MCP Integration with SSE and Stdio + +Integrating MCP-based agents with various communication channels enhances their interactivity and usability. + +* **Server-Sent Events (SSE):** SSE is a web technology that allows a server to send real-time updates to a client over a single HTTP connection. For AI agents, this means the "View" can be a web interface that receives continuous updates from the agent (e.g., streaming responses, status updates) managed by the "Controller/Presenter". This is useful for long-running tasks or when providing incremental feedback to the user. +* **Standard Input/Output (stdio):** For command-line interface (CLI) based agents, the "View" interacts with the user via stdio. The "Controller/Presenter" processes text input and sends text output back to the terminal. This is a straightforward way to interact with agents for local development, scripting, or integration with other CLI tools. + +## Anonymous Agents and Feedback Loops + +Developing robust AI agents often involves iterative improvement based on user interactions and feedback. + +* **Anonymous Agents:** These are agents that interact with users without revealing their underlying identity or specific model. This can be useful for collecting unbiased feedback or for A/B testing different agent versions. +* **Feedback Loops:** Implementing mechanisms to capture explicit and implicit feedback from users is crucial. + * **Explicit Feedback:** Users directly providing ratings, corrections, or suggestions. + * **Implicit Feedback:** Analyzing user behavior, such as task completion rates, clarifications requested, or abandonment of interactions. + This feedback is then used to refine the agent's "Model" (e.g., fine-tuning the LLM, updating knowledge bases) or its "Controller/Presenter" logic (e.g., improving instruction interpretation, tool selection). + +## Agentic AI + +Agentic AI refers to systems that can operate autonomously to achieve goals, make decisions, and take actions in an environment. Key characteristics include: + +* **Goal-oriented:** Agents are designed with specific objectives to achieve. +* **Autonomous:** They can operate without constant human intervention. +* **Perception:** They can perceive their environment through sensors or data inputs. +* **Action:** They can take actions that affect their environment or internal state. +* **Learning/Adaptation:** Many agentic systems can learn from experience and adapt their behavior over time. This often involves complex reasoning, planning, and memory capabilities. + +## Tools and Frameworks for Building AI Agents + +Several tools and frameworks have emerged to simplify the development of AI agents: + +* **Langchain:** An open-source framework for building applications with LLMs. It provides modules for managing prompts, memory, chains (sequences of calls), indexes, and agents. +* **CrewAI:** A framework for orchestrating role-playing, autonomous AI agents. It helps in creating collaborative AI crews that can work together on complex tasks. +* **Langsmith:** A platform by Langchain for debugging, testing, evaluating, and monitoring LLM applications. It provides visibility into agent behavior and helps in identifying areas for improvement. +* **Google A2A (Agents for Automation):** While specific public details might vary, Google has research and products focused on AI agents for automating tasks and processes. +* **Google GenAI (Generative AI):** Google offers a suite of Generative AI tools and models (e.g., Gemini) that can be the core "Model" component of AI agents, providing powerful language understanding and generation capabilities. This includes Vertex AI for building and deploying AI models. +* **Other tools:** Many other specialized tools exist for aspects like vector databases (e.g., Pinecone, Weaviate) for semantic search, workflow orchestration (e.g., Apache Airflow with AI plugins), and more. + +## Examples + +- [Bary's MCP Headless Gmail Server](./examples/mcp-headless-gmail.md) + +## Additional Resources + +- For a curated list of MCP frameworks, tutorials, and tools relevant to AI agent development, please see our [MCPs, Tutorials, and Tools list](./resources/). diff --git a/ai-agents/examples/mcp-headless-gmail.md b/ai-agents/examples/mcp-headless-gmail.md new file mode 100644 index 0000000..5646f1b --- /dev/null +++ b/ai-agents/examples/mcp-headless-gmail.md @@ -0,0 +1,160 @@ +## Bary's MCP Headless Gmail Server + +A specific example of an MCP (Model Context Protocol) server is Bary Huang's `mcp-headless-gmail`. This server allows AI agents to interact with Gmail for tasks like reading and sending emails without requiring local credential or token setup directly on the machine running the agent. + +**Project Repository:** [https://github.com/baryhuang/mcp-headless-gmail](https://github.com/baryhuang/mcp-headless-gmail) + +### Key Features and Advantages + +* **Headless and Remote Operation:** Designed to run in environments like Docker containers or remote servers where direct browser access for OAuth is not feasible. This is a significant advantage over solutions requiring local file access. +* **Decoupled Architecture:** The client application (e.g., an AI agent or a tool like Claude Desktop) handles the Google OAuth 2.0 flow independently. The obtained credentials (access token, refresh token, client ID, client secret) are then passed as context to this MCP server with each request. This separates credential management from the server's email processing logic. +* **Gmail Focused:** Primarily provides tools for Gmail, making it a lightweight solution if only email capabilities are needed (e.g., for marketing automation agents). +* **Docker-Ready:** Provides a Dockerfile for easy containerization and deployment. +* **Core Functionality:** + * Get recent emails (with the first 1k characters of the body). + * Get full email body content (in 1k chunks using an offset). + * Send emails. + * Refresh access tokens using a dedicated tool. + * Automatic refresh token handling by the underlying Google API client. + +### Prerequisites + +* Python 3.10 or higher for running the server directly. +* Google API Credentials: + * Client ID + * Client Secret + * Access Token + * Refresh Token + To obtain these, you need to set up a project in the Google Cloud Console, enable the Gmail API, configure the OAuth consent screen, and create OAuth 2.0 client ID credentials (typically for "Desktop app" or "Web application" depending on your client). + * **Required Scopes:** + * `https://www.googleapis.com/auth/gmail.readonly` (for reading emails) + * `https://www.googleapis.com/auth/gmail.send` (for sending emails) + +### Installation and Setup + +There are two main ways to use the server: + +1. **Running with Python (Local Development/Custom Setup):** + ```bash + git clone https://github.com/baryhuang/mcp-headless-gmail.git + cd mcp-headless-gmail + pip install -e . + # Start the server + mcp-server-headless-gmail + ``` + +2. **Running with Docker (Recommended for Production/Isolation):** + * **Build the image:** + ```bash + docker build -t mcp-headless-gmail . + ``` + * Or use the pre-built image: `buryhuang/mcp-headless-gmail:latest` + * **Run the container:** + The server within the Docker container listens for MCP requests typically via stdio. + +### Integration Guide (MCP Client Configuration) + +The `mcp-headless-gmail` server is designed to be called by an MCP client. The client needs to be configured to invoke this server for Gmail-related tools. + +**Example Configuration (Conceptual - e.g., for a client like Claude Desktop):** + +The client configuration tells it how to start and communicate with the MCP server. + +* **Using Docker:** + ```json + { + "mcpServers": { + "gmail": { + "command": "docker", + "args": [ + "run", + "-i", // Interactive, keep STDIN open + "--rm", // Automatically remove the container when it exits + "buryhuang/mcp-headless-gmail:latest" + ] + } + } + } + ``` + +* **Using `npx` (for the npm package wrapper, if available):** + ```json + { + "mcpServers": { + "gmail": { + "command": "npx", + "args": [ + "@peakmojo/mcp-server-headless-gmail" + ] + } + } + } + ``` + +**Tool Call Structure:** + +When the AI agent needs to use a Gmail tool, it will make a request structured for the MCP server. Crucially, Google API credentials must be passed in the tool call context because the server itself is stateless and doesn't store them. + +1. **Refreshing Tokens (`gmail_refresh_token` tool):** + This should be the first step or used when an access token expires. + * **Input (with existing access and refresh tokens):** + ```json + { + "google_access_token": "your_access_token", + "google_refresh_token": "your_refresh_token", + "google_client_id": "your_client_id", + "google_client_secret": "your_client_secret" + } + ``` + * **Input (if access token expired, using only refresh token):** + ```json + { + "google_refresh_token": "your_refresh_token", + "google_client_id": "your_client_id", + "google_client_secret": "your_client_secret" + } + ``` + * **Output:** Will include a new `access_token` and its `expiry_time`. + +2. **Getting Recent Emails (`gmail_get_recent_emails` tool):** + * **Input:** + ```json + { + "google_access_token": "current_valid_access_token", + "max_results": 5, + "unread_only": false + } + ``` + * **Output:** List of emails with metadata and the first 1k characters of the body. + +3. **Getting Full Email Body Content (`gmail_get_email_content` tool):** + Used when an email body is >1k characters. + * **Input:** + ```json + { + "google_access_token": "current_valid_access_token", + "message_id": "message_id_from_get_recent_emails", // or "thread_id" + "offset": 0 // increment by 1000 for subsequent chunks + } + ``` + * **Output:** Chunk of the email body. Repeat with increasing offset until `contains_full_body` is true. + +4. **Sending an Email (`gmail_send_email` tool):** + * **Input:** + ```json + { + "google_access_token": "current_valid_access_token", + "to": "recipient@example.com", + "subject": "Hello from MCP Agent", + "body": "This is the plain text body.", + "html_body": "

This is the HTML body.

" + } + ``` + +### Security Considerations + +* **Credential Handling:** The primary security benefit is that the `mcp-headless-gmail` server itself does not store your Google credentials long-term. They are passed with each request (or at least the access token is). +* **Client Responsibility:** The client application (your AI agent, Claude Desktop, etc.) is responsible for securely obtaining, storing, and managing the Google OAuth credentials (especially the refresh token). +* **Secure Communication:** Ensure that the communication channel between your AI agent and the `mcp-headless-gmail` server is secure if they are running on different machines or in potentially insecure environments. + +This server provides a valuable component for AI agents needing to interact with Gmail in a secure and flexible manner, especially in automated or headless setups. By conforming to the MCP, it can be integrated into various agent frameworks and client applications. diff --git a/ai-agents/resources/.gitkeep b/ai-agents/resources/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/ai-agents/resources/README.md b/ai-agents/resources/README.md new file mode 100644 index 0000000..e1df639 --- /dev/null +++ b/ai-agents/resources/README.md @@ -0,0 +1,84 @@ +# MCP Frameworks, Tutorials, and Tools for AI Agent Development + +This document provides a curated list of Model-Control-Presenter (MCP) frameworks, tutorials, tools, and platforms relevant to the development of AI agents. + +## MCP Frameworks + +* **[Langchain](https://www.langchain.com/)**: A comprehensive framework for developing applications powered by language models. It provides modular components for working with LLMs, including models, prompts, memory, indexes, chains, and agents, which align well with MCP concepts. +* **[CrewAI](https://www.crewai.com/)**: A framework for orchestrating role-playing, autonomous AI agents. CrewAI enables agents to collaborate to solve complex tasks, fitting an advanced MCP paradigm where multiple controller/presenter layers interact. +* **[Microsoft Autogen](https://microsoft.github.io/autogen/)**: A framework for enabling next-generation LLM applications with multi-agent conversations. Autogen allows developers to build complex workflows with multiple agents that can converse with each other and humans, embodying a distributed MCP architecture. +* **[Uagents](https://fetch.ai/docs/uea/framework/uagents/)**: A framework by Fetch.ai for building decentralized autonomous agents. It allows for the creation of agents that can perform tasks, communicate, and transact in a decentralized network. +* **[Fast-Agent](https://github.com/HumanAIGC/fastagent)**: An experimental framework for building autonomous agents with minimal code, often leveraging LLMs for decision-making and tool use. (Note: This appears to be one of several projects with similar names; link is to a prominent one.) + +## Awesome Lists + +* [awesome-mcp-servers](https://github.com/punkpeye/awesome-mcp-servers): A curated list of awesome Model Context Protocol (MCP) servers. +* [awesome-mcp-devtools](https://github.com/punkpeye/awesome-mcp-devtools): A curated list of awesome MCP developer tools, libraries, and utilities. +* [awesome-mcp-clients](https://github.com/punkpeye/awesome-mcp-clients): A curated list of awesome MCP clients and client libraries. + +## MCP Servers + +This section highlights specific open-source MCP server and client implementations, many from Bary Huang and PeakMojo, showcasing practical applications of the MCP concept for enabling AI agents to interact with various external services and systems. It also includes conceptual examples of how MCP servers can be built using different technologies. + +* **[agentic-mcp-client (PeakMojo)](https://github.com/peakmojo/agentic-mcp-client)**: A standalone agent runner that executes tasks using MCP tools via Anthropic Claude, AWS Bedrock, and OpenAI APIs. Enables autonomous agent operation in cloud environments. +* **[voice-mcp-client (Bary Huang)](https://github.com/baryhuang/voice-mcp-client)**: An iOS/MacOS Swift MCP client using voice to interact with Python MCP servers natively. +* **[mcp-remote-macos-use (Bary Huang)](https://github.com/baryhuang/mcp-remote-macos-use)**: An MCP server enabling AI to control remote macOS systems (screen sharing, keyboard/mouse input) for interaction with any macOS application. +* **[mcp-hubspot (PeakMojo/Bary Huang)](https://github.com/peakmojo/mcp-hubspot)**: MCP server for HubSpot CRM integration, allowing AI models to interact with HubSpot data. Features vector storage and caching. +* **[mcp-headless-gmail (Bary Huang)](https://github.com/baryhuang/mcp-headless-gmail)**: An MCP server for headless Gmail integration, enabling AI assistants to read, search, and send emails without direct local credential setup. (Also detailed in the main `ai-agents/README.md`). +* **[mcp-server-zoom-noauth (PeakMojo)](https://github.com/peakmojo/mcp-server-zoom-noauth)**: An MCP server for accessing Zoom recordings and transcripts without requiring direct end-user authentication. +* **[my-apple-remembers (Bary Huang)](https://github.com/baryhuang/my-apple-remembers)**: MCP server for reading and managing Apple Notes on macOS. +* **[mcp-server-any-openapi (Bary Huang)](https://github.com/baryhuang/mcp-server-any-openapi)**: An MCP server that allows AI to discover and call any API endpoint via semantic search on OpenAPI specifications. +* **[mcp-server-aws-resources-python (Bary Huang)](https://github.com/baryhuang/mcp-server-aws-resources-python)**: MCP server for AWS resource management using Python's boto3. +* **`mcp-bridge` and `mcp-bridge-compose`**: These tools, presumably related to Bary Huang's MCP work, are not found as public standalone repositories. They might be internal utilities, integrated into other projects, or examples for using MCP servers with Docker Compose. +* **Conceptual MCP Server Examples**: The MCP pattern can be implemented in various languages and frameworks. Below are conceptual examples of how such servers might be named or built. Specific public repositories for these exact names by Bary Huang or PeakMojo were not found, so these serve as illustrative patterns: + * `mcp-server-python-flask`: An MCP server built with Python and the Flask web framework. + * `mcp-server-go-grpc`: An MCP server built with Go, using gRPC for communication. + * `mcp-server-java-spring`: An MCP server built with Java and the Spring Boot framework. + * `mcp-server-node-express`: An MCP server built with Node.js and the Express.js framework. + +## Tutorials + +* **[Langchain Quickstart](https://python.langchain.com/docs/get_started/quickstart)**: The official quickstart guide for Langchain, providing a hands-on introduction to its core concepts and how to build your first LLM application. +* **[CrewAI Quickstart](https://docs.crewai.com/quickstart/)**: Official CrewAI documentation to quickly get started building multi-agent collaborative systems. +* **[Microsoft Autogen Examples](https://microsoft.github.io/autogen/docs/Examples/AutoGen-Agent)**: A collection of examples showcasing how to use Autogen for various multi-agent scenarios. +* **[Build Your First AI Agent with Uagents](https://fetch.ai/docs/uea/guides/general/intro-to-uagent-course/build-your-first-uagent/)**: A step-by-step guide to creating your first agent using the Fetch.ai Uagents framework. +* **[Introduction to Agents with Haystack](https://haystack.deepset.ai/tutorials/23_introducing_agents)**: A tutorial by deepset Haystack on how to build and use agents for question answering and task execution. +* **[Creating a Simple AI Agent with LiteLLM](https://docs.litellm.ai/docs/simple_proxy#example-usage-1)**: Demonstrates basic LiteLLM proxy usage, a step towards agent tool calling by abstracting LLM provider interactions. + +## Communities + +### Hackathons and Events + +* Luma +* KXSB +* Agentic Foundation +* Google Labs Discord +* Programmers Hangout +* AI Native Developer +* Various LinkedIn groups focused on AI and agent development. + +## Tools (General Purpose for Agent Development) + +* **[LiteLLM](https://litellm.ai/)**: Provides a unified interface to interact with various LLM APIs (OpenAI, Cohere, Anthropic, etc.). Essential for the "Model" component in an MCP architecture, allowing flexibility. +* **[Ollama](https://ollama.com/)**: Allows running large language models (LLMs) locally. Useful for developing and testing agents with local models, ensuring privacy and reducing costs. +* **[LangSmith](https://www.langchain.com/langsmith)**: A platform by Langchain for debugging, testing, evaluating, and monitoring LLM applications. Crucial for understanding and improving agent behavior. +* **[Pydantic](https://pydantic-docs.helpmanual.io/)**: A Python library for data validation and settings management using Python type hints. Extremely useful for defining schemas for tool inputs/outputs and agent configurations. +* **[FastAPI](https://fastapi.tiangolo.com/)**: A modern, fast web framework for building APIs with Python. Often used to expose agent capabilities as services. +* **[Chainlit](https://chainlit.io/)**: An open-source Python package that makes it incredibly fast to build and share AI user interfaces. Can serve as the "View" component. +* **[Gradio](https://www.gradio.app/)**: A Python library that allows you to quickly create customizable UI components for your machine learning models. Useful for creating interactive "Views". +* **[OpenWebUI](https://openwebui.com/)**: An extensible, self-hosted AI interface that supports various LLMs and operates offline. Can serve as a user-facing "View" for interacting with agents. (GitHub: [open-webui/open-webui](https://github.com/open-webui/open-webui)) +* **[Vector Databases (e.g., Pinecone, Weaviate, Chroma)](https://www.pinecone.io/)**: Tools for storing and searching vector embeddings, critical for agents needing to retrieve information from large knowledge bases. +* **[Hugging Face Transformers](https://huggingface.co/docs/transformers/index)**: Provides thousands of pre-trained models and tools to access them. Excellent for sourcing open-source models for the "Model" part of an agent. +* **[Flowise AI](https://flowiseai.com/)**: A low-code/no-code tool for building LLM applications, including agents, using a visual interface. +* **[Embedchain](https://embedchain.ai/)**: A framework that simplifies creating and managing LLM-powered bots over any dataset. It handles loading, chunking, embedding, and storing data, facilitating the "Model" or knowledge retrieval aspect for agents. +* **[Claude Desktop](https://www.anthropic.com/claude#claude-app)**: A desktop application by Anthropic for interacting with their Claude AI models. While a product, it can be used as a "View" or testbed for agentic interactions if the underlying model supports tool use. +* **AgentSpace, Jules, Cline, Roo Code**: These tools are either not widely known public projects in the AI agent development space, specific internal tools, or may refer to more generic concepts. Specific public links for AI agent frameworks/tools with these names were not readily identifiable. + +## Platforms and Cloud Services + +* **[OpenAI API](https://platform.openai.com/docs/api-reference)**: Direct access to OpenAI models like GPT-4, GPT-3.5-turbo, which are often the "Model" component in many AI agents. +* **[Google AI Studio & Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/ai-studio/overview)**: Google Cloud's platform for building, deploying, and managing ML models, including generative AI models like Gemini. Provides infrastructure and tools for the "Model" and agent deployment. +* **[Google BigQuery](https://cloud.google.com/bigquery)**: A highly scalable, serverless data warehouse. Often used as a backend for storing and analyzing data used by AI agents or generated from their operations. + +This list is not exhaustive but provides a good starting point for developers looking to build AI agents using MCP principles. +The field is rapidly evolving, so it's recommended to also follow communities and publications in the AI space for the latest developments.