Skip to content

Conversation

@gadenbuie
Copy link
Collaborator

@gadenbuie gadenbuie commented Jun 2, 2025

Adds a minimal memory system using a single YAML file to store stable, user-provided facts about the analysis problem and dataset characteristics that won't change during the project lifecycle.

The memory is stored in a project-level YAML file (typically btw-memory.yaml):

project_context:
  problem_description: <string>       # High-level description of the business problem to solve
  objectives: <array[string]>         # Specific analysis goals and questions to answer
  success_criteria: <array[string]>   # How to measure if the analysis was successful
  constraints: <array[string]>        # Limitations, requirements, or restrictions for the analysis
  business_context: <array[string]>   # Domain knowledge, organizational context, or background information

data_sources: <array[DataSource]>
  # where DataSource:
  - name: <string>                    # Unique identifier for the dataset
    description: <string>             # Human-readable description of the dataset
    source: <string>                  # File path, SQL table name, or source description
    code: <string|multiline>          # R code to load this dataset
    notes: <array[string]>            # Contextual notes about this dataset
    variables: <array[Variable]>      # Array of variable metadata
      # where Variable:
      - name: <string>                # Variable/column name
        notes: <array[string]>        # Array of contextual notes about this variable

This PR currently implements the project context memory; I've implemented parts of the data sources memory locally. But as a first experiment we can merge without the data sources memory.

Tool Functions

Project Context Tools

  • btw_tool_memory_project_context_add(key, content) - Append to project context section
  • btw_tool_memory_project_context_read(key) - Read project context (all or specific key)
  • btw_tool_memory_project_context_replace(key, contents) - Replace entire project context section

Data Source Tools

  • btw_tool_memory_data_source_add(name, key, content) - Append to datasets or variables
  • btw_tool_memory_data_source_read(name, key) - Read data sources (all or specific name/key)
  • btw_tool_memory_data_source_replace(name, key, contents) - Replace entire data sources section
  • btw_tool_memory_data_source_variable_add(data_source_name, variable_name, note) - Add notes to a specific variable
  • btw_tool_memory_data_source_variable_replace(data_source_name, variable_name, notes) - Read notes for a specific variable

Copy link
Collaborator

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very thorough for now, but will come back to this with a fresh mind tomorrow morning.

My initial expectation was that if I launched a chat and asked to add something to memory (without creating the file myself), the model would be able to use a tool to create a btw-memory.yaml file. Instead, I saw:

Warning: Failed to evaluate 2 tool calls.
✖ [btw_tool_memory_project_context_add
  (toolu_01EA9ipXAxAHXAJBLDBc2yXY)]: cannot coerce type 'closure' to
  vector of type 'character'

The @contents from that tool result:

<ellmer::ContentToolResult>
 @ value  : NULL
 @ error  :List of 2
 .. $ message: chr "cannot coerce type 'closure' to vector of type 'character'"
 .. $ call   : language paste0(before, x, after)
 .. - attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
 @ extra  : list()
 @ request: <ellmer::ContentToolRequest>
 .. @ id       : chr "toolu_0143eU9keVgmG3wdi1ZQwyDm"
 .. @ name     : chr "btw_tool_memory_project_context_add"
 .. @ arguments:List of 3
 .. .. $ key    : chr "problem_description"
 .. .. $ content:List of 1
 .. ..  ..$ : chr "Develop the btw package with its source code located in the current directory."
 .. .. $ intent : chr "Storing information about the btw package development project"

@gadenbuie
Copy link
Collaborator Author

My initial expectation was that if I launched a chat and asked to add something to memory (without creating the file myself), the model would be able to use a tool to create a btw-memory.yaml file.

Totally! I did some refactoring before I committed and missed the "memory file doesn't exist yet" case in the _add and _replace() verbs. Should be fixed for you for tomorrow.

Copy link
Collaborator

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a couple lines somewhere in the documentation that clarifies the distinction between btw.md and btw's memory? Before playing with the PR, I had anticipated that "memory" would have a meaning like Claude Code's, where the tool can just integrate content into the claude.md file.

claude_memory

}
}

yaml::write_yaml(data, path, indent.mapping.sequence = TRUE, indent = 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think of this file as only to be edited by the model? Should there be some sort of cautionary comment at the top?

# Generated by btw: do not edit by hand

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I ask is because the formatting here is quite precise (for good reason). Feels like, if a user wants the model to know something, they ought to drop it in btw.md?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I did want this file to be human editable (hence YAML and not JSON), but you're right that there's a precise structure to follow. I'm thinking we could a comment header in this file explaining the structure and also letting people know that the file may be overwritten by the tool – the biggest consequence is that the yaml package can't preserve comments in a round trip read/write (so we'd always inject the standard instructions header).

}


path_find_btw_memory <- function(path = NULL, must_exist = TRUE) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that in the list_files tool, we've just considered the working directory as the project directory:

btw_tool_files_list_files("../")
Condition
Error in `check_path_within_current_wd()`:
! You are not allowed to list or read files outside of the project directory. Make sure that `path` is relative to the current working directory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True but there are slightly different use cases. We use the "project directory" idea when looking for btw.md, but we don't want the list_files tool to access any file on the user's machine. Hence the working directory and below constraint on the list files tool.

@gadenbuie
Copy link
Collaborator Author

btw.md and btw's memory? Before playing with the PR, I had anticipated that "memory" would have a meaning like Claude Code's, where the tool can just integrate content into the claude.md file.

Yeah great point. I considered this kind of approach, but I felt YAML was a better choice because it makes it easier for us to read, write and update just parts of the memory without resorting to parsing the markdown file.

My other feeling is that as a support primarily for EDA, there's a lot of refinement that happens where both the user and the LLM start out not knowing much about the data and can learn together. My sense is that we wouldn't want to throw the whole memory at the LLM in the system prompt, but I could be wrong about that. Or maybe we want to include parts of the memory in the system prompt and the YAML file helps us do that filtering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants