Hardened feature for importing answers from a JSON file by nonprofittechy · Pull Request #1027 · SuffolkLITLab/docassemble-AssemblyLine

nonprofittechy · 2026-03-10T19:44:59Z

Hardens the answer import functionality significantly, although I'd still caution authors that they might not need to turn this on for normal use.

fix #481

Expands the explicit deny list significantly to cover more built-in variables that we do not want to shadow
Adds an explicit set of allowed classes to load in; filters out DAFile to better prevent disclosure of system files. For now, defaults to just ALIndividual, ALPeopleList, and safe Docassemble built-in classes.
Explicitly filter out any variables that are not mentioned in the interview's context already, using same method playground uses to check what variables an interview mentions. This may be too strict, but since nobody uses JSON answers yet, I think it will probably be OK. Workaround would be for author to explicitly add a code block that sets a dummy value for these variables, so easy to workaround from author's PoV.
Checks for dangerous patterns like setting __dunder__ methods.
Limits on stack depth and JSON string size.

An interactive interview to test with:

modules:
  - .sessions
---
mandatory: True
code: |
  import json
  
  minimal_var = "minimal"

  payload = {
    "minimal_var": "I am allowed",
    "mako_var": "I am in mako",
    "field_var": "I am in field",
    "evil_var": "I am evil",
    "log": "I love logs",
    "menu_items": [
      {"hello": "Hello"}
    ],
    "url_args": {
      "hello": 123
    }
  }

  load_interview_json(json.dumps(payload))
  report = get_last_import_report()
  accepted = list(report.get("accepted", {}))
  if not accepted and type(report.get("accepted")) is dict:
      accepted = list(report["accepted"].keys())

  rejected = [r['path'] for r in report.get("rejected", [])]

  from docassemble.base.functions import current_context
  all_vars = sorted(list(_get_target_interview_variables(current_context().filename)))

  test_finished = True
---
question: "A simple page"
fields:
  - "Foo": field_var
---
question: "A simple mako"
subquestion: |
  ${ mako_var }
---
mandatory: True
question: Test Results
subquestion: |

  ${ url_args.get('hello') }
  <pre id="test_results">
  {
    "accepted": ${ json.dumps(accepted) },
    "rejected": ${ json.dumps(rejected) },
    "all_vars": ${ json.dumps(all_vars) }
  }
  </pre>

…yground's variable list

nonprofittechy · 2026-03-10T21:31:19Z

@ll-jgraszer your comments welcome too

Copilot

Pull request overview

This PR hardens the “import answers from JSON” feature in AssemblyLine by adding structural validation, strict variable/class filtering, and user-facing reporting to reduce the risk of malicious or malformed payloads (per #481).

Changes:

Added defensive JSON parsing and sanitization in sessions.load_interview_json() (limits, protected variable blocking, object/class allowlisting, class remapping, dunder/key pattern checks).
Added an import report API (get_last_import_report) and wired the saved-sessions interview flow to display warnings/rejections to the user.
Added unit tests and sample JSON fixtures covering adversarial cases and object graph handling; documented configuration options in the README.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
docassemble/AssemblyLine/sessions.py	Core hardening: parse limits, variable/class allowlists, object envelope validation, and import reporting.
docassemble/AssemblyLine/test_sessions.py	New unit tests covering malformed JSON, limit enforcement, protected vars, object allowlisting/remaps, and adversarial payloads.
docassemble/AssemblyLine/data/sources/answer_set_import_samples/*.json	Test fixtures for safe and malicious import scenarios (object graphs, dunder/internal keys, malformed JSON).
docassemble/AssemblyLine/data/questions/al_visual.yml	Adds an interview-level gate (`al_allow_answer_set_imports`) to the “Import answer set” menu item visibility logic.
docassemble/AssemblyLine/data/questions/al_settings.yml	Defines `al_allow_answer_set_imports` (default True) for interview-level control of imports.
docassemble/AssemblyLine/data/questions/al_saved_sessions.yml	Uses `is_valid_json()` for validation; calls `get_last_import_report()` and displays warnings/rejections.
docassemble/AssemblyLine/data/questions/al_document.yml	Minor string quoting consistency changes.
README.md	Adds configuration documentation for the hardened JSON import behavior and limits.

Copilot · 2026-03-10T21:45:07Z

docassemble/AssemblyLine/data/questions/al_saved_sessions.yml

+  % for item in al_sessions_last_import_report.get('rejected', []):
+  * `${ item.get('path', '?') }`: ${ item.get('reason', 'unknown reason') }
+  % endfor


al_sessions_import_rejected_template renders a bullet for every rejected item. With the current limits (max_keys up to 20k), this could generate an extremely large page and slow down rendering for a malicious/accidental large upload. Consider truncating the displayed list (e.g., first N rejections) and showing a summary count, while keeping the full report available for debugging/logging if needed.

Suggested change

% for item in al_sessions_last_import_report.get('rejected', []):

* `${ item.get('path', '?') }`: ${ item.get('reason', 'unknown reason') }

% endfor

% for item in al_sessions_last_import_report.get('rejected', [])[:50]:

* `${ item.get('path', '?') }`: ${ item.get('reason', 'unknown reason') }

% endfor

% if len(al_sessions_last_import_report.get('rejected', [])) > 50:

* ${ len(al_sessions_last_import_report.get('rejected', [])) - 50 } more variables were skipped and are not shown here.

% endif

Copilot · 2026-03-10T21:45:07Z

docassemble/AssemblyLine/sessions.py

+_last_import_report: Dict[str, Any] = {
+    "accepted": [],
+    "rejected": [],
+    "warnings": [],
+    "limits": {},
+    "contains_objects": False,
+    "remapped_classes": [],
+}


_last_import_report is stored as a module-level global, so it will be shared across concurrent Docassemble sessions/threads. That can leak one user's import report to another user and create race conditions where the UI shows the wrong report. Consider storing the report in docassemble's thread/session-local context (e.g., docassemble.base.functions.this_thread) or persisting it in the current session variables instead of a process-global.

Copilot · 2026-03-10T21:45:07Z

docassemble/AssemblyLine/sessions.py

+    allowed = get_config("assembly line", {}).get("answer set import allowed variables")
+    if not allowed:
+        return None
+    cleaned = {str(x).strip() for x in allowed if str(x).strip()}


answer set import allowed variables is assumed to be an iterable of entries. If a server config mistakenly provides a single string instead of a YAML list, this will iterate over characters and build a per-character allowlist (silently breaking imports). Recommend normalizing config values (treat str as a single-item list, ignore/raise on unexpected types) before building the set.

Suggested change

allowed = get_config("assembly line", {}).get("answer set import allowed variables")

if not allowed:

return None

cleaned = {str(x).strip() for x in allowed if str(x).strip()}

raw_allowed = get_config("assembly line", {}).get(

"answer set import allowed variables"

)

if raw_allowed is None:

return None

if isinstance(raw_allowed, str):

iterable = [raw_allowed]

elif isinstance(raw_allowed, (list, tuple, set)):

iterable = raw_allowed

else:

# Unexpected type; ignore invalid configuration and fall back to default policy.

return None

cleaned = {str(x).strip() for x in iterable if str(x).strip()}

Copilot · 2026-03-10T21:45:08Z

docassemble/AssemblyLine/sessions.py

+    configured = get_config("assembly line", {}).get(
+        "answer set import allowed object classes", []
+    )
+    for class_name in configured:


Same robustness issue as the variable allowlist: answer set import allowed object classes is iterated directly. If the config is a single string (instead of a YAML list), it will iterate character-by-character and add many invalid class names. Recommend validating the type and normalizing str to a one-element list (or rejecting invalid types) before the loop.

Suggested change

for class_name in configured:

# Normalize configured value:

# - If it's a string, treat it as a single class name.

# - If it's a list/tuple/set, iterate over its elements.

# - Otherwise, ignore it and return the defaults.

if isinstance(configured, str):

iterable = [configured]

elif isinstance(configured, (list, tuple, set)):

iterable = configured

else:

iterable = []

for class_name in iterable:

Copilot · 2026-03-10T21:45:08Z

README.md

+Default behavior:
+- Plain JSON values are imported by default, and object reconstruction is allowed only for allowlisted DAObject classes.
+- Top-level variable names must match `^[A-Za-z][A-Za-z0-9_]*$`.
+- Internal/protected variable names are blocked.
+- If `answer set import allowed variables` is not set, imports use a denylist-only policy for backwards compatibility.
+- Object payloads can be imported when classes are allowlisted; by default, known `docassemble.base` and `docassemble.AssemblyLine` DAObject descendants are allowed.


README says that when answer set import allowed variables is unset, imports use a denylist-only policy for backwards compatibility. However, the implementation also restricts imports to variables detected in the target interview (via _get_target_interview_variables) when available, which is stricter than denylist-only. Please update this section to match the actual default behavior (or adjust the code to match the documented policy).

LemmaLegal added 6 commits March 10, 2026 15:39

Starting place-hardened security for imports

e3eac36

Variable filtering to only allow variables that would show in the pla…

1935e34

…yground's variable list

Additional hardening pass

208645c

PR pre-review pass, cleanup redundant code

e5e72c1

Fix docsig error

e710691

More cleanup

4c5b904

nonprofittechy changed the title ~~Starting place-hardened security for imports~~ Hardened feature for importing answers from a JSON file Mar 10, 2026

LemmaLegal added 2 commits March 10, 2026 17:26

Give authors more control

242dd22

Format YAML

947eda3

nonprofittechy requested review from BryceStevenWilley and Copilot March 10, 2026 21:28

Copilot started reviewing on behalf of nonprofittechy March 10, 2026 21:28 View session

Try to fix mocking breaking ci/cd

fedc432

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardened feature for importing answers from a JSON file#1027

Hardened feature for importing answers from a JSON file#1027
nonprofittechy wants to merge 9 commits intomainfrom
answer-set-import-review

nonprofittechy commented Mar 10, 2026 •

edited

Loading

Uh oh!

nonprofittechy commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    allowed = get_config("assembly line", {}).get("answer set import allowed variables")
-    if not allowed:
-        return None
-    cleaned = {str(x).strip() for x in allowed if str(x).strip()}
+    raw_allowed = get_config("assembly line", {}).get(
+        "answer set import allowed variables"
+    )
+    if raw_allowed is None:
+        return None
+    if isinstance(raw_allowed, str):
+        iterable = [raw_allowed]
+    elif isinstance(raw_allowed, (list, tuple, set)):
+        iterable = raw_allowed
+    else:
+        # Unexpected type; ignore invalid configuration and fall back to default policy.
+        return None
+    cleaned = {str(x).strip() for x in iterable if str(x).strip()}

-    for class_name in configured:
+    # Normalize configured value:
+    # - If it's a string, treat it as a single class name.
+    # - If it's a list/tuple/set, iterate over its elements.
+    # - Otherwise, ignore it and return the defaults.
+    if isinstance(configured, str):
+        iterable = [configured]
+    elif isinstance(configured, (list, tuple, set)):
+        iterable = configured
+    else:
+        iterable = []
+    for class_name in iterable:

Conversation

nonprofittechy commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nonprofittechy commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nonprofittechy commented Mar 10, 2026 •

edited

Loading