Skip to content

Hardened feature for importing answers from a JSON file#1027

Draft
nonprofittechy wants to merge 9 commits intomainfrom
answer-set-import-review
Draft

Hardened feature for importing answers from a JSON file#1027
nonprofittechy wants to merge 9 commits intomainfrom
answer-set-import-review

Conversation

@nonprofittechy
Copy link
Member

@nonprofittechy nonprofittechy commented Mar 10, 2026

Hardens the answer import functionality significantly, although I'd still caution authors that they might not need to turn this on for normal use.

fix #481

  1. Expands the explicit deny list significantly to cover more built-in variables that we do not want to shadow
  2. Adds an explicit set of allowed classes to load in; filters out DAFile to better prevent disclosure of system files. For now, defaults to just ALIndividual, ALPeopleList, and safe Docassemble built-in classes.
  3. Explicitly filter out any variables that are not mentioned in the interview's context already, using same method playground uses to check what variables an interview mentions. This may be too strict, but since nobody uses JSON answers yet, I think it will probably be OK. Workaround would be for author to explicitly add a code block that sets a dummy value for these variables, so easy to workaround from author's PoV.
  4. Checks for dangerous patterns like setting __dunder__ methods.
  5. Limits on stack depth and JSON string size.

An interactive interview to test with:

modules:
  - .sessions
---
mandatory: True
code: |
  import json
  
  minimal_var = "minimal"

  payload = {
    "minimal_var": "I am allowed",
    "mako_var": "I am in mako",
    "field_var": "I am in field",
    "evil_var": "I am evil",
    "log": "I love logs",
    "menu_items": [
      {"hello": "Hello"}
    ],
    "url_args": {
      "hello": 123
    }
  }

  load_interview_json(json.dumps(payload))
  report = get_last_import_report()
  accepted = list(report.get("accepted", {}))
  if not accepted and type(report.get("accepted")) is dict:
      accepted = list(report["accepted"].keys())

  rejected = [r['path'] for r in report.get("rejected", [])]

  from docassemble.base.functions import current_context
  all_vars = sorted(list(_get_target_interview_variables(current_context().filename)))

  test_finished = True
---
question: "A simple page"
fields:
  - "Foo": field_var
---
question: "A simple mako"
subquestion: |
  ${ mako_var }
---
mandatory: True
question: Test Results
subquestion: |

  ${ url_args.get('hello') }
  <pre id="test_results">
  {
    "accepted": ${ json.dumps(accepted) },
    "rejected": ${ json.dumps(rejected) },
    "all_vars": ${ json.dumps(all_vars) }
  }
  </pre>

@nonprofittechy nonprofittechy changed the title Starting place-hardened security for imports Hardened feature for importing answers from a JSON file Mar 10, 2026
@nonprofittechy
Copy link
Member Author

@ll-jgraszer your comments welcome too

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the “import answers from JSON” feature in AssemblyLine by adding structural validation, strict variable/class filtering, and user-facing reporting to reduce the risk of malicious or malformed payloads (per #481).

Changes:

  • Added defensive JSON parsing and sanitization in sessions.load_interview_json() (limits, protected variable blocking, object/class allowlisting, class remapping, dunder/key pattern checks).
  • Added an import report API (get_last_import_report) and wired the saved-sessions interview flow to display warnings/rejections to the user.
  • Added unit tests and sample JSON fixtures covering adversarial cases and object graph handling; documented configuration options in the README.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
docassemble/AssemblyLine/sessions.py Core hardening: parse limits, variable/class allowlists, object envelope validation, and import reporting.
docassemble/AssemblyLine/test_sessions.py New unit tests covering malformed JSON, limit enforcement, protected vars, object allowlisting/remaps, and adversarial payloads.
docassemble/AssemblyLine/data/sources/answer_set_import_samples/*.json Test fixtures for safe and malicious import scenarios (object graphs, dunder/internal keys, malformed JSON).
docassemble/AssemblyLine/data/questions/al_visual.yml Adds an interview-level gate (al_allow_answer_set_imports) to the “Import answer set” menu item visibility logic.
docassemble/AssemblyLine/data/questions/al_settings.yml Defines al_allow_answer_set_imports (default True) for interview-level control of imports.
docassemble/AssemblyLine/data/questions/al_saved_sessions.yml Uses is_valid_json() for validation; calls get_last_import_report() and displays warnings/rejections.
docassemble/AssemblyLine/data/questions/al_document.yml Minor string quoting consistency changes.
README.md Adds configuration documentation for the hardened JSON import behavior and limits.

Comment on lines +250 to +252
% for item in al_sessions_last_import_report.get('rejected', []):
* `${ item.get('path', '?') }`: ${ item.get('reason', 'unknown reason') }
% endfor
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

al_sessions_import_rejected_template renders a bullet for every rejected item. With the current limits (max_keys up to 20k), this could generate an extremely large page and slow down rendering for a malicious/accidental large upload. Consider truncating the displayed list (e.g., first N rejections) and showing a summary count, while keeping the full report available for debugging/logging if needed.

Suggested change
% for item in al_sessions_last_import_report.get('rejected', []):
* `${ item.get('path', '?') }`: ${ item.get('reason', 'unknown reason') }
% endfor
% for item in al_sessions_last_import_report.get('rejected', [])[:50]:
* `${ item.get('path', '?') }`: ${ item.get('reason', 'unknown reason') }
% endfor
% if len(al_sessions_last_import_report.get('rejected', [])) > 50:
* ${ len(al_sessions_last_import_report.get('rejected', [])) - 50 } more variables were skipped and are not shown here.
% endif

Copilot uses AI. Check for mistakes.
Comment on lines +273 to +280
_last_import_report: Dict[str, Any] = {
"accepted": [],
"rejected": [],
"warnings": [],
"limits": {},
"contains_objects": False,
"remapped_classes": [],
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_last_import_report is stored as a module-level global, so it will be shared across concurrent Docassemble sessions/threads. That can leak one user's import report to another user and create race conditions where the UI shows the wrong report. Consider storing the report in docassemble's thread/session-local context (e.g., docassemble.base.functions.this_thread) or persisting it in the current session variables instead of a process-global.

Copilot uses AI. Check for mistakes.
Comment on lines +385 to +388
allowed = get_config("assembly line", {}).get("answer set import allowed variables")
if not allowed:
return None
cleaned = {str(x).strip() for x in allowed if str(x).strip()}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answer set import allowed variables is assumed to be an iterable of entries. If a server config mistakenly provides a single string instead of a YAML list, this will iterate over characters and build a per-character allowlist (silently breaking imports). Recommend normalizing config values (treat str as a single-item list, ignore/raise on unexpected types) before building the set.

Suggested change
allowed = get_config("assembly line", {}).get("answer set import allowed variables")
if not allowed:
return None
cleaned = {str(x).strip() for x in allowed if str(x).strip()}
raw_allowed = get_config("assembly line", {}).get(
"answer set import allowed variables"
)
if raw_allowed is None:
return None
if isinstance(raw_allowed, str):
iterable = [raw_allowed]
elif isinstance(raw_allowed, (list, tuple, set)):
iterable = raw_allowed
else:
# Unexpected type; ignore invalid configuration and fall back to default policy.
return None
cleaned = {str(x).strip() for x in iterable if str(x).strip()}

Copilot uses AI. Check for mistakes.
configured = get_config("assembly line", {}).get(
"answer set import allowed object classes", []
)
for class_name in configured:
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same robustness issue as the variable allowlist: answer set import allowed object classes is iterated directly. If the config is a single string (instead of a YAML list), it will iterate character-by-character and add many invalid class names. Recommend validating the type and normalizing str to a one-element list (or rejecting invalid types) before the loop.

Suggested change
for class_name in configured:
# Normalize configured value:
# - If it's a string, treat it as a single class name.
# - If it's a list/tuple/set, iterate over its elements.
# - Otherwise, ignore it and return the defaults.
if isinstance(configured, str):
iterable = [configured]
elif isinstance(configured, (list, tuple, set)):
iterable = configured
else:
iterable = []
for class_name in iterable:

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +88
Default behavior:
- Plain JSON values are imported by default, and object reconstruction is allowed only for allowlisted DAObject classes.
- Top-level variable names must match `^[A-Za-z][A-Za-z0-9_]*$`.
- Internal/protected variable names are blocked.
- If `answer set import allowed variables` is not set, imports use a denylist-only policy for backwards compatibility.
- Object payloads can be imported when classes are allowlisted; by default, known `docassemble.base` and `docassemble.AssemblyLine` DAObject descendants are allowed.
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README says that when answer set import allowed variables is unset, imports use a denylist-only policy for backwards compatibility. However, the implementation also restricts imports to variables detected in the target interview (via _get_target_interview_variables) when available, which is stricter than denylist-only. Please update this section to match the actual default behavior (or adjust the code to match the documented policy).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance security of import from JSON feature

3 participants