Hardened feature for importing answers from a JSON file#1027
Hardened feature for importing answers from a JSON file#1027nonprofittechy wants to merge 9 commits intomainfrom
Conversation
|
@ll-jgraszer your comments welcome too |
There was a problem hiding this comment.
Pull request overview
This PR hardens the “import answers from JSON” feature in AssemblyLine by adding structural validation, strict variable/class filtering, and user-facing reporting to reduce the risk of malicious or malformed payloads (per #481).
Changes:
- Added defensive JSON parsing and sanitization in
sessions.load_interview_json()(limits, protected variable blocking, object/class allowlisting, class remapping, dunder/key pattern checks). - Added an import report API (
get_last_import_report) and wired the saved-sessions interview flow to display warnings/rejections to the user. - Added unit tests and sample JSON fixtures covering adversarial cases and object graph handling; documented configuration options in the README.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| docassemble/AssemblyLine/sessions.py | Core hardening: parse limits, variable/class allowlists, object envelope validation, and import reporting. |
| docassemble/AssemblyLine/test_sessions.py | New unit tests covering malformed JSON, limit enforcement, protected vars, object allowlisting/remaps, and adversarial payloads. |
| docassemble/AssemblyLine/data/sources/answer_set_import_samples/*.json | Test fixtures for safe and malicious import scenarios (object graphs, dunder/internal keys, malformed JSON). |
| docassemble/AssemblyLine/data/questions/al_visual.yml | Adds an interview-level gate (al_allow_answer_set_imports) to the “Import answer set” menu item visibility logic. |
| docassemble/AssemblyLine/data/questions/al_settings.yml | Defines al_allow_answer_set_imports (default True) for interview-level control of imports. |
| docassemble/AssemblyLine/data/questions/al_saved_sessions.yml | Uses is_valid_json() for validation; calls get_last_import_report() and displays warnings/rejections. |
| docassemble/AssemblyLine/data/questions/al_document.yml | Minor string quoting consistency changes. |
| README.md | Adds configuration documentation for the hardened JSON import behavior and limits. |
| % for item in al_sessions_last_import_report.get('rejected', []): | ||
| * `${ item.get('path', '?') }`: ${ item.get('reason', 'unknown reason') } | ||
| % endfor |
There was a problem hiding this comment.
al_sessions_import_rejected_template renders a bullet for every rejected item. With the current limits (max_keys up to 20k), this could generate an extremely large page and slow down rendering for a malicious/accidental large upload. Consider truncating the displayed list (e.g., first N rejections) and showing a summary count, while keeping the full report available for debugging/logging if needed.
| % for item in al_sessions_last_import_report.get('rejected', []): | |
| * `${ item.get('path', '?') }`: ${ item.get('reason', 'unknown reason') } | |
| % endfor | |
| % for item in al_sessions_last_import_report.get('rejected', [])[:50]: | |
| * `${ item.get('path', '?') }`: ${ item.get('reason', 'unknown reason') } | |
| % endfor | |
| % if len(al_sessions_last_import_report.get('rejected', [])) > 50: | |
| * ${ len(al_sessions_last_import_report.get('rejected', [])) - 50 } more variables were skipped and are not shown here. | |
| % endif |
| _last_import_report: Dict[str, Any] = { | ||
| "accepted": [], | ||
| "rejected": [], | ||
| "warnings": [], | ||
| "limits": {}, | ||
| "contains_objects": False, | ||
| "remapped_classes": [], | ||
| } |
There was a problem hiding this comment.
_last_import_report is stored as a module-level global, so it will be shared across concurrent Docassemble sessions/threads. That can leak one user's import report to another user and create race conditions where the UI shows the wrong report. Consider storing the report in docassemble's thread/session-local context (e.g., docassemble.base.functions.this_thread) or persisting it in the current session variables instead of a process-global.
| allowed = get_config("assembly line", {}).get("answer set import allowed variables") | ||
| if not allowed: | ||
| return None | ||
| cleaned = {str(x).strip() for x in allowed if str(x).strip()} |
There was a problem hiding this comment.
answer set import allowed variables is assumed to be an iterable of entries. If a server config mistakenly provides a single string instead of a YAML list, this will iterate over characters and build a per-character allowlist (silently breaking imports). Recommend normalizing config values (treat str as a single-item list, ignore/raise on unexpected types) before building the set.
| allowed = get_config("assembly line", {}).get("answer set import allowed variables") | |
| if not allowed: | |
| return None | |
| cleaned = {str(x).strip() for x in allowed if str(x).strip()} | |
| raw_allowed = get_config("assembly line", {}).get( | |
| "answer set import allowed variables" | |
| ) | |
| if raw_allowed is None: | |
| return None | |
| if isinstance(raw_allowed, str): | |
| iterable = [raw_allowed] | |
| elif isinstance(raw_allowed, (list, tuple, set)): | |
| iterable = raw_allowed | |
| else: | |
| # Unexpected type; ignore invalid configuration and fall back to default policy. | |
| return None | |
| cleaned = {str(x).strip() for x in iterable if str(x).strip()} |
| configured = get_config("assembly line", {}).get( | ||
| "answer set import allowed object classes", [] | ||
| ) | ||
| for class_name in configured: |
There was a problem hiding this comment.
Same robustness issue as the variable allowlist: answer set import allowed object classes is iterated directly. If the config is a single string (instead of a YAML list), it will iterate character-by-character and add many invalid class names. Recommend validating the type and normalizing str to a one-element list (or rejecting invalid types) before the loop.
| for class_name in configured: | |
| # Normalize configured value: | |
| # - If it's a string, treat it as a single class name. | |
| # - If it's a list/tuple/set, iterate over its elements. | |
| # - Otherwise, ignore it and return the defaults. | |
| if isinstance(configured, str): | |
| iterable = [configured] | |
| elif isinstance(configured, (list, tuple, set)): | |
| iterable = configured | |
| else: | |
| iterable = [] | |
| for class_name in iterable: |
| Default behavior: | ||
| - Plain JSON values are imported by default, and object reconstruction is allowed only for allowlisted DAObject classes. | ||
| - Top-level variable names must match `^[A-Za-z][A-Za-z0-9_]*$`. | ||
| - Internal/protected variable names are blocked. | ||
| - If `answer set import allowed variables` is not set, imports use a denylist-only policy for backwards compatibility. | ||
| - Object payloads can be imported when classes are allowlisted; by default, known `docassemble.base` and `docassemble.AssemblyLine` DAObject descendants are allowed. |
There was a problem hiding this comment.
README says that when answer set import allowed variables is unset, imports use a denylist-only policy for backwards compatibility. However, the implementation also restricts imports to variables detected in the target interview (via _get_target_interview_variables) when available, which is stricter than denylist-only. Please update this section to match the actual default behavior (or adjust the code to match the documented policy).
Hardens the answer import functionality significantly, although I'd still caution authors that they might not need to turn this on for normal use.
fix #481
__dunder__methods.An interactive interview to test with: