Skip to content

WIP Include challenge teams1 5#226

Open
Ultimate-Storm wants to merge 1138 commits intomainfrom
include_challenge_teams1-5
Open

WIP Include challenge teams1 5#226
Ultimate-Storm wants to merge 1138 commits intomainfrom
include_challenge_teams1-5

Conversation

@Ultimate-Storm
Copy link
Copy Markdown
Contributor

Five models added
still being tesed
auto collection of logs is working

oleschwen and others added 30 commits October 24, 2025 10:12
…g unit tests, use NVFlare controller classes instead
chore: Update APT versions in Dockerfile
chore: Update APT versions in Dockerfile
app = FastAPI()

def read_text(p: Path, limit: int = 50000) -> str:
if not p.exists():

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 5 days ago

In general, to fix uncontrolled path usage you must (1) construct paths relative to a trusted base directory, (2) normalize the resulting path, and (3) verify that the normalized path still lies within the base directory before accessing the filesystem. If the check fails, return an error (or empty content) instead of reading the file.

For this code, the best non-breaking fix is to introduce a small helper that safely joins BASE with the user-provided components site, mode, and run_id, normalizes the result, and ensures that the final path is inside BASE. Then use this helper in heartbeat, console, and log to derive the directory path; from there, append the fixed filenames (heartbeat.json, nohup.out, etc.) using simple / concatenation on Path, which is safe once the directory has been validated. This preserves existing behavior for valid inputs while blocking traversal attempts.

Concretely, within server_tools/app.py:

  • Add a helper safe_run_path(site: str, mode: str, run_id: str) -> Path | None (or just Path with error handling) above the route handlers. This function should:
    • Build a candidate path: candidate = (BASE / site / mode / run_id).resolve().
    • Ensure candidate is within BASE, for example by comparing candidate’s parts to BASE’s parts or checking that str(candidate).startswith(str(BASE) + os.sep) after normalization; with Path, a robust, cross-platform check is BASE in candidate.parents or candidate == BASE is not quite right, so instead ensure BASE == candidate or BASE == next(iter(candidate.parents))—but the simplest is to check BASE == candidate or BASE in candidate.parents. Here, since candidate must be deeper than BASE, we verify BASE == candidate or BASE in candidate.parents.
    • If the check fails, return None.
  • Update heartbeat to:
    • Call safe_run_path(site, mode, run_id) to get run_dir.
    • If run_dir is None, return an empty string.
    • Build p = run_dir / name instead of using BASE / site / mode / run_id / name.
  • Update console similarly: compute run_dir with the helper, then p1 = run_dir / "nohup.out" and p2 = run_dir / "local_training_console_output.txt", guarding the case where run_dir is None.
  • Update log to use the helper and return an empty string if validation fails.

We already import Path and do not need additional third-party libraries; we only add one helper function, plus possibly Optional typing if desired (but that’s not strictly necessary). All changes stay within server_tools/app.py.


Suggested changeset 1
server_tools/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/server_tools/app.py b/server_tools/app.py
--- a/server_tools/app.py
+++ b/server_tools/app.py
@@ -12,6 +12,23 @@
         return ""
     return p.read_text(errors="replace")[-limit:]
 
+def safe_run_path(site: str, mode: str, run_id: str) -> Path | None:
+    """
+    Construct a path under BASE from user-provided components and ensure that
+    the resulting path is contained within BASE. Returns None if the path
+    would escape BASE.
+    """
+    candidate = (BASE / site / mode / run_id).resolve()
+    try:
+        # Ensure the resolved path is within BASE to prevent path traversal.
+        if candidate == BASE or BASE in candidate.parents:
+            return candidate
+    except Exception:
+        # In case of any unexpected error during resolution or checking,
+        # treat the path as unsafe.
+        pass
+    return None
+
 def parse_age(ts: str) -> str:
     if not ts:
         return "unknown"
@@ -90,16 +107,22 @@
 
 @app.get("/heartbeat/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def heartbeat(site: str, mode: str, run_id: str):
+    run_dir = safe_run_path(site, mode, run_id)
+    if run_dir is None:
+        return ""
     for name in ["heartbeat.json", "heartbeat_final.json"]:
-        p = BASE / site / mode / run_id / name
+        p = run_dir / name
         if p.exists():
             return read_text(p)
     return ""
 
 @app.get("/console/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def console(site: str, mode: str, run_id: str):
-    p1 = BASE / site / mode / run_id / "nohup.out"
-    p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
+    run_dir = safe_run_path(site, mode, run_id)
+    if run_dir is None:
+        return ""
+    p1 = run_dir / "nohup.out"
+    p2 = run_dir / "local_training_console_output.txt"
     if p1.exists():
         return read_text(p1)
     if p2.exists():
@@ -108,4 +124,7 @@
 
 @app.get("/log/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def log(site: str, mode: str, run_id: str):
-    return read_text(BASE / site / mode / run_id / "log.txt")
\ No newline at end of file
+    run_dir = safe_run_path(site, mode, run_id)
+    if run_dir is None:
+        return ""
+    return read_text(run_dir / "log.txt")
\ No newline at end of file
EOF
@@ -12,6 +12,23 @@
return ""
return p.read_text(errors="replace")[-limit:]

def safe_run_path(site: str, mode: str, run_id: str) -> Path | None:
"""
Construct a path under BASE from user-provided components and ensure that
the resulting path is contained within BASE. Returns None if the path
would escape BASE.
"""
candidate = (BASE / site / mode / run_id).resolve()
try:
# Ensure the resolved path is within BASE to prevent path traversal.
if candidate == BASE or BASE in candidate.parents:
return candidate
except Exception:
# In case of any unexpected error during resolution or checking,
# treat the path as unsafe.
pass
return None

def parse_age(ts: str) -> str:
if not ts:
return "unknown"
@@ -90,16 +107,22 @@

@app.get("/heartbeat/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def heartbeat(site: str, mode: str, run_id: str):
run_dir = safe_run_path(site, mode, run_id)
if run_dir is None:
return ""
for name in ["heartbeat.json", "heartbeat_final.json"]:
p = BASE / site / mode / run_id / name
p = run_dir / name
if p.exists():
return read_text(p)
return ""

@app.get("/console/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def console(site: str, mode: str, run_id: str):
p1 = BASE / site / mode / run_id / "nohup.out"
p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
run_dir = safe_run_path(site, mode, run_id)
if run_dir is None:
return ""
p1 = run_dir / "nohup.out"
p2 = run_dir / "local_training_console_output.txt"
if p1.exists():
return read_text(p1)
if p2.exists():
@@ -108,4 +124,7 @@

@app.get("/log/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def log(site: str, mode: str, run_id: str):
return read_text(BASE / site / mode / run_id / "log.txt")
run_dir = safe_run_path(site, mode, run_id)
if run_dir is None:
return ""
return read_text(run_dir / "log.txt")
Copilot is powered by AI and may make mistakes. Always verify output.
def read_text(p: Path, limit: int = 50000) -> str:
if not p.exists():
return ""
return p.read_text(errors="replace")[-limit:]

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 5 days ago

In general, to fix this kind of issue you must ensure that any path built from user input is confined to an intended root directory. That is usually done by: (1) constructing the candidate path under a known base directory, (2) normalizing / resolving it to remove .. and similar, and (3) verifying that the final path is still inside the base directory. If the check fails, return an error or empty content instead of touching the filesystem.

For this codebase, the least intrusive and most consistent fix is to add a small helper that safely joins BASE with site, mode, and run_id plus the target filename, using Path.resolve() to normalize and then checking that the resolved path is within BASE. We then use this helper in heartbeat, console, and log instead of directly doing BASE / site / mode / run_id / .... If the resulting path is outside BASE, we treat it as non‑existent and return an empty string, preserving the current observable behavior (these handlers already return "" on missing files). We can also perform a quick early rejection of path components containing directory separators to further reduce risk, but the critical protection is the resolve-and-check step.

Concretely, in server_tools/app.py:

  • Add a function, e.g. safe_path(*parts: str) -> Path | None, just after BASE and app (or after read_text), that:
    • Builds candidate = BASE.joinpath(*parts).
    • Resolves it: resolved = candidate.resolve().
    • Checks resolved.is_relative_to(BASE) if available (Python 3.9+), otherwise falls back to checking BASE in resolved.parents or resolved == BASE.
    • Returns resolved if it is inside BASE, otherwise returns None.
  • Optionally, inside safe_path, reject any part that contains / or \ to force components to be single path segments.
  • Update heartbeat, console, and log to:
    • Call safe_path(site, mode, run_id, filename) to obtain each path.
    • If safe_path returns None or the file doesn’t exist, skip/return "" as they currently do.
  • Leave read_text unchanged; it will now only ever be called with paths that have passed the safety check.

This centralizes the validation in one helper, fixes all 9 variants at once, and does not change successful responses for valid paths under BASE.

Suggested changeset 1
server_tools/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/server_tools/app.py b/server_tools/app.py
--- a/server_tools/app.py
+++ b/server_tools/app.py
@@ -7,6 +7,33 @@
 BASE = Path("/srv/mediswarm/live")
 app = FastAPI()
 
+def safe_path(*parts: str) -> Path | None:
+    """
+    Construct a path under BASE from the given string components, ensuring
+    that the resulting resolved path stays within BASE. Returns None if
+    the path would escape BASE.
+    """
+    # Reject components that try to smuggle in path separators
+    for part in parts:
+        if "/" in part or "\\" in part:
+            return None
+    candidate = BASE.joinpath(*parts)
+    try:
+        resolved = candidate.resolve()
+    except FileNotFoundError:
+        # Even if intermediate components do not exist, resolve as far as possible
+        resolved = candidate.resolve(strict=False)
+    try:
+        # Python 3.9+: use is_relative_to if available
+        is_inside = resolved.is_relative_to(BASE)  # type: ignore[attr-defined]
+    except AttributeError:
+        # Fallback for older Python versions
+        resolved_base = BASE.resolve()
+        is_inside = resolved == resolved_base or resolved_base in resolved.parents
+    if not is_inside:
+        return None
+    return resolved
+
 def read_text(p: Path, limit: int = 50000) -> str:
     if not p.exists():
         return ""
@@ -91,21 +118,24 @@
 @app.get("/heartbeat/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def heartbeat(site: str, mode: str, run_id: str):
     for name in ["heartbeat.json", "heartbeat_final.json"]:
-        p = BASE / site / mode / run_id / name
-        if p.exists():
+        p = safe_path(site, mode, run_id, name)
+        if p is not None and p.exists():
             return read_text(p)
     return ""
 
 @app.get("/console/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def console(site: str, mode: str, run_id: str):
-    p1 = BASE / site / mode / run_id / "nohup.out"
-    p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
-    if p1.exists():
+    p1 = safe_path(site, mode, run_id, "nohup.out")
+    p2 = safe_path(site, mode, run_id, "local_training_console_output.txt")
+    if p1 is not None and p1.exists():
         return read_text(p1)
-    if p2.exists():
+    if p2 is not None and p2.exists():
         return read_text(p2)
     return ""
 
 @app.get("/log/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def log(site: str, mode: str, run_id: str):
-    return read_text(BASE / site / mode / run_id / "log.txt")
\ No newline at end of file
+    p = safe_path(site, mode, run_id, "log.txt")
+    if p is None:
+        return ""
+    return read_text(p)
\ No newline at end of file
EOF
@@ -7,6 +7,33 @@
BASE = Path("/srv/mediswarm/live")
app = FastAPI()

def safe_path(*parts: str) -> Path | None:
"""
Construct a path under BASE from the given string components, ensuring
that the resulting resolved path stays within BASE. Returns None if
the path would escape BASE.
"""
# Reject components that try to smuggle in path separators
for part in parts:
if "/" in part or "\\" in part:
return None
candidate = BASE.joinpath(*parts)
try:
resolved = candidate.resolve()
except FileNotFoundError:
# Even if intermediate components do not exist, resolve as far as possible
resolved = candidate.resolve(strict=False)
try:
# Python 3.9+: use is_relative_to if available
is_inside = resolved.is_relative_to(BASE) # type: ignore[attr-defined]
except AttributeError:
# Fallback for older Python versions
resolved_base = BASE.resolve()
is_inside = resolved == resolved_base or resolved_base in resolved.parents
if not is_inside:
return None
return resolved

def read_text(p: Path, limit: int = 50000) -> str:
if not p.exists():
return ""
@@ -91,21 +118,24 @@
@app.get("/heartbeat/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def heartbeat(site: str, mode: str, run_id: str):
for name in ["heartbeat.json", "heartbeat_final.json"]:
p = BASE / site / mode / run_id / name
if p.exists():
p = safe_path(site, mode, run_id, name)
if p is not None and p.exists():
return read_text(p)
return ""

@app.get("/console/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def console(site: str, mode: str, run_id: str):
p1 = BASE / site / mode / run_id / "nohup.out"
p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
if p1.exists():
p1 = safe_path(site, mode, run_id, "nohup.out")
p2 = safe_path(site, mode, run_id, "local_training_console_output.txt")
if p1 is not None and p1.exists():
return read_text(p1)
if p2.exists():
if p2 is not None and p2.exists():
return read_text(p2)
return ""

@app.get("/log/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def log(site: str, mode: str, run_id: str):
return read_text(BASE / site / mode / run_id / "log.txt")
p = safe_path(site, mode, run_id, "log.txt")
if p is None:
return ""
return read_text(p)
Copilot is powered by AI and may make mistakes. Always verify output.
def heartbeat(site: str, mode: str, run_id: str):
for name in ["heartbeat.json", "heartbeat_final.json"]:
p = BASE / site / mode / run_id / name
if p.exists():

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 5 days ago

In general, to fix uncontrolled path usage, you should normalize any path that includes user-controlled components and then verify that the resulting path is still within an expected safe root directory. For Python paths built with pathlib.Path, this means constructing the path under BASE, resolving or normalizing it (e.g., via .resolve()), and then checking that the resolved path is a descendant of BASE. If the check fails, the endpoint should return an error instead of touching the filesystem.

For this codebase, the best way to fix the problem without changing existing functionality is:

  1. Add a small helper function, e.g. safe_join, that:

    • Takes a base Path and one or more path components (*parts).
    • Constructs the full path with base.joinpath(*parts).
    • Resolves it to an absolute path (.resolve()).
    • Verifies that this resolved path is within base using is_relative_to (Python 3.9+) or a fallback using relative_to and catching ValueError.
    • Raises HTTPException(status_code=404) or similar if the check fails.
  2. Use this helper in all three endpoints (heartbeat, console, and log) instead of directly doing BASE / site / mode / run_id / .... This ensures site, mode, and run_id cannot cause traversal outside BASE, addressing all three alert variants in one place. Existing behavior for valid paths (under BASE) remains the same.

  3. Since we want to return HTTP-friendly errors, we should import HTTPException from fastapi. No other behavioral changes are needed; we keep the same return types (PlainTextResponse) and keep using read_text.

Concretely, inside server_tools/app.py:

  • Add from fastapi import HTTPException to the imports.
  • Define safe_join just below BASE/app or near read_text.
  • Update:
    • In heartbeat, change p = BASE / site / mode / run_id / name to p = safe_join(BASE, site, mode, run_id, name).
    • In console, change p1 = BASE / site / mode / run_id / "nohup.out" and p2 = ... similarly to use safe_join.
    • In log, change read_text(BASE / site / mode / run_id / "log.txt") to use safe_join for the path argument.
Suggested changeset 1
server_tools/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/server_tools/app.py b/server_tools/app.py
--- a/server_tools/app.py
+++ b/server_tools/app.py
@@ -1,12 +1,31 @@
 from pathlib import Path
 import json
 from datetime import datetime, timezone
-from fastapi import FastAPI
+from fastapi import FastAPI, HTTPException
 from fastapi.responses import HTMLResponse, PlainTextResponse
 
 BASE = Path("/srv/mediswarm/live")
 app = FastAPI()
 
+def safe_join(base: Path, *parts: str) -> Path:
+    """
+    Safely join user-controlled path components to a base directory.
+    Ensures the resulting path is within the base directory.
+    """
+    candidate = base.joinpath(*parts).resolve()
+    try:
+        # Python 3.9+: use is_relative_to if available
+        is_relative = getattr(candidate, "is_relative_to", None)
+        if callable(is_relative):
+            if not candidate.is_relative_to(base):
+                raise HTTPException(status_code=404)
+        else:
+            candidate.relative_to(base)
+    except ValueError:
+        # candidate is not under base
+        raise HTTPException(status_code=404)
+    return candidate
+
 def read_text(p: Path, limit: int = 50000) -> str:
     if not p.exists():
         return ""
@@ -91,15 +105,15 @@
 @app.get("/heartbeat/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def heartbeat(site: str, mode: str, run_id: str):
     for name in ["heartbeat.json", "heartbeat_final.json"]:
-        p = BASE / site / mode / run_id / name
+        p = safe_join(BASE, site, mode, run_id, name)
         if p.exists():
             return read_text(p)
     return ""
 
 @app.get("/console/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def console(site: str, mode: str, run_id: str):
-    p1 = BASE / site / mode / run_id / "nohup.out"
-    p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
+    p1 = safe_join(BASE, site, mode, run_id, "nohup.out")
+    p2 = safe_join(BASE, site, mode, run_id, "local_training_console_output.txt")
     if p1.exists():
         return read_text(p1)
     if p2.exists():
@@ -108,4 +116,4 @@
 
 @app.get("/log/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def log(site: str, mode: str, run_id: str):
-    return read_text(BASE / site / mode / run_id / "log.txt")
\ No newline at end of file
+    return read_text(safe_join(BASE, site, mode, run_id, "log.txt"))
\ No newline at end of file
EOF
@@ -1,12 +1,31 @@
from pathlib import Path
import json
from datetime import datetime, timezone
from fastapi import FastAPI
from fastapi import FastAPI, HTTPException
from fastapi.responses import HTMLResponse, PlainTextResponse

BASE = Path("/srv/mediswarm/live")
app = FastAPI()

def safe_join(base: Path, *parts: str) -> Path:
"""
Safely join user-controlled path components to a base directory.
Ensures the resulting path is within the base directory.
"""
candidate = base.joinpath(*parts).resolve()
try:
# Python 3.9+: use is_relative_to if available
is_relative = getattr(candidate, "is_relative_to", None)
if callable(is_relative):
if not candidate.is_relative_to(base):
raise HTTPException(status_code=404)
else:
candidate.relative_to(base)
except ValueError:
# candidate is not under base
raise HTTPException(status_code=404)
return candidate

def read_text(p: Path, limit: int = 50000) -> str:
if not p.exists():
return ""
@@ -91,15 +105,15 @@
@app.get("/heartbeat/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def heartbeat(site: str, mode: str, run_id: str):
for name in ["heartbeat.json", "heartbeat_final.json"]:
p = BASE / site / mode / run_id / name
p = safe_join(BASE, site, mode, run_id, name)
if p.exists():
return read_text(p)
return ""

@app.get("/console/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def console(site: str, mode: str, run_id: str):
p1 = BASE / site / mode / run_id / "nohup.out"
p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
p1 = safe_join(BASE, site, mode, run_id, "nohup.out")
p2 = safe_join(BASE, site, mode, run_id, "local_training_console_output.txt")
if p1.exists():
return read_text(p1)
if p2.exists():
@@ -108,4 +116,4 @@

@app.get("/log/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def log(site: str, mode: str, run_id: str):
return read_text(BASE / site / mode / run_id / "log.txt")
return read_text(safe_join(BASE, site, mode, run_id, "log.txt"))
Copilot is powered by AI and may make mistakes. Always verify output.
def console(site: str, mode: str, run_id: str):
p1 = BASE / site / mode / run_id / "nohup.out"
p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
if p1.exists():

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 5 days ago

In general, to fix uncontrolled path usage you should normalize the constructed path and enforce that it stays within an allowed root directory (here BASE), or validate the individual path components against an allow‑list or strict pattern. Normalization is important to neutralize .. segments and similar tricks before comparing paths.

For this codebase, the least invasive fix that preserves existing functionality is:

  1. Add a small helper function that:
    • Takes site, mode, and run_id.
    • Builds a path rooted at BASE.
    • Resolves it to an absolute path (e.g. Path.resolve(strict=False)).
    • Verifies that the resolved path is still under BASE (e.g. using Path.relative_to or a prefix check on resolved absolute paths).
    • Raises an HTTP 404/400 (or similar) if validation fails.
  2. Use this helper in all three endpoints (/heartbeat, /console, /log) so the validation is centralized and fixes all CodeQL variants at once.

Concretely, within server_tools/app.py:

  • Import HTTPException from fastapi (you’re already importing from fastapi, so this is consistent).
  • Define a function like get_run_dir(site: str, mode: str, run_id: str) -> Path just after the BASE/app definitions or near rows(). This function should:
    • Compute candidate = BASE / site / mode / run_id.
    • Compute resolved_base = BASE.resolve() once (can be global) and resolved_candidate = candidate.resolve().
    • Check resolved_candidate.is_dir() (optional but safer).
    • Ensure resolved_candidate is under resolved_base, for example by:
      try:
          resolved_candidate.relative_to(resolved_base)
      except ValueError:
          raise HTTPException(status_code=404, detail="Not found")
    • Return resolved_candidate.
  • Update:
    • heartbeat() to call run_dir = get_run_dir(site, mode, run_id) and then look for heartbeat files under run_dir.
    • console() to call the same helper and then build p1/p2 from run_dir.
    • log() to call the same helper and then read_text(run_dir / "log.txt").

This keeps behavior the same for valid inputs, but rejects any attempt to escape BASE via .. or absolute paths, addressing all three tainted parameters in one place.


Suggested changeset 1
server_tools/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/server_tools/app.py b/server_tools/app.py
--- a/server_tools/app.py
+++ b/server_tools/app.py
@@ -1,12 +1,25 @@
 from pathlib import Path
 import json
 from datetime import datetime, timezone
-from fastapi import FastAPI
+from fastapi import FastAPI, HTTPException
 from fastapi.responses import HTMLResponse, PlainTextResponse
 
 BASE = Path("/srv/mediswarm/live")
 app = FastAPI()
 
+def get_run_dir(site: str, mode: str, run_id: str) -> Path:
+    """
+    Construct and validate a run directory path rooted at BASE.
+    Raises HTTPException if the resulting path is outside BASE.
+    """
+    base_resolved = BASE.resolve()
+    candidate = (BASE / site / mode / run_id).resolve()
+    try:
+        candidate.relative_to(base_resolved)
+    except ValueError:
+        raise HTTPException(status_code=404, detail="Not found")
+    return candidate
+
 def read_text(p: Path, limit: int = 50000) -> str:
     if not p.exists():
         return ""
@@ -90,16 +98,18 @@
 
 @app.get("/heartbeat/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def heartbeat(site: str, mode: str, run_id: str):
+    run_dir = get_run_dir(site, mode, run_id)
     for name in ["heartbeat.json", "heartbeat_final.json"]:
-        p = BASE / site / mode / run_id / name
+        p = run_dir / name
         if p.exists():
             return read_text(p)
     return ""
 
 @app.get("/console/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def console(site: str, mode: str, run_id: str):
-    p1 = BASE / site / mode / run_id / "nohup.out"
-    p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
+    run_dir = get_run_dir(site, mode, run_id)
+    p1 = run_dir / "nohup.out"
+    p2 = run_dir / "local_training_console_output.txt"
     if p1.exists():
         return read_text(p1)
     if p2.exists():
@@ -108,4 +111,5 @@
 
 @app.get("/log/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def log(site: str, mode: str, run_id: str):
-    return read_text(BASE / site / mode / run_id / "log.txt")
\ No newline at end of file
+    run_dir = get_run_dir(site, mode, run_id)
+    return read_text(run_dir / "log.txt")
\ No newline at end of file
EOF
@@ -1,12 +1,25 @@
from pathlib import Path
import json
from datetime import datetime, timezone
from fastapi import FastAPI
from fastapi import FastAPI, HTTPException
from fastapi.responses import HTMLResponse, PlainTextResponse

BASE = Path("/srv/mediswarm/live")
app = FastAPI()

def get_run_dir(site: str, mode: str, run_id: str) -> Path:
"""
Construct and validate a run directory path rooted at BASE.
Raises HTTPException if the resulting path is outside BASE.
"""
base_resolved = BASE.resolve()
candidate = (BASE / site / mode / run_id).resolve()
try:
candidate.relative_to(base_resolved)
except ValueError:
raise HTTPException(status_code=404, detail="Not found")
return candidate

def read_text(p: Path, limit: int = 50000) -> str:
if not p.exists():
return ""
@@ -90,16 +98,18 @@

@app.get("/heartbeat/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def heartbeat(site: str, mode: str, run_id: str):
run_dir = get_run_dir(site, mode, run_id)
for name in ["heartbeat.json", "heartbeat_final.json"]:
p = BASE / site / mode / run_id / name
p = run_dir / name
if p.exists():
return read_text(p)
return ""

@app.get("/console/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def console(site: str, mode: str, run_id: str):
p1 = BASE / site / mode / run_id / "nohup.out"
p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
run_dir = get_run_dir(site, mode, run_id)
p1 = run_dir / "nohup.out"
p2 = run_dir / "local_training_console_output.txt"
if p1.exists():
return read_text(p1)
if p2.exists():
@@ -108,4 +111,5 @@

@app.get("/log/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
def log(site: str, mode: str, run_id: str):
return read_text(BASE / site / mode / run_id / "log.txt")
run_dir = get_run_dir(site, mode, run_id)
return read_text(run_dir / "log.txt")
Copilot is powered by AI and may make mistakes. Always verify output.
p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
if p1.exists():
return read_text(p1)
if p2.exists():

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 5 days ago

In general, to fix uncontrolled path usage you should ensure that any path derived from user input is (a) normalized to remove .. and similar constructs and (b) verified to remain inside an expected base directory, rejecting requests that resolve outside that base. This can be implemented by constructing the full path with the base directory, normalizing it, and then checking that it is still within the base directory before using it.

For this specific code, the best fix with minimal behavior change is:

  1. Add a helper function, e.g. safe_join, that:
    • Takes base: Path and a variable number of string path components.
    • Constructs a path with base.joinpath(*parts).
    • Normalizes it with .resolve() (or os.path.realpath) to eliminate .., symlinks, and redundant separators.
    • Verifies that the resolved path is still within base using path.is_relative_to(base) (Python 3.9+) or an equivalent prefix check on .resolve().
    • Raises an HTTP 404/400 (using FastAPI’s HTTPException) if the path escapes the base directory.
  2. Use this helper when building p1, p2, and the log path in the console and log endpoints, and similarly for the heartbeat endpoint’s p path.
  3. Import HTTPException and status from fastapi to provide appropriate error responses.

Concretely, within server_tools/app.py:

  • Add from fastapi import FastAPI, HTTPException, status (replacing the existing FastAPI-only import) or add a separate import line for HTTPException and status.
  • Define a safe_join helper near the top of the file (after BASE and before the endpoints).
  • Replace direct BASE / site / mode / run_id / ... constructions within heartbeat, console, and log with calls to safe_join(BASE, site, mode, run_id, filename).

This preserves existing functionality (valid paths under /srv/mediswarm/live still work) while blocking directory traversal and similar abuses across all alert variants in one place.


Suggested changeset 1
server_tools/app.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/server_tools/app.py b/server_tools/app.py
--- a/server_tools/app.py
+++ b/server_tools/app.py
@@ -1,12 +1,44 @@
 from pathlib import Path
 import json
 from datetime import datetime, timezone
-from fastapi import FastAPI
+from fastapi import FastAPI, HTTPException, status
 from fastapi.responses import HTMLResponse, PlainTextResponse
 
 BASE = Path("/srv/mediswarm/live")
 app = FastAPI()
 
+def safe_join(base: Path, *parts: str) -> Path:
+    """
+    Safely join untrusted path segments to a base directory.
+
+    Ensures that the resulting path is contained within the base directory.
+    """
+    # Construct the candidate path relative to the base
+    candidate = base.joinpath(*parts)
+    try:
+        resolved = candidate.resolve()
+    except FileNotFoundError:
+        # Even if the path does not exist, we still want to validate
+        resolved = candidate.parent.resolve() / candidate.name
+
+    try:
+        # Python 3.9+: use is_relative_to for clarity
+        if not resolved.is_relative_to(base.resolve()):
+            raise HTTPException(
+                status_code=status.HTTP_404_NOT_FOUND,
+                detail="Requested resource not found.",
+            )
+    except AttributeError:
+        # Fallback for older Python versions
+        base_resolved = base.resolve()
+        if base_resolved not in resolved.parents and resolved != base_resolved:
+            raise HTTPException(
+                status_code=status.HTTP_404_NOT_FOUND,
+                detail="Requested resource not found.",
+            )
+
+    return resolved
+
 def read_text(p: Path, limit: int = 50000) -> str:
     if not p.exists():
         return ""
@@ -91,15 +118,15 @@
 @app.get("/heartbeat/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def heartbeat(site: str, mode: str, run_id: str):
     for name in ["heartbeat.json", "heartbeat_final.json"]:
-        p = BASE / site / mode / run_id / name
+        p = safe_join(BASE, site, mode, run_id, name)
         if p.exists():
             return read_text(p)
     return ""
 
 @app.get("/console/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def console(site: str, mode: str, run_id: str):
-    p1 = BASE / site / mode / run_id / "nohup.out"
-    p2 = BASE / site / mode / run_id / "local_training_console_output.txt"
+    p1 = safe_join(BASE, site, mode, run_id, "nohup.out")
+    p2 = safe_join(BASE, site, mode, run_id, "local_training_console_output.txt")
     if p1.exists():
         return read_text(p1)
     if p2.exists():
@@ -108,4 +129,4 @@
 
 @app.get("/log/{site}/{mode}/{run_id}", response_class=PlainTextResponse)
 def log(site: str, mode: str, run_id: str):
-    return read_text(BASE / site / mode / run_id / "log.txt")
\ No newline at end of file
+    return read_text(safe_join(BASE, site, mode, run_id, "log.txt"))
\ No newline at end of file
EOF
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants