Skip to content

VHP4Safety/pbpk-mcp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

PBPK MCP Server

CI DOI License Release Python

Part of ToxMCP Suite → https://github.com/ToxMCP/toxmcp

Public MCP endpoint for physiologically based pharmacokinetic (PBPK) simulation, qualification, and dossier export.
Expose PK-Sim / MoBi .pkml, MCP-ready rxode2 .R, deterministic and population runs, and OECD-oriented validation/reporting to any MCP-aware agent (Codex CLI, Gemini CLI, Claude Code, etc.).

Architecture

flowchart LR
    subgraph Clients["Clients and Agents"]
        Codex["Codex CLI / Desktop"]
        Gemini["Gemini CLI"]
        Claude["Claude Code"]
        Scripts["Scripts / notebooks"]
    end

    subgraph API["FastAPI MCP Service"]
        Router["HTTP entrypoints\n/health and /mcp"]
        Registry["Tool registry\npbpk-mcp.v1 contracts"]
        Tools["Tool handlers\ndiscovery, validation,\nexecution, reporting"]
    end

    subgraph Contracts["Discovery and Qualification Layer"]
        Catalog["Filesystem model catalog\nADAPTER_MODEL_PATHS"]
        Manifest["Static manifest validation\nvalidate_model_manifest"]
        Qualification["Qualification metadata\ncapabilities, profile,\nvalidation, qualificationState"]
        Verification["Executable verification\nrun_verification_checks"]
        Reports["OECD dossier export\nexport_oecd_report"]
    end

    subgraph Backends["Execution Backends"]
        OSPSuite["OSPSuite backend\nPK-Sim / MoBi (.pkml)"]
        Rxode2["rxode2 backend\nMCP-ready R (.R)"]
        Convert["Conversion-only inputs\n.pksim5 and .mmd -> .pkml or .R"]
    end

    subgraph Runtime["Runtime and QA Path"]
        Jobs["Celery + Redis jobs\nasync runs and result handles"]
        Overlay["Source-overlay runtime\nbind mounts + worker image"]
        Tests["Unit + live smoke tests\nrelease readiness checks"]
    end

    Clients --> Router
    Router --> Registry
    Registry --> Tools
    Tools --> Catalog
    Tools --> Manifest
    Tools --> Qualification
    Qualification --> Verification
    Qualification --> Reports
    Tools --> OSPSuite
    Tools --> Rxode2
    Convert --> OSPSuite
    Convert --> Rxode2
    Tools --> Jobs
    Overlay --> Tools
    Qualification --> Tests
    Overlay --> Tests
Loading

The current implementation follows a layered model:

  • FastAPI + JSON-RPC expose /mcp and /health, and keep transport concerns separate from PBPK model logic.
  • Tool handlers are the agent-facing API. They validate inputs, call adapters, and emit structured responses with normalized pbpk-mcp.v1 payloads.
  • Discovery and manifest validation make model files visible before load, and keep curation checks separate from runtime execution.
  • Execution backends route .pkml files to ospsuite and MCP-ready .R modules to rxode2, while treating .pksim5 and .mmd as conversion-only inputs.
  • Qualification metadata keeps capabilities, profile, validation, and qualificationState separate so runnable does not get conflated with scientifically qualified.
  • Executable verification adds lightweight but structured runtime checks on top of qualification metadata, including parameter-unit consistency, structural flow/volume consistency, deterministic smoke, result-integrity, and repeat-run reproducibility checks without conflating them with formal qualification evidence.
  • Packaged-default runtime + overlay smoke tests keep the documented tool surface synchronized with the live API while still leaving an explicit maintainer override path for workspace iteration.
  • Release metadata checks now verify that package version markers, compose/env SERVICE_VERSION, README release markers, the top changelog entry, and the matching docs/releases/ note stay aligned.
  • Release artifact evidence now retains a machine-readable report with sdist/wheel hashes and the linked contract-manifest identity during tag builds.

See docs/architecture/dual_backend_pbpk_mcp.md for the fuller architecture narrative, docs/architecture/capability_matrix.md for the published support matrix, docs/architecture/mcp_payload_conventions.md for the response contract, docs/deployment/runtime_patch_flow.md for the operator path behind the current local deployment model, and docs/pbpk_model_onboarding_checklist.md for the recommended trust pipeline when onboarding another chemical model. See docs/architecture/exposure_led_ngra_role.md for the explicit boundary statement on what PBPK MCP does and does not own inside exposure-led NGRA workflows. See benchmarks/regulatory_goldset/regulatory_goldset_summary.md for the current gold-set documentation benchmark summary derived from the fetched public-code PBPK corpus. Internal examples such as the bundled synthetic reference model remain bounded MCP rehearsal models, not benchmark exemplars.

What's new in v0.4.3

  • Added benchmark-derived dossier guidance, onboarding guidance, and explicit trust-surface reporting so MCP can show what is still missing before a PBPK model can credibly approach regulatory-grade use.
  • Tightened the public runtime posture by keeping the analyst UI retired, preserving viewer-only anonymous development mode, and keeping critical actions behind confirmation and sign-off surfaces.
  • Hardened release and audit evidence by shipping a reproducible gold-set benchmark dossier, a verifiable synthetic reference-model audit pack, and stricter contract/release-bundle integrity checks on the live stack.

Why this project exists

PBPK workflows often juggle PK-Sim or MoBi transfer files, R-authored mechanistic models, runtime-specific scripts, and incomplete qualification notes. That makes them difficult for coding agents to automate safely and difficult for reviewers to inspect consistently.

The PBPK MCP server wraps those workflows in a single, programmable interface:

  • Unified MCP surface – discovery, manifest checks, load, validation, execution, results, and dossier export share one tool catalog.
  • Dual-backend execution.pkml models run on ospsuite; MCP-ready .R models run on rxode2.
  • Qualification-aware workflows – runtime capability, scientific profile, preflight validation, and derived qualificationState stay separate.
  • NGRA-ready PBPK objects – validation and dossier export expose PBPK-side typed objects such as assessmentContext, pbpkQualificationSummary, uncertaintySummary, uncertaintyHandoff, internalExposureEstimate, a typed external uncertainty-register reference, and a typed external PoD reference handoff, now with explicit boundary/support metadata plus additive semanticCoverage for PBPK-side uncertainty semantics, without collapsing PBPK MCP into a full decision engine.
  • Exposure-led boundary clarityassessmentContext.workflowRole, assessmentContext.populationSupport, pbpkQualificationSummary.evidenceBasis, pbpkQualificationSummary.workflowClaimBoundaries, and pbpkQualificationSummary.reviewStatus now make IVIVE/exposure dependencies, supported population contexts, no-direct-in-vivo status, reviewer-attention requirements, and claim boundaries explicit.
  • Static curation feedback for NGRA gapsvalidate_model_manifest now exposes manifest.ngraCoverage plus a compact curationSummary, so missing workflow-role, population-support, evidence-basis, and claim-boundary declarations are visible before load in both machine-readable and reviewer-facing form; that summary now also carries an anti-misread block, explicit summary-transport risk for thin/forwarded views, a normalized cautionSummary with advisory-versus-blocking handling, a machine-readable exportBlockPolicy, and rendering guardrails for analyst-facing clients.
  • Regulatory benchmark bar – the workspace now also ships a gold-set benchmark dossier under benchmarks/regulatory_goldset/ plus an advisory curationSummary.regulatoryBenchmarkReadiness block, so MCP reviewers can compare a model dossier to public regulatory-grade PBPK packaging without promoting research models to regulatory-ready status. That advisory block now also carries per-dimension evaluation paths, benchmark examples, and concrete next-artifact guidance for closing documentation gaps.
  • Reviewer-friendly discovery summariesdiscover_models and /mcp/resources/models now surface manifestStatus, qualificationState, and a compact curationSummary so curated illustrative examples can be distinguished from exploratory or partially declared models at discovery time, with explicit anti-misread guidance and no official “bare label only” rendering path.
  • Human-review summary – OECD report export now includes a compact humanReviewSummary block that pulls those boundary fields into one reviewer-facing surface, including reviewer-status, unresolved-dissent context, normalized cautionSummary, summary-transport risk, exportBlockPolicy, and rendering guardrails, without creating a second hidden score or decision engine.
  • Anti-misread report section – OECD report export now also includes a mandatory misreadRiskSummary block so common over-interpretations are stated explicitly next to the primary report summary instead of being left implicit.
  • Audit-backed operator sign-off – trust-bearing outputs now surface an additive operatorReviewSignoff summary plus an explicit operatorReviewGovernance block derived from the real workflow contract, and the REST surface now exposes viewer-readable /review_signoff/history so recorded acknowledgement, bounded-use sign-off, rejection, and revocation remain auditable without being mistaken for a second qualification score, override path, or decision authority.
  • No bundled analyst UI in the current release – trust-bearing review remains on the MCP and REST surfaces for now; /console and /console/api/* are intentionally absent until a stronger reviewer-facing client exists.
  • Thin-client trust-surface contract – trust-bearing MCP tool results now also expose a top-level trustSurfaceContract so thin clients can find the nested summary surfaces, required adjacent caveats, and primary block reasons they must carry together instead of rediscovering those paths tool by tool.
  • Traceable predictive evidence – performance bundles can now carry traceability supplements, and the report path now warns when bundled benchmark rows do not actually line up with the declared datasets, target outputs, or acceptance criteria.
  • Discovery before execution – models are discoverable from disk before they are loaded into a live session.
  • Release-tested local deployment – the packaged local runtime and the explicit source-overlay maintainer profile are both exercised with unit tests, live-stack tests, and a readiness check.
  • Model-specific executable qualification checks – MCP-ready models can add runtime verification hooks for checks such as flow/volume consistency, mass balance, or numerical stability without overstating regulatory qualification.

rxode2 is a native PBPK execution engine in this project. It is not limited to Berkeley Madonna conversion workflows.


Feature snapshot

Capability Description
🧬 Dual-backend PBPK execution Route .pkml models to ospsuite and MCP-ready .R models to rxode2 through one MCP surface.
🗂️ Model discovery and curation Discover supported model files from ADAPTER_MODEL_PATHS, inspect unloaded models, and run static manifest checks before load.
🛡️ OECD-oriented qualification Keep capabilities, profile, validation, and qualificationState explicit; expose applicability, provenance, uncertainty, implementation verification, software-platform qualification, and qualification gaps.
🧱 NGRA-ready PBPK objects Emit typed PBPK-side objects such as assessmentContext, pbpkQualificationSummary, uncertaintySummary, uncertaintyHandoff, internalExposureEstimate, a typed uncertaintyRegisterReference, a typed pointOfDepartureReference, and a thin BER-ready reference bundle in dossier export, with explicit boundary/support flags plus additive uncertainty semanticCoverage for downstream NGRA orchestration and without embedding BER decision logic in PBPK MCP.
📐 Published object schemas Publish machine-readable JSON Schemas and example payloads for the PBPK-side NGRA handoff object family under schemas/ so downstream tooling can validate the public object layer directly.
🔌 External PBPK normalization Normalize externally generated PBPK outputs, qualification metadata, and optional PoD references through ingest_external_pbpk_bundle without pretending PBPK MCP executed the upstream engine.
Executable verification Run run_verification_checks to capture preflight validation, parameter coverage, parameter-unit consistency, structural flow/volume consistency, deterministic smoke, deterministic result integrity, repeat-run reproducibility, optional population smoke, and verification-evidence summaries in one payload.
📈 Deterministic and population jobs Submit asynchronous deterministic and population simulations, then retrieve result handles, stored results, and PK summaries.
🧾 Dossier export Export a structured OECD-style report with model metadata, validation context, checklist state, parameter provenance, performance evidence, uncertainty evidence, verification evidence, and software-platform qualification evidence when declared.
⚙️ Source-overlay deployment hygiene Recreate the local stack through the documented compose entrypoints and validate the live contract with readiness and smoke gates.
🤖 Agent friendly Verified through MCP HTTP surfaces such as /mcp/list_tools, /mcp/call_tool, and /mcp/resources/models, with live-stack regression checks.

Table of contents

  1. Architecture
  2. Published schemas
  3. Capability matrix
  4. Quick start
  5. Configuration
  6. Tool catalog
  7. Running the server
  8. Integrating with coding agents
  9. Output artifacts
  10. Security checklist
  11. Current limitations
  12. Development notes
  13. Contributing
  14. Security policy
  15. Code of conduct
  16. Citation
  17. Roadmap
  18. License

Quickstart TL;DR

# 1) build the worker image
git clone https://github.com/ToxMCP/pbpk-mcp.git
cd pbpk-mcp

# 2) start the local stack
./scripts/build_rxode2_worker_image.sh
./scripts/deploy_rxode2_stack.sh

# 3) verify
curl -s http://localhost:8000/health | jq .
curl -s http://localhost:8000/mcp/list_tools | jq '.tools[].name'

Published schemas

The PBPK-side NGRA handoff objects are now published as machine-readable JSON Schemas under schemas/, with matching examples under schemas/examples/.

Published object family:

  • schemas/assessmentContext.v1.json
  • schemas/pbpkQualificationSummary.v1.json
  • schemas/uncertaintySummary.v1.json
  • schemas/uncertaintyHandoff.v1.json
  • schemas/uncertaintyRegisterReference.v1.json
  • schemas/internalExposureEstimate.v1.json
  • schemas/pointOfDepartureReference.v1.json
  • schemas/berInputBundle.v1.json

Design intent:

  • require the stable core fields only
  • allow additive convenience fields
  • keep BER calculation and final decision policy outside PBPK MCP
  • keep exposure assessment, IVIVE policy, reverse dosimetry, and direct regulatory dose derivation outside PBPK MCP
  • make the PBPK-side handoff layer consumable by downstream validators and orchestrators without scraping examples out of tests

See schemas/README.md for the short schema guide and tests/test_ngra_object_schemas.py for the validation gate that keeps the published schemas aligned with live payload generation. The same schema family is also exposed through the live MCP resource surface at /mcp/resources/schemas. In the current local packaged runtime, the live schema/capability/contract-manifest resources treat the packaged mcp_bridge.contract content as authoritative rather than depending on copied JSON under /app/var/contract. scripts/check_installed_package_contract.py is the maintainer gate that proves the generated package fallback still matches the published JSON artifacts after a non-editable local install. When you explicitly opt into the workspace source-overlay profile, scripts/runtime_src_overlay.pth promotes /app/src ahead of the installed package.

PBPK MCP now also publishes a machine-readable contract manifest in:

  • docs/architecture/contract_manifest.json
  • /mcp/resources/contract-manifest

That manifest inventories the published PBPK-side schema family, the capability matrix, the legacy artifacts intentionally excluded from the PBPK-side object family, and the stable resource endpoints that expose the contract. The live schema, capability-matrix, and contract-manifest resources now also expose SHA-256 values so downstream clients can verify that the running API matches the published artifact inventory. The shared schema/capability/contract-manifest route logic now lives in packaged src/mcp_bridge/routes/resources_base.py, and packaged src/mcp_bridge/routes/resources.py now owns the full generic /mcp/resources surface including /mcp/resources/models. The same is now true for tools as well: packaged src/mcp_bridge/tools/registry_base.py and src/mcp_bridge/tools/registry.py now own the generic discovery/static-manifest/result/import descriptors alongside the rest of the documented PBPK workflow surface. The next 0.4.x debt-reduction slices are now live too: generic discovery, manifest, load/session-status, preflight validation, executable verification, dossier export, deterministic-result retrieval, external-import normalization, population workflow tools, the shared model_catalog / model_manifest helpers, the top-level mcp namespace, and the generic adapter contract/runtime now live in packaged src/. The default local runtime no longer needs a patch-copy step or a source overlay for code: the worker image bakes scripts/ospsuite_bridge.R and the bundled reference .R model into their runtime locations, docker-compose.celery.yml runs the packaged baseline, and docker-compose.overlay.yml is now the explicit maintainer profile when workspace src/ and scripts/ should override the image.

Capability matrix

PBPK MCP now publishes a dedicated capability matrix for adopters in:

  • docs/architecture/capability_matrix.md
  • docs/architecture/capability_matrix.json

The machine-readable matrix is also exposed from the running server at /mcp/resources/capability-matrix.

This matrix is the crisp public answer to:

  • what is discoverable
  • what is statically validatable
  • what is loadable
  • what is deterministic-run capable
  • what is population-run capable
  • what is dossier-capable

Current headline boundaries:

  • .pkml is discoverable, loadable, validatable, verification-capable, deterministic-run capable, and dossier-capable through ospsuite
  • contract-complete MCP-ready .R is discoverable, loadable, validatable, verification-capable, deterministic-run capable, and dossier-capable through rxode2
  • population execution is currently rxode2-only and remains conditional on model capability
  • incomplete .R files may still be discoverable before they are runnable
  • .pksim5 and .mmd are conversion-only and do not appear as runtime-supported catalog entries

The machine-readable JSON file is intentionally small and stable so downstream tooling can consume it directly without scraping prose from the README.

scripts/generate_contract_artifacts.py --check is the maintainer gate that keeps the contract manifest and generated packaged fallback aligned with the published JSON artifacts.

Quick start

git clone https://github.com/ToxMCP/pbpk-mcp.git
cd pbpk-mcp

./scripts/build_rxode2_worker_image.sh
./scripts/deploy_rxode2_stack.sh

Heads-up: ./scripts/deploy_rxode2_stack.sh now runs the packaged local runtime by default and waits for stable /health and /mcp/list_tools responses before returning. When you need workspace src/ and scripts/ to override the image for active development, use ./scripts/deploy_source_overlay_stack.sh.

Once the server is running:

  • HTTP MCP endpoint: http://localhost:8000/mcp
  • Health check: http://localhost:8000/health
  • Model discovery resource: http://localhost:8000/mcp/resources/models
  • Architecture docs: docs/architecture/dual_backend_pbpk_mcp.md
  • Runtime operator docs: docs/deployment/runtime_patch_flow.md
  • New-model trust pipeline: docs/pbpk_model_onboarding_checklist.md

Verification (smoke test)

Once the server is running:

# health
curl -s http://localhost:8000/health | jq .

# list MCP tools
curl -s http://localhost:8000/mcp/list_tools | jq .

# discover reference_compound
curl -s -X POST http://localhost:8000/mcp/call_tool \
  -H "Content-Type: application/json" \
  -d '{"tool":"discover_models","arguments":{"search":"reference_compound","limit":10}}' | jq .

# static manifest validation
curl -s -X POST http://localhost:8000/mcp/call_tool \
  -H "Content-Type: application/json" \
  -d '{"tool":"validate_model_manifest","arguments":{"filePath":"/app/var/models/rxode2/reference_compound/reference_compound_population_rxode2_model.R"}}' | jq .

For the full live-stack gate, run:

python3 scripts/release_readiness_check.py

That live readiness gate now also reruns the curated manifest publication check for the bundled synthetic reference model and pregnancy examples, and it fails if those examples regress to implicit NGRA workflow-role, population-support, evidence-basis, or claim-boundary declarations.

For the standalone static gate on that same curated publication set, run:

python3 scripts/validate_model_manifests.py --strict --require-explicit-ngra --curated-publication-set

To smoke-test the actual discovered model inventory in the current workspace, run:

python3 scripts/workspace_model_smoke.py --auth-dev-secret pbpk-local-dev-secret
python3 scripts/workspace_model_smoke.py --include-population --auth-dev-secret pbpk-local-dev-secret

The script discovers runtime-supported models through /mcp/resources/models, runs static manifest validation, loads each model, submits a deterministic simulation, retrieves stored results, and optionally runs a small population smoke for rxode2 models that declare population support. It writes a JSON report to var/workspace_model_smoke_report.json. On hardened local stacks, the load/run operations require operator or admin access, so pass --auth-dev-secret pbpk-local-dev-secret for the default development profile or provide a real bearer token.

For the GitHub-hosted verification path, the repository also carries:

  • a lightweight CI workflow for patch/runtime contract checks on pushes and pull requests
  • a heavier Model Smoke workflow that builds the Docker-backed stack, runs the live readiness gate, executes workspace_model_smoke.py --include-population --auth-dev-secret pbpk-local-dev-secret, and uploads the resulting JSON reports as workflow artifacts

Configuration

The default local packaged stack is defined in docker-compose.celery.yml. An explicit source-overlay development profile is available in docker-compose.overlay.yml, and a stricter operator overlay is available in docker-compose.hardened.yml for deployments that should disable anonymous access and require explicit auth configuration. Key environment variables currently shaping the runtime are:

Variable Default Description
ADAPTER_MODEL_PATHS /app/var Filesystem roots scanned by discover_models and /mcp/resources/models.
ADAPTER_BACKEND subprocess Uses the R / OSPSuite subprocess bridge instead of an in-memory mock path.
JOB_BACKEND celery Enables asynchronous job submission through Redis-backed Celery workers.
CELERY_BROKER_URL redis://redis:6379/0 Queue broker for asynchronous jobs.
CELERY_RESULT_BACKEND redis://redis:6379/1 Result storage for job handles and async result chaining.
ADAPTER_R_PATH / ADAPTER_R_HOME container defaults Point the bridge to the R runtime used by rxode2 and OSPSuite tooling.
ADAPTER_TIMEOUT_MS 30000 Default adapter timeout in milliseconds.
AUDIT_ENABLED true in local compose Enables audit-backed sign-off, verification traceability, and readable history surfaces.
R_MAX_VSIZE 2G Caps R virtual memory use inside the current local worker setup.
DOTNET_GCHeapLimitPercent 60 Constrains the .NET heap used by the OSPSuite runtime.
PBPK_ENABLE_SRC_OVERLAY false Keeps /app/src out of import precedence by default; docker-compose.overlay.yml sets it to true for workspace-override development.
SERVICE_VERSION 0.4.3 Exposed through /health and compose-level runtime metadata.
AUTH_ALLOW_ANONYMOUS true Development-friendly local default; do not expose beyond localhost without hardening.
PBPK_BIND_HOST 127.0.0.1 Host/interface used by the hardened overlay when publishing the API port.
PBPK_BIND_PORT 8000 Host port used by the hardened overlay when publishing the API port.
AUTH_ISSUER_URL unset Required by the hardened overlay. OIDC issuer used for token validation.
AUTH_AUDIENCE unset Required by the hardened overlay. Audience expected in bearer tokens.
AUTH_JWKS_URL unset Required by the hardened overlay. JWKS endpoint used to validate bearer tokens.

For the default packaged local runtime, use docker-compose.celery.yml through ./scripts/deploy_rxode2_stack.sh. When you need workspace overrides, use ./scripts/deploy_source_overlay_stack.sh, which layers docker-compose.overlay.yml over the same local stack and enables /app/src precedence. For a more production-like local or operator-managed deployment, use ./scripts/deploy_hardened_stack.sh, which layers docker-compose.hardened.yml over the packaged local runtime and waits for stable readiness before returning.

Legacy aliases such as R_PATH, R_HOME, R_LIBS, MCP_MODEL_SEARCH_PATHS, ADAPTER_TIMEOUT_SECONDS, and AUDIT_TRAIL_ENABLED are still accepted for compatibility, but startup now warns that they are scheduled for removal in v0.5.0.

See docker-compose.celery.yml, docker-compose.overlay.yml, docker-compose.hardened.yml, docs/deployment/runtime_patch_flow.md, and docs/deployment/rxode2_worker_image.md for the current operator-facing deployment surface.


Tool catalog

Category Highlight tools Notes
Discovery and curation discover_models, validate_model_manifest, load_simulation Discover models before load, inspect static manifest state, and load supported .pkml or MCP-ready .R files into the live session registry.
Qualification and reporting validate_simulation_request, run_verification_checks, export_oecd_report, ingest_external_pbpk_bundle Run OECD-oriented preflight checks, executable verification with unit/integrity/reproducibility checks, structured dossier export with profile, validation, qualificationState, evidence sections, and a descriptive oecdCoverage map to OECD Tables 3.1/3.2, and normalize externally generated PBPK outputs into the same typed PBPK-side objects.
Simulation control list_parameters, get_parameter_value, set_parameter_value, run_simulation Inspect and modify simulation parameters, then submit deterministic runs asynchronously.
Async status and results get_job_status, get_results, calculate_pk_parameters, cancel_job Track async jobs, retrieve stored deterministic results, compute PK summaries, and cancel queued/running jobs.
Population and exploration run_population_simulation, get_population_results, run_sensitivity_analysis Run population workflows and sensitivity analyses on the backends that declare those capabilities.

The normalized pbpk-mcp.v1 contract currently applies to:

  • load_simulation
  • validate_model_manifest
  • validate_simulation_request
  • run_verification_checks
  • export_oecd_report
  • get_job_status
  • get_results

Example workflow

For the current public workflow:

  1. Call discover_models or inspect /mcp/resources/models.
  2. Call validate_model_manifest before loading a new model.
  3. Call load_simulation.
  4. Call validate_simulation_request for the intended context of use.
  5. Call run_verification_checks when you want a lightweight executable verification summary before broader use or release.
  6. Submit run_simulation or run_population_simulation.
  7. Poll get_job_status, then fetch get_results or get_population_results.
  8. Call export_oecd_report when you need a dossier/report package.

Supported model policy

Supported runtime inputs:

  • .pkml via ospsuite
  • MCP-ready .R via rxode2

Conversion-only inputs:

  • .pksim5 PK-Sim project files
  • Berkeley Madonna .mmd

Both conversion-only formats are rejected early with explicit guidance to export to .pkml or convert to an MCP-ready .R model.

Qualification model:

PBPK MCP keeps these concepts separate:

  • capabilities for operational/runtime support
  • profile for declared scientific metadata
  • validation for preflight applicability and guardrail assessment
  • qualificationState for the derived summary label
  • ngraObjects for PBPK-side typed NGRA-ready objects such as assessmentContext, pbpkQualificationSummary, uncertaintySummary, uncertaintyHandoff, internalExposureEstimate, and uncertaintyRegisterReference
    • uncertaintySummary now also carries additive semanticCoverage so downstream consumers can distinguish declared-only versus quantified PBPK-side variability, sensitivity, and residual uncertainty without inferring that from raw evidence rows alone; it now includes quantifiedRowCount, declaredOnlyRowCount, and a stricter variabilityQuantificationStatus
  • oecdCoverage for a descriptive mapping from exported dossier fields onto OECD reporting-template and evaluation-checklist sections; it does not modify oecdChecklistScore or qualification state

Current derived qualification states include:

  • exploratory
  • illustrative-example
  • research-use
  • regulatory-candidate
  • qualified-within-context

This is the core policy boundary for the server:

  • executable does not mean qualified
  • in-bounds does not mean suitable for regulatory use

For .R models, richer OECD-oriented reporting is enabled by hooks such as:

  • pbpk_model_profile()
  • pbpk_validate_request(...)
  • pbpk_parameter_table(...)
  • pbpk_performance_evidence(...)
  • pbpk_uncertainty_evidence(...)
  • pbpk_verification_evidence(...)
  • pbpk_run_verification_checks(...)
  • pbpk_platform_qualification_evidence(...)

For example, the reference_compound rxode2 model now exports bounded local sensitivity evidence and a compact variability-propagation summary through uncertaintyEvidence, using the currently loaded parameter state rather than a hard-coded placeholder row. Its performanceEvidence is also explicitly classified as runtime-only/internal evidence so executable smoke checks cannot be mistaken for predictive validation.

export_oecd_report now also adds oecdCoverage, an additive coverage map aligned to OECD PBK Guidance Tables 3.1 and 3.2. It is intentionally descriptive: it shows which dossier sections/questions are covered by the current report payload and which remain incomplete, but it does not alter oecdChecklistScore, qualificationState, or any downstream decision boundary.

Researchers can also attach generic companion performance bundles next to either .pkml or .R models without modifying the bridge code. See performance_evidence_bundles.md and the starter template at performance_evidence_bundle.template.json. Those bundles can now carry both row-level evidence and a traceability-only profileSupplement for predictive dataset records, acceptance criteria, and target outputs. The MCP validates row-level claims conservatively and keeps the supplement descriptive rather than silently upgrading the qualification boundary.

modelPerformance can now also declare structured datasetRecords and acceptanceCriteria inside goodnessOfFit, predictiveChecks, or evaluationData. The bridge normalizes those fields and exposes additive traceability counts so predictive support is not reduced to a single status token.

The same pattern now exists for uncertainty evidence. See uncertainty_evidence_bundles.md and uncertainty_evidence_bundle.template.json. The MCP surfaces warnings when uncertainty rows are missing method/summary or scope information, and variability-propagation rows are only treated as quantified propagation evidence when they actually include quantitative outputs such as value, lowerBound, upperBound, mean, or sd.

The same companion-bundle pattern now exists for richer parameter tables. See parameter_table_bundles.md and parameter_table_bundle.template.json. export_oecd_report and validate_model_manifest now surface row-level coverage for sources, citations, distributions, study conditions, and rationale so dossier gaps are visible without custom bridge code.

Structured peer-engagement traceability can now stay inside the existing peerReview profile section. If a model declares reviewRecords, priorRegulatoryUse, revisionStatus, or revisionHistory, the bridge now normalizes those fields and exposes dossier-ready coverage counts instead of treating peer review as a single free-text status flag.

For .pkml models, richer qualification metadata can be supplied with sidecars such as:

  • model.profile.json
  • model.pbpk.json

Running the server

The default local stack is:

  • pbpk_mcp-api-1 on http://127.0.0.1:8000
  • pbpk_mcp-worker-1 on the same pbpk_mcp-worker-rxode2:latest image for execution
  • pbpk_mcp-redis-1 for job brokering and result storage

The preferred local operator entrypoint is:

./scripts/deploy_rxode2_stack.sh

That command recreates the packaged local runtime so the live tool catalog remains aligned with the documented contract.

When you want to iterate against the current workspace source tree instead of the baked image baseline, use:

./scripts/deploy_source_overlay_stack.sh

That profile layers docker-compose.overlay.yml on top of the packaged stack, bind-mounts src/, scripts/, and var/, enables PBPK_ENABLE_SRC_OVERLAY=true, and then waits for stable /health and /mcp/list_tools responses before returning.

When you want the same runtime contract with stricter deployment defaults, use:

AUTH_ISSUER_URL="https://issuer.example" \
AUTH_AUDIENCE="pbpk-mcp" \
AUTH_JWKS_URL="https://issuer.example/.well-known/jwks.json" \
./scripts/deploy_hardened_stack.sh

That overlay disables anonymous access, switches the stack to ENVIRONMENT=production, binds the published API port through PBPK_BIND_HOST / PBPK_BIND_PORT, and still waits for stable /health and /mcp/list_tools responses before returning.


Integrating with coding agents

Add http://localhost:8000/mcp as an MCP provider in Codex CLI/Desktop, Gemini CLI, Claude Code, or other MCP-aware hosts.

Useful companion surfaces:

  • /mcp/list_tools for the live tool catalog
  • /mcp/call_tool for direct HTTP tool execution
  • /mcp/resources/models for discoverable model files
  • /mcp/resources/simulations for already-loaded live sessions
  • /mcp/resources/schemas for the published PBPK-side object schemas and example payloads
  • /mcp/resources/capability-matrix for the live machine-readable runtime-support matrix

Critical tools still declare MCP-side guardrails such as critical and requiresConfirmation in the tool catalog. The current live regression suite checks that the documented workflow is actually exposed through /mcp/list_tools.


Output artifacts

The server currently produces and exposes:

  • discovered model inventories with loaded/unloaded state
  • static manifest validation reports
  • load-time model metadata with backend, capabilities, profile, validation, and qualificationState
  • validation/report payloads with typed PBPK-side NGRA-ready objects such as assessmentContext, pbpkQualificationSummary, uncertaintySummary, uncertaintyHandoff, internalExposureEstimate, uncertaintyRegisterReference, and pointOfDepartureReference
    • uncertainty objects now include machine-readable PBPK-side uncertainty semantics for quantified variability, declared-only residual uncertainty, and still-missing components
  • a BER-ready reference bundle in dossier export that can become ready-for-external-ber-calculation when an external podRef and a resolved PBPK exposure target are both available
  • imported external PBPK run records and NGRA-ready object bundles from ingest_external_pbpk_bundle
  • executable verification summaries with structured check results, smoke-run artifact handles, parameter-unit consistency, result-integrity/reproducibility checks, and verification-evidence snapshots
  • deterministic result handles and stored deterministic result payloads
  • population summary payloads and chunk handles
  • PK metric outputs from calculate_pk_parameters
  • OECD-style dossier/report exports with checklist state, missing-evidence hints, performance evidence, uncertainty evidence, verification evidence, software-platform qualification evidence, and parameter provenance when declared
  • local/internal audit generators can produce verifiable evidence packs for bounded rehearsal models, but those study-specific outputs are not part of the public release surface and should not be treated as benchmark or distribution artifacts

Security checklist

  • ✅ Dual-backend runtime policy is explicit: .pkml and MCP-ready .R are supported; .pksim5 and .mmd are conversion-only.
  • ✅ Discovery, manifest validation, runtime validation, and OECD dossier export are exposed as separate tool surfaces.
  • capabilities, profile, validation, and qualificationState remain distinct, so runtime support is not mislabeled as scientific qualification.
  • ✅ The default local deployment path is now packaged by default, and the source-overlay maintainer path is explicit and opt-in.
  • ✅ Live readiness checks verify /mcp/list_tools, /mcp/resources/models, discovery parity, and conversion-only rejection behavior.
  • ✅ A documented hardened overlay now disables anonymous access by default and requires explicit auth settings before the stack will start.
  • 🔲 The default compose file is still development-oriented; keep it on localhost and use the hardened overlay plus environment-specific ingress controls when moving beyond local development.
  • 🔲 Qualification evidence completeness still depends on the model or sidecar; executable does not imply regulatory readiness.

Current limitations

Format and backend limitations

  • Raw Berkeley Madonna .mmd files are not directly executable.
  • Only .pkml and MCP-ready .R are supported runtime formats.
  • PK-Sim / MoBi project files such as .pksim5 are not loaded directly and must be exported to .pkml.
  • An .R file can be discoverable without being runnable if it does not implement the required contract.
  • Backend capabilities are intentionally not identical across ospsuite and rxode2.
  • Population simulation is currently implemented for rxode2 models, not as a generic OSPSuite .pkml capability.

Qualification limitations

  • validate_model_manifest is a static pre-load check; it does not by itself prove executable correctness or scientific validity.
  • Runtime guardrails are not the same as external scientific validation.
  • ingest_external_pbpk_bundle is a normalization/import path for externally generated PBPK outputs. It does not execute vendor engines such as GastroPlus or Simcyp, and it does not parse proprietary project files directly.
  • run_verification_checks is intentionally lightweight implementation verification; it now includes parameter-unit consistency, structural flow/volume consistency, deterministic integrity, and reproducibility checks, but it still does not replace formal dimensional analysis, full unit-consistency proof across equations, solver qualification, software-platform qualification, or external qualification evidence.
  • export_oecd_report now carries the latest stored run_verification_checks snapshot as executableVerification when one has been run for that loaded simulation. The report does not silently rerun verification during export.
  • Model-specific hooks can add executable qualification checks such as flow/volume consistency, mass balance, or solver-stability comparisons, but those checks should still be interpreted as implementation evidence within the declared context, not as blanket regulatory qualification.
  • Bundled uncertaintyEvidence can now include bounded local sensitivity screens and compact variability-propagation summaries for specific models, but those rows are still internal quantitative evidence. They are not the same thing as variance-based global sensitivity analysis, posterior uncertainty propagation, or externally reviewed qualification studies.
  • research-use or illustrative-example models should not be represented as regulatory-ready.
  • OECD-style metadata completeness can be high while scientific evidence remains incomplete.
  • Exported performanceEvidence is now explicitly classified and bounded, but many models will still only carry runtime/internal evidence rather than observed-versus-predicted datasets, predictive evaluation sets, or external qualification studies.
  • Parameter provenance may be structured and exportable while still lacking full per-parameter citations, study conditions, or identifiability evidence.

Operational limitations

  • rxode2 image builds are heavy and should be prebuilt rather than compiled inside a capped runtime worker.
  • Runtime workers should stay conservatively capped, such as 4 GiB.
  • Durable rxode2 image builds can take a long time on laptop hardware because of C/C++ compilation.
  • The default local deployment now runs the packaged baseline, but active workspace development should still go through the documented deploy entrypoints so the intended runtime mode is explicit.
  • The hardened overlay improves auth defaults and bind behavior, but it is still an operator-side compose profile rather than a full production deployment package with ingress, secret rotation, or managed identity infrastructure.

Development notes

Recommended checks:

python3 scripts/check_runtime_contract_env.py
python3 -m unittest -v tests/test_load_simulation_contract.py
python3 -m unittest -v tests/test_model_manifest.py
python3 -m unittest -v tests/test_oecd_bridge.py
python3 -m unittest -v tests/test_model_discovery_live_stack.py
python3 -m unittest -v tests/test_oecd_live_stack.py
python3 scripts/release_readiness_check.py
python3 scripts/workspace_model_smoke.py --auth-dev-secret pbpk-local-dev-secret

make runtime-contract-test in the public repository now runs the same dependency preflight first, runs the named misuse-prevention gate, reruns the strict curated-manifest NGRA gate for the bundled example models, builds a temporary sdist and wheel outside the repo worktree, validates that the normative contract files survive source distribution packaging, and performs an installed-package check of mcp_bridge.contract against the built wheel so the published contract artifacts are validated across both source and distribution boundaries.

The same strict curated-manifest NGRA gate is now part of make runtime-contract-test, scripts/release_readiness_check.py, and the Release Artifacts workflow, so bundled example models cannot quietly regress to implicit NGRA workflow role, population support, evidence basis, or claim-boundary declarations on the normal release-prep path.

That gate is now selected through --curated-publication-set, so the Makefile, workflow, and manual release checklist all use one curated model inventory instead of duplicating path lists.

Human-process release artifacts now live alongside the runtime gates:

  • docs/hardening_migration_notes.md
  • docs/pbpk_model_onboarding_checklist.md
  • docs/pbk_reviewer_signoff_checklist.md
  • docs/post_release_audit_plan.md

Repository automation is split intentionally:

  • normal CI should catch contract drift in the local runtime layer quickly
  • the dedicated Release Artifacts workflow should be used on tags or release-prep runs to validate and retain the exact published sdist and wheel
  • the heavier catalog-wide model smoke should run as a dedicated workflow or release gate because it builds the full worker image and executes the live stack

The published pbpk-mcp.v1 contract manifest now distinguishes:

  • normative artifacts that define the machine-readable contract
  • supporting artifacts that help users adopt and publish the contract correctly
  • legacy-excluded artifacts that remain in the repo but are intentionally outside the PBPK-side object family

Maintainer workflow

If you are maintaining the local stack in the current convergence stage:

  1. Change the authoritative runtime files in the packaged modules under src/, the local runtime helpers under scripts/, scripts/ospsuite_bridge.R, or the bundled .R model modules.
    • generic MCP namespaces, tool modules, and adapter/runtime files now live under src/
    • the local operator/runtime helper surface now lives in the deploy scripts, worker Dockerfile, and overlay hook rather than a tracked patches/ implementation layer
  2. If the worker image baseline should change, rebuild it with ./scripts/build_rxode2_worker_image.sh.
  3. Recreate the packaged local stack with ./scripts/deploy_rxode2_stack.sh.
    • When you need workspace overrides for active development, use ./scripts/deploy_source_overlay_stack.sh.
    • When you need stricter auth defaults, use ./scripts/deploy_hardened_stack.sh with AUTH_ISSUER_URL, AUTH_AUDIENCE, and AUTH_JWKS_URL set.
  4. Verify the live API with:
    • curl -s http://localhost:8000/health
    • curl -s http://localhost:8000/mcp/list_tools
    • python3 scripts/release_readiness_check.py
    • python3 scripts/workspace_model_smoke.py --auth-dev-secret pbpk-local-dev-secret
    • python3 scripts/workspace_model_smoke.py --include-population --auth-dev-secret pbpk-local-dev-secret

Important boundaries:

  • scripts/deploy_rxode2_stack.sh is the preferred packaged local operator entrypoint.
  • scripts/deploy_source_overlay_stack.sh is the explicit maintainer entrypoint when workspace src/ and scripts/ should override the image.
  • scripts/deploy_hardened_stack.sh is the stricter operator entrypoint when you need non-anonymous auth defaults on the same packaged local stack.
  • docker/rxode2-worker.Dockerfile now owns the baked baseline worker assets directly, including scripts/ospsuite_bridge.R and the bundled reference .R model.
  • the local stack now defaults to the packaged runtime, while the overlay .pth remains only for the explicit source-overlay maintainer profile.

Useful repository guideposts

  • scripts/ospsuite_bridge.R
  • scripts/runtime_src_overlay.pth
  • docker/rxode2-worker.Dockerfile
  • scripts/workspace_model_smoke.py
  • src/mcp_bridge/model_catalog.py
  • src/mcp_bridge/model_manifest.py
  • src/mcp/__init__.py
  • src/mcp/tools/discover_models.py
  • src/mcp/tools/load_simulation.py
  • src/mcp/tools/get_job_status.py
  • src/mcp/tools/validate_simulation_request.py
  • src/mcp/tools/run_verification_checks.py
  • src/mcp/tools/validate_model_manifest.py
  • src/mcp/tools/export_oecd_report.py
  • src/mcp/tools/get_results.py
  • src/mcp/tools/ingest_external_pbpk_bundle.py
  • src/mcp/tools/run_population_simulation.py
  • src/mcp_bridge/adapter/__init__.py
  • src/mcp_bridge/adapter/interface.py
  • src/mcp_bridge/adapter/ospsuite.py
  • docs/architecture/dual_backend_pbpk_mcp.md
  • docs/deployment/runtime_patch_flow.md
  • docs/integration_guides/rxode2_adapter.md
  • docs/integration_guides/ospsuite_profile_sidecars.md
  • docs/github_publication_checklist.md
  • docs/hardening_migration_notes.md
  • docs/pbpk_model_onboarding_checklist.md
  • docs/pbk_reviewer_signoff_checklist.md
  • docs/post_release_audit_plan.md

Contributing

Use the clean GitHub clone as the release source of truth when publishing. Before proposing a release or publication update, run the readiness checks above and review the publication checklist in docs/github_publication_checklist.md, the migration notes in docs/hardening_migration_notes.md, the model-onboarding checklist in docs/pbpk_model_onboarding_checklist.md, the reviewer checklist in docs/pbk_reviewer_signoff_checklist.md, and the post-release audit plan in docs/post_release_audit_plan.md.

For the published repo, follow CONTRIBUTING.md.


Security policy

The default local compose stack is development-oriented and currently allows anonymous local use. Do not expose it beyond localhost without hardening authentication, ingress, and runtime resource limits.

For the published repo, follow SECURITY.md.


Code of conduct

Community interactions for the published repo should follow CODE_OF_CONDUCT.md.


Citation

If you use PBPK MCP in your work, please cite the BioRxiv preprint:

Djidrovski, I. ToxMCP: Guardrailed, Auditable Agentic Workflows for Computational Toxicology via the Model Context Protocol. bioRxiv (2026). https://doi.org/10.64898/2026.02.06.703989

@article{djidrovski2026toxmcp,
  title   = {ToxMCP: Guardrailed, Auditable Agentic Workflows for Computational Toxicology via the Model Context Protocol},
  author  = {Djidrovski, Ivo},
  journal = {bioRxiv},
  year    = {2026},
  doi     = {10.64898/2026.02.06.703989},
  url     = {https://doi.org/10.64898/2026.02.06.703989}
}

Citation metadata: CITATION.cff


Roadmap

  • Keep reducing the remaining source-overlay-specific maintainer surface now that the default local runtime is packaged by default.
  • Add richer quantitative uncertainty, sensitivity, and implementation-verification evidence to strengthen regulatory-facing qualification.
  • Expand dossier-grade parameter provenance and model-performance evidence across curated example and production models.
  • Keep improving release automation so image build, hot patching, docs, and version markers stay aligned.

License

Apache-2.0

About

MCP server for PBPK modeling workflows (simulation control + PK analytics).

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 79.6%
  • R 18.8%
  • Other 1.6%