A minimal, monochrome web interface for the pyreflect neutron reflectivity analysis package.
Neutron reflectivity is an experimental technique used to study the internal structure of thin films and layered materials at the nanometer scale, with applications in batteries, semiconductors, polymers, magnetic materials, and surface chemistry. Experiments measure how neutrons reflect off a material, but translating that data into a meaningful depth profile is a difficult inverse problem that traditionally requires expert knowledge and slow, manual fitting. pyreflect uses machine learning to automate and accelerate this process by learning the relationship between measured reflectivity curves and underlying material structure. This interface makes that capability accessible and interactive, enabling faster analysis, easier exploration of material behavior, and quicker real-world scientific and engineering decisions.
Full documentation here: https://deepwiki.com/Northeastern-Research-ORNL-1/pyreflect-interface/5-storage-and-persistence
- v0.1.2 01/21/2026 — Model bundles (.npy + .pth) on HuggingFace, pipeline documentation, production hardening + whitelist-only higher limits, checkpoints, and controls.
- v0.1.1 01/14/2026 — GitHub auth, explore/history sidebar, download bundle support, and GPU compute.
This interface aims to fully expose all parameters from the pyreflect package, making them adjustable through the UI without requiring users to dig into the code.
| Category | Exposed | Total | Coverage |
|---|---|---|---|
| Film Layer Properties | 4 | 4 | ✅ 100% |
| Generator Settings | 3 | 8 | 🔶 38% |
| CNN Training | 4 | 7 | 🔶 57% |
| AE/MLP (Chi Prediction) | 3 | 5 | 🔶 60% |
| Overall | 14 | 24 | 58% |
| Parameter | Location | Default | Notes |
|---|---|---|---|
sld |
Film Layer | varies | Scattering Length Density (0–10) |
isld |
Film Layer | 0 | Imaginary SLD (0–1) |
thickness |
Film Layer | varies | Layer thickness in Å (0–1000) |
roughness |
Film Layer | varies | Interface roughness in Å (0–200) |
numCurves |
Generator | 1000 | Number of synthetic curves |
numFilmLayers |
Generator | 5 | Number of material layers |
layerBound |
Generator | — | Per-layer min/max bounds |
batchSize |
Training | 32 | CNN training batch size |
epochs |
Training | 10 | CNN training epochs |
layers |
Training | 12 | CNN convolutional layers |
dropout |
Training | 0.0 | CNN dropout rate |
latentDim |
Training | 16 | Autoencoder latent dimension |
aeEpochs |
Training | 50 | Autoencoder training epochs |
mlpEpochs |
Training | 50 | MLP training epochs |
These parameters directly affect the physics simulation via refl1d.
-
qResolution— Beam Q resolution (default: 0.0294855) -
qMin— Minimum Q value (default: 0.0081 Å⁻¹) -
qMax— Maximum Q value (default: 0.1975 Å⁻¹) -
numQPoints— Number of Q points (default: 308) -
scale— Overall intensity scale factor (default: 1.0) -
background— Background signal level (default: 0.0)
Common ML hyperparameters most researchers want to tune.
-
learningRate— Optimizer learning rate (default: 0.001) -
validationSplit— Train/validation split ratio (default: 0.2) -
optimizer— Optimizer type: Adam, AdamW, SGD (default: Adam)
Architecture parameters for power users; exposed in an "Advanced" panel.
-
kernelSize— CNN Conv1d kernel size (default: 51) -
sldOutputPoints— SLD profile output resolution (default: 900) -
vaeBeta— VAE KL divergence weight (0 = AE, >0 = VAE) -
aeHiddenLayers— Autoencoder hidden layer sizes (default: [500, 300, 200, 72])
Data preprocessing options for experimental workflows.
-
applyLogTransform— Log10 transform NR y-axis (default: true) -
normalizationMethod— 'minmax' or 'zscore' (default: minmax) -
clipMin— Minimum clip value for log transform (default: 1e-8)
| Date | Phase | Changes |
|---|---|---|
| 2026-01-26 | — | Initial roadmap created |
| — | Phase 1 | pending |
| — | Phase 2 | pending |
| — | Phase 3 | pending |
| — | Phase 4 | pending |
- App:
https://pyreflect.shlawg.com - API:
https://api.shlawg.com
The hosted deployment runs with the full stack enabled: Redis job queue + Modal GPU burst workers, MongoDB history persistence, and Hugging Face model storage.
See docs/FLOW.md for a detailed pipeline diagram.
- Adjustable Parameters: Film layers (SLD, thickness, roughness), generator settings, training configuration
- Manual Layer Bounds: Set min/max variation ranges per layer parameter for synthetic data generation (notebook-parity
layer_boundsupport) - Ground Truth vs Predicted: NR and SLD charts show both ground truth and model predictions
- Graph Visualization: Downloadable & interactive NR curves, SLD profiles, training loss, Chi parameter plots
- Real-time Updates: Instant parameter feedback with generate-on-demand
- Editable Values: Click any numeric value to type custom inputs beyond slider limits
- Live Streaming Logs: Real-time training progress streamed from backend via SSE
- Timing + Warnings: Generation/training/inference timings and backend warnings streamed to console
- Data Upload: Drag-and-drop upload for
.npydatasets and.pthmodel weights - Background Jobs: Redis + RQ queue for non-blocking training runs
- Controls: Buttons for stop, cancel, resume, pause, download, etc. for each job.
- GPU Training: Modal GPU burst workers (spin up on demand, scale to zero)
- Checkpointing: Periodic checkpoint saves to HuggingFace for crash recovery and pause/resume
- Cloud Storage: Hugging Face model artifacts + MongoDB history persistence
- State Persistence: Parameters and results persist across browser refreshes
- Reset + Collapse: One-click reset to example defaults and per-layer collapse/expand controls
| Parameter | Local | Production |
|---|---|---|
| Curves | 100,000 | 5,000 |
| Epochs | 1,000 | 50 |
| Batch Size | 512 | 64 |
| CNN Layers | 20 | 12 |
| Dropout | 0.9 | 0.5 |
| Latent Dim | 128 | 32 |
| AE/MLP Epochs | 500 | 100 |
Higher limits in production are allowlist-only.
- The frontend sends
X-User-IDas your GitHub username (login). - The backend uses
LIMITS_WHITELIST_USER_IDS(comma-separated GitHub usernames) to decide who gets local/unlimited limits. - If you are not allowlisted, the UI shows a lock icon and the Limits modal provides contact info.
pyreflect-interface/
├── src/
│ ├── interface/ # Next.js frontend
│ └── backend/ # FastAPI backend
│ ├── main.py # API server
│ ├── settings.yml # Config (auto-generated)
│ └── data/ # Uploaded datasets & models
│ └── curves/ # NR/SLD curve files
└── README.md
Note: The
pyreflectpackage is installed directly from GitHub rather than bundled in this repo.
Each training run creates a folder on HuggingFace with all artifacts bundled together:
models/{model_id}/
├── {model_id}.pth # Trained CNN model weights
├── nr_train.npy # NR curves (N × 2 × 308)
└── sld_train.npy # SLD profiles (N × 2 × 900)
Object Storage:
https://huggingface.co/datasets/Northeastern-Research-ORNL-1/models/tree/main
The .npy training data files are uploaded immediately after data generation (before training begins). This ensures:
- Fault Tolerance: If training fails (e.g., OOM, timeout), the generated data is preserved.
- Retry Efficiency: Retries can reuse the existing
.npyfiles instead of regenerating them. - Data Reuse: Datasets can be downloaded and shared between team members or used for external analysis.
flowchart LR
subgraph Browser["Browser"]
UI[Next.js Frontend]
Charts["Dual-line Charts"]
LS[(localStorage)]
end
UI <--> LS
UI --> Charts
UI -->|REST + SSE| API[FastAPI Backend]
API -->|groundTruth + predicted| Charts
flowchart LR
subgraph Backend["FastAPI Backend"]
API[REST API]
SSE[SSE Stream]
Pipeline[ML Pipeline]
DataStore[(Data Store)]
end
subgraph Queue["Redis + RQ"]
RQ[(training queue)]
Meta[(job meta)]
end
subgraph Modal["Modal GPU"]
Poller[poll_queue]
Worker[T4 GPU Worker]
end
subgraph Checkpoints["HuggingFace"]
HFModels[(Models Repo)]
HFCheckpoints[(Checkpoints Repo)]
end
API --> RQ
API -->|trigger| Poller
Poller -->|spawn| Worker
Worker -->|consume| RQ
Worker -->|save/load| HFCheckpoints
Worker -->|upload model| HFModels
Worker <-->|progress| Meta
API --> Pipeline
API --> DataStore
flowchart LR
Backend[FastAPI Backend]
Worker[Modal GPU Worker]
subgraph Storage["Integrations"]
Mongo[(MongoDB)]
HF[(Hugging Face)]
end
subgraph PyReflect["pyreflect"]
Gen[DataGenerator]
CNN[CNN Model]
Physics[refl1d / refnx]
end
Backend --> Mongo
Backend --> HF
Worker --> Mongo
Worker --> HF
Backend --> PyReflect
For detailed diagrams of the data generation, preprocessing, training, model saving, and inference phases, see docs/FLOW.md.
| Field | Source | Description |
|---|---|---|
nr.groundTruth |
refl1d | True reflectivity from physics simulation |
nr.computed |
refl1d | Same as groundTruth (future: compute from predicted SLD) |
sld.groundTruth |
refl1d | True SLD profile from physics simulation |
sld.predicted |
CNN | Model prediction given the NR curve as input |
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check |
/api/limits |
GET | Current limits + access status |
/api/defaults |
GET | Default parameters |
/api/status |
GET | Backend status and data files |
| Endpoint | Method | Description |
|---|---|---|
/api/generate |
POST | Generate NR/SLD curves (non-streaming) |
/api/generate/stream |
POST | Generate with SSE log stream |
| Endpoint | Method | Description |
|---|---|---|
/api/history |
GET | List saved generations |
/api/history |
POST | Save a generation manually |
/api/history/{id} |
GET | Get full details of a save |
/api/history/{id} |
PATCH | Rename a saved generation |
/api/history/{id} |
DELETE | Delete a saved generation and its model |
| Endpoint | Method | Description |
|---|---|---|
/api/models/upload |
POST | Receive model upload from worker |
/api/models/{model_id} |
GET | Download a saved model |
/api/models/{model_id} |
DELETE | Delete a local model file |
/api/models/{model_id}/info |
GET | Get model size and source |
/api/upload |
POST | Upload files (+ optional roles) |
| Endpoint | Method | Description |
|---|---|---|
/api/jobs/submit |
POST | Submit job to queue (non-blocking) |
/api/jobs/{job_id} |
GET | Get job status, progress, and result |
/api/jobs/{job_id} |
DELETE | Cancel a queued job |
/api/jobs/{job_id}/name |
PATCH | Rename a queued job |
/api/jobs/{job_id}/retry |
POST | Retry a failed/finished job |
/api/jobs/{job_id}/stop |
POST | Stop job immediately (no checkpoint) |
/api/jobs/{job_id}/pause |
POST | Pause job and save checkpoint |
/api/jobs/{job_id}/delete |
DELETE | Delete a job record (non-running only) |
/api/jobs/{job_id}/claim |
POST | Attach a job to a user (login mid-run) |
/api/jobs/purge |
DELETE | Delete non-running jobs for a user |
/api/jobs/{job_id}/force-purge |
POST | Force purge a zombie job (admin) |
| Endpoint | Method | Description |
|---|---|---|
/api/checkpoints |
GET | List all available checkpoints |
/api/checkpoints/{job_id}/resume |
POST | Resume training from checkpoint |
/api/checkpoints/{job_id} |
DELETE | Delete a checkpoint |
| Endpoint | Method | Description |
|---|---|---|
/api/queue |
GET | Queue status and worker info |
/api/queue/spawn |
POST | Trigger remote worker spawn (debug) |
/api/queue/cleanup |
POST | Trigger stale job cleanup (admin) |
The system includes automatic detection and cleanup of "zombie" jobs - jobs that get stuck in "started" state when their worker dies unexpectedly (Modal container killed, OOM, heartbeat timeout, etc.).
flowchart TB
subgraph Normal["Normal Job Flow"]
Submit[Job Submitted]
Queue[(Redis Queue)]
Worker[Modal GPU Worker]
Complete[Job Complete]
end
subgraph Failure["Worker Death (Zombie Scenario)"]
Started[Job Started]
Death[Worker Dies]
Zombie[Zombie Job<br/>stuck in 'started']
end
subgraph Detection["Automatic Cleanup"]
Cleanup[Stale Job Detector<br/>runs every 60s]
Check{updated_at<br/>older than 10min?}
Purge[Purge from Redis]
MarkFailed[Mark as Failed]
end
Submit --> Queue --> Worker --> Complete
Started --> Death --> Zombie
Zombie --> Cleanup
Cleanup --> Check
Check -->|Yes| Purge --> MarkFailed
Check -->|No| Wait[Keep Monitoring]
Workers update job.meta.updated_at every ~1 second during execution. The stale job detector:
- Scans the started registry (
rq:wip:training,rq:started:training) - Checks each job's
meta.updated_attimestamp - If older than
STALE_JOB_THRESHOLD_S(default: 600 seconds / 10 minutes), marks it as stale - Purges stale jobs from Redis registries and marks them as failed
| Environment Variable | Default | Description |
|---|---|---|
STALE_JOB_THRESHOLD_S |
600 | Seconds before a job is considered stale |
STALE_JOB_CLEANUP_INTERVAL_S |
60 | How often the cleanup task runs |
Manual cleanup (admin only):
# Dry-run: see what would be cleaned
curl -X POST "http://localhost:8000/api/queue/cleanup?dry_run=true" \
-H "X-Admin-Token: YOUR_ADMIN_TOKEN"
# Actually clean up stale jobs
curl -X POST "http://localhost:8000/api/queue/cleanup" \
-H "X-Admin-Token: YOUR_ADMIN_TOKEN"
# Force purge a specific job
curl -X POST "http://localhost:8000/api/jobs/JOB_ID/force-purge" \
-H "X-Admin-Token: YOUR_ADMIN_TOKEN"The /api/jobs/{job_id}/stop endpoint:
- Sets
meta.stop_requested = true(checked by worker between phases/epochs) - Sends RQ
stop-jobcommand to kill the workhorse process immediately - Removes job from queue/started registries
- Updates meta to show "stopped" status in UI
This handles both graceful stops (worker sees flag) and hard stops (worker process killed).
Training jobs can be paused and resumed across worker restarts or crashes. Checkpoints are stored on HuggingFace Hub in a dedicated dataset repo.
flowchart TB
subgraph Training["Training Loop"]
Epoch[Epoch N]
Check{N % 5 == 0?}
Save[Save Checkpoint to HF]
Continue[Continue Training]
end
subgraph Pause["Pause Flow"]
PauseBtn[User clicks Pause]
SetFlag[Set pause_requested in Redis]
Worker[Worker checks flag]
SaveImmediate[Save checkpoint immediately]
Exit[Exit with status: paused]
end
subgraph Resume["Resume Flow"]
ResumeBtn[User clicks Resume]
NewJob[Create new job with same params]
LoadCheckpoint[Load checkpoint from HF]
RestoreState[Restore model + optimizer state]
ContinueFrom[Continue from epoch N]
end
subgraph Storage["HuggingFace Hub"]
HFRepo[(Checkpoints Repo<br/>job_id.pth)]
end
Epoch --> Check
Check -->|Yes| Save --> Continue
Check -->|No| Continue
Save --> HFRepo
PauseBtn --> SetFlag --> Worker --> SaveImmediate --> HFRepo
SaveImmediate --> Exit
ResumeBtn --> NewJob --> LoadCheckpoint
HFRepo --> LoadCheckpoint
LoadCheckpoint --> RestoreState --> ContinueFrom
Each checkpoint ({job_id}.pth) contains:
| Field | Description |
|---|---|
epoch |
Last completed epoch number |
model_state_dict |
Full model weights |
optimizer_state_dict |
Optimizer state (Adam momentum, etc.) |
train_losses |
Training loss history |
val_losses |
Validation loss history |
best_val_loss |
Best validation loss seen |
nr_stats, sld_stats |
Normalization statistics |
Pause vs Stop:
| Action | Saves Checkpoint? | Can Resume? | Use Case |
|---|---|---|---|
| Pause | Yes | Yes | Want to continue later |
| Stop | No | No | Abandon training |
Configuration:
| Environment Variable | Default | Description |
|---|---|---|
CHECKPOINT_EVERY_N_EPOCHS |
5 | Save checkpoint every N epochs |
HF_CHECKPOINT_REPO_ID |
- | HuggingFace dataset repo for checkpoints |
The checkpoint repo should be a HuggingFace dataset type repo (e.g., org/checkpoints).
cd src/backend
uv sync
uv run uvicorn main:app --reload --port 8000Backend runs at http://localhost:8000
cd src/interface
bun install
bun devFrontend runs at http://localhost:3000
For GPU-accelerated training (serverless, pay-per-use), deploy the Modal worker.
Important:
- Your backend must enqueue to a Redis instance reachable from Modal (
REDIS_URL). - Disable the backend's local worker so jobs aren't consumed on CPU (
START_LOCAL_RQ_WORKER=false). REDIS_URL=redis://localhost:6379will NOT work with Modal (localhost is inside the Modal container).
cd src/backend
# Install backend + dev deps (includes Modal CLI)
uv sync
# If you still see `modal: command not found`, force-install the dev group:
# uv sync --group dev
# Auth (pick one)
# Option A: browser/OAuth flow
uv run modal setup
#
# Option B: token flow (Modal dashboard -> Settings -> Tokens)
uv run modal token set --token-id <token-id> --token-secret <token-secret>
# Add your Redis secret (must match backend REDIS_URL).
# Modal containers can't read your local `.env`, and you shouldn't bake secrets into the image.
uv run modal secret create --force pyreflect-redis REDIS_URL="redis://:PASSWORD@YOUR_PUBLIC_REDIS_HOST:6379"
# Deploy (cron polls Redis and spawns a GPU RQ worker only when jobs are pending)
uv run modal deploy modal_worker.pyThe worker automatically:
- Spins up a T4 GPU when jobs are queued
- Runs the same
service.jobs.run_training_jobcode as local workers (progress, results, model uploads) - Scales down when idle (no cost)
Verify end-to-end:
- Backend:
GET /api/queueshould showlocal_worker_enabled: falseandremote_workers_compatible: true. - When you enqueue a training job,
queued_jobsshould become> 0briefly. - Modal logs should show
pending=<N>and thenStarting RQ SimpleWorker ... (burst mode):
cd src/backend
uv run modal app logs pyreflect-worker --timestampsStop/Undeploy:
cd src/backend
uv run modal app stop pyreflect-workerIf your Redis runs on your own machine, Modal can only reach it if it's reachable from the public internet. That usually means your machine has a public IP (or you set up port-forwarding), and Redis is configured to accept remote connections securely.
Minimum checklist (Redis host):
- Configure Redis to listen on a reachable interface (
bind 0.0.0.0or your public NIC) and require auth (requirepassor ACLs). - Open firewall / router port-forward for TCP
6379to the Redis host. - Confirm connectivity from outside your network:
redis-cli -h <public-host> -a <password> ping(should returnPONG).
If you can't safely expose Redis publicly, use a managed Redis (Upstash / Redis Cloud) and point both the backend and Modal at it.
No. uv run modal deploy ... deploys the Modal app to Modal's infra and runs independently. Starting uvicorn only starts the API server.
modal deploy registers your functions + schedule. In this project, the GPU worker is spawned by poll_queue on a cron (* * * * *).
To start immediately (for testing), run the poller once:
cd src/backend
uv run modal run modal_worker.py::poll_queue# Kill process on port 8000
lsof -ti:8000 | xargs kill -9
# Kill process on port 3000
lsof -ti:3000 | xargs kill -9To deploy with resource limits (prevents abuse):
Option 1: Environment variable
PRODUCTION=true uv run uvicorn main:app --port 8000Option 2: Create .env file in src/backend/
# .env
PRODUCTION=true
# CORS (comma-separated origins)
CORS_ORIGINS=http://localhost:3000,https://your-app.vercel.app
# Redis queue (required for background jobs in the UI)
REDIS_URL=redis://localhost:6379
RQ_JOB_TIMEOUT=2h
# Disable local worker if using Modal/remote GPU workers
START_LOCAL_RQ_WORKER=false
# Optional: enable history + model downloads
#MONGODB_URI=mongodb+srv://...
#HF_TOKEN=hf_...
#HF_REPO_ID=your-username/pyreflect-models
# Optional: override individual limits
MAX_CURVES=5000
MAX_EPOCHS=50
MAX_BATCH_SIZE=64
MAX_CNN_LAYERS=12
MAX_DROPOUT=0.5
MAX_LATENT_DIM=32
MAX_AE_EPOCHS=100
MAX_MLP_EPOCHS=100Then run normally:
uv run uvicorn main:app --port 8000If you want the backend + Redis on your own machine (and Modal only for GPU), the minimum flow is:
- On the bare-metal host, run Redis and make it reachable from Modal (see "Bare-metal Redis" above).
- Point the backend to that same
REDIS_URLand disable the local worker:
cd src/backend
cp .env.example .env
# Edit:
# REDIS_URL=redis://:PASSWORD@<your-public-host>:6379
# START_LOCAL_RQ_WORKER=false
uv sync
uv run uvicorn main:app --host 0.0.0.0 --port 8000- Run the frontend either on the same host or locally, pointing it at your backend:
cd src/interface
NEXT_PUBLIC_API_URL=http://<baremetal-host>:8000 bun devNote: Modal workers do not share your bare-metal filesystem. If you need model files to persist, configure Hugging Face uploads (HF_TOKEN, HF_REPO_ID) or another shared storage mechanism.
cd src/interface
vercel| Variable | Value |
|---|---|
NEXT_PUBLIC_API_URL |
https://your-backend.railway.app (or wherever backend is hosted) |
In your backend .env, add your Vercel URL:
CORS_ORIGINS=http://localhost:3000,https://your-app.vercel.appProduction limits:
| Parameter | Local | Production |
|---|---|---|
| Curves | 100,000 | 5,000 |
| Epochs | 1,000 | 50 |
| Batch Size | 512 | 64 |
| CNN Layers | 20 | 12 |
| Dropout | 0.9 | 0.5 |
| Latent Dim | 128 | 32 |
| AE/MLP Epochs | 500 | 100 |
-
Adjust parameters in the left sidebar:
- Film Layers: Add/remove layers, adjust SLD, thickness, roughness
- Generator: Set number of curves and layers
- Training: Configure batch size, epochs, dropout, etc.
-
Click GENERATE to compute and visualize:
- NR Chart: Ground truth (solid) vs Computed (dashed)
- SLD Profile: Ground truth (solid black) vs Predicted (dashed red)
- Training Loss: Training and validation loss curves
- Chi Parameters: Scatter plot of actual vs predicted SLD values
-
Tips:
- Click any numeric value to type a custom number (e.g., 50000 curves)
- Watch the console for real-time training progress, warnings, and timing
- Use RESET to restore the example defaults
- Use COLLAPSE/EXPAND to manage long film layer lists
- Export individual graphs as CSV or all data as JSON
- Charts show model predictions compared to ground truth after training
For pretrained models or existing datasets, use the Data & Models section:
You do not need to manually place files in backend folders if you upload through the UI.
Pick the correct role and the backend stores the file + updates settings.yml.
For your files specifically:
NR_EXP.npy→ upload asexperimental_nrnr-5-train.npy→ upload asnr_trainsld-5-train.npy→ upload assld_traintrained_nr_sld_model_no_dropout.pt→ upload asnr_sld_model- (optional)
normalization_stat.npy→ upload asnormalization_stats
Role mapping (what goes where):
| Upload role | Expected content | Stored on disk | Updated settings.yml key |
|---|---|---|---|
nr_train |
NR training curves (.npy) |
src/backend/data/curves/ |
nr_predict_sld.file.nr_train |
sld_train |
SLD training curves (.npy) |
src/backend/data/curves/ |
nr_predict_sld.file.sld_train |
experimental_nr |
Experimental NR curves (.npy) |
src/backend/data/expt/ |
nr_predict_sld.file.experimental_nr_file |
nr_sld_model |
NR→SLD weights (.pth / .pt) |
src/backend/data/models/ |
nr_predict_sld.models.model |
normalization_stats |
Normalization stats (.npy / .npz / .json) |
src/backend/data/ (saved canonical as .npy) |
nr_predict_sld.models.normalization_stats |
sld_chi_experimental_profile |
Experimental SLD profile (.npy) |
src/backend/data/ |
sld_predict_chi.file.model_experimental_sld_profile |
sld_chi_model_sld_file |
SLD→Chi SLD training file (.npy) |
src/backend/data/ |
sld_predict_chi.file.model_sld_file |
sld_chi_model_chi_params_file |
SLD→Chi chi-params file (.npy) |
src/backend/data/ |
sld_predict_chi.file.model_chi_params_file |
Shape handling and canonicalization:
- Canonical schema:
nr_train:(N, 2, 308)experimental_nr:(N, 2, 308)sld_train:(N, 2, 900)
- Accepted raw variants include
(N,2,L),(2,L),(L,2), and NR(L,3)/(3,L). - For 3-channel NR inputs, channel 3 is treated as uncertainty/error and dropped.
- This is why you can start with 3 values per point (
q, R, dR) and end with 2 channels (q, R) in canonical data.
- This is why you can start with 3 values per point (
- Hard checks run before train/infer:
- minimum point count
- finite values only (no NaN/Inf)
- strict NR q-range gate:
[0.0081, 0.1975](out-of-range is rejected)
- If an
experimental_nrupload fails q-range validation, the UI console now adds a targeted warning:warning: experimental_nr q must stay in [0.0081, 0.1975]. Crop out-of-range rows, then re-upload.
- Curves are resampled to fixed grids (
308NR,900SLD). - NR preprocessing remains training-compatible:
log10(clip(R, 1e-8)).
Conceptual Notes:
- Why force
308points?
The NR model was trained to read exactly308input points, always in the same order on the q-axis. If input length changes, the model no longer sees the structure it learned. - Why is there a q-range limit (
0.0081to0.1975)?
Training data used that q window, and normalization/preprocessing were built for that same window. Staying in-range keeps inference consistent with training. - Why not just use my file max q (for example
0.277)?
Then each input index maps to a different q position than the model expects. That is a domain mismatch and can degrade predictions silently. - What does auto-crop do?
It keeps only rows whereqis inside[0.0081, 0.1975], drops the rest, then resamples to308. This preserves model compatibility, but intentionally discards out-of-range information. - Why go from 3 values to 2 values for NR?
Raw experimental files may contain(q, R, dR). Canonical model input is(q, R)only, sodR(uncertainty) is dropped for this NR→SLD model path.
Operational notes:
- Each upload writes a local conversion report to
src/backend/data/upload_reports/. - If Hugging Face storage is configured, upload lineage is published as:
uploads/{user_or_anonymous}/{upload_id}/{role}/...- containing
raw + canonical + report.
Which files are required depends on workflow/mode:
workflow=nr_sld,mode=train:nr_train,sld_train(+nr_sld_modelandnormalization_statsonly if auto-generate is disabled)workflow=nr_sld,mode=infer:experimental_nr,nr_sld_model,normalization_statsworkflow=nr_sld_chi,mode=train:nr_train,sld_train,sld_chi_model_sld_file,sld_chi_model_chi_params_file(+ optional model/stats as above)workflow=nr_sld_chi,mode=infer:experimental_nr,nr_sld_model,normalization_stats,sld_chi_model_sld_file,sld_chi_model_chi_params_fileworkflow=sld_chi:sld_chi_experimental_profile,sld_chi_model_sld_file,sld_chi_model_chi_params_file
Train mode model/stats behavior:
- Auto-generate model + stats: trains a fresh NR→SLD model and writes fresh normalization stats.
- Reuse existing model + stats: skips NR→SLD retraining and uses currently configured/uploaded model + stats paths.
- First run: model-only bootstrap (with reuse enabled): if normalization stats are missing, the backend derives stats once from
nr_train/sld_train, then reuses the uploaded model.- This still requires
nr_trainandsld_trainto be present.
- This still requires
Hugging Face quick access:
- In the UI Required Uploads panel, a quick link appears (when
HF_REPO_IDis configured) to browse model artifacts:https://huggingface.co/datasets/{HF_REPO_ID}/tree/main/models
- Frontend: Next.js 16, React 19, TypeScript, Recharts
- Backend: FastAPI, Pydantic, NumPy
- ML Package: pyreflect (PyTorch, refl1d, refnx)
- pyreflect - NR-SCFT-ML package by Yuqing Qiao
- Based on research by Brian Qu, Dr. Rajeev Kumar, Prof. Miguel Fuentes-Cabrera