Update proteomics example data with collection sites and update all tutorials accordingly by enryH · Pull Request #62 · Multiomics-Analytics-Group/acore

enryH · 2026-03-03T14:39:08Z

Summary

Add collection sites to proteomics dataset.

List of changes proposed in this PR (pull-request)

update proteomics creation script (includes preprocessing)
update all tutorial to use new curated dataset

Checks

Look at CONTRIBUTING.md for guidance on how to contribute.

- needs to be set to main before merge this time.

- update how group renaming is used

Copilot

Pull request overview

Updates the Alzheimer proteomics example dataset to include a collection-site field and aligns documentation/tutorial notebooks to use the newly curated combined dataset for downstream examples.

Changes:

Pin pingouin to <0.6.0 due to upstream breaking column-name changes.
Extend Alzheimer proteomics example-data generation to add collection_site from metadata and persist it in the combined dataset.
Update API example notebooks/scripts to load the curated combined dataset from this repo and account for the new collection_site column.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`pyproject.toml`	Pins `pingouin` to avoid breaking changes.
`docs/example_data/alzheimer_proteomics.py`	Adds metadata load and creates `collection_site` column in the combined dataset.
`docs/example_data/alzheimer_proteomics.ipynb`	Notebook equivalent of the example-data update (adds `collection_site`).
`docs/api_examples/normalization_analysis.py`	Switches example to curated combined dataset and updates plotting defaults.
`docs/api_examples/normalization_analysis.ipynb`	Notebook equivalent of the normalization example update.
`docs/api_examples/diff_regulation_anova_ttest_two_groups.py`	Points to curated dataset and drops `collection_site` for numeric-only analysis.
`docs/api_examples/diff_regulation_anova_ttest_two_groups.ipynb`	Notebook equivalent of the two-group diff regulation update.
`docs/api_examples/diff_regulation_ancova.py`	Points to curated dataset and drops `collection_site` for numeric-only analysis.
`docs/api_examples/diff_regulation_ancova.ipynb`	Notebook equivalent of the ANCOVA example update.
`docs/api_examples/batch_correction.py`	Updates batch-correction example to use `collection_site` as the batch variable and the curated dataset.
`docs/api_examples/batch_correction.ipynb`	Notebook equivalent of the batch-correction example update.
`.github/workflows/cicd.yml`	Minor formatting + sets matrix `fail-fast: false`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T17:42:36Z

docs/api_examples/batch_correction.py

+PCs, fig = run_and_plot_pca(standard_normalize(X), y, n_components=4)
+ax = plot_umap(X, y)


After batch correction, PCA is run on standard_normalize(X) but UMAP is plotted from raw X. Since run_umap expects scaled input, the UMAP plot should likely use the same standardized data to match the PCA view (or the variable/function naming should be adjusted if unscaled is intended).

Suggested change

PCs, fig = run_and_plot_pca(standard_normalize(X), y, n_components=4)

ax = plot_umap(X, y)

X_scaled = standard_normalize(X)

PCs, fig = run_and_plot_pca(X_scaled, y, n_components=4)

ax = plot_umap(X_scaled, y)

Copilot · 2026-03-03T17:42:37Z

docs/example_data/alzheimer_proteomics.py

    "https://raw.githubusercontent.com/RasmussenLab/njab/"
    "HEAD/docs/tutorial/data/alzheimer/"
 )
+META: str = "meta.csv"  # clincial data


Typo in comment: “clincial” should be “clinical”.

Suggested change

META: str = "meta.csv" # clincial data

META: str = "meta.csv" # clinical data

Copilot · 2026-03-03T17:42:37Z

docs/example_data/alzheimer_proteomics.ipynb

    "    \"https://raw.githubusercontent.com/RasmussenLab/njab/\"\n",
    "    \"HEAD/docs/tutorial/data/alzheimer/\"\n",
    ")\n",
+    "META: str = \"meta.csv\"  # clincial data\n",


Typo in comment string: “clincial” should be “clinical”.

Suggested change

"META: str = \"meta.csv\" # clincial data\n",

"META: str = \"meta.csv\" # clinical data\n",

Copilot · 2026-03-03T17:42:38Z

docs/api_examples/normalization_analysis.py

 # jupyter:
 #   jupytext:
-#     cell_metadata_filter: tags,-all
+#     cell_data_filter: tags,-all


The Jupytext header uses cell_data_filter, but the rest of the repo’s Jupytext headers use cell_metadata_filter. If this key is not recognized, tags like hide-input/hide-output may not round-trip correctly when syncing notebooks. Use the same cell_metadata_filter key here for consistency and compatibility.

Suggested change

# cell_data_filter: tags,-all

# cell_metadata_filter: tags,-all

Copilot · 2026-03-03T17:42:38Z

docs/api_examples/normalization_analysis.ipynb

 "metadata": {
  "jupytext": {
-   "cell_metadata_filter": "tags,-all"
+   "cell_data_filter": "tags,-all"


Notebook metadata sets jupytext.cell_data_filter, but other notebooks in this repo use jupytext.cell_metadata_filter. If cell_data_filter isn’t supported by your Jupytext tooling, tag filtering may be ignored when converting/syncing. Consider switching back to cell_metadata_filter for consistency with the rest of the docs.

Suggested change

"cell_data_filter": "tags,-all"

"cell_metadata_filter": "tags,-all"

Copilot · 2026-03-03T17:42:38Z

docs/api_examples/batch_correction.py

 # %%time
 X = median_impute(omics)
 X = acore.batch_correction.combat_batch_correction(
-    X.join(y),
-    batch_col="site",
+    X.join(y.astype("category")),
+    batch_col=y.name,
 )


batch_col=y.name will break if group_label is set to None (since Series.rename(None) yields y.name is None). Given the parameter is typed Optional and documented as “optional rename”, consider ensuring y always has a non-empty name (e.g., fall back to group) and/or pass batch_col from the known group/group_label variable instead of y.name.

Copilot · 2026-03-03T17:42:39Z

docs/api_examples/batch_correction.py

 omics_imp = median_impute(omics)
 omics_imp_scaled = standard_normalize(omics_imp)
-PCs, fig = run_and_plot_pca(omics_imp, y, METACOL_LABEL, n_components=4)
-ax = plot_umap(omics_imp, y, METACOL_LABEL)
-
+PCs, fig = run_and_plot_pca(omics_imp, y, n_components=4)
+ax = plot_umap(omics_imp, y)


omics_imp_scaled is computed but not used; run_and_plot_pca/plot_umap are instead called with the unscaled omics_imp. Since acore.decomposition.umap.run_umap explicitly expects scaled input (X_scaled), this likely produces inconsistent plots. Either use omics_imp_scaled in the PCA/UMAP calls or drop the unused scaling step to avoid confusion.

enryH added 11 commits March 3, 2026 12:55

🚧 add collection site to alzheimer data

a72a41c

🚧 update to fetch data from this branch

b22cf50

- needs to be set to main before merge this time.

✨ update example to use the curated proteomcis dataset

b2fc54b

🎨 set and use relabeled grouping variable

13a545a

✨ use curated dataset (less verbose preprocessing)

e08c697

- update how group renaming is used

🐛 update alzheimer ipynb

277da7d

🐛 update metadata

6191675

🐛 only delete once

bbb76a6

🐛 see if all version fail

d40c188

🔧 use older pingouin version

911924f

🐛 collection site is a string variable

07b9eaf

enryH marked this pull request as ready for review March 3, 2026 17:33

💥 let action fail, but cicd on main work

d9e525e

enryH requested a review from Copilot March 3, 2026 17:37

Copilot started reviewing on behalf of enryH March 3, 2026 17:37 View session

enryH changed the title ~~Update batch corr example~~ Update proteomics example data with collection sites and update all tutorials accordingly Mar 3, 2026

enryH merged commit 32e389a into main Mar 3, 2026
13 of 14 checks passed

enryH deleted the update_batch_corr_example branch March 3, 2026 17:42

Copilot AI reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update proteomics example data with collection sites and update all tutorials accordingly#62

Update proteomics example data with collection sites and update all tutorials accordingly#62
enryH merged 12 commits intomainfrom
update_batch_corr_example

enryH commented Mar 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		PCs, fig = run_and_plot_pca(standard_normalize(X), y, n_components=4)
		ax = plot_umap(X, y)

-PCs, fig = run_and_plot_pca(standard_normalize(X), y, n_components=4)
-ax = plot_umap(X, y)
+X_scaled = standard_normalize(X)
+PCs, fig = run_and_plot_pca(X_scaled, y, n_components=4)
+ax = plot_umap(X_scaled, y)

	META: str = "meta.csv" # clincial data
	META: str = "meta.csv" # clinical data

	"META: str = \"meta.csv\" # clincial data\n",
	"META: str = \"meta.csv\" # clinical data\n",

	# cell_data_filter: tags,-all
	# cell_metadata_filter: tags,-all

	"cell_data_filter": "tags,-all"
	"cell_metadata_filter": "tags,-all"

Conversation

enryH commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

List of changes proposed in this PR (pull-request)

Checks

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

enryH commented Mar 3, 2026 •

edited

Loading