ImagingDataCommons
diff --git a/‎SUMMARY.md‎
Lines changed: 6 additions & 6 deletions b/‎SUMMARY.md‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎data/data-versioning.md‎
Lines changed: 1 addition & 1 deletion b/‎data/data-versioning.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎data/organization-of-data/README.md‎
Lines changed: 3 additions & 3 deletions b/‎data/organization-of-data/README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎…functionality/organization-of-data-v1.md‎ ‎…ation-of-data/organization-of-data-v1.md‎data/organization-of-data/deprecated-functionality/organization-of-data-v1.md renamed to data/organization-of-data/organization-of-data-v1.md
Lines changed: 1 addition & 1 deletion b/‎…functionality/organization-of-data-v1.md‎ ‎…ation-of-data/organization-of-data-v1.md‎data/organization-of-data/deprecated-functionality/organization-of-data-v1.md renamed to data/organization-of-data/organization-of-data-v1.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎…data-v2-through-v13-deprecated/README.md‎ ‎…data-v2-through-v13-deprecated/README.md‎data/organization-of-data/deprecated-functionality/organization-of-data-v2-through-v13-deprecated/README.md renamed to data/organization-of-data/organization-of-data-v2-through-v13-deprecated/README.md b/‎…data-v2-through-v13-deprecated/README.md‎ ‎…data-v2-through-v13-deprecated/README.md‎data/organization-of-data/deprecated-functionality/organization-of-data-v2-through-v13-deprecated/README.md renamed to data/organization-of-data/organization-of-data-v2-through-v13-deprecated/README.md
diff --git a/‎…ta-v2-through-v13-deprecated/clinical.md‎ ‎…ta-v2-through-v13-deprecated/clinical.md‎data/organization-of-data/deprecated-functionality/organization-of-data-v2-through-v13-deprecated/clinical.md renamed to data/organization-of-data/organization-of-data-v2-through-v13-deprecated/clinical.md
Lines changed: 8 additions & 8 deletions b/‎…ta-v2-through-v13-deprecated/clinical.md‎ ‎…ta-v2-through-v13-deprecated/clinical.md‎data/organization-of-data/deprecated-functionality/organization-of-data-v2-through-v13-deprecated/clinical.md renamed to data/organization-of-data/organization-of-data-v2-through-v13-deprecated/clinical.md
Lines changed: 8 additions & 8 deletions
@@ -28,14 +28,14 @@
 * [Data versioning](data/data-versioning.md)
 * [Organization of data](data/organization-of-data/README.md)
   * [Files and metadata](data/organization-of-data/files-and-metadata.md)
-  * [Resolving CRDC Globally Unique Identifiers (GUIDs)](data/organization-of-data/guids-and-uuids.md)
   * [Clinical data](data/organization-of-data/clinical.md)
+  * [Resolving CRDC Globally Unique Identifiers (GUIDs)](data/organization-of-data/guids-and-uuids.md)
   * [Deprecated functionality](data/organization-of-data/deprecated-functionality/README.md)
-    * [Organization of data in v1 (deprecated)](data/organization-of-data/deprecated-functionality/organization-of-data-v1.md)
-    * [Organization of data, v2 through V13 (deprecated)](data/organization-of-data/deprecated-functionality/organization-of-data-v2-through-v13-deprecated/README.md)
-      * [Files and metadata](data/organization-of-data/deprecated-functionality/organization-of-data-v2-through-v13-deprecated/files-and-metadata.md)
-      * [Resolving CRDC Globally Unique Identifiers (GUIDs)](data/organization-of-data/deprecated-functionality/organization-of-data-v2-through-v13-deprecated/guids-and-uuids.md)
-      * [Clinical data](data/organization-of-data/deprecated-functionality/organization-of-data-v2-through-v13-deprecated/clinical.md)
+    * [Organization of data in v1 (deprecated)](data/organization-of-data/organization-of-data-v1.md)
+    * [Organization of data, v2 through V13 (deprecated)](data/organization-of-data/organization-of-data-v2-through-v13-deprecated/README.md)
+      * [Files and metadata](data/organization-of-data/organization-of-data-v2-through-v13-deprecated/files-and-metadata.md)
+      * [Resolving CRDC Globally Unique Identifiers (GUIDs)](data/organization-of-data/organization-of-data-v2-through-v13-deprecated/guids-and-uuids.md)
+      * [Clinical data](data/organization-of-data/organization-of-data-v2-through-v13-deprecated/clinical.md)
 * [Downloading data](data/downloading-data/README.md)
   * [Downloading data with s5cmd](data/downloading-data/downloading-data-with-s5cmd.md)
   * [Directly loading DICOM objects from Google Cloud or AWS in Python](data/downloading-data/direct-loading.md)
 
@@ -68,4 +68,4 @@ A corollary is that only a single version of an instance, series or study is in
 
 Note that instances, series and studies do not have an explicit version number in their metadata. Versioning of an object is implicit in the associated UUIDs.
 
-As we will see in [Organization of data](organization-of-data/deprecated-functionality/organization-of-data-v1.md), the UUID of a (version of an) instance, and the UUID of the (version of a) series to which it belongs, are used in forming the object (file) name of the corresponding GCS and AWS objects. In addition, each instance version has a corresponding GA4GH DRS object, identified by a GUID based on the instance version's UUID. Refer to the [GA4GH DRS Objects](organization-of-data/deprecated-functionality/organization-of-data-v2-through-v13-deprecated/guids-and-uuids.md) section for details.
+As we will see in [Organization of data](organization-of-data/organization-of-data-v1.md), the UUID of a (version of an) instance, and the UUID of the (version of a) series to which it belongs, are used in forming the object (file) name of the corresponding GCS and AWS objects. In addition, each instance version has a corresponding GA4GH DRS object, identified by a GUID based on the instance version's UUID. Refer to the [GA4GH DRS Objects](organization-of-data/organization-of-data-v2-through-v13-deprecated/guids-and-uuids.md) section for details.
@@ -1,15 +1,15 @@
 # Organization of data
 
-This section describes the current organization of IDC data. The organization of data was static from IDC Version 2  through IDC Version 13 except that [clinical data](deprecated-functionality/organization-of-data-v2-through-v13-deprecated/clinical.md) was added in Version 11. Development of the clinical data resource is an ongoing project. From IDC v14, our data [became available](https://registry.opendata.aws/nci-imaging-data-commons/) from the Amazon AWS Open Data Registry, and the files in storage buckets were organized into series-level folders.
+This section describes the current organization of IDC data. The organization of data was static from IDC Version 2  through IDC Version 13 except that [clinical data](organization-of-data-v2-through-v13-deprecated/clinical.md) was added in Version 11. Development of the clinical data resource is an ongoing project. From IDC v14, our data [became available](https://registry.opendata.aws/nci-imaging-data-commons/) from the Amazon AWS Open Data Registry, and the files in storage buckets were organized into series-level folders.
 
 ### IDC data model
 
 ### [Files and metadata](files-and-metadata.md)
 
 ### [GA4GH DRS objects](https://learn.canceridc.dev/data/organization-of-data/guids-and-uuids)
 
-### [Clinical Data](deprecated-functionality/organization-of-data-v2-through-v13-deprecated/clinical.md)
+### [Clinical Data](organization-of-data-v2-through-v13-deprecated/clinical.md)
 
 ### [Organization of data, v1 through V13 (deprecated)](./#organization-of-data-v1-through-v13-deprecated)
 
-### [Organization of data in v1 (deprecated)](deprecated-functionality/organization-of-data-v1.md)
+### [Organization of data in v1 (deprecated)](organization-of-data-v1.md)
@@ -10,7 +10,7 @@ description: >-
 IDC approach to storage and management of DICOM data is relying on the Google Cloud Platform [Healthcare API](https://cloud.google.com/healthcare/docs/how-tos/dicom). We maintain three representations of the data, which are fully synchronized and correspond to the same dataset, but are intended to serve different use cases.
 
 {% hint style="warning" %}
-In order to access the resources listed below, it is assumed you have completed the ["getting started" steps](../../../introduction/google-cloud-platform/getting-started-with-gcp.md) to access Google Cloud console!
+In order to access the resources listed below, it is assumed you have completed the ["getting started" steps](../../introduction/google-cloud-platform/getting-started-with-gcp.md) to access Google Cloud console!
 {% endhint %}
 
 All of the resources listed below are accessible under the [`canceridc-data` GCP project](https://console.cloud.google.com/home/dashboard?project=canceridc-data).
 
@@ -1,7 +1,7 @@
 # Clinical data
 
 {% hint style="info" %}
-Check out our [IDC clinical data exploration Colab notebook](https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/clinical\_data\_intro.ipynb) tutorial for a brief hands-on introduction into IDC clinical data!
+Check out our [IDC clinical data exploration Colab notebook](https://github.com/ImagingDataCommons/IDC-Examples/blob/master/notebooks/clinical_data_intro.ipynb) tutorial for a brief hands-on introduction into IDC clinical data!
 {% endhint %}
 
 ### Background
@@ -12,19 +12,19 @@ Clinical data is often critical in understanding imaging data, and is essential
 
 Not only the terms used in the clinical data accompanying individual collection are not harmonized, but the format of the spreadsheets is also collection-specific. In order to search and navigate clinical data, one has to parse those collection specific tables, and there is no interface to support searching across collections.
 
-With the release v11 of IDC, we make the attempt to lower the barriers for accessing clinical data accompanying IDC imaging collections. We parse collection-specific tables, and organize the underlying data into BigQuery tables that can be accessed using standard SQL queries. You can also see the summary of clinical data available for IDC collections in [this dashboard](https://datastudio.google.com/u/0/reporting/04cf5976-4ea0-4fee-a749-8bfd162f2e87/page/p\_s7mk6eybqc).
+With the release v11 of IDC, we make the attempt to lower the barriers for accessing clinical data accompanying IDC imaging collections. We parse collection-specific tables, and organize the underlying data into BigQuery tables that can be accessed using standard SQL queries. You can also see the summary of clinical data available for IDC collections in [this dashboard](https://datastudio.google.com/u/0/reporting/04cf5976-4ea0-4fee-a749-8bfd162f2e87/page/p_s7mk6eybqc).
 
 ### Clinical data BigQuery tables
 
-As of Version 11, IDC has provided a public [BigQuery dataset](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc\_clinical\_current) with clinical data associated with several of its imaging collections. The clinical data tables associated with a particular version are in the dataset `bigquery-public-data.idc_<idc_version_number>_clinical`. In addition the dataset `bigquery-public-data.idc_current_clinical` has an identically named view for each table in the BQ clinical dataset corresponding to the current IDC release.
+As of Version 11, IDC has provided a public [BigQuery dataset](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc_clinical_current) with clinical data associated with several of its imaging collections. The clinical data tables associated with a particular version are in the dataset `bigquery-public-data.idc_<idc_version_number>_clinical`. In addition the dataset `bigquery-public-data.idc_current_clinical` has an identically named view for each table in the BQ clinical dataset corresponding to the current IDC release.
 
-There are currently 130 tables with clinical data representing 70 different collections. Most of this data was curated from Excel and CSV files downloaded from [The Cancer Imaging Archive (TCIA) wiki](https://wiki.cancerimagingarchive.net/). For most collections data is placed in a single table named `<collection_id>_clinical`, where `<collection_id>` is the name of the collection in a standardized format (i.e. the `idc_webapp_collection_id` column in the `dicom_all` view in the [idc\_current dataset](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc\_clinical\&page=dataset)).
+There are currently 130 tables with clinical data representing 70 different collections. Most of this data was curated from Excel and CSV files downloaded from [The Cancer Imaging Archive (TCIA) wiki](https://wiki.cancerimagingarchive.net/). For most collections data is placed in a single table named `<collection_id>_clinical`, where `<collection_id>` is the name of the collection in a standardized format (i.e. the `idc_webapp_collection_id` column in the `dicom_all` view in the [idc\_current dataset](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc_clinical\&page=dataset)).
 
-Collections from the ACRIN project have different types of clinical data spread across CSV files, and so this data is represented by several BigQuery tables. The clinical data for collections in the [CPTAC program](https://proteomics.cancer.gov/programs/cptac) program is not curated from TCIA but instead is copied from a [BigQuery table](https://console.cloud.google.com/bigquery?p=isb-cgc-bq\&d=cptac\&t=clinical\_gdc\_current\&page=table) in the ISB-CGC project, which in turn was sourced from the [Genomics Data Commons (GDC) api](https://gdc.cancer.gov/developers/gdc-application-programming-interface-api). Similarly clinical data for collections in the [TCGA program](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga) is copied from the table `tcga_clinical_rel9` in the `idc_current` dataset, which was also created using the [Genomics Data Commons (GDC) api](https://gdc.cancer.gov/developers/gdc-application-programming-interface-api). Every clinical data table contains two fields we have introduced, `dicom_patient_id` and `source_batch`. `dicom_patient_id` is identical to the `PatientID` field in the DICOM files that correspond to the given patient. The `dicom_patient_id` value is determined by inspecting the patient column in the clinical data file. In some of the collections' clinical data, the patients are separated into different 'batches' i.e. different source files, or different sheets in the same Excel file. The `source_batch` field is an integer indicating the 'batch' for the given patient. For most collections, in which all patients data is found in the same location, the `source_batch` value is zero.
+Collections from the ACRIN project have different types of clinical data spread across CSV files, and so this data is represented by several BigQuery tables. The clinical data for collections in the [CPTAC program](https://proteomics.cancer.gov/programs/cptac) program is not curated from TCIA but instead is copied from a [BigQuery table](https://console.cloud.google.com/bigquery?p=isb-cgc-bq\&d=cptac\&t=clinical_gdc_current\&page=table) in the ISB-CGC project, which in turn was sourced from the [Genomics Data Commons (GDC) api](https://gdc.cancer.gov/developers/gdc-application-programming-interface-api). Similarly clinical data for collections in the [TCGA program](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga) is copied from the table `tcga_clinical_rel9` in the `idc_current` dataset, which was also created using the [Genomics Data Commons (GDC) api](https://gdc.cancer.gov/developers/gdc-application-programming-interface-api). Every clinical data table contains two fields we have introduced, `dicom_patient_id` and `source_batch`. `dicom_patient_id` is identical to the `PatientID` field in the DICOM files that correspond to the given patient. The `dicom_patient_id` value is determined by inspecting the patient column in the clinical data file. In some of the collections' clinical data, the patients are separated into different 'batches' i.e. different source files, or different sheets in the same Excel file. The `source_batch` field is an integer indicating the 'batch' for the given patient. For most collections, in which all patients data is found in the same location, the `source_batch` value is zero.
 
-Most of the clinical tables are legible by themselves. Tables from the ACRIN collection are an exception as the column names and some of the column values are coded. To provide for clarity and ease of use of all clinical data, we have created two metadata tables, [`table_metadata`](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc\_clinical\_current\&t=table\_metadata\&page=table) and [`column_metadata`](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc\_clinical\_current\&t=column\_metadata\&page=table) that provide information about the structure and provenance of all data in this dataset. `table_metadata` has table-level metadata about each clinical collection, while `column_metadata` has column-level metadata.
+Most of the clinical tables are legible by themselves. Tables from the ACRIN collection are an exception as the column names and some of the column values are coded. To provide for clarity and ease of use of all clinical data, we have created two metadata tables, [`table_metadata`](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc_clinical_current\&t=table_metadata\&page=table) and [`column_metadata`](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc_clinical_current\&t=column_metadata\&page=table) that provide information about the structure and provenance of all data in this dataset. `table_metadata` has table-level metadata about each clinical collection, while `column_metadata` has column-level metadata.
 
-Structure of the[`table_metadata`](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc\_clinical\_current\&t=table\_metadata\&page=table) table:
+Structure of the[`table_metadata`](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc_clinical_current\&t=table_metadata\&page=table) table:
 
 * `collection_id` (STRING, NULLABLE) - the collection\_id of the collection in the given table. The collection id is in a format used internally by the IDC Web App (with only lowercase letters, numbers and '\_' allowed). It is equivalent to the `idc_webapp_id` field in the `dicom_all` view in the `idc_current` dataset.
 * `table_name` (STRING,NULLABLE) - name of the table
@@ -45,7 +45,7 @@ Structure of the[`table_metadata`](https://console.cloud.google.com/bigquery?p=b
 * `source_info.table_last_modified` (STRING, NULLABLE) - CPTAC and TCGA collections only. The date and time the source BigQuery table was most recently modified, as recorded when last copied
 * `source_info.table_size` (STRING, NULLABLE) - CPTAC and TCGA collections only. The size of the source BigQuery table as recorded when last copied
 
-Structure of [`column_metadata`](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc\_clinical\_current\&t=column\_metadata\&page=table) table:
+Structure of [`column_metadata`](https://console.cloud.google.com/bigquery?p=bigquery-public-data\&d=idc_clinical_current\&t=column_metadata\&page=table) table:
 
 * `collection_id` (STRING,NULLABLE) - the collection\_id of the collection in the given table. The collection id is in a format used internally by the IDC Web App (with only lowercase letters, numbers and '\_' allowed). It is equivalent to the `idc_webapp_id` field in the `dicom_all` view in the `idc_current` dataset.
 * `case_col` (BOOLEAN, NULLABLE) - true if the BigQuery column contains the patient or case id, i.e. if this column is used to determine the value of the `dicom_patient_id` column
Original file line number	Diff line number	Diff line change
`@@ -68,4 +68,4 @@ A corollary is that only a single version of an instance, series or study is in`
`68`	`68`
`69`	`69`	`Note that instances, series and studies do not have an explicit version number in their metadata. Versioning of an object is implicit in the associated UUIDs.`
`70`	`70`
`71`		-As we will see in [Organization of data](organization-of-data/deprecated-functionality/organization-of-data-v1.md), the UUID of a (version of an) instance, and the UUID of the (version of a) series to which it belongs, are used in forming the object (file) name of the corresponding GCS and AWS objects. In addition, each instance version has a corresponding GA4GH DRS object, identified by a GUID based on the instance version's UUID. Refer to the [GA4GH DRS Objects](organization-of-data/deprecated-functionality/organization-of-data-v2-through-v13-deprecated/guids-and-uuids.md) section for details.
	`71`	+As we will see in [Organization of data](organization-of-data/organization-of-data-v1.md), the UUID of a (version of an) instance, and the UUID of the (version of a) series to which it belongs, are used in forming the object (file) name of the corresponding GCS and AWS objects. In addition, each instance version has a corresponding GA4GH DRS object, identified by a GUID based on the instance version's UUID. Refer to the [GA4GH DRS Objects](organization-of-data/organization-of-data-v2-through-v13-deprecated/guids-and-uuids.md) section for details.