.. _manifest_generation: Generate a manifest =================== A **manifest** is a structured file containing metadata that adheres to a specific data model. This page covers different ways to generate a manifest. Prerequisites ------------- **Before Using the Schematic CLI** - **Install and Configure Schematic**: Ensure you have installed `schematic` and set up its dependencies. See the :ref:`installation` section for more details. - **Understand Important Concepts**: Understand Important Concepts: Familiarize yourself with key concepts outlined on the :ref:`index` of the documentation. - **Configuration File**: Learn more about each attribute in the configuration file by referring to the relevant documentation. **Using the Schematic API in Production** Visit the **Schematic API (Production Environment)**: ``_ This will open the **Swagger UI**, where you can explore all available API endpoints. Run help command ---------------- You could run the following commands to learn about subcommands with manifest generation: .. code-block:: bash schematic manifest -h You could also run the following commands to learn about all the options with manifest generation: .. code-block:: bash schematic manifest --config path/to/config.yml get -h Generate an empty manifest --------------------------- Option 1: Use the CLI ~~~~~~~~~~~~~~~~~~~~~ You can generate a manifest by running the following command: .. code-block:: bash schematic manifest -c /path/to/config.yml get -dt -s - **-c /path/to/config.yml**: Specifies the configuration file containing your data model location. - **-dt **: Defines the data type for the manifest (e.g., `"Patient"`, `"Biospecimen"`). - **-s**: Generates a manifest as a Google Sheet. If you want to generate a manifest as an excel spreadsheet, you could do: .. code-block:: bash schematic manifest -c /path/to/config.yml get -dt --output-xlsx And if you want to generate a manifest as a csv file, you could do: .. code-block:: bash schematic manifest -c /path/to/config.yml get -dt --output-csv Option 2: Use the API ~~~~~~~~~~~~~~~~~~~~~ 1. Visit the `manifest/generate endpoint `_. 2. Click "Try it out" to enable input fields. 3. Enter the following parameters and execute the request: - **schema_url**: The URL of your data model. - If your data model is hosted on **GitHub**, the URL should follow this format: - JSON-LD: `https://raw.githubusercontent.com//data-model.jsonld` - CSV: `https://raw.githubusercontent.com//data-model.csv` - **data_type**: The data type or schema model for your manifest (e.g., `"Patient"`, `"Biospecimen"`). - You can specify multiple data types or enter `"all manifests"` to generate manifests for all available data types. - **output_format**: The desired format for the generated manifest. Options include `"excel"` or `"google_sheet"`. This will generate a manifest directly from the API. Generate a manifest using a dataset on synapse ---------------------------------------------- Option 1: Use the CLI ~~~~~~~~~~~~~~~~~~~~~~ .. note:: See the :ref:`installation` section for more details to obtain synapse credentials and set up synapse configuration file. The **top-level dataset** can be either an empty folder or a folder containing files. See below as an example of a top-level dataset: .. code-block:: text syn12345678/ ├── sample1.fastq ├── sample2.fastq └── sample3.fastq Here you should use syn12345678 to generate a manifest See another example of a top-level dataset with subfolders: .. code-block:: text syn12345678/ └── subfolder1/ ├── sample1.fastq └── sample2.fastq └── subfolder2/ ├── sample3.fastq └── sample4.fastq Here you should use syn12345678 to generate a manifest .. code-block:: bash schematic manifest -c /path/to/config.yml get -dt -s -d - **-c /path/to/config.yml**: Specifies the configuration file containing the data model location and asset view (`master_fileview_id`). - **-dt **: Defines the data type/schema model for the manifest (e.g., `"Patient"`, `"Biospecimen"`). - **-d **: Retrieves the existing manifest associated with a specific dataset on Synpase. Option 2: Use the API ~~~~~~~~~~~~~~~~~~~~~~ To generate a manifest using the **Schematic API**, follow these steps: 1. Visit the `manifest/generate endpoint `_. 2. Click **"Try it out"** to enable input fields. 3. Enter the required parameters and execute the request: - **schema_url**: The URL of your data model. - If your data model is hosted on **GitHub**, the URL should follow this format: - JSON-LD: `https://raw.githubusercontent.com//data-model.jsonld` - CSV: `https://raw.githubusercontent.com//data-model.csv` - **output_format**: The desired format for the generated manifest. - Options include `"excel"` or `"google_sheet"`. - **data_type**: The data type or schema model for your manifest (e.g., `"Patient"`, `"Biospecimen"`). - You can specify multiple data types or enter `"all manifests"` to generate manifests for all available data types. - **dataset_id**: The **top-level Synapse dataset ID**. - This can be a **Synapse Project ID** or a **Folder ID**. - **asset_view**: The **Synapse ID of the fileview** containing the top-level dataset for which you want to generate a manifest. Generate a manifest using a dataset on synapse and pull annotations -------------------------------------------------------------------- .. note:: When you pull annotations from Synapse, the existing metadata (annotations) associated with files or folders in a Synapse dataset is automatically retrieved and pre-filled into the generated manifest. This saves time and ensures consistency between the Synapse dataset and the manifest. See below as an example: .. code-block:: text syn12345678/ ├── file1.txt ├── file2.txt └── file3.txt The corresponding annotations might look like this: - **file1.txt** - Annotation Key: `species` - Annotation Value: `test1` - **file2.txt** - Annotation Key: `species` - Annotation Value: `test2` - **file3.txt** - Annotation Key: `species` - Annotation Value: `test3` The generated manifest will include the above annotations pulled from Synapse when enabled. Option 1: Use the CLI ~~~~~~~~~~~~~~~~~~~~~~ .. note:: Ensure your **Synapse credentials** are configured before running the command. You can obtain a **personal access token** from Synapse by following the instructions here: ``_ The **top-level dataset** can be either an empty folder or a folder containing files. .. code-block:: bash schematic manifest -c /path/to/config.yml get -dt -s -d -a - **-c /path/to/config.yml**: Specifies the configuration file containing the data model location and asset view (`master_fileview_id`). - **-a**: Pulls annotations from Synapse and fills out the manifest with the annotations. - **-dt **: Defines the data type/schema model for the manifest (e.g., `"Patient"`, `"Biospecimen"`). - **-d **: Retrieves the existing manifest associated with a specific dataset on Synpase. Option 2: Use the API ~~~~~~~~~~~~~~~~~~~~~~ To generate a manifest using the **Schematic API**, follow these steps: 1. Visit the `manifest/generate endpoint `_. 2. Click **"Try it out"** to enable input fields. 3. Enter the required parameters and execute the request: - **schema_url**: The URL of your data model. - If your data model is hosted on **GitHub**, the URL should follow this format: - JSON-LD: `https://raw.githubusercontent.com//data-model.jsonld` - CSV: `https://raw.githubusercontent.com//data-model.csv` - **output_format**: The desired format for the generated manifest. - Options include `"excel"` or `"google_sheet"`. - **data_type**: The data type or schema model for your manifest (e.g., `"Patient"`, `"Biospecimen"`). - You can specify multiple data types or enter `"all manifests"` to generate manifests for all available data types. - **dataset_id**: The **top-level Synapse dataset ID**. - This can be a **Synapse Project ID** or a **Folder ID**. - **asset_view**: The **Synapse ID of the fileview** containing the top-level dataset for which you want to generate a manifest. - **use_annotations**: A boolean value that determines whether to pull annotations from Synapse and fill out the manifest with the annotations. - Set this value to `true` to pull annotations.