Generate a manifest

A manifest is a structured file containing metadata that adheres to a specific data model. This page covers different ways to generate a manifest.

Prerequisites

Before Using the Schematic CLI

Install and Configure Schematic: Ensure you have installed schematic and set up its dependencies. See the Installation section for more details.
Understand Important Concepts: Understand Important Concepts: Familiarize yourself with key concepts outlined on the Welcome to Schematic’s documentation! of the documentation.
Configuration File: Learn more about each attribute in the configuration file by referring to the relevant documentation.

Using the Schematic API in Production

Visit the Schematic API (Production Environment): https://schematic.api.sagebionetworks.org/v1/ui/#/

This will open the Swagger UI, where you can explore all available API endpoints.

Run help command

You could run the following commands to learn about subcommands with manifest generation:

schematic manifest -h

You could also run the following commands to learn about all the options with manifest generation:

schematic manifest --config path/to/config.yml get -h

Generate an empty manifest

Option 1: Use the CLI

You can generate a manifest by running the following command:

schematic manifest -c /path/to/config.yml get -dt <your_data_type> -s
-c /path/to/config.yml: Specifies the configuration file containing your data model location.

-dt <your_data_type>: Defines the data type for the manifest (e.g., “Patient”, “Biospecimen”).

-s: Generates a manifest as a Google Sheet.

If you want to generate a manifest as an excel spreadsheet, you could do:

schematic manifest -c /path/to/config.yml get -dt <your data type> --output-xlsx <your-output-manifest-path.xlsx>

And if you want to generate a manifest as a csv file, you could do:

schematic manifest -c /path/to/config.yml get -dt <your data type> --output-csv <your-output-manifest-path.csv>

Option 2: Use the API

Visit the manifest/generate endpoint.
Click “Try it out” to enable input fields.
Enter the following parameters and execute the request:
- schema_url: The URL of your data model. - If your data model is hosted on GitHub, the URL should follow this format:
  - JSON-LD: https://raw.githubusercontent.com/<your-repo-path>/data-model.jsonld
  - CSV: https://raw.githubusercontent.com/<your-repo-path>/data-model.csv
- data_type: The data type or schema model for your manifest (e.g., “Patient”, “Biospecimen”).
  - You can specify multiple data types or enter “all manifests” to generate manifests for all available data types.
- output_format: The desired format for the generated manifest. Options include “excel” or “google_sheet”.

This will generate a manifest directly from the API.

Generate a manifest using a dataset on synapse

Option 1: Use the CLI

Note

See the Installation section for more details to obtain synapse credentials and set up synapse configuration file.

The top-level dataset can be either an empty folder or a folder containing files.

See below as an example of a top-level dataset:

syn12345678/
├── sample1.fastq
├── sample2.fastq
└── sample3.fastq

Here you should use syn12345678 to generate a manifest

See another example of a top-level dataset with subfolders:

syn12345678/
└── subfolder1/
    ├── sample1.fastq
    └── sample2.fastq
└── subfolder2/
    ├── sample3.fastq
    └── sample4.fastq

Here you should use syn12345678 to generate a manifest

schematic manifest -c /path/to/config.yml get -dt <your_data_type> -s -d <synapse_dataset_id>

-c /path/to/config.yml: Specifies the configuration file containing the data model location and asset view (master_fileview_id).
-dt <your_data_type>: Defines the data type/schema model for the manifest (e.g., “Patient”, “Biospecimen”).
-d <your_dataset_id>: Retrieves the existing manifest associated with a specific dataset on Synpase.

Option 2: Use the API

To generate a manifest using the Schematic API, follow these steps:

Visit the manifest/generate endpoint.
Click “Try it out” to enable input fields.
Enter the required parameters and execute the request:
- schema_url: The URL of your data model.
  - If your data model is hosted on GitHub, the URL should follow this format:
    
    JSON-LD: https://raw.githubusercontent.com/<your-repo-path>/data-model.jsonld
    
    CSV: https://raw.githubusercontent.com/<your-repo-path>/data-model.csv
- output_format: The desired format for the generated manifest.
  - Options include “excel” or “google_sheet”.
- data_type: The data type or schema model for your manifest (e.g., “Patient”, “Biospecimen”).
  - You can specify multiple data types or enter “all manifests” to generate manifests for all available data types.
- dataset_id: The top-level Synapse dataset ID.
  - This can be a Synapse Project ID or a Folder ID.
- asset_view: The Synapse ID of the fileview containing the top-level dataset for which you want to generate a manifest.

Generate a manifest using a dataset on synapse and pull annotations

Note

When you pull annotations from Synapse, the existing metadata (annotations) associated with files or folders in a Synapse dataset is automatically retrieved and pre-filled into the generated manifest. This saves time and ensures consistency between the Synapse dataset and the manifest.

See below as an example:

syn12345678/
├── file1.txt
├── file2.txt
└── file3.txt

The corresponding annotations might look like this:

file1.txt - Annotation Key: species - Annotation Value: test1
file2.txt - Annotation Key: species - Annotation Value: test2
file3.txt - Annotation Key: species - Annotation Value: test3

The generated manifest will include the above annotations pulled from Synapse when enabled.

Option 1: Use the CLI

Note

Ensure your Synapse credentials are configured before running the command. You can obtain a personal access token from Synapse by following the instructions here: https://python-docs.synapse.org/tutorials/authentication/#prerequisites

The top-level dataset can be either an empty folder or a folder containing files.

schematic manifest -c /path/to/config.yml get -dt <your_data_type> -s -d <synapse_dataset_id> -a
-c /path/to/config.yml: Specifies the configuration file containing the data model location and asset view (master_fileview_id).

-a: Pulls annotations from Synapse and fills out the manifest with the annotations.

-dt <your_data_type>: Defines the data type/schema model for the manifest (e.g., “Patient”, “Biospecimen”).

-d <your_dataset_id>: Retrieves the existing manifest associated with a specific dataset on Synpase.

Option 2: Use the API

To generate a manifest using the Schematic API, follow these steps:

Visit the manifest/generate endpoint.
Click “Try it out” to enable input fields.
Enter the required parameters and execute the request:
- schema_url: The URL of your data model.
  - If your data model is hosted on GitHub, the URL should follow this format:
    
    JSON-LD: https://raw.githubusercontent.com/<your-repo-path>/data-model.jsonld
    
    CSV: https://raw.githubusercontent.com/<your-repo-path>/data-model.csv
- output_format: The desired format for the generated manifest.
  - Options include “excel” or “google_sheet”.
- data_type: The data type or schema model for your manifest (e.g., “Patient”, “Biospecimen”).
  - You can specify multiple data types or enter “all manifests” to generate manifests for all available data types.
- dataset_id: The top-level Synapse dataset ID.
  - This can be a Synapse Project ID or a Folder ID.
- asset_view: The Synapse ID of the fileview containing the top-level dataset for which you want to generate a manifest.
- use_annotations: A boolean value that determines whether to pull annotations from Synapse and fill out the manifest with the annotations.
  - Set this value to true to pull annotations.