CLI documentation¶
Human Cell Atlas Command Line Interface
For general help, run {prog} help
.
For help with individual commands, run {prog} <command> --help
.
usage: dbio [-h] [--version] [--log-level {INFO,WARNING,DEBUG,ERROR,CRITICAL}]
{clear-dbio-cache,help,dss,auth} ...
Named Arguments¶
--version | show program’s version number and exit |
--log-level | Possible choices: INFO, WARNING, DEBUG, ERROR, CRITICAL [‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’] Default: “INFO” |
Sub-commands:¶
clear-dbio-cache¶
Clear the cached DataBiosphere API definitions. This can help resolve errors communicating with the API.
dbio clear-dbio-cache [-h]
dss¶
Interact with the Data Storage System
dbio dss [-h]
{get-bundles-all,get-bundles-checkout,delete-bundle,get-bundle,patch-bundle,put-bundle,post-bundles-checkout,get-collections,put-collection,delete-collection,get-collection,patch-collection,get-events,get-event,get-file,head-file,put-file,post-search,get-subscriptions,put-subscription,delete-subscription,get-subscription,login,logout,upload,download,download-manifest,create-version,download-collection}
...
Sub-commands:¶
get-bundles-all¶
Lists all the bundles available in the data-store, responses will be returned in a paginated format, at most 500 values shall be returned at a time. Tombstoned bundles will be omitted from the list of bundles available.
dbio dss get-bundles-all [-h] --replica {aws,gcp} [--prefix PREFIX]
[--token TOKEN] [--per-page PER_PAGE]
[--search-after SEARCH_AFTER] [--no-paginate]
Named Arguments¶
--replica | Possible choices: aws, gcp Replica to fetch from. |
--prefix | Used to specify the beginning of a particular bundle UUID. Capitalized letters will be lower-cased as is done when users submit a uuid (all uuids have lower-cased letters upon ingestion into the dss). Characters other than letters, numbers, and dashes are not allowed and will error. The specified character(s) will return all available bundle uuids starting with that character(s). |
--token | Token to manage retries. End users constructing queries should not set this parameter. |
--per-page | Max number of results to return per page. |
--search-after | Search-After-Context. An internal state pointer parameter for use with pagination. This parameter is referenced by the Link header as described in the “Pagination” section. The API client should not need to set this parameter directly; it should instead directly fetch the URL given in the Link header. |
--no-paginate | Do not automatically page the responses Default: True |
get-bundles-checkout¶
Use this route with the checkout_job_id
identifier returned by POST /bundles/{uuid}/checkout
.
dbio dss get-bundles-checkout [-h] --replica {aws,gcp} --checkout-job-id
CHECKOUT_JOB_ID
Named Arguments¶
--replica | Possible choices: aws, gcp Replica to fetch from. |
--checkout-job-id | |
A RFC4122-compliant ID for the checkout job request. |
delete-bundle¶
Delete the bundle with the given UUID. This deletion is applied across replicas.
dbio dss delete-bundle [-h] --reason REASON --uuid UUID --replica {aws,gcp}
[--version VERSION]
Named Arguments¶
--reason | User-friendly reason for the bundle or timestamp-specfic bundle deletion. |
--uuid | A RFC4122-compliant ID for the bundle. |
--replica | Possible choices: aws, gcp Replica to write to. |
--version | Timestamp of bundle creation in DSS_VERSION format. |
get-bundle¶
Given a bundle UUID, return the latest version of that bundle. If the version is provided, that version of the bundle is returned instead.
dbio dss get-bundle [-h] --uuid UUID [--version VERSION] --replica {aws,gcp}
[--directurls DIRECTURLS] [--presignedurls PRESIGNEDURLS]
[--token TOKEN] [--per-page PER_PAGE]
[--start-at START_AT] [--no-paginate]
Named Arguments¶
--uuid | Bundle unique ID. |
--version | Timestamp of bundle creation in DSS_VERSION format. |
--replica | Possible choices: aws, gcp Replica to fetch from. |
--directurls | When set to true, the response will contain API-specific URLs that are tied to the specified replica, for example This parameter is mutually exclusive with the presigned urls parameter. The use of presigned URLs is recommended for data access. Cloud native URLs are currently provided for a limited set of use cases and may not be provided in the future. If cloud native URLs are required, please contact the data store team regarding the credentials necessary to use them. |
--presignedurls | |
Include presigned URLs in the response. This is mutually exclusive with the directurls parameter. | |
--token | Token to manage retries. End users constructing queries should not set this parameter. |
--per-page | Max number of results to return per page. |
--start-at | An internal state pointer parameter for use with pagination. This parameter is referenced by the Link header as described in the “Pagination” section. The API client should not need to set this parameter directly; it should instead directly fetch the URL given in the Link header. |
--no-paginate | Do not automatically page the responses Default: True |
patch-bundle¶
Add or remove files from a bundle. A specific version of the bundle to update must be provided, and a new version will be written. Bundle manifests exceeding 20,000 files will not be included in the Elasticsearch index document.
dbio dss patch-bundle [-h] [--add-files ADD_FILES]
[--remove-files REMOVE_FILES] --uuid UUID --replica
{aws,gcp} --version VERSION
Named Arguments¶
--add-files | List of new files to add to the bundle. File names must be unique. |
--remove-files | List of files to remove from the bundle. Files must match exactly to be removed. Files not found in the bundle are ignored. |
--uuid | A RFC4122-compliant ID of the bundle to update. |
--replica | Possible choices: aws, gcp Replica to update the bundle on. Updates are propagated to other replicas. |
--version | Timestamp of the bundle to update in DSS_VERSION format format (required). |
put-bundle¶
Create a new version of a bundle with a given UUID. The list of file UUID and versions to be included must be provided.
dbio dss put-bundle [-h] --creator-uid CREATOR_UID --files FILES [FILES ...]
--uuid UUID --version VERSION --replica {aws,gcp}
Named Arguments¶
--creator-uid | User ID who is creating this bundle. |
--files | This is a list of dictionaries describing each of the files. Each dictionary includes the fields: - The “uuid” of a file already previously uploaded with “PUT file/{uuid}”. - The “version” timestamp of the file. - The “name” of the file. This can be most anything, and is the name the file will have when downloaded. - The “indexed” field, which specifies whether a file should be indexed or not. Bundle manifests exceeding 20,000 files will not be included in the Elasticsearch index document. Example representing 2 files with dummy values: [{‘uuid’: ‘ce55fd51-7833-469b-be0b-5da88ebebfcd’, ‘version’: ‘2017-06-16T193604.240704Z’, ‘name’: ‘dinosaur_dna.fa’, ‘indexed’: False}, {‘uuid’: ‘ae55fd51-7833-469b-be0b-5da88ebebfca’, ‘version’: ‘0303-04-23T193604.240704Z’, ‘name’: ‘dragon_dna.fa’, ‘indexed’: False}] |
--uuid | A RFC4122-compliant ID for the bundle. |
--version | Timestamp of bundle creation in DSS_VERSION format. |
--replica | Possible choices: aws, gcp Replica to write to. |
post-bundles-checkout¶
Initiate asynchronous checkout of a bundle. The response JSON contains a field, checkout_job_id
, that can be used to query the status of the checkout via the GET /bundles/checkout/{checkout_job_id}
API method. FIXME: document the error code returned when the bundle or specified version does not exist. TODO: After some time period, the data will be removed. TBD: This could be based on initial checkout time or last access time.
dbio dss post-bundles-checkout [-h] [--destination DESTINATION]
[--email EMAIL] --uuid UUID [--version VERSION]
--replica {aws,gcp}
Named Arguments¶
--destination | User-owned destination storage bucket. |
An email address to send status updates to. | |
--uuid | A RFC4122-compliant ID for the bundle. |
--version | Timestamp of file creation in DSS_VERSION format. If this is not provided, the latest version is returned. |
--replica | Possible choices: aws, gcp Replica to fetch from. |
get-collections¶
Return a list of a user’s collections.
Collections are sets of links to files, bundles, other collections, or fragments of JSON metadata files. Each entry in the input set of links is checked for referential integrity (the link target must exist in the replica referenced). Up to 1000 items can be referenced in a new collection, or added or removed using PATCH /collections
. New collections are private to the authenticated user.
Collection items are de-duplicated (if an identical item is given multiple times, it will only be added once).
Collections are replicated across storage replicas similarly to files and bundles.
dbio dss get-collections [-h] [--per-page PER_PAGE] [--start-at START_AT]
[--no-paginate]
Named Arguments¶
--per-page | Max number of results to return per page. |
--start-at | An internal state pointer parameter for use with pagination. This parameter is referenced by the Link header as described in the “Pagination” section. The API client should not need to set this parameter directly; it should instead directly fetch the URL given in the Link header. |
--no-paginate | Do not automatically page the responses Default: True |
put-collection¶
Create a new collection.
Collections are sets of links to files, bundles, other collections, or fragments of JSON metadata files. Each entry in the input set of links is checked for referential integrity (the link target must exist in the replica referenced). Up to 1000 items can be referenced in a new collection, or added or removed using PATCH /collections
. New collections are private to the authenticated user.
Collection items are de-duplicated (if an identical item is given multiple times, it will only be added once).
Collections are replicated across storage replicas similarly to files and bundles.
dbio dss put-collection [-h] --contents CONTENTS [CONTENTS ...] --description
DESCRIPTION --details DETAILS --name NAME --replica
{aws,gcp} --uuid UUID --version VERSION
Named Arguments¶
--contents | A list of objects describing links to files, bundles, other collections, and metadata fragments that are part of the collection. |
--description | A long description of the collection, formatted in Markdown. |
--details | Supplementary JSON metadata for the collection. |
--name | A short name identifying the collection. |
--replica | Possible choices: aws, gcp Replica to write to. |
--uuid | A RFC4122-compliant ID for the collection. |
--version | Timestamp of collection creation in DSS_VERSION format. |
delete-collection¶
Delete a collection.
dbio dss delete-collection [-h] --uuid UUID --replica {aws,gcp}
Named Arguments¶
--uuid | A RFC4122-compliant ID for the collection. |
--replica | Possible choices: aws, gcp Replica to delete from. |
get-collection¶
Given a collection UUID, return the associated collection object.
dbio dss get-collection [-h] --uuid UUID --replica {aws,gcp}
[--version VERSION]
Named Arguments¶
--uuid | A RFC4122-compliant ID for the collection. |
--replica | Possible choices: aws, gcp Replica to fetch from. |
--version | Timestamp of collection creation in DSS_VERSION format. If this is not provided, the latest version is returned. |
patch-collection¶
Add or remove items from a collection. A specific version of the collection to update must be provided, and a new version will be written.
dbio dss patch-collection [-h] [--add-contents ADD_CONTENTS]
[--description DESCRIPTION] [--details DETAILS]
[--name NAME] [--remove-contents REMOVE_CONTENTS]
--uuid UUID --replica {aws,gcp} --version VERSION
Named Arguments¶
--add-contents | List of new items to add to the collection. Items are de-duplicated (if an identical item is already present in the collection or given multiple times, it will only be added once). |
--description | New description for the collection. |
--details | New details for the collection. |
--name | New name for the collection. |
--remove-contents | |
List of items to remove from the collection. Items must match exactly to be removed. Items not found in the collection are ignored. | |
--uuid | A RFC4122-compliant ID of the collection to update. |
--replica | Possible choices: aws, gcp Replica to update the collection on. Updates are propagated to other replicas. |
--version | Timestamp of the collection to update in DSS_VERSION format format (required). |
get-events¶
Return urls where event data is available, with manifest of contents.
dbio dss get-events [-h] [--from-date FROM_DATE] [--to-date TO_DATE] --replica
{aws,gcp} [--per-page PER_PAGE] [--token TOKEN]
[--no-paginate]
Named Arguments¶
--from-date | Timestamp to begin replaying events, in DSS_VERSION format. If this is not provided, replay from the earliest event. |
--to-date | Timestamp to stop replaying events, in DSS_VERSION format. If this is not provided, replay to the latest event. |
--replica | Possible choices: aws, gcp Replica to fetch from. |
--per-page | Max number of results to return per page. |
--token | Token to manage retries. End users constructing queries should not set this parameter. |
--no-paginate | Do not automatically page the responses Default: True |
get-event¶
Given a bundle UUID and version, return the bundle metadata document.
dbio dss get-event [-h] --uuid UUID --version VERSION --replica {aws,gcp}
Named Arguments¶
--uuid | Bundle unique ID. |
--version | Timestamp of bundle creation in DSS_VERSION format. |
--replica | Possible choices: aws, gcp Replica to fetch from. |
get-file¶
Given a file UUID, return the latest version of that file. If the version is provided, that version of the file is returned instead.
Headers will contain the data store metadata for the file.
This endpoint returns a HTTP redirect to another HTTP endpoint with the file contents.
NOTE When using the DataBiosphere DSS CLI, this will stream the file to stdout and may need to be piped. For example,
dbio dss get-file --uuid UUID --replica aws > result.txt
dbio dss get-file [-h] --uuid UUID --replica {aws,gcp} [--version VERSION]
[--token TOKEN] [--directurl DIRECTURL]
[--content-disposition CONTENT_DISPOSITION]
Named Arguments¶
--uuid | A RFC4122-compliant ID for the file. |
--replica | Possible choices: aws, gcp Replica to fetch from. |
--version | Timestamp of file creation in DSS_VERSION format. If this is not provided, the latest version is returned. |
--token | Token to manage retries. End users constructing queries should not set this parameter. |
--directurl | When set to true, the response will contain API-specific URLs that are tied to the specified replica, for
example The use of presigned URLs is recommended for data access. Cloud native URLs are currently provided for a limited set of use cases and may not be provided in the future. If cloud native URLs are required, please contact the data store team regarding the credentials necessary to use them. |
--content-disposition | |
Optional and does not work when directurl=true (only works with the default presigned url response). If this parameter is provided, the response from fetching the returned presigned url will include the specified Content-Disposition header. This can be useful to indicate to a browser that a file should be downloaded rather than opened in a new tab, and can also supply the original filename in the response. Example: content_disposition="attachment; filename=data.json"
|
head-file¶
Given a file UUID, return the metadata for the latest version of that file. If the version is provided, that version’s metadata is returned instead. The metadata is returned in the headers.
dbio dss head-file [-h] --uuid UUID --replica {aws,gcp} [--version VERSION]
Named Arguments¶
--uuid | A RFC4122-compliant ID for the file. |
--replica | Possible choices: aws, gcp Replica to fetch from. |
--version | Timestamp of file creation in DSS_VERSION format. If this is not provided, the latest version is returned. |
put-file¶
Create a new version of a file with a given UUID. The contents of the file are provided by the client by reference using a cloud object storage URL. The file on the cloud object storage service must have metadata set listing the file checksums and content-type. The metadata fields required are:
- dss-sha256: SHA-256 checksum of the file
- dss-sha1: SHA-1 checksum of the file
- dss-s3_etag: S3 ETAG checksum of the file. See https://stackoverflow.com/q/12186993 for the general algorithm for how checksum is calculated. For files smaller than 64MB, this is the MD5 checksum of the file. For files larger than 64MB but smaller than 640,000MB, we use 64MB chunks. For files larger than 640,000MB, we use a chunk size equal to the total file size divided by 10000, rounded up to the nearest MB. MB, in this section, refers to 1,048,576 bytes. Note that 640,000MB is not the same as 640GB!
- dss-crc32c: CRC-32C checksum of the file
dbio dss put-file [-h] --creator-uid CREATOR_UID --source-url SOURCE_URL
--uuid UUID --version VERSION
Named Arguments¶
--creator-uid | User ID who is creating this file. |
--source-url | Cloud bucket URL for source data. Example is “s3://bucket_name/serious_dna.fa” . |
--uuid | A RFC4122-compliant ID for the file. |
--version | Timestamp of file creation in DSS_VERSION format. If this is not provided, the latest version is returned. |
post-search¶
Accepts Elasticsearch JSON query and returns matching bundle identifiers Index Design: The metadata seach index is implemented as a document-oriented database using Elasticsearch. The index stores all information relevant to a bundle within each bundle document, largely eliminating the need for object-relational mapping. This design is optimized for queries that filter the data.
To illustrate this concept, say our index stored information on three entities, foo
, bar
, and baz
. A foo can have many bars and bars can have many bazes. If we were to index bazes in a document-oriented design, the information on the foo a bar comes from and the bazes it contains are combined into a single document. A example sketch of this is shown below in JSON-schema.
{
"definitions": {
"bar": {
"type": "object",
"properties": {
"uuid": {
"type": "string",
"format": "uuid"
},
"foo": {
"type": "object",
"properties": {
"uuid": {
"type": "string",
"format": "uuid"
},
...
}
},
"bazes": {
"type": "array",
"items": {
"type": "string",
"format": "uuid"
}
},
...
}
}
}
}
This closely resembles the structure of DSS bundle documents: projects have many bundles and bundles have many files. Each bundle document is a concatenation of the metadata on the project it belongs to and the files it contains. Limitations to Index Design: There are limitations to the design of DSS’s metadata search index. A few important ones are listed below.
- Joins between bundle metadata must be conducted client-side
- Querying is schema-specific; fields or values changed between schema version will break queries that use those fields and values
- A new search index must be built for each schema version
- A lot of metadata is duplicated between documents
dbio dss post-search [-h] --es-query ES_QUERY [--output-format {summary,raw}]
--replica {aws,gcp} [--per-page PER_PAGE]
[--search-after SEARCH_AFTER] [--no-paginate]
Named Arguments¶
--es-query | Elasticsearch query |
--output-format | |
Possible choices: summary, raw Specifies the output format. The default format, | |
--replica | Possible choices: aws, gcp Replica to search. |
--per-page | Max number of results to return per page. When using output_format raw the per_page size is limit to no more than 10 to avoid excessively large response sizes. |
--search-after | Search-After-Context. An internal state pointer parameter for use with pagination. This parameter is referenced by the Link header as described in the “Pagination” section. The API client should not need to set this parameter directly; it should instead directly fetch the URL given in the Link header. |
--no-paginate | Do not automatically page the responses Default: True |
get-subscriptions¶
Return a list of associated subscriptions.
dbio dss get-subscriptions [-h] --replica {aws,gcp}
[--subscription-type {elasticsearch,jmespath}]
Named Arguments¶
--replica | Possible choices: aws, gcp Replica to fetch from. |
--subscription-type | |
Possible choices: elasticsearch, jmespath Type of subscriptions to fetch (elasticsearch or jmespath). |
put-subscription¶
Register an HTTP endpoint that is to be notified when a given event occurs. Each user is allowed 100 subscriptions, a limit that may be increased in the future. Concerns about notification service limitations should be routed to the DSS development team.
dbio dss put-subscription [-h] [--attachments ATTACHMENTS] --callback-url
CALLBACK_URL
[--encoding {application/json,multipart/form-data}]
[--es-query ES_QUERY] [--form-fields FORM_FIELDS]
[--hmac-key-id HMAC_KEY_ID]
[--hmac-secret-key HMAC_SECRET_KEY]
[--jmespath-query JMESPATH_QUERY]
[--method {POST,PUT}]
[--payload-form-field PAYLOAD_FORM_FIELD] --replica
{aws,gcp}
Named Arguments¶
--attachments | The set of bundle metadata items to be included in the payload of a notification request to a subscription endpoint. Each property in this object represents an attachment to the notification payload. Each attachment will be a child property of the {
"attachments": {
"taxon": {
"type": "jmespath",
"expression": "files.biomaterial_json.biomaterials[].content.biomaterial_core.ncbi_taxon_id[]"
}
}
}
the corresponding notification payload will contain the following entry "attachments": {
"taxon": [9606, 9606]
}
If a general error occurs during the processing of attachments, the notification will be sent with "attachments": {
"taxon": [9606, 9606]
"_errors" {
"biomaterial": "Some error occurred"
}
}
The value of the |
--callback-url | The subscriber’s URL. An HTTP request is made to the specified URL for every attempt to deliver a notification to the subscriber. If the HTTP response code is 2XX, the delivery attempt is considered successful. Otherwise, more attempts will be made with an exponentially increasing delay between attempts, until an attempt is successful or the a maximum number of attempts is reached. Occasionally, duplicate notifications may be sent. It is up to the receiver of the notification to tolerate duplicate notifications. |
--encoding | Possible choices: application/json, multipart/form-data The MIME type describing the encoding of the request body * |
--es-query | An Elasticsearch query for restricting the set of bundles for which the subscriber is notified. The subscriber will only be notified for newly indexed bundles that match the given query. If this parameter is present the subscription will be of type elasticsearch , otherwise it will be of type jmespath . |
--form-fields | A collection of static form fields to be supplied in the request body, alongside the actual notification payload. The value of each field must be a string. For example, if the subscriptions has this property set to ----------------2769baffc4f24cbc83ced26aa0c2f712
Content-Disposition: form-data; name="foo"
bar
----------------2769baffc4f24cbc83ced26aa0c2f712
Content-Disposition: form-data; name="payload"
{"transaction_id": "301c9079-3b20-4311-a131-bcda9b7f08ba", "subscription_id": ...
Since the type of this property is |
--hmac-key-id | An optional key ID to use with hmac_secret_key . |
--hmac-secret-key | |
The key for signing requests to the subscriber’s URL. The signature will be constructed according to https://tools.ietf.org/html/draft-cavage-http-signatures and transmitted in the HTTP Authorization header. | |
--jmespath-query | |
An JMESPath query for restricting the set of bundles for which the subscriber is notified. The subscriber will only be notified for new bundles that match the given query. If es_query is specified, the subscription will be of type elasticsearch . If es_query is not present, the subscription will be of type jmespath | |
--method | Possible choices: POST, PUT The HTTP request method to use when delivering a notification to the subscriber. |
--payload-form-field | |
The name of the form field that will hold the notification payload when the request is made. If the default name of the payload field collides with that of a field in form_fields , this porperty can be used to rename the payload and avoid the collision. This property is ignored unless encoding is multipart/form-data . | |
--replica | Possible choices: aws, gcp Replica to write to. |
delete-subscription¶
Delete a registered event subscription. The associated query will no longer trigger a callback if a matching document is added to the system.
dbio dss delete-subscription [-h] --uuid UUID --replica {aws,gcp}
[--subscription-type {elasticsearch,jmespath}]
Named Arguments¶
--uuid | A RFC4122-compliant ID for the subscription. |
--replica | Possible choices: aws, gcp Replica to delete from. |
--subscription-type | |
Possible choices: elasticsearch, jmespath type of subscriptions to fetch (elasticsearch or jmespath) |
get-subscription¶
Given a subscription UUID, return the associated subscription.
dbio dss get-subscription [-h] --uuid UUID --replica {aws,gcp}
[--subscription-type {elasticsearch,jmespath}]
Named Arguments¶
--uuid | A RFC4122-compliant ID for the subscription. |
--replica | Possible choices: aws, gcp Replica to fetch from. |
--subscription-type | |
Possible choices: elasticsearch, jmespath type of subscriptions to fetch (elasticsearch or jmespath) |
login¶
This command may open a browser window to ask for your consent to use web service authentication credentials.
Use –remote if using the CLI in a remote environment
dbio dss login [-h] [--access-token ACCESS_TOKEN] [--remote]
Named Arguments¶
--access-token | Default: “” |
--remote | Default: False |
logout¶
Clear sphinx-build dss authentication credentials previously configured with sphinx-build dss login.
dbio dss logout [-h]
upload¶
Upload a directory of files from the local filesystem and create a bundle containing the uploaded files. This method requires the use of a client-controlled object storage bucket to stage the data for upload.
dbio dss upload [-h] --src-dir SRC_DIR --replica REPLICA --staging-bucket
STAGING_BUCKET [--timeout-seconds TIMEOUT_SECONDS]
[--no-progress] [--bundle-uuid BUNDLE_UUID]
Named Arguments¶
--src-dir | file path to a directory of files to upload to the replica. |
--replica | the replica to upload to. The supported replicas are: aws for Amazon Web Services, and gcp for Google Cloud Platform. [aws, gcp] |
--staging-bucket | |
a client controlled AWS S3 storage bucket to upload from. | |
--timeout-seconds | |
the time to wait for a file to upload to replica. Default: 1200 | |
--no-progress | if set, will not report upload progress. Note that even if this flag is not set, progress will not be reported if the logging level is higher than INFO or if the session is not interactive. Default: False |
--bundle-uuid |
download¶
Download a bundle and save it to the local filesystem as a directory.
By default, all data and metadata files are downloaded. To disable the downloading of data, use the –no-data flag if using the CLI or pass the no_data=True argument if calling the download() API method. Likewise, to disable the downloading of metadata, use the –no-metadata flag for the CLI or pass the no_metadata=True argument if calling the download() API method.
If a retryable exception occurs, we wait a bit and retry again. The delay increases each time we fail and decreases each time we successfully read a block. We set a quota for the number of failures that goes up with every successful block read and down with each failure.
dbio dss download [-h] --bundle-uuid BUNDLE_UUID --replica REPLICA
[--version VERSION] [--download-dir DOWNLOAD_DIR]
[--metadata-filter METADATA_FILTER [METADATA_FILTER ...]]
[--data-filter DATA_FILTER [DATA_FILTER ...]]
[--no-metadata] [--no-data] [--num-retries NUM_RETRIES]
[--min-delay-seconds MIN_DELAY_SECONDS]
Named Arguments¶
--bundle-uuid | The uuid of the bundle to download |
--replica | the replica to download from. The supported replicas are: aws for Amazon Web Services, and gcp for Google Cloud Platform. [aws, gcp] |
--version | The version to download, else if not specified, download the latest. The version is a timestamp of bundle creation in RFC3339 Default: “” |
--download-dir | The directory into which to download Default: “” |
--metadata-filter | |
One or more shell patterns against which all metadata files in the bundle will be matched case-sensitively. A file is considered a metadata file if the indexed property in the manifest is set. If and only if a metadata file matches any of the patterns in metadata_files will it be downloaded. Default: (‘*’,) | |
--data-filter | One or more shell patterns against which all data files in the bundle will be matched case-sensitively. A file is considered a data file if the indexed property in the manifest is not set. The file will be downloaded only if a data file matches any of the patterns in data_files will it be downloaded. Default: (‘*’,) |
--no-metadata | Exclude metadata files. Cannot be set when –metadata-filter is also set. Default: False |
--no-data | Exclude data files. Cannot be set when –data-filter is also set. Default: False |
--num-retries | The initial quota of download failures to accept before exiting due to failures. The number of retries increase and decrease as file chucks succeed and fail. Default: 10 |
--min-delay-seconds | |
The minimum number of seconds to wait in between retries. Default: 0.25 |
download-manifest¶
Files are always downloaded to a cache / filestore directory called ‘.dbio’. This directory is created in the current directory where download is initiated. A copy of the manifest used is also written to the current directory. This manifest has an added column that lists the paths of the files within the ‘.dbio’ filestore.
The default layout is none. In this layout all of the files are downloaded to the filestore and the recommended way of accessing the files in by parsing the manifest copy that’s written to the download directory.
The bundle layout still downloads all of files to the filestore. For each bundle mentioned in the manifest a directory is created. All relevant metadata files for each bundle are linked into these directories in addition to relevant data files mentioned in the manifest.
Each row in the manifest represents one file in DSS. The manifest must have a header row. The header row must declare the following columns:
- bundle_uuid - the UUID of the bundle containing the file in DSS.
- bundle_version - the version of the bundle containing the file in DSS.
- file_name - the name of the file as specified in the bundle.
- file_uuid - the UUID of the file in the DSS.
- file_sha256 - the SHA-256 hash of the file.
- file_size - the size of the file.
The TSV may have additional columns. Those columns will be ignored. The ordering of the columns is insignificant because the TSV is required to have a header row.
This download format will serve as the main storage format for downloaded files. If a user specifies a different format for download (coming in the future) the files will first be downloaded in this format, then hard-linked to the user’s preferred format.
dbio dss download-manifest [-h] --manifest MANIFEST --replica REPLICA
[--layout LAYOUT] [--no-metadata] [--no-data]
[--num-retries NUM_RETRIES]
[--min-delay-seconds MIN_DELAY_SECONDS]
[--download-dir DOWNLOAD_DIR]
Named Arguments¶
--manifest | The path to a TSV (tab-separated values) file listing files to download. If the directory for download already contains the manifest, the manifest will be overwritten to include a column with paths into the filestore. |
--replica | The replica from which to download. The supported replicas are: aws for Amazon Web Services, and gcp for Google Cloud Platform. [aws, gcp] |
--layout | The layout of the downloaded files. Currently two options are supported, ‘none’ (the default), and ‘bundle’. Default: “none” |
--no-metadata | Exclude metadata files. Cannot be set when –metadata-filter is also set. Default: False |
--no-data | Exclude data files. Cannot be set when –data-filter is also set. Default: False |
--num-retries | The initial quota of download failures to accept before exiting due to failures. The number of retries increase and decrease as file chucks succeed and fail. Default: 10 |
--min-delay-seconds | |
The minimum number of seconds to wait in between retries for downloading any file Default: 0.25 | |
--download-dir | The directory into which to download Default: “” |
download-collection¶
Download a bundle and save it to the local filesystem as a directory.
dbio dss download-collection [-h] --uuid UUID --replica REPLICA
[--version VERSION] [--download-dir DOWNLOAD_DIR]
Named Arguments¶
--uuid | The uuid of the collection to download |
--replica | the replica to download from. The supported replicas are: aws for Amazon Web Services, and gcp for Google Cloud Platform. [aws, gcp] |
--version | The version to download, else if not specified, download the latest. The version is a timestamp of bundle creation in RFC3339 |
--download-dir | The directory into which to download Default: “” |
auth¶
Interact with the authorization and authentication system.
dbio auth [-h]
{get-login,get-logout,get-openid-configuration,get-jwks.json,get-oauth-authorize,post-oauth-revoke,post-oauth-token,get-oauth-userinfo,post-oauth-userinfo,get-echo,post-v1-policies-evaluate,get-v1-users,post-v1-user,get-v1-user,put-v1-user,get-v1-user-owns,get-v1-user-groups,put-v1-user-group,get-v1-user-roles,put-v1-user-role,put-v1-user-policy,get-v1-groups,post-v1-group,get-v1-group,delete-v1-group,get-v1-group-roles,put-v1-group-role,get-v1-group-users,put-v1-group-user,put-v1-group-policy,get-v1-roles,post-v1-role,get-v1-role,delete-v1-role,put-v1-role-policy,get-v1-resources,get-v1-resource,post-v1-resource,delete-v1-resource,get-v1-resource-actions,put-v1-resource-action,delete-v1-resource-actions,get-v1-resource-policies,get-v1-resource-policy,post-v1-resource-policy,put-v1-resource-policy,delete-v1-resource-policy,get-v1-resource-ids,get-v1-resource-id,post-v1-resource-id,delete-v1-resource-id,get-v1-resource-id-members,put-v1-resource-id-member,login,logout}
...
Sub-commands:¶
get-login¶
Send the user agent to an identity provider selector and generate a user account to establish the user’s identity. This is a redirect endpoint.
dbio auth get-login [-h] --redirect-uri REDIRECT_URI [--state STATE]
Named Arguments¶
--redirect-uri | Where to redirect to once login is complete. |
--state | An opaque parameter that is returned back to the redirect_uri . |
get-logout¶
Logout the user from current sessions with the OIDC provider. You can log the users out from a specific application if the you know the client_id for the application. Otherwise the user will be logged out of the default application by oauth2_config.
dbio auth get-logout [-h] [--client-id CLIENT_ID]
Named Arguments¶
--client-id |
get-openid-configuration¶
This endpoint is part of OIDC, see documentation at Provider Config
dbio auth get-openid-configuration [-h] --host HOST
Named Arguments¶
--host | Must be auth.ucsc.ucsc-cgp-redwood.org . |
get-jwks.json¶
Provide the public key used to sign all JWTs minted by the OIDC provider. See JSON Web Key Set for more info.
dbio auth get-jwks.json [-h]
get-oauth-authorize¶
This endpoint is part of OIDC and is used to redirect to an openid provider. See Auth Request
dbio auth get-oauth-authorize [-h] [--redirect-uri REDIRECT_URI]
[--state STATE] [--client-id CLIENT_ID]
[--scope SCOPE] [--respone-type RESPONE_TYPE]
[--nonce NONCE] [--prompt PROMPT]
Named Arguments¶
--redirect-uri | |
--state | |
--client-id | |
--scope | |
--respone-type | |
--nonce | |
--prompt |
post-oauth-revoke¶
Revokes a refresh token from a client making all future token refresh requests fail.
dbio auth post-oauth-revoke [-h] --client-id CLIENT_ID --token TOKEN
Named Arguments¶
--client-id | |
--token | The refresh token to revoke. |
post-oauth-token¶
This endpoint is part of OIDC and is used to redirect to an openid provider. See Token Endpoint, and Refresh Tokens
dbio auth post-oauth-token [-h]
get-oauth-userinfo¶
This endpoint is part of OIDC and is used to redirect to an openid provider. See User Info
dbio auth get-oauth-userinfo [-h]
post-oauth-userinfo¶
This endpoint is part of OIDC and is used to redirect to an openid provider. See User Info
dbio auth post-oauth-userinfo [-h]
post-v1-policies-evaluate¶
Given a set of principals, actions, and resources, return a set of access control decisions.
dbio auth post-v1-policies-evaluate [-h] --principal PRINCIPAL --action ACTION
[ACTION ...] --resource RESOURCE
[RESOURCE ...]
Named Arguments¶
--principal | Attested user identifier. |
--action | The action the principal is attempting to perform. |
--resource | The resource the principal will perform the action against. |
get-v1-users¶
Paginate through all users.
dbio auth get-v1-users [-h] [--next-token NEXT_TOKEN] [--per-page PER_PAGE]
[--no-paginate]
Named Arguments¶
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
post-v1-user¶
Create a new user with the specified groups, roles, and iam policy.
dbio auth post-v1-user [-h] --user-id USER_ID [--groups GROUPS]
[--roles ROLES] [--policy POLICY]
Named Arguments¶
--user-id | Used to identify users, groups, and roles. |
--groups | |
--roles | |
--policy | A resource policy, used for controlling access to a resource. |
get-v1-user¶
Retrieve information about the user’s status and the policies attached.
dbio auth get-v1-user [-h] --user-id USER_ID
Named Arguments¶
--user-id | User ID (email). |
put-v1-user¶
Enable or disable a user. A disabled user will return false for all evaluations with that user as principal.
dbio auth put-v1-user [-h] --user-id USER_ID --status STATUS
Named Arguments¶
--user-id | User ID (email). |
--status |
get-v1-user-owns¶
Paginate through a list of resources owned by a user.
dbio auth get-v1-user-owns [-h] --user-id USER_ID [--next-token NEXT_TOKEN]
[--per-page PER_PAGE] --resource-type RESOURCE_TYPE
[--no-paginate]
Named Arguments¶
--user-id | User ID (email). |
--next-token | |
--per-page | |
--resource-type | |
--no-paginate | Do not automatically page the responses Default: True |
get-v1-user-groups¶
Paginate through a list of groups of which a user is a member.
dbio auth get-v1-user-groups [-h] --user-id USER_ID [--next-token NEXT_TOKEN]
[--per-page PER_PAGE] [--no-paginate]
Named Arguments¶
--user-id | User ID (email). |
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
put-v1-user-group¶
Modify group(s) in which a user is a member.
dbio auth put-v1-user-group [-h] [--groups GROUPS] --user-id USER_ID --action
ACTION
Named Arguments¶
--groups | |
--user-id | User ID (email). |
--action |
get-v1-user-roles¶
Paginate through all roles attached to a user.
dbio auth get-v1-user-roles [-h] --user-id USER_ID [--next-token NEXT_TOKEN]
[--per-page PER_PAGE] [--no-paginate]
Named Arguments¶
--user-id | User ID (email). |
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
put-v1-user-role¶
Modify the role(s) attached to a user.
dbio auth put-v1-user-role [-h] [--roles ROLES] --user-id USER_ID --action
ACTION
Named Arguments¶
--roles | |
--user-id | User ID (email). |
--action |
put-v1-user-policy¶
Modify or add the user’s IAM policy.
dbio auth put-v1-user-policy [-h] [--policy POLICY] --user-id USER_ID
Named Arguments¶
--policy | A resource policy, used for controlling access to a resource. |
--user-id | User ID (email). |
get-v1-groups¶
Paginate through all groups.
dbio auth get-v1-groups [-h] [--next-token NEXT_TOKEN] [--per-page PER_PAGE]
[--no-paginate]
Named Arguments¶
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
post-v1-group¶
Create a new group, attach an IAM policy, and assign roles.
dbio auth post-v1-group [-h] [--group-id GROUP_ID] [--policy POLICY]
[--roles ROLES]
Named Arguments¶
--group-id | Used to identify users, groups, and roles. |
--policy | A resource policy, used for controlling access to a resource. |
--roles |
get-v1-group¶
Get properties of a group, including the group’s IAM policy.
dbio auth get-v1-group [-h] --group-id GROUP_ID
Named Arguments¶
--group-id | The name of the group. |
delete-v1-group¶
Remove all users, policies, and roles from the group, and delete the group.
dbio auth delete-v1-group [-h] --group-id GROUP_ID
Named Arguments¶
--group-id | The name of the group. |
get-v1-group-roles¶
Paginate through all roles assigned to the group.
dbio auth get-v1-group-roles [-h] --group-id GROUP_ID
[--next-token NEXT_TOKEN] [--per-page PER_PAGE]
[--no-paginate]
Named Arguments¶
--group-id | The name of the group. |
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
put-v1-group-role¶
Modify the role(s) assigned to a group.
dbio auth put-v1-group-role [-h] [--roles ROLES] --group-id GROUP_ID --action
ACTION
Named Arguments¶
--roles | |
--group-id | The name of the group. |
--action |
get-v1-group-users¶
Paginate through all users in a group.
dbio auth get-v1-group-users [-h] --group-id GROUP_ID
[--next-token NEXT_TOKEN] [--per-page PER_PAGE]
[--no-paginate]
Named Arguments¶
--group-id | The name of the group. |
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
put-v1-group-user¶
Modify the user(s) assigned to a group.
dbio auth put-v1-group-user [-h] [--users USERS] --group-id GROUP_ID --action
ACTION
Named Arguments¶
--users | |
--group-id | The name of the group. |
--action |
put-v1-group-policy¶
Modify or create a policy attached to a group.
dbio auth put-v1-group-policy [-h] [--policy POLICY] --group-id GROUP_ID
Named Arguments¶
--policy | A resource policy, used for controlling access to a resource. |
--group-id | The name of the group. |
get-v1-roles¶
Paginate through all roles.
dbio auth get-v1-roles [-h] [--next-token NEXT_TOKEN] [--per-page PER_PAGE]
[--no-paginate]
Named Arguments¶
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
post-v1-role¶
Create a new role and attach a IAM policy.
dbio auth post-v1-role [-h] --role-id ROLE_ID --policy POLICY
Named Arguments¶
--role-id | Used to identify users, groups, and roles. |
--policy | A resource policy, used for controlling access to a resource. |
get-v1-role¶
Get properties of a role.
dbio auth get-v1-role [-h] --role-id ROLE_ID
Named Arguments¶
--role-id | The name of the role. |
delete-v1-role¶
Remove the role from all users and groups, and finally delete the role.
dbio auth delete-v1-role [-h] --role-id ROLE_ID
Named Arguments¶
--role-id | The name of the role. |
put-v1-role-policy¶
Modify the IAM policy attached to the role.
dbio auth put-v1-role-policy [-h] [--policy POLICY] --role-id ROLE_ID
Named Arguments¶
--policy | A resource policy, used for controlling access to a resource. |
--role-id | The name of the role. |
get-v1-resources¶
List available resource types
dbio auth get-v1-resources [-h] [--next-token NEXT_TOKEN]
[--per-page PER_PAGE] [--no-paginate]
Named Arguments¶
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
get-v1-resource¶
List all resources for this resource type
dbio auth get-v1-resource [-h] --resource-type-name RESOURCE_TYPE_NAME
[--next-token NEXT_TOKEN] [--per-page PER_PAGE]
[--no-paginate]
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
post-v1-resource¶
Create a new resource type.
dbio auth post-v1-resource [-h] [--owner-policy OWNER_POLICY]
[--actions ACTIONS] --resource-type-name
RESOURCE_TYPE_NAME
Named Arguments¶
--owner-policy | A resource policy, used for controlling access to a resource. |
--actions | |
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. |
delete-v1-resource¶
Delete an existing resource type if there are no ids of that resource type stored.
dbio auth delete-v1-resource [-h] --resource-type-name RESOURCE_TYPE_NAME
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. |
get-v1-resource-actions¶
List all actions for this resource type.
dbio auth get-v1-resource-actions [-h] --resource-type-name RESOURCE_TYPE_NAME
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. |
put-v1-resource-action¶
Add a new action for a resource type.
dbio auth put-v1-resource-action [-h] [--actions ACTIONS] --resource-type-name
RESOURCE_TYPE_NAME
Named Arguments¶
--actions | |
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. |
delete-v1-resource-actions¶
Delete an existing action for a resource type.
dbio auth delete-v1-resource-actions [-h] [--actions ACTIONS]
--resource-type-name RESOURCE_TYPE_NAME
Named Arguments¶
--actions | |
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. |
get-v1-resource-policies¶
List the available policies for this resource type.
dbio auth get-v1-resource-policies [-h] --resource-type-name
RESOURCE_TYPE_NAME
[--next-token NEXT_TOKEN]
[--per-page PER_PAGE] [--no-paginate]
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
get-v1-resource-policy¶
Retrieve information about resource policy.
dbio auth get-v1-resource-policy [-h] --resource-type-name RESOURCE_TYPE_NAME
--policy-name POLICY_NAME
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--policy-name |
post-v1-resource-policy¶
Add a new resource policy. This makes the resource policy available for all resources of that type . Once added, the new resource policy can be applied to members by using PUT /v1/resource/{resource_type_name}/id/{resource_id}/members
dbio auth post-v1-resource-policy [-h] --policy POLICY
[--policy-type {ResourcePolicy,IAMPolicy}]
--resource-type-name RESOURCE_TYPE_NAME
--policy-name POLICY_NAME
Named Arguments¶
--policy | A resource policy, used for controlling access to a resource. |
--policy-type | Possible choices: ResourcePolicy, IAMPolicy |
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--policy-name |
put-v1-resource-policy¶
Modify an existing resource policy. This will affect all members that use this resource policy to access a {resource_type_name}/{resource_ids}.
dbio auth put-v1-resource-policy [-h] [--policy POLICY] --resource-type-name
RESOURCE_TYPE_NAME --policy-name POLICY_NAME
Named Arguments¶
--policy | A resource policy, used for controlling access to a resource. |
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--policy-name |
delete-v1-resource-policy¶
Delete an existing resource policy. This will affect all members that use this resource policy to access a {resource_type_name}/{resource_ids}.
dbio auth delete-v1-resource-policy [-h] --resource-type-name
RESOURCE_TYPE_NAME --policy-name
POLICY_NAME
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--policy-name |
get-v1-resource-ids¶
List ids of all resources matching the specified resource type.
dbio auth get-v1-resource-ids [-h] --resource-type-name RESOURCE_TYPE_NAME
[--next-token NEXT_TOKEN] [--per-page PER_PAGE]
[--no-paginate]
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
get-v1-resource-id¶
Check if a resource of type resource_type_name with id resource_id exist.
dbio auth get-v1-resource-id [-h] --resource-type-name RESOURCE_TYPE_NAME
--resource-id RESOURCE_ID
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--resource-id | The id of the resource. |
post-v1-resource-id¶
Create a new resource.
dbio auth post-v1-resource-id [-h] --resource-type-name RESOURCE_TYPE_NAME
--resource-id RESOURCE_ID
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--resource-id | The id of the resource. |
delete-v1-resource-id¶
Delete a resource.
dbio auth delete-v1-resource-id [-h] --resource-type-name RESOURCE_TYPE_NAME
--resource-id RESOURCE_ID
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--resource-id | The id of the resource. |
get-v1-resource-id-members¶
List all members that have access to this resource. Members include users, and groups with any level of access to this resource.
dbio auth get-v1-resource-id-members [-h] --resource-type-name
RESOURCE_TYPE_NAME --resource-id
RESOURCE_ID [--next-token NEXT_TOKEN]
[--per-page PER_PAGE] [--no-paginate]
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--resource-id | The id of the resource. |
--next-token | |
--per-page | |
--no-paginate | Do not automatically page the responses Default: True |
put-v1-resource-id-member¶
Give a principal access defined by {policy_name} to {resource_type_name}/{resource_id}.
dbio auth put-v1-resource-id-member [-h] --resource-type-name
RESOURCE_TYPE_NAME --resource-id
RESOURCE_ID
Named Arguments¶
--resource-type-name | |
The name of a type of resources to which a resource policy can be applied. | |
--resource-id | The id of the resource. |
login¶
This command may open a browser window to ask for your consent to use web service authentication credentials.
Use –remote if using the CLI in a remote environment
dbio auth login [-h] [--access-token ACCESS_TOKEN] [--remote]
Named Arguments¶
--access-token | Default: “” |
--remote | Default: False |
logout¶
Clear sphinx-build auth authentication credentials previously configured with sphinx-build auth login.
dbio auth logout [-h]
Links: Index / Module Index / Search Page