dtoolcore

API for creating and interacting with dtool datasets.

class dtoolcore.DataSet(uri, admin_metadata, config_path=None)[source]

Class for reading the contents of a dataset.

base_uri

Return the base URI of the dataset.

delete_tag(tag)

Delete a tag from a dataset.

Parameters:tag – tag
Raises:DtoolCoreKeyError if the tag does not exist
classmethod from_uri(uri, config_path=None)[source]

Return an existing dtoolcore.DataSet from a URI.

Parameters:uri – unique resource identifier where the existing dtoolcore.DataSet is stored
Returns:dtoolcore.DataSet
generate_manifest(progressbar=None)

Return manifest generated from knowledge about contents.

get_annotation(annotation_name)

Return annotation.

Parameters:annotation_name – name of the annotation
Raises:DtoolCoreKeyError if the annotation does not exist
Returns:annotation
get_overlay(overlay_name)[source]

Return overlay as a dictionary.

Parameters:overlay_name – name of the overlay
Returns:overlay as a dictionary
get_readme_content()

Return the content of the README describing the dataset.

Returns:content of README as a string
identifiers

Return iterable of dataset item identifiers.

item_content_abspath(identifier)[source]

Return absolute path at which item content can be accessed.

Parameters:identifier – item identifier
Returns:absolute path from which the item content can be accessed
item_properties(identifier)[source]

Return properties of the item with the given identifier.

Parameters:identifier – item identifier
Returns:dictionary of item properties from the manifest
list_annotation_names()

Return list of annotation names.

list_overlay_names()[source]

Return list of overlay names.

list_tags()

Return the dataset’s tags as a list.

name

Return the name of the dataset.

put_annotation(annotation_name, annotation)

Store annotation so that it is accessible by the given name.

Parameters:
  • annotation_name – name of the annotation
  • annotation – JSON serialisable value or data structure
Raises:

DtoolCoreInvalidNameError if the annotation name is invalid

put_overlay(overlay_name, overlay)[source]

Store overlay so that it is accessible by the given name.

Parameters:
  • overlay_name – name of the overlay
  • overlay – overlay must be a dictionary where the keys are identifiers in the dataset
Raises:

DtoolCoreTypeError if the overlay is not a dictionary, DtoolCoreValueError if identifiers in overlay and dataset do not match DtoolCoreInvalidNameError if the overlay name is invalid

put_readme(content)[source]

Update the README of the dataset and backup the previous README.

The client is responsible for ensuring that the content is valid YAML.

Parameters:content – string to put into the README
put_tag(tag)

Annotate the dataset with a tag.

Parameters:tag – tag
Raises:DtoolCoreInvalidNameError if the tag is invalid
Raises:DtoolCoreValueError if the tag is not a string
update_name(new_name)

Update the name of the proto dataset.

Parameters:new_name – the new name of the proto dataset
uri

Return the URI of the dataset.

uuid

Return the UUID of the dataset.

class dtoolcore.DataSetCreator(name, base_uri, readme_content='', creator_username=None)[source]

Context manager for creating a dataset.

Inside the context manager one works on a proto dataset. When exiting the context manager the proto dataset is automatically frozen into a dataset, unless an exception has been raised in the context manager.

add_item_metadata(handle, key, value)[source]

Add metadata to a specific item in the dtoolcore.ProtoDataSet.

Parameters:
  • handle – handle representing the relative path of the item in the dtoolcore.ProtoDataSet
  • key – metadata key
  • value – metadata value
name

Return the dataset name.

prepare_staging_abspath_promise(handle)[source]

Return abspath and handle to stage a file.

For getting access to an abspath that can be used to write output to. It is the responsibility of this method to create any missing subdirectories. It is the responsibility of the user to create the file associated with the abspath.

Use the abspath to create a file in the staging directory. The file will be added to the dataset when exiting the context handler.

The handle can be used to generate an identifier for the item in the dataset using the dtoolcore.utils.generate_identifier() function.

Parameters:handle – Unix like relpath
Returns:absolute path to the file in staging area that the user promises to create
put_annotation(annotation_name, annotation)[source]

Store annotation so that it is accessible by the given name.

Parameters:
  • annotation_name – name of the annotation
  • annotation – JSON serialisable value or data structure
Raises:

DtoolCoreInvalidNameError if the annotation name is invalid

put_item(fpath, relpath)[source]

Put an item into the dataset.

Parameters:
  • fpath – path to the item on disk
  • relpath – relative path name given to the item in the dataset as a handle
Returns:

the handle given to the item

put_readme(content)[source]

Update the README of the dataset and backup the previous README.

The client is responsible for ensuring that the content is valid YAML.

Parameters:content – string to put into the README
put_tag(tag)[source]

Annotate the dataset with a tag.

Parameters:tag – tag
Raises:DtoolCoreInvalidNameError if the tag is invalid
Raises:ValueError if the tag is not a string
staging_directory

Return the staging directory.

An ephemeral directory that only exists within the DataSetCreator context manger. It can be used as a location to write output files that need to be added to the dataset.

The easiest way to add a file here is to use the dtoolcore.DataSetCreator.get_staging_fpath() method to get a path to write content to.

If you write files directly to the staging directory you will need to register them using the dtoolcore.DataSetCreator.register_output_file() method.

uri

Return the dataset URI.

class dtoolcore.DerivedDataSetCreator(name, base_uri, source_dataset, readme_content='', creator_username=None)[source]

Context manager for creating a derived dataset.

A derived dataset automatically has information about the source dataset (name, URI and UUID) automatically added to the readme and to annotations. It adds the “source_name”, “source_uri”, and “source_uuid” as annotations and to the descriptive metadata in the readme.

Inside the context manager one works on a proto dataset. When exiting the context manager the proto dataset is automatically frozen into a dataset, unless an exception has been raised in the context manager.

add_item_metadata(handle, key, value)

Add metadata to a specific item in the dtoolcore.ProtoDataSet.

Parameters:
  • handle – handle representing the relative path of the item in the dtoolcore.ProtoDataSet
  • key – metadata key
  • value – metadata value
name

Return the dataset name.

prepare_staging_abspath_promise(handle)

Return abspath and handle to stage a file.

For getting access to an abspath that can be used to write output to. It is the responsibility of this method to create any missing subdirectories. It is the responsibility of the user to create the file associated with the abspath.

Use the abspath to create a file in the staging directory. The file will be added to the dataset when exiting the context handler.

The handle can be used to generate an identifier for the item in the dataset using the dtoolcore.utils.generate_identifier() function.

Parameters:handle – Unix like relpath
Returns:absolute path to the file in staging area that the user promises to create
put_annotation(annotation_name, annotation)

Store annotation so that it is accessible by the given name.

Parameters:
  • annotation_name – name of the annotation
  • annotation – JSON serialisable value or data structure
Raises:

DtoolCoreInvalidNameError if the annotation name is invalid

put_item(fpath, relpath)

Put an item into the dataset.

Parameters:
  • fpath – path to the item on disk
  • relpath – relative path name given to the item in the dataset as a handle
Returns:

the handle given to the item

put_readme(content)

Update the README of the dataset and backup the previous README.

The client is responsible for ensuring that the content is valid YAML.

Parameters:content – string to put into the README
put_tag(tag)

Annotate the dataset with a tag.

Parameters:tag – tag
Raises:DtoolCoreInvalidNameError if the tag is invalid
Raises:ValueError if the tag is not a string
staging_directory

Return the staging directory.

An ephemeral directory that only exists within the DataSetCreator context manger. It can be used as a location to write output files that need to be added to the dataset.

The easiest way to add a file here is to use the dtoolcore.DataSetCreator.get_staging_fpath() method to get a path to write content to.

If you write files directly to the staging directory you will need to register them using the dtoolcore.DataSetCreator.register_output_file() method.

uri

Return the dataset URI.

exception dtoolcore.DtoolCoreBrokenStagingPromise[source]
errno

exception errno

filename

exception filename

strerror

exception strerror

exception dtoolcore.DtoolCoreInvalidNameError[source]
exception dtoolcore.DtoolCoreKeyError[source]
exception dtoolcore.DtoolCoreTypeError[source]
exception dtoolcore.DtoolCoreValueError[source]
class dtoolcore.ProtoDataSet(uri, admin_metadata, config_path=None)[source]

Class for building up a dataset.

add_item_metadata(handle, key, value)[source]

Add metadata to a specific item in the dtoolcore.ProtoDataSet.

Parameters:
  • handle – handle representing the relative path of the item in the dtoolcore.ProtoDataSet
  • key – metadata key
  • value – metadata value
base_uri

Return the base URI of the dataset.

create()[source]

Create the required directory structure and admin metadata.

delete_tag(tag)

Delete a tag from a dataset.

Parameters:tag – tag
Raises:DtoolCoreKeyError if the tag does not exist
freeze(progressbar=None)[source]

Convert dtoolcore.ProtoDataSet to dtoolcore.DataSet.

classmethod from_uri(uri, config_path=None)[source]

Return an existing dtoolcore.ProtoDataSet from a URI.

Parameters:uri – unique resource identifier where the existing dtoolcore.ProtoDataSet is stored
Returns:dtoolcore.ProtoDataSet
generate_manifest(progressbar=None)

Return manifest generated from knowledge about contents.

get_annotation(annotation_name)

Return annotation.

Parameters:annotation_name – name of the annotation
Raises:DtoolCoreKeyError if the annotation does not exist
Returns:annotation
get_readme_content()

Return the content of the README describing the dataset.

Returns:content of README as a string
list_annotation_names()

Return list of annotation names.

list_tags()

Return the dataset’s tags as a list.

name

Return the name of the dataset.

put_annotation(annotation_name, annotation)

Store annotation so that it is accessible by the given name.

Parameters:
  • annotation_name – name of the annotation
  • annotation – JSON serialisable value or data structure
Raises:

DtoolCoreInvalidNameError if the annotation name is invalid

put_item(fpath, relpath)[source]

Put an item into the dataset.

Parameters:
  • fpath – path to the item on disk
  • relpath – relative path name given to the item in the dataset as a handle, i.e. a Unix-like relpath
Returns:

the handle given to the item

put_readme(content)[source]

Put content into the README of the dataset.

The client is responsible for ensuring that the content is valid YAML.

Parameters:content – string to put into the README
put_tag(tag)

Annotate the dataset with a tag.

Parameters:tag – tag
Raises:DtoolCoreInvalidNameError if the tag is invalid
Raises:DtoolCoreValueError if the tag is not a string
update_name(new_name)

Update the name of the proto dataset.

Parameters:new_name – the new name of the proto dataset
uri

Return the URI of the dataset.

uuid

Return the UUID of the dataset.

dtoolcore.copy(src_uri, dest_base_uri, config_path=None, progressbar=None)[source]

Copy a dataset to another location.

Parameters:
  • src_uri – URI of dataset to be copied
  • dest_base_uri – base of URI for copy target
  • config_path – path to dtool configuration file
Returns:

URI of new dataset

dtoolcore.copy_resume(src_uri, dest_base_uri, config_path=None, progressbar=None)[source]

Resume coping a dataset to another location.

Items that have been copied to the destination and have the same size as in the source dataset are skipped. All other items are copied across and the dataset is frozen.

Parameters:
  • src_uri – URI of dataset to be copied
  • dest_base_uri – base of URI for copy target
  • config_path – path to dtool configuration file
Returns:

URI of new dataset

dtoolcore.create_derived_proto_dataset(name, base_uri, source_dataset, readme_content='', creator_username=None)[source]

Return dtoolcore.ProtoDataSet instance.

It adds the “source_name”, “source_uri”, and “source_uuid” as annotations.

Parameters:
  • name – dataset name
  • base_uri – base URI for proto dataset
  • source_dataset – source dataset
  • readme_content – content of README as a string
  • creator_username – creator username
dtoolcore.create_proto_dataset(name, base_uri, readme_content='', creator_username=None)[source]

Return dtoolcore.ProtoDataSet instance.

Parameters:
  • name – dataset name
  • base_uri – base URI for proto dataset
  • readme_content – content of README as a string
  • creator_username – creator username
dtoolcore.generate_admin_metadata(name, creator_username=None)[source]

Return admin metadata as a dictionary.

dtoolcore.generate_proto_dataset(admin_metadata, base_uri, config_path=None)[source]

Return dtoolcore.ProtoDataSet instance.

Parameters:
  • admin_metadata – dataset administrative metadata
  • base_uri – base URI for proto dataset
  • config_path – path to dtool configuration file
dtoolcore.iter_datasets_in_base_uri(base_uri)[source]

Yield dtoolcore.DataSet instances present in the base URI.

Params base_uri:
 base URI
Returns:iterator yielding dtoolcore.DataSet instances
dtoolcore.iter_proto_datasets_in_base_uri(base_uri)[source]

Yield dtoolcore.ProtoDataSet instances present in the base URI.

Params base_uri:
 base URI
Returns:iterator yielding dtoolcore.ProtoDataSet instances