dtoolcore¶
API for creating and interacting with dtool datasets.
-
class
dtoolcore.
DataSet
(uri, admin_metadata, config_path=None)[source]¶ Class for reading the contents of a dataset.
-
base_uri
¶ Return the base URI of the dataset.
-
delete_tag
(tag)¶ Delete a tag from a dataset.
Parameters: tag – tag Raises: DtoolCoreKeyError if the tag does not exist
-
classmethod
from_uri
(uri, config_path=None)[source]¶ Return an existing
dtoolcore.DataSet
from a URI.Parameters: uri – unique resource identifier where the existing dtoolcore.DataSet
is storedReturns: dtoolcore.DataSet
-
generate_manifest
(progressbar=None)¶ Return manifest generated from knowledge about contents.
-
get_annotation
(annotation_name)¶ Return annotation.
Parameters: annotation_name – name of the annotation Raises: DtoolCoreKeyError if the annotation does not exist Returns: annotation
-
get_overlay
(overlay_name)[source]¶ Return overlay as a dictionary.
Parameters: overlay_name – name of the overlay Returns: overlay as a dictionary
-
get_readme_content
()¶ Return the content of the README describing the dataset.
Returns: content of README as a string
-
identifiers
¶ Return iterable of dataset item identifiers.
-
item_content_abspath
(identifier)[source]¶ Return absolute path at which item content can be accessed.
Parameters: identifier – item identifier Returns: absolute path from which the item content can be accessed
-
item_properties
(identifier)[source]¶ Return properties of the item with the given identifier.
Parameters: identifier – item identifier Returns: dictionary of item properties from the manifest
-
list_annotation_names
()¶ Return list of annotation names.
Return the dataset’s tags as a list.
-
name
¶ Return the name of the dataset.
-
put_annotation
(annotation_name, annotation)¶ Store annotation so that it is accessible by the given name.
Parameters: - annotation_name – name of the annotation
- annotation – JSON serialisable value or data structure
Raises: DtoolCoreInvalidNameError if the annotation name is invalid
-
put_overlay
(overlay_name, overlay)[source]¶ Store overlay so that it is accessible by the given name.
Parameters: - overlay_name – name of the overlay
- overlay – overlay must be a dictionary where the keys are identifiers in the dataset
Raises: DtoolCoreTypeError if the overlay is not a dictionary, DtoolCoreValueError if identifiers in overlay and dataset do not match DtoolCoreInvalidNameError if the overlay name is invalid
-
put_readme
(content)[source]¶ Update the README of the dataset and backup the previous README.
The client is responsible for ensuring that the content is valid YAML.
Parameters: content – string to put into the README
-
put_tag
(tag)¶ Annotate the dataset with a tag.
Parameters: tag – tag Raises: DtoolCoreInvalidNameError if the tag is invalid Raises: DtoolCoreValueError if the tag is not a string
-
update_name
(new_name)¶ Update the name of the proto dataset.
Parameters: new_name – the new name of the proto dataset
-
uri
¶ Return the URI of the dataset.
-
uuid
¶ Return the UUID of the dataset.
-
-
class
dtoolcore.
DataSetCreator
(name, base_uri, readme_content='', creator_username=None)[source]¶ Context manager for creating a dataset.
Inside the context manager one works on a proto dataset. When exiting the context manager the proto dataset is automatically frozen into a dataset, unless an exception has been raised in the context manager.
-
add_item_metadata
(handle, key, value)[source]¶ Add metadata to a specific item in the
dtoolcore.ProtoDataSet
.Parameters: - handle – handle representing the relative path of the item in the
dtoolcore.ProtoDataSet
- key – metadata key
- value – metadata value
- handle – handle representing the relative path of the item in the
-
name
¶ Return the dataset name.
-
prepare_staging_abspath_promise
(handle)[source]¶ Return abspath and handle to stage a file.
For getting access to an abspath that can be used to write output to. It is the responsibility of this method to create any missing subdirectories. It is the responsibility of the user to create the file associated with the abspath.
Use the abspath to create a file in the staging directory. The file will be added to the dataset when exiting the context handler.
The handle can be used to generate an identifier for the item in the dataset using the
dtoolcore.utils.generate_identifier()
function.Parameters: handle – Unix like relpath Returns: absolute path to the file in staging area that the user promises to create
-
put_annotation
(annotation_name, annotation)[source]¶ Store annotation so that it is accessible by the given name.
Parameters: - annotation_name – name of the annotation
- annotation – JSON serialisable value or data structure
Raises: DtoolCoreInvalidNameError if the annotation name is invalid
-
put_item
(fpath, relpath)[source]¶ Put an item into the dataset.
Parameters: - fpath – path to the item on disk
- relpath – relative path name given to the item in the dataset as a handle
Returns: the handle given to the item
-
put_readme
(content)[source]¶ Update the README of the dataset and backup the previous README.
The client is responsible for ensuring that the content is valid YAML.
Parameters: content – string to put into the README
-
put_tag
(tag)[source]¶ Annotate the dataset with a tag.
Parameters: tag – tag Raises: DtoolCoreInvalidNameError if the tag is invalid Raises: ValueError if the tag is not a string
-
staging_directory
¶ Return the staging directory.
An ephemeral directory that only exists within the DataSetCreator context manger. It can be used as a location to write output files that need to be added to the dataset.
The easiest way to add a file here is to use the
dtoolcore.DataSetCreator.get_staging_fpath()
method to get a path to write content to.If you write files directly to the staging directory you will need to register them using the
dtoolcore.DataSetCreator.register_output_file()
method.
-
uri
¶ Return the dataset URI.
-
-
class
dtoolcore.
DerivedDataSetCreator
(name, base_uri, source_dataset, readme_content='', creator_username=None)[source]¶ Context manager for creating a derived dataset.
A derived dataset automatically has information about the source dataset (name, URI and UUID) automatically added to the readme and to annotations. It adds the “source_name”, “source_uri”, and “source_uuid” as annotations and to the descriptive metadata in the readme.
Inside the context manager one works on a proto dataset. When exiting the context manager the proto dataset is automatically frozen into a dataset, unless an exception has been raised in the context manager.
-
add_item_metadata
(handle, key, value)¶ Add metadata to a specific item in the
dtoolcore.ProtoDataSet
.Parameters: - handle – handle representing the relative path of the item in the
dtoolcore.ProtoDataSet
- key – metadata key
- value – metadata value
- handle – handle representing the relative path of the item in the
-
name
¶ Return the dataset name.
-
prepare_staging_abspath_promise
(handle)¶ Return abspath and handle to stage a file.
For getting access to an abspath that can be used to write output to. It is the responsibility of this method to create any missing subdirectories. It is the responsibility of the user to create the file associated with the abspath.
Use the abspath to create a file in the staging directory. The file will be added to the dataset when exiting the context handler.
The handle can be used to generate an identifier for the item in the dataset using the
dtoolcore.utils.generate_identifier()
function.Parameters: handle – Unix like relpath Returns: absolute path to the file in staging area that the user promises to create
-
put_annotation
(annotation_name, annotation)¶ Store annotation so that it is accessible by the given name.
Parameters: - annotation_name – name of the annotation
- annotation – JSON serialisable value or data structure
Raises: DtoolCoreInvalidNameError if the annotation name is invalid
-
put_item
(fpath, relpath)¶ Put an item into the dataset.
Parameters: - fpath – path to the item on disk
- relpath – relative path name given to the item in the dataset as a handle
Returns: the handle given to the item
-
put_readme
(content)¶ Update the README of the dataset and backup the previous README.
The client is responsible for ensuring that the content is valid YAML.
Parameters: content – string to put into the README
-
put_tag
(tag)¶ Annotate the dataset with a tag.
Parameters: tag – tag Raises: DtoolCoreInvalidNameError if the tag is invalid Raises: ValueError if the tag is not a string
-
staging_directory
¶ Return the staging directory.
An ephemeral directory that only exists within the DataSetCreator context manger. It can be used as a location to write output files that need to be added to the dataset.
The easiest way to add a file here is to use the
dtoolcore.DataSetCreator.get_staging_fpath()
method to get a path to write content to.If you write files directly to the staging directory you will need to register them using the
dtoolcore.DataSetCreator.register_output_file()
method.
-
uri
¶ Return the dataset URI.
-
-
exception
dtoolcore.
DtoolCoreBrokenStagingPromise
[source]¶ -
errno
¶ exception errno
-
filename
¶ exception filename
-
strerror
¶ exception strerror
-
-
class
dtoolcore.
ProtoDataSet
(uri, admin_metadata, config_path=None)[source]¶ Class for building up a dataset.
-
add_item_metadata
(handle, key, value)[source]¶ Add metadata to a specific item in the
dtoolcore.ProtoDataSet
.Parameters: - handle – handle representing the relative path of the item in the
dtoolcore.ProtoDataSet
- key – metadata key
- value – metadata value
- handle – handle representing the relative path of the item in the
-
base_uri
¶ Return the base URI of the dataset.
-
delete_tag
(tag)¶ Delete a tag from a dataset.
Parameters: tag – tag Raises: DtoolCoreKeyError if the tag does not exist
-
freeze
(progressbar=None)[source]¶ Convert
dtoolcore.ProtoDataSet
todtoolcore.DataSet
.
-
classmethod
from_uri
(uri, config_path=None)[source]¶ Return an existing
dtoolcore.ProtoDataSet
from a URI.Parameters: uri – unique resource identifier where the existing dtoolcore.ProtoDataSet
is storedReturns: dtoolcore.ProtoDataSet
-
generate_manifest
(progressbar=None)¶ Return manifest generated from knowledge about contents.
-
get_annotation
(annotation_name)¶ Return annotation.
Parameters: annotation_name – name of the annotation Raises: DtoolCoreKeyError if the annotation does not exist Returns: annotation
-
get_readme_content
()¶ Return the content of the README describing the dataset.
Returns: content of README as a string
-
list_annotation_names
()¶ Return list of annotation names.
Return the dataset’s tags as a list.
-
name
¶ Return the name of the dataset.
-
put_annotation
(annotation_name, annotation)¶ Store annotation so that it is accessible by the given name.
Parameters: - annotation_name – name of the annotation
- annotation – JSON serialisable value or data structure
Raises: DtoolCoreInvalidNameError if the annotation name is invalid
-
put_item
(fpath, relpath)[source]¶ Put an item into the dataset.
Parameters: - fpath – path to the item on disk
- relpath – relative path name given to the item in the dataset as a handle, i.e. a Unix-like relpath
Returns: the handle given to the item
-
put_readme
(content)[source]¶ Put content into the README of the dataset.
The client is responsible for ensuring that the content is valid YAML.
Parameters: content – string to put into the README
-
put_tag
(tag)¶ Annotate the dataset with a tag.
Parameters: tag – tag Raises: DtoolCoreInvalidNameError if the tag is invalid Raises: DtoolCoreValueError if the tag is not a string
-
update_name
(new_name)¶ Update the name of the proto dataset.
Parameters: new_name – the new name of the proto dataset
-
uri
¶ Return the URI of the dataset.
-
uuid
¶ Return the UUID of the dataset.
-
-
dtoolcore.
copy
(src_uri, dest_base_uri, config_path=None, progressbar=None)[source]¶ Copy a dataset to another location.
Parameters: - src_uri – URI of dataset to be copied
- dest_base_uri – base of URI for copy target
- config_path – path to dtool configuration file
Returns: URI of new dataset
-
dtoolcore.
copy_resume
(src_uri, dest_base_uri, config_path=None, progressbar=None)[source]¶ Resume coping a dataset to another location.
Items that have been copied to the destination and have the same size as in the source dataset are skipped. All other items are copied across and the dataset is frozen.
Parameters: - src_uri – URI of dataset to be copied
- dest_base_uri – base of URI for copy target
- config_path – path to dtool configuration file
Returns: URI of new dataset
-
dtoolcore.
create_derived_proto_dataset
(name, base_uri, source_dataset, readme_content='', creator_username=None)[source]¶ Return
dtoolcore.ProtoDataSet
instance.It adds the “source_name”, “source_uri”, and “source_uuid” as annotations.
Parameters: - name – dataset name
- base_uri – base URI for proto dataset
- source_dataset – source dataset
- readme_content – content of README as a string
- creator_username – creator username
-
dtoolcore.
create_proto_dataset
(name, base_uri, readme_content='', creator_username=None)[source]¶ Return
dtoolcore.ProtoDataSet
instance.Parameters: - name – dataset name
- base_uri – base URI for proto dataset
- readme_content – content of README as a string
- creator_username – creator username
-
dtoolcore.
generate_admin_metadata
(name, creator_username=None)[source]¶ Return admin metadata as a dictionary.
-
dtoolcore.
generate_proto_dataset
(admin_metadata, base_uri, config_path=None)[source]¶ Return
dtoolcore.ProtoDataSet
instance.Parameters: - admin_metadata – dataset administrative metadata
- base_uri – base URI for proto dataset
- config_path – path to dtool configuration file
-
dtoolcore.
iter_datasets_in_base_uri
(base_uri)[source]¶ Yield
dtoolcore.DataSet
instances present in the base URI.Params base_uri: base URI Returns: iterator yielding dtoolcore.DataSet
instances
-
dtoolcore.
iter_proto_datasets_in_base_uri
(base_uri)[source]¶ Yield
dtoolcore.ProtoDataSet
instances present in the base URI.Params base_uri: base URI Returns: iterator yielding dtoolcore.ProtoDataSet
instances