PYME.IO.cluster_directory module

Work in progress … refactor some of the listing stuff out of clusterIO to

  1. reduce the size of clusterIO

  2. reduce duplication between, e.g. locate and listdirectory

  3. ultimately allow pluggable directory management / caching, e.g. to enable a central directory server (unclear if this would solve current performance issues, but potentially worth a try).

class PYME.IO.cluster_directory.DirectoryInfoManager(ns=None, serverfilter='')

Bases: object

cglob(pattern)

Find files matching a given glob on the cluster. Analogous to the python glob.glob function.

Parameters
patternstring glob
serverfiltercluster name (optional)
Returns
a list of files matching the glob
property dataservers

Find all the data servers belonging to the cluster, caching the results

exists(name)

Test whether a file exists on the cluster. Analogue to os.path.exists for local files.

Parameters
namestring, file path
serverfiltername of the cluster (optional)
Returns
True if file exists, else False
isdir(name)

Tests if a given path on the cluster is a directory. Analogous to os.path.isdir

list_single_node_dir(dirurl, nRetries=1, timeout=10, strict_caching=False)

List the directory on a single node

Parameters
dirurl
nRetries
timeout
Returns
listdir(dirname)

Lists the contents of a directory on the cluster. Similar to os.listdir, but directories are indicated by a trailing slash

listdirectory(dirname, timeout=5)

Lists the contents of a directory on the cluster.

Returns a dictionary mapping filenames to clusterListing.FileInfo named tuples.

locate_file(filename, return_first_hit=False)

Searches the cluster to find which server(s) a given file is stored on

Parameters
filenamestr

The file name

return_first_hitbool

Whether to try and find all locations, or return when we find the first copy

Returns
register_file(filename, url, size)

Call after uploading a new file so we can update our caches

PYME.IO.cluster_directory.get_ns()