PYME.IO.cluster_directory module¶
Work in progress … refactor some of the listing stuff out of clusterIO to
reduce the size of clusterIO
reduce duplication between, e.g. locate and listdirectory
ultimately allow pluggable directory management / caching, e.g. to enable a central directory server (unclear if this would solve current performance issues, but potentially worth a try).
- class PYME.IO.cluster_directory.DirectoryInfoManager(ns=None, serverfilter='')¶
Bases:
object
- cglob(pattern)¶
Find files matching a given glob on the cluster. Analogous to the python glob.glob function.
- Parameters
- patternstring glob
- serverfiltercluster name (optional)
- Returns
- a list of files matching the glob
- property dataservers¶
Find all the data servers belonging to the cluster, caching the results
- exists(name)¶
Test whether a file exists on the cluster. Analogue to os.path.exists for local files.
- Parameters
- namestring, file path
- serverfiltername of the cluster (optional)
- Returns
- True if file exists, else False
- isdir(name)¶
Tests if a given path on the cluster is a directory. Analogous to os.path.isdir
- list_single_node_dir(dirurl, nRetries=1, timeout=10, strict_caching=False)¶
List the directory on a single node
- Parameters
- dirurl
- nRetries
- timeout
- Returns
- listdir(dirname)¶
Lists the contents of a directory on the cluster. Similar to os.listdir, but directories are indicated by a trailing slash
- listdirectory(dirname, timeout=5)¶
Lists the contents of a directory on the cluster.
Returns a dictionary mapping filenames to clusterListing.FileInfo named tuples.
- locate_file(filename, return_first_hit=False)¶
Searches the cluster to find which server(s) a given file is stored on
- Parameters
- filenamestr
The file name
- return_first_hitbool
Whether to try and find all locations, or return when we find the first copy
- Returns
- register_file(filename, url, size)¶
Call after uploading a new file so we can update our caches
- PYME.IO.cluster_directory.get_ns()¶