cmeta.utils.files module

Reusable functions for safe loading, storing and caching of files

cMeta author and developer: (C) 2025-2026 Grigori Fursin

See the cMeta COPYRIGHT and LICENSE files in the project root for details.

Functions

cmeta.utils.files.apply_sharding_to_path(path: str, name: str, slices: list)[source]

Apply sharding to construct a full sharded directory path.

Combines a base path with sharded directory components generated from a name.

Parameters:
  • path – Base directory path to prepend to sharded path.

  • name – Name to shard.

  • slices – List of integers specifying shard lengths (e.g., [2, 2]).

Returns:

Dictionary with ‘return’: 0, ‘sharded_parts’: list of path components,

and ‘sharded_path’: full sharded path string. On error, ‘return’ > 0.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.ask_to_delete(con, force, path, name=None, text=None, space=False)[source]

Prompt for deletion confirmation in console mode.

Parameters:
  • con – If True, print output to console.

  • force – If True, force operation.

  • path – Filesystem path.

  • name – Object or artifact name.

  • text – Value for text.

  • space – Indentation prefix for console output.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.cpath(path)[source]

Normalize and quote a path string for shell usage.

Parameters:

path – Filesystem path.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.diff_env(old: dict[str, str], new: dict[str, str])[source]

Diff two environment dicts and return added/removed variables.

Parameters:
  • old (dict[str, str]) – Old environment dictionary.

  • new (dict[str, str]) – New environment dictionary.

Returns:

Dictionary with ‘env_added’ and ‘env_removed’ keys containing

usable environment dictionaries with fuzzy PATH expansion.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.files_decode(files_base64)[source]

files_base64: dict {filename: base64_string} returns: dict {filename: binary_bytes}

Parameters:

files_base64 – Value for files base64.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.files_encode(files)[source]

files: list of file paths returns: dict {filename: base64_string}

Parameters:

files – Value for files.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.gen_temp_filepath(template=None)[source]

Generate a temporary file path using an optional template.

Parameters:

template – Value for template.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.get_creation_time(path)[source]

Return creation (or closest available) timestamp for a path.

Parameters:

path – Filesystem path.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.get_latest_modification_time(path)[source]

Return last modification timestamp for a path as a datetime object.

Parameters:

path – Filesystem path.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.get_latest_tree_modification_time(path)[source]

Return the maximum modification time (mtime) of the directory or any file/directory inside it (recursively). Works on Linux, macOS, and Windows.

Parameters:

path – Filesystem path.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.is_dir_empty(path, clean=False)[source]

Check if a directory is empty and optionally remove it.

Parameters:
  • path – Filesystem path.

  • clean – Value for clean.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.is_dir_within_path(base: str, directory: str)[source]

Check whether a child directory name resolves under a base path.

Parameters:
  • base – Value for base.

  • directory – Directory path.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.is_path_within(base: str, target: str)[source]

Check if base path is within target path.

Determines if the base path is a subdirectory or file within the target path.

Parameters:
  • base (str) – Base path to check.

  • target (str) – Target path to check against.

Returns:

True if base is within target, False otherwise.

Return type:

bool

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.load_files(path, load_files, fail_on_error=False, logger=None, timeout=1)[source]

Load selected files from a directory using safe readers.

Parameters:
  • path – Filesystem path.

  • load_files – Value for load files.

  • fail_on_error – Value for fail on error.

  • logger – Value for logger.

  • timeout – Timeout value in seconds.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.lock_path(path: str, timeout: int = 3, fail_on_error: bool = False, logger=None)[source]

Acquire a lock on a file or directory path.

Parameters:
  • path (str) – Path to lock (file or directory).

  • timeout (int) – Seconds to wait for lock acquisition. Default is 3.

  • fail_on_error (bool) – If True, raises exception on error instead of returning error dict.

  • logger – Optional logger for debug messages.

Returns:

Dictionary with ‘return’: 0 and ‘file_lock’ on success,

or ‘return’ > 0 and ‘error’ on failure.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.md5sum(path, chunk_size=100000)[source]

Calculate md5sum

Parameters:
  • path – Filesystem path.

  • chunk_size – Number of bytes to read per iteration when hashing file content.

Returns:

  • return (int): return code == 0 if no error and >0 if error

  • (error) (str): error string if return>0

  • md5sum (str): md5sum of the give file

Return type:

(CM return dict)

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.parse_env_dump(data: str) dict[str, str][source]

Parse KEY=VALUE lines into an environment dictionary.

Parameters:

data – Input data object.

Returns:

Result value.

Return type:

dict[str, str]

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.quote_path(path)[source]

Add double quotes around a path when it contains spaces.

Parameters:

path – Filesystem path.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.read_file(filepath: str, fail_on_error: bool = False, logger=None, encoding: str = None)[source]

Read file without locking (convenience wrapper for safe_read_file).

Parameters:
  • filepath (str) – Path to the file to read.

  • fail_on_error (bool) – If True, raises exception on error instead of returning error dict.

  • logger – Optional logger for debug messages.

  • encoding (str | None) – Character encoding for text files.

Returns:

Dictionary with ‘return’: 0 and ‘data’, or ‘return’ > 0 and ‘error’.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.remove_files_and_dirs_in_path(path, pattern='*', ignore=None)[source]

Recursively remove files and directories in path matching pattern, while ignoring any names matching items in ignore.

Parameters:
  • path (str or Path) – Root directory to operate on.

  • pattern (str) – Wildcard pattern (fnmatch-style) to match files/directories.

  • ignore (list[str], optional) – List of wildcard patterns for files/directories to ignore. Matching is done against the basename.

Examples

remove_files_and_dirs_in_path(

path=”.”, pattern=”.log”, ignore=[“keep.log”, “important_”]

)

Parameters:
  • path – Filesystem path.

  • pattern – Value for pattern.

  • ignore – Value for ignore.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.safe_delete_directory(dirpath: str, timeout: int = 3, fail_on_error: bool = False, logger=None)[source]

Safely and recursively deletes a directory with all its contents. Works cross-platform (Windows, Linux, MacOS) and handles special cases like .git directories with read-only attributes.

If lock acquisition fails but directory doesn’t exist, returns success.

Parameters:
  • dirpath (str) – Full path to directory to delete.

  • timeout (int) – Lock timeout in seconds.

  • fail_on_error (bool) – Whether to raise exceptions or return error dict.

  • logger – Logger instance for debug messages.

Returns:

Dict with ‘return’ (0=success, non-zero=error) and optional ‘error’.

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.safe_delete_directory_if_empty(dirpath: str)[source]

Delete directory only if it’s empty (no files or subdirectories).

Quickly checks if directory is empty and removes it. Ignores all errors (permissions, race conditions, etc.) for safe cleanup operations.

Parameters:

dirpath (str) – Path to the directory to potentially delete.

Returns:

Operation result.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.safe_delete_directory_if_empty_with_sharding(artifact_path: str, sharding_slices: list = None)[source]

Safely delete empty directories up the hierarchy based on sharding configuration.

Parameters:
  • artifact_path (str) – Path to the artifact directory.

  • sharding_slices (list | None) – Sharding configuration from category meta.

Returns:

A cMeta dictionary with the following keys
  • return (int): 0 if success, >0 if error.

  • error (str): Error message if return > 0.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.safe_read_file(filepath: str, encoding: str = None, lock: bool = False, keep_locked: bool = False, timeout: int = 3, retry_if_not_found: int = 0, fail_on_error: bool = False, logger=None, get_last_modified: bool = False)[source]

Safely read file with optional locking and retry logic.

Provides thread/process-safe file reading with file locking support. Cleans up lock on error. If keep_locked=True and lock=True, maintains the lock after successful read (caller must release).

WARNING: This function uses blocking I/O operations. Not suitable for async contexts - use aiofiles and async locking instead.

Parameters:
  • filepath – Path to the file to read.

  • encoding – Character encoding for text files. If None, auto-detected.

  • lock – If True, acquires file lock before reading.

  • keep_locked – If True with lock=True, keeps lock after read (returns in result).

  • timeout – Seconds to wait for lock acquisition. Default is 3.

  • retry_if_not_found – Number of retry attempts if file not found.

  • fail_on_error – If True, raises exception on error instead of returning error dict.

  • logger – Optional logger for debug messages.

  • get_last_modified (bool) – If True, include file modification timestamp in the result.

Returns:

Dictionary with ‘return’: 0, ‘data’, ‘filepath’, and optionally ‘last_modified’

and ‘file_lock’ (if keep_locked=True). Returns ‘return’ > 0 and ‘error’ on failure.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.safe_read_file_via_cache(filepath: str, cache: dict, timeout: int = 10, fail_on_error: bool = False, logger=None)[source]

Reads a file with caching based on file modification timestamp. Automatically reloads if file has been modified since last cache.

WARNING: This function is NOT thread-safe for async usage. The cache dictionary can be corrupted by concurrent access, and it uses blocking I/O operations.

Parameters:
  • filepath (str) – Path to the file to read.

  • cache (dict) – Dictionary to store cached data (modified in-place).

  • timeout (int) – Lock timeout for file operations.

  • fail_on_error (bool) – Whether to raise exceptions or return error dict.

  • logger – Optional logger for debug messages

Returns:

Dict with ‘return’ (0=success, non-zero=error) and ‘data’ or ‘error’.

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.safe_read_yaml_or_json(filepath: str, lock: bool = False, keep_locked: bool = False, timeout: int = 3, fail_on_error: bool = False, retry_if_not_found: int = 0, logger=None)[source]

Safely reads a YAML or JSON file by trying YAML first, then JSON. Removes any existing extension from filepath and tries .yaml, then .json.

Parameters:
  • filepath (str) – Path to file (extension will be ignored/removed).

  • lock (bool) – Whether to use file locking.

  • keep_locked (bool) – Whether to keep lock after successful read.

  • timeout (int) – Lock timeout.

  • fail_on_error (bool) – Whether to raise exceptions or return error dict.

  • retry_if_not_found (int) – Number of retries if file not found.

  • logger – Logger instance for debug messages.

Returns:

Dict with ‘return’ (0=success, non-zero=error) and ‘data’ or ‘error’. If keep_locked=True and lock=True, also returns ‘file_lock’.

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.safe_write_file(filepath: str, data, timeout: int = 3, file_lock=None, atomic: bool = False, encoding: str = None, fail_on_error: bool = False, logger=None, sort_keys: bool = True)[source]

Safely write data to file with locking and optional atomic write.

Provides thread/process-safe file writing with file locking support. Supports atomic writes via temp file + rename for data integrity.

WARNING: This function uses blocking I/O operations. Not suitable for async contexts - use aiofiles and async locking instead.

Parameters:
  • filepath (str) – Path where file should be written.

  • data – Data to write (dict/list for JSON/YAML, any object for pickle/text).

  • timeout (int) – Seconds to wait for lock acquisition. Default is 3.

  • file_lock – Existing lock to use. If None, acquires new lock.

  • atomic (bool) – If True, writes to temp file then renames for atomicity.

  • encoding (str | None) – Character encoding for text files. If None, auto-detected.

  • fail_on_error (bool) – If True, raises exception on error instead of returning error dict.

  • logger – Optional logger for debug messages.

  • sort_keys (bool) – If True, sorts dictionary keys in JSON/YAML output.

Returns:

Dictionary with ‘return’: 0 on success, or ‘return’ > 0 and ‘error’ on failure.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.shard_name(name: str, slices=None)[source]

Apply sharding to a single path component.

Generates shard directory names from a name string based on specified slice lengths. If the name is shorter than required, uses underscore-filled placeholders to ensure predictable directory structure.

Parameters:
  • name – Name to shard (file or directory name).

  • slices – List of integers specifying shard lengths (e.g., [2, 2] creates 2-char shards). None means no sharding.

Returns:

List containing shard directory names followed by the original name.

Example: shard_name(‘example’, [2, 2]) -> [‘ex’, ‘am’, ‘example’]

Return type:

list

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.unlock_path(path: str, file_lock, fail_on_error: bool = False, logger=None)[source]

Release a lock on a file or directory path.

Parameters:
  • path (str) – Path to unlock (file or directory).

  • file_lock – FileLock object from lock_path().

  • fail_on_error (bool) – If True, raises exception on error instead of returning error dict.

  • logger – Optional logger for debug messages.

Returns:

Error dict with ‘return’ > 0 and ‘error’ on failure, None on success.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.unzip(filename: str, path: str = None, remove_directories: int = 0, skip_directories: list = None, overwrite: bool = True, clean: bool = False, fail_on_error: bool = False)[source]

Extract a ZIP archive to a directory.

Parameters:
  • filename (str) – Path to ZIP file to extract.

  • path (str | None) – Destination directory (defaults to current directory).

  • remove_directories (int) – Number of leading directory levels to strip from paths.

  • skip_directories (list | None) – List of directory names to skip during extraction.

  • overwrite (bool) – If True, overwrite existing files.

  • clean (bool) – If True, delete ZIP file after successful extraction.

  • fail_on_error (bool) – If True, raises exception on error instead of returning error dict.

Returns:

Dictionary with ‘return’: 0 on success, or ‘return’ > 0 and ‘error’ on failure.

Return type:

dict

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.write_file(filepath: str, data, encoding: str = None, fail_on_error: bool = False, logger=None, sort_keys: bool = True, file_format: str = None, newline: str = '\n')[source]

Write data to file with format-specific serialization.

Automatically serializes data based on file format (JSON, YAML, pickle, or text).

Args:

filepath (str): Path where file should be written. data: Data to write (dict/list for JSON/YAML, any object for pickle/text). encoding (str | None): Character encoding for text files. If None, auto-detected. fail_on_error (bool): If True, raises exception on error instead of returning error dict. logger: Optional logger for debug messages. sort_keys (bool): If True, sorts dictionary keys in JSON/YAML output. file_format (str | None): Force specific format (‘json’, ‘yaml’, ‘pickle’, ‘text’). If None, auto-detected. newline (str): Newline character for text files. Default is ‘

‘.

Returns:

dict: Dictionary with ‘return’: 0 on success, or ‘return’ > 0 and ‘error’ on failure.

Raises:

Exception – Propagated runtime errors, if any.

cmeta.utils.files.zip_directory(source_dir: str, output_path: str, skip_directories: list = None, fail_on_error: bool = True, logger=None, skip_files: list = None)[source]

Creates a zip archive from a directory.

Parameters:
  • source_dir (str) – Path to the directory to zip.

  • output_path (str) – Path where the zip file will be created.

  • skip_directories (list | None) – List of directory names to skip (e.g., [‘.git’, ‘__pycache__’]).

  • fail_on_error (bool) – Whether to raise exceptions or return error dict.

  • logger – Logger instance for debug messages

  • skip_files (list) – List of file names or glob patterns to exclude from the archive.

Returns:

Dict with ‘return’ (0=success, non-zero=error) and optional ‘error’.

Raises:

Exception – Propagated runtime errors, if any.

Constants

cmeta.utils.files.ERROR_CODE_FILE_NOT_FOUND = 16

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating-point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

cmeta.utils.files.LOCK_SUFFIX = '.lock'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to ‘utf-8’. errors defaults to ‘strict’.

cmeta.utils.files.RETRY_DELAY = 0.1

Convert a string or number to a floating-point number, if possible.

cmeta.utils.files.RETRY_DELETE_ATTEMPTS = 5

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating-point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

cmeta.utils.files.RETRY_NOT_FOUND_FILE = 10

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating-point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

cmeta.utils.files.RETRY_NOT_FOUND_INDEX_FILE = 2

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating-point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

cmeta.utils.files.RETRY_REPLACE_FILE = 10

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating-point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

cmeta.utils.files.RETRY_TIMESTAMP_FILE = 10

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating-point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4