cmeta.utils.files module

Reusable functions for safe loading, storing and caching of files

cMeta author and developer: (C) 2025-2026 Grigori Fursin

See the cMeta COPYRIGHT and LICENSE files in the project root for details.

Functions

cmeta.utils.files.apply_sharding_to_path(path: str, name: str, slices: list)[source]

Apply sharding to construct a full sharded directory path.

Combines a base path with sharded directory components generated from a name.

Parameters:
  • path – Base directory path to prepend to sharded path.

  • name – Name to shard.

  • slices – List of integers specifying shard lengths (e.g., [2, 2]).

Returns:

Dictionary with ‘return’: 0, ‘sharded_parts’: list of path components,

and ‘sharded_path’: full sharded path string. On error, ‘return’ > 0.

Return type:

dict

cmeta.utils.files.files_decode(files_base64)[source]

files_base64: dict {filename: base64_string} returns: dict {filename: binary_bytes}

cmeta.utils.files.files_encode(files)[source]

files: list of file paths returns: dict {filename: base64_string}

cmeta.utils.files.get_creation_time(path)[source]
cmeta.utils.files.get_latest_modification_time(path)[source]
cmeta.utils.files.get_latest_tree_modification_time(path)[source]

Return the maximum modification time (mtime) of the directory or any file/directory inside it (recursively). Works on Linux, macOS, and Windows.

cmeta.utils.files.is_path_within(base: str, target: str)[source]

Check if base path is within target path.

Determines if the base path is a subdirectory or file within the target path.

Parameters:
  • base (str) – Base path to check.

  • target (str) – Target path to check against.

Returns:

True if base is within target, False otherwise.

Return type:

bool

cmeta.utils.files.lock_path(path: str, timeout: int = 3, fail_on_error: bool = False, logger=None)[source]

Acquire a lock on a file or directory path.

Parameters:
  • path (str) – Path to lock (file or directory).

  • timeout (int) – Seconds to wait for lock acquisition. Default is 3.

  • fail_on_error (bool) – If True, raises exception on error instead of returning error dict.

  • logger – Optional logger for debug messages.

Returns:

Dictionary with ‘return’: 0 and ‘file_lock’ on success,

or ‘return’ > 0 and ‘error’ on failure.

Return type:

dict

cmeta.utils.files.quote_path(path)[source]
cmeta.utils.files.read_file(filepath: str, fail_on_error: bool = False, logger=None, encoding: str | None = None)[source]

Read file without locking (convenience wrapper for safe_read_file).

Parameters:
  • filepath (str) – Path to the file to read.

  • fail_on_error (bool) – If True, raises exception on error instead of returning error dict.

  • logger – Optional logger for debug messages.

  • encoding (str | None) – Character encoding for text files.

Returns:

Dictionary with ‘return’: 0 and ‘data’, or ‘return’ > 0 and ‘error’.

Return type:

dict

cmeta.utils.files.safe_delete_directory(dirpath: str, timeout: int = 3, fail_on_error: bool = False, logger=None)[source]

Safely and recursively deletes a directory with all its contents. Works cross-platform (Windows, Linux, MacOS) and handles special cases like .git directories with read-only attributes.

If lock acquisition fails but directory doesn’t exist, returns success.

Parameters:
  • dirpath (str) – Full path to directory to delete.

  • timeout (int) – Lock timeout in seconds.

  • fail_on_error (bool) – Whether to raise exceptions or return error dict.

  • logger – Logger instance for debug messages.

Returns:

Dict with ‘return’ (0=success, non-zero=error) and optional ‘error’.

cmeta.utils.files.safe_delete_directory_if_empty(dirpath: str)[source]

Delete directory only if it’s empty (no files or subdirectories).

Quickly checks if directory is empty and removes it. Ignores all errors (permissions, race conditions, etc.) for safe cleanup operations.

Parameters:

dirpath (str) – Path to the directory to potentially delete.

cmeta.utils.files.safe_delete_directory_if_empty_with_sharding(artifact_path: str, sharding_slices: list | None = None)[source]

Safely delete empty directories up the hierarchy based on sharding configuration.

Parameters:
  • artifact_path (str) – Path to the artifact directory.

  • sharding_slices (list | None) – Sharding configuration from category meta.

Returns:

A cMeta dictionary with the following keys
  • return (int): 0 if success, >0 if error.

  • error (str): Error message if return > 0.

Return type:

dict

cmeta.utils.files.safe_read_file(filepath: str, encoding: str | None = None, lock: bool = False, keep_locked: bool = False, timeout: int = 3, retry_if_not_found: int = 0, fail_on_error: bool = False, logger=None)[source]

Safely read file with optional locking and retry logic.

Provides thread/process-safe file reading with file locking support. Cleans up lock on error. If keep_locked=True and lock=True, maintains the lock after successful read (caller must release).

WARNING: This function uses blocking I/O operations. Not suitable for async contexts - use aiofiles and async locking instead.

Parameters:
  • filepath – Path to the file to read.

  • encoding – Character encoding for text files. If None, auto-detected.

  • lock – If True, acquires file lock before reading.

  • keep_locked – If True with lock=True, keeps lock after read (returns in result).

  • timeout – Seconds to wait for lock acquisition. Default is 3.

  • retry_if_not_found – Number of retry attempts if file not found.

  • fail_on_error – If True, raises exception on error instead of returning error dict.

  • logger – Optional logger for debug messages.

Returns:

Dictionary with ‘return’: 0, ‘data’, ‘filepath’, and optionally ‘last_modified’

and ‘file_lock’ (if keep_locked=True). Returns ‘return’ > 0 and ‘error’ on failure.

Return type:

dict

cmeta.utils.files.safe_read_file_via_cache(filepath: str, cache: dict, timeout: int = 10, fail_on_error: bool = False, logger=None)[source]

Reads a file with caching based on file modification timestamp. Automatically reloads if file has been modified since last cache.

WARNING: This function is NOT thread-safe for async usage. The cache dictionary can be corrupted by concurrent access, and it uses blocking I/O operations.

Parameters:
  • filepath (str) – Path to the file to read.

  • cache (dict) – Dictionary to store cached data (modified in-place).

  • timeout (int) – Lock timeout for file operations.

  • fail_on_error (bool) – Whether to raise exceptions or return error dict.

  • logger – Optional logger for debug messages

Returns:

Dict with ‘return’ (0=success, non-zero=error) and ‘data’ or ‘error’.

cmeta.utils.files.safe_read_yaml_or_json(filepath: str, lock: bool = False, keep_locked: bool = False, timeout: int = 3, fail_on_error: bool = False, retry_if_not_found: int = 0, logger=None)[source]

Safely reads a YAML or JSON file by trying YAML first, then JSON. Removes any existing extension from filepath and tries .yaml, then .json.

Parameters:
  • filepath (str) – Path to file (extension will be ignored/removed).

  • lock (bool) – Whether to use file locking.

  • keep_locked (bool) – Whether to keep lock after successful read.

  • timeout (int) – Lock timeout.

  • fail_on_error (bool) – Whether to raise exceptions or return error dict.

  • retry_if_not_found (int) – Number of retries if file not found.

  • logger – Logger instance for debug messages.

Returns:

Dict with ‘return’ (0=success, non-zero=error) and ‘data’ or ‘error’. If keep_locked=True and lock=True, also returns ‘file_lock’.

cmeta.utils.files.safe_write_file(filepath: str, data, timeout: int = 3, file_lock=None, atomic: bool = False, encoding: str | None = None, fail_on_error: bool = False, logger=None, sort_keys: bool = True)[source]

Safely write data to file with locking and optional atomic write.

Provides thread/process-safe file writing with file locking support. Supports atomic writes via temp file + rename for data integrity.

WARNING: This function uses blocking I/O operations. Not suitable for async contexts - use aiofiles and async locking instead.

Parameters:
  • filepath (str) – Path where file should be written.

  • data – Data to write (dict/list for JSON/YAML, any object for pickle/text).

  • timeout (int) – Seconds to wait for lock acquisition. Default is 3.

  • file_lock – Existing lock to use. If None, acquires new lock.

  • atomic (bool) – If True, writes to temp file then renames for atomicity.

  • encoding (str | None) – Character encoding for text files. If None, auto-detected.

  • fail_on_error (bool) – If True, raises exception on error instead of returning error dict.

  • logger – Optional logger for debug messages.

  • sort_keys (bool) – If True, sorts dictionary keys in JSON/YAML output.

Returns:

Dictionary with ‘return’: 0 on success, or ‘return’ > 0 and ‘error’ on failure.

Return type:

dict

cmeta.utils.files.shard_name(name: str, slices=None)[source]

Apply sharding to a single path component.

Generates shard directory names from a name string based on specified slice lengths. If the name is shorter than required, uses underscore-filled placeholders to ensure predictable directory structure.

Parameters:
  • name – Name to shard (file or directory name).

  • slices – List of integers specifying shard lengths (e.g., [2, 2] creates 2-char shards). None means no sharding.

Returns:

List containing shard directory names followed by the original name.

Example: shard_name(‘example’, [2, 2]) -> [‘ex’, ‘am’, ‘example’]

Return type:

list

cmeta.utils.files.unlock_path(path: str, file_lock, fail_on_error: bool = False, logger=None)[source]

Release a lock on a file or directory path.

Parameters:
  • path (str) – Path to unlock (file or directory).

  • file_lock – FileLock object from lock_path().

  • fail_on_error (bool) – If True, raises exception on error instead of returning error dict.

  • logger – Optional logger for debug messages.

Returns:

Error dict with ‘return’ > 0 and ‘error’ on failure, None on success.

Return type:

dict

cmeta.utils.files.unzip(filename: str, path: str | None = None, remove_directories: int = 0, skip_directories: list | None = None, overwrite: bool = True, clean: bool = False, fail_on_error: bool = False)[source]

Extract a ZIP archive to a directory.

Parameters:
  • filename (str) – Path to ZIP file to extract.

  • path (str | None) – Destination directory (defaults to current directory).

  • remove_directories (int) – Number of leading directory levels to strip from paths.

  • skip_directories (list | None) – List of directory names to skip during extraction.

  • overwrite (bool) – If True, overwrite existing files.

  • clean (bool) – If True, delete ZIP file after successful extraction.

  • fail_on_error (bool) – If True, raises exception on error instead of returning error dict.

Returns:

Dictionary with ‘return’: 0 on success, or ‘return’ > 0 and ‘error’ on failure.

Return type:

dict

cmeta.utils.files.write_file(filepath: str, data, encoding: str | None = None, fail_on_error: bool = False, logger=None, sort_keys: bool = True, file_format: str | None = None, newline: str = '\n')[source]

Write data to file with format-specific serialization.

Automatically serializes data based on file format (JSON, YAML, pickle, or text).

Args:

filepath (str): Path where file should be written. data: Data to write (dict/list for JSON/YAML, any object for pickle/text). encoding (str | None): Character encoding for text files. If None, auto-detected. fail_on_error (bool): If True, raises exception on error instead of returning error dict. logger: Optional logger for debug messages. sort_keys (bool): If True, sorts dictionary keys in JSON/YAML output. file_format (str | None): Force specific format (‘json’, ‘yaml’, ‘pickle’, ‘text’). If None, auto-detected. newline (str): Newline character for text files. Default is ‘

‘.

Returns:

dict: Dictionary with ‘return’: 0 on success, or ‘return’ > 0 and ‘error’ on failure.

cmeta.utils.files.zip_directory(source_dir: str, output_path: str, skip_directories: list | None = None, fail_on_error: bool = True, logger=None)[source]

Creates a zip archive from a directory.

Parameters:
  • source_dir (str) – Path to the directory to zip.

  • output_path (str) – Path where the zip file will be created.

  • skip_directories (list | None) – List of directory names to skip (e.g., [‘.git’, ‘__pycache__’]).

  • fail_on_error (bool) – Whether to raise exceptions or return error dict.

  • logger – Logger instance for debug messages

Returns:

Dict with ‘return’ (0=success, non-zero=error) and optional ‘error’.

Constants

cmeta.utils.files.ERROR_CODE_FILE_NOT_FOUND = 16

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

cmeta.utils.files.LOCK_SUFFIX = '.lock'

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

cmeta.utils.files.RETRY_DELAY = 0.1

Convert a string or number to a floating point number, if possible.

cmeta.utils.files.RETRY_DELETE_ATTEMPTS = 5

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

cmeta.utils.files.RETRY_NOT_FOUND_FILE = 10

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

cmeta.utils.files.RETRY_NOT_FOUND_INDEX_FILE = 2

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

cmeta.utils.files.RETRY_REPLACE_FILE = 10

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

cmeta.utils.files.RETRY_TIMESTAMP_FILE = 10

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4