cmeta.utils.files module
Reusable functions for safe loading, storing and caching of files
cMeta author and developer: (C) 2025-2026 Grigori Fursin
See the cMeta COPYRIGHT and LICENSE files in the project root for details.
Functions
- cmeta.utils.files.apply_sharding_to_path(path: str, name: str, slices: list)[source]
Apply sharding to construct a full sharded directory path.
Combines a base path with sharded directory components generated from a name.
- Parameters:
path – Base directory path to prepend to sharded path.
name – Name to shard.
slices – List of integers specifying shard lengths (e.g., [2, 2]).
- Returns:
- Dictionary with ‘return’: 0, ‘sharded_parts’: list of path components,
and ‘sharded_path’: full sharded path string. On error, ‘return’ > 0.
- Return type:
dict
- cmeta.utils.files.files_decode(files_base64)[source]
files_base64: dict {filename: base64_string} returns: dict {filename: binary_bytes}
- cmeta.utils.files.files_encode(files)[source]
files: list of file paths returns: dict {filename: base64_string}
- cmeta.utils.files.get_latest_tree_modification_time(path)[source]
Return the maximum modification time (mtime) of the directory or any file/directory inside it (recursively). Works on Linux, macOS, and Windows.
- cmeta.utils.files.is_path_within(base: str, target: str)[source]
Check if base path is within target path.
Determines if the base path is a subdirectory or file within the target path.
- Parameters:
base (str) – Base path to check.
target (str) – Target path to check against.
- Returns:
True if base is within target, False otherwise.
- Return type:
bool
- cmeta.utils.files.lock_path(path: str, timeout: int = 3, fail_on_error: bool = False, logger=None)[source]
Acquire a lock on a file or directory path.
- Parameters:
path (str) – Path to lock (file or directory).
timeout (int) – Seconds to wait for lock acquisition. Default is 3.
fail_on_error (bool) – If True, raises exception on error instead of returning error dict.
logger – Optional logger for debug messages.
- Returns:
- Dictionary with ‘return’: 0 and ‘file_lock’ on success,
or ‘return’ > 0 and ‘error’ on failure.
- Return type:
dict
- cmeta.utils.files.read_file(filepath: str, fail_on_error: bool = False, logger=None, encoding: str | None = None)[source]
Read file without locking (convenience wrapper for safe_read_file).
- Parameters:
filepath (str) – Path to the file to read.
fail_on_error (bool) – If True, raises exception on error instead of returning error dict.
logger – Optional logger for debug messages.
encoding (str | None) – Character encoding for text files.
- Returns:
Dictionary with ‘return’: 0 and ‘data’, or ‘return’ > 0 and ‘error’.
- Return type:
dict
- cmeta.utils.files.safe_delete_directory(dirpath: str, timeout: int = 3, fail_on_error: bool = False, logger=None)[source]
Safely and recursively deletes a directory with all its contents. Works cross-platform (Windows, Linux, MacOS) and handles special cases like .git directories with read-only attributes.
If lock acquisition fails but directory doesn’t exist, returns success.
- Parameters:
dirpath (str) – Full path to directory to delete.
timeout (int) – Lock timeout in seconds.
fail_on_error (bool) – Whether to raise exceptions or return error dict.
logger – Logger instance for debug messages.
- Returns:
Dict with ‘return’ (0=success, non-zero=error) and optional ‘error’.
- cmeta.utils.files.safe_delete_directory_if_empty(dirpath: str)[source]
Delete directory only if it’s empty (no files or subdirectories).
Quickly checks if directory is empty and removes it. Ignores all errors (permissions, race conditions, etc.) for safe cleanup operations.
- Parameters:
dirpath (str) – Path to the directory to potentially delete.
- cmeta.utils.files.safe_delete_directory_if_empty_with_sharding(artifact_path: str, sharding_slices: list | None = None)[source]
Safely delete empty directories up the hierarchy based on sharding configuration.
- Parameters:
artifact_path (str) – Path to the artifact directory.
sharding_slices (list | None) – Sharding configuration from category meta.
- Returns:
- A cMeta dictionary with the following keys
return (int): 0 if success, >0 if error.
error (str): Error message if return > 0.
- Return type:
dict
- cmeta.utils.files.safe_read_file(filepath: str, encoding: str | None = None, lock: bool = False, keep_locked: bool = False, timeout: int = 3, retry_if_not_found: int = 0, fail_on_error: bool = False, logger=None)[source]
Safely read file with optional locking and retry logic.
Provides thread/process-safe file reading with file locking support. Cleans up lock on error. If keep_locked=True and lock=True, maintains the lock after successful read (caller must release).
WARNING: This function uses blocking I/O operations. Not suitable for async contexts - use aiofiles and async locking instead.
- Parameters:
filepath – Path to the file to read.
encoding – Character encoding for text files. If None, auto-detected.
lock – If True, acquires file lock before reading.
keep_locked – If True with lock=True, keeps lock after read (returns in result).
timeout – Seconds to wait for lock acquisition. Default is 3.
retry_if_not_found – Number of retry attempts if file not found.
fail_on_error – If True, raises exception on error instead of returning error dict.
logger – Optional logger for debug messages.
- Returns:
- Dictionary with ‘return’: 0, ‘data’, ‘filepath’, and optionally ‘last_modified’
and ‘file_lock’ (if keep_locked=True). Returns ‘return’ > 0 and ‘error’ on failure.
- Return type:
dict
- cmeta.utils.files.safe_read_file_via_cache(filepath: str, cache: dict, timeout: int = 10, fail_on_error: bool = False, logger=None)[source]
Reads a file with caching based on file modification timestamp. Automatically reloads if file has been modified since last cache.
WARNING: This function is NOT thread-safe for async usage. The cache dictionary can be corrupted by concurrent access, and it uses blocking I/O operations.
- Parameters:
filepath (str) – Path to the file to read.
cache (dict) – Dictionary to store cached data (modified in-place).
timeout (int) – Lock timeout for file operations.
fail_on_error (bool) – Whether to raise exceptions or return error dict.
logger – Optional logger for debug messages
- Returns:
Dict with ‘return’ (0=success, non-zero=error) and ‘data’ or ‘error’.
- cmeta.utils.files.safe_read_yaml_or_json(filepath: str, lock: bool = False, keep_locked: bool = False, timeout: int = 3, fail_on_error: bool = False, retry_if_not_found: int = 0, logger=None)[source]
Safely reads a YAML or JSON file by trying YAML first, then JSON. Removes any existing extension from filepath and tries .yaml, then .json.
- Parameters:
filepath (str) – Path to file (extension will be ignored/removed).
lock (bool) – Whether to use file locking.
keep_locked (bool) – Whether to keep lock after successful read.
timeout (int) – Lock timeout.
fail_on_error (bool) – Whether to raise exceptions or return error dict.
retry_if_not_found (int) – Number of retries if file not found.
logger – Logger instance for debug messages.
- Returns:
Dict with ‘return’ (0=success, non-zero=error) and ‘data’ or ‘error’. If keep_locked=True and lock=True, also returns ‘file_lock’.
- cmeta.utils.files.safe_write_file(filepath: str, data, timeout: int = 3, file_lock=None, atomic: bool = False, encoding: str | None = None, fail_on_error: bool = False, logger=None, sort_keys: bool = True)[source]
Safely write data to file with locking and optional atomic write.
Provides thread/process-safe file writing with file locking support. Supports atomic writes via temp file + rename for data integrity.
WARNING: This function uses blocking I/O operations. Not suitable for async contexts - use aiofiles and async locking instead.
- Parameters:
filepath (str) – Path where file should be written.
data – Data to write (dict/list for JSON/YAML, any object for pickle/text).
timeout (int) – Seconds to wait for lock acquisition. Default is 3.
file_lock – Existing lock to use. If None, acquires new lock.
atomic (bool) – If True, writes to temp file then renames for atomicity.
encoding (str | None) – Character encoding for text files. If None, auto-detected.
fail_on_error (bool) – If True, raises exception on error instead of returning error dict.
logger – Optional logger for debug messages.
sort_keys (bool) – If True, sorts dictionary keys in JSON/YAML output.
- Returns:
Dictionary with ‘return’: 0 on success, or ‘return’ > 0 and ‘error’ on failure.
- Return type:
dict
- cmeta.utils.files.shard_name(name: str, slices=None)[source]
Apply sharding to a single path component.
Generates shard directory names from a name string based on specified slice lengths. If the name is shorter than required, uses underscore-filled placeholders to ensure predictable directory structure.
- Parameters:
name – Name to shard (file or directory name).
slices – List of integers specifying shard lengths (e.g., [2, 2] creates 2-char shards). None means no sharding.
- Returns:
- List containing shard directory names followed by the original name.
Example: shard_name(‘example’, [2, 2]) -> [‘ex’, ‘am’, ‘example’]
- Return type:
list
- cmeta.utils.files.unlock_path(path: str, file_lock, fail_on_error: bool = False, logger=None)[source]
Release a lock on a file or directory path.
- Parameters:
path (str) – Path to unlock (file or directory).
file_lock – FileLock object from lock_path().
fail_on_error (bool) – If True, raises exception on error instead of returning error dict.
logger – Optional logger for debug messages.
- Returns:
Error dict with ‘return’ > 0 and ‘error’ on failure, None on success.
- Return type:
dict
- cmeta.utils.files.unzip(filename: str, path: str | None = None, remove_directories: int = 0, skip_directories: list | None = None, overwrite: bool = True, clean: bool = False, fail_on_error: bool = False)[source]
Extract a ZIP archive to a directory.
- Parameters:
filename (str) – Path to ZIP file to extract.
path (str | None) – Destination directory (defaults to current directory).
remove_directories (int) – Number of leading directory levels to strip from paths.
skip_directories (list | None) – List of directory names to skip during extraction.
overwrite (bool) – If True, overwrite existing files.
clean (bool) – If True, delete ZIP file after successful extraction.
fail_on_error (bool) – If True, raises exception on error instead of returning error dict.
- Returns:
Dictionary with ‘return’: 0 on success, or ‘return’ > 0 and ‘error’ on failure.
- Return type:
dict
- cmeta.utils.files.write_file(filepath: str, data, encoding: str | None = None, fail_on_error: bool = False, logger=None, sort_keys: bool = True, file_format: str | None = None, newline: str = '\n')[source]
Write data to file with format-specific serialization.
Automatically serializes data based on file format (JSON, YAML, pickle, or text).
- Args:
filepath (str): Path where file should be written. data: Data to write (dict/list for JSON/YAML, any object for pickle/text). encoding (str | None): Character encoding for text files. If None, auto-detected. fail_on_error (bool): If True, raises exception on error instead of returning error dict. logger: Optional logger for debug messages. sort_keys (bool): If True, sorts dictionary keys in JSON/YAML output. file_format (str | None): Force specific format (‘json’, ‘yaml’, ‘pickle’, ‘text’). If None, auto-detected. newline (str): Newline character for text files. Default is ‘
‘.
- Returns:
dict: Dictionary with ‘return’: 0 on success, or ‘return’ > 0 and ‘error’ on failure.
- cmeta.utils.files.zip_directory(source_dir: str, output_path: str, skip_directories: list | None = None, fail_on_error: bool = True, logger=None)[source]
Creates a zip archive from a directory.
- Parameters:
source_dir (str) – Path to the directory to zip.
output_path (str) – Path where the zip file will be created.
skip_directories (list | None) – List of directory names to skip (e.g., [‘.git’, ‘__pycache__’]).
fail_on_error (bool) – Whether to raise exceptions or return error dict.
logger – Logger instance for debug messages
- Returns:
Dict with ‘return’ (0=success, non-zero=error) and optional ‘error’.
Constants
- cmeta.utils.files.ERROR_CODE_FILE_NOT_FOUND = 16
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
- cmeta.utils.files.LOCK_SUFFIX = '.lock'
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
- cmeta.utils.files.RETRY_DELAY = 0.1
Convert a string or number to a floating point number, if possible.
- cmeta.utils.files.RETRY_DELETE_ATTEMPTS = 5
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
- cmeta.utils.files.RETRY_NOT_FOUND_FILE = 10
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
- cmeta.utils.files.RETRY_NOT_FOUND_INDEX_FILE = 2
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
- cmeta.utils.files.RETRY_REPLACE_FILE = 10
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
- cmeta.utils.files.RETRY_TIMESTAMP_FILE = 10
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4