3.1.2. Module io

3.1.2.1. Description

Utilities for file or folder selection, filter, regrouping, or mapping. Can be derived as attribute or data loader function as required. Original location of IO module in informatics is mainly in namespace info.toolbox.libs.io. The main entry info.me is also available for convenience.

leaf_folders

search leaf folders recursively from a root folder.

search_from_root

search (specific) files recursively from root folder.

generic_filter

filter a sequence of str using regex pattern.

files_regroup

regroup a str sequence based on a list of patterns.

dict_filter

filter values of dict using regex pattern.

archive

python object(s) persistence toolkit.

unarchive

toolkit to load python persistence object(s).

3.1.2.2. Docstrings

leaf_folders

search leaf folders recursively from a root folder.

Arguments:
Parameters:
  • data (str) – path-like string of root folder

  • file_system (Literal['desktop', 'hdfs']) – file system mode; 'desktop' for desktop file system and 'hdfs' for distributed file system; 'desktop' as default

Returns:

a generator for paths of leaf folders from root folder

Return type:

Generator

Raises:

TypeErrordata is not assigned properly

Examples:
Code 3.24 list all leaf folders from root
# for file folder tree as:
# --- root
#  |--- folder1
#  | | --- file1
#  | | --- file2
#  |--- folder2
#    | --- file3
#    | --- sub_folder
#       | --- file4
#       | --- file5

from info.me import io
for leaf_folder in io.leaf_folders(data=root):
    print(leaf_folder)

# expected output:
# root/folder1
# root/folder2/sub_folder
Logs:

Added in version 0.0.3.

Changed in version 0.0.4: support search in distributed file system

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

search_from_root

search (specific) files recursively from root folder.

Arguments:
Parameters:
  • data (str) – path-like string of root folder

  • search_condition (Callable[[str], bool]) – function to determine selected files; lambda x: x as default

  • file_system (Literal['desktop', 'hdfs']) – file system mode; 'desktop' for desktop file system and 'hdfs' for distributed file system; 'desktop' as default

Returns:

a generator for selected files from root folder

Return type:

Generator

Raises:

TypeErrordata is not assigned properly

Examples:
Code 3.25 list all python scripts from root
from info.me import io
for script in io.search_from_root(data='.', search_condition=lambda x: x[-2:] == 'py'):
    print(script)
Logs:

Added in version 0.0.2.

Changed in version 0.0.4: support search in distributed file system

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

generic_filter

filter a sequence of str using regex pattern. return a generator for a sequence composed of filtered str, or mapped object from those str.

Arguments:
Parameters:
  • data (Iterable[str]) – a str sequence

  • filter_pattern (str) – regular pattern for filter; r'.*' as default

  • apply_map (Callable[[str], Any]) – mapping function for filtered str as input; None as default

Returns:

a generator of filtered str, or mapped object from those str

Return type:

Generator

Raises:

TypeErrordata is not assigned properly

Examples:
Code 3.26 filtering and mapping for names
from info.me import io
names = ['Kane', 'Elfin', 'Kyle', 'Elena', 'Dark', 'Kate', 'David', 'Ezra', 'Deborah']

for init_with_E in io.generic_filter(data=names, filter_pattern=r'^E'):
    print(init_with_E)

for len_of_init_with_E in io.generic_filter(data=names, filter_pattern=r'^E', get_map=lambda x: len(x)):
    print(len_of_init_with_E)
Logs:

Added in version 0.0.2.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

files_regroup

regroup a str sequence based on a list of patterns. regular expression supported.

Arguments:
Parameters:
  • data (Iterable[str]) – a str sequence

  • regroup_labels (list[str]) – a regular expression list; [r'.*'] as default

Returns:

a dict with patterns as keywords, and elements matched those patterns as their respective values

Return type:

dict[str, Sequence[str]]

Raises:

TypeErrordata is not assigned properly

Examples:
Code 3.27 regroup names by initial
from info.me import io
names = ['Kane', 'Elfin', 'Kyle', 'Elena', 'Dark', 'Kate', 'David', 'Ezra', 'Deborah']

for k, v in io.files_regroup(data=names, regroup_labels=[r'^K', r'^D', r'^E']).items():
    print(k, v)
Logs:

Added in version 0.0.2.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

dict_filter

filter values of dict using regex pattern. return a dict composed of elements which matches pattern, or mapped objects from that elements as values.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – a dict composed of keywords and str sequence values

  • match_pattern (str) – regular pattern for filter; r'.*' as default

  • using_map (Callable[[str], Any]) – mapping function for matched str as input; None as default

Returns:

dict composed of elements which matches pattern, or mapped objects from that elements as values

Return type:

dict[str, Iterable[Any]]

Raises:

TypeErrordata is not assigned properly

Examples:
Code 3.28 select female names and calculate name length
from info.me import io

names = ['Kane_F', 'Elfin_M', 'Kyle_M', 'Elena_M', 'Dark_F', 'Kate_F', 'David_M', 'Ezra_F', 'Deborah_F']
name_group = io.files_regroup(data=names, regroup_labels=[r'^K', r'^D', r'^E'])

for k, v in io.dict_filter(data=name_group, match_pattern=r'F$').items():
    print(k, v)

for k, v in io.dict_filter(data=name_group, match_pattern=r"F$", using_map=lambda x: len(x)-2).items():
    print(k, v)
Logs:

Added in version 0.0.2.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

archive

python object(s) persistence toolkit. append pyp or pyp.gz suffix on file name(s).

Arguments:
Parameters:
  • data (object) – object, or objects to be saved contained in an iterable container

  • to_file (Union[str, list[str]]) – file name(s) for objects to be saved

  • compress_in (Optional[str]) – file name when use compression; if assigned, a _header.pyp will be generated to note down file list; None as default to not activate

Variables:
  • ~compress_algorithm (Optional[int]) – compression method code; 0 for STORED; 8 for DEFLATED; 12 for BZIP2; and 14 for LZMA; 8 as default; available only if compress_in is not None

  • ~compress_level (Optional[int]) – int of 0 (DEFLATED), 1 to 9 (DEFLATED and BZIP2) are accepted; 5 as default; available only if compress_in is not None

Returns:

NoReturn

Return type:

NoneType

Examples:

Without compression, python object can be saved integrally and individually:

Code 3.29 data persistence for python objects
from info.me import archive
objs = [py_obj1, py_obj2, ..., py_objn]

archive(data=objs, to_file='all')  # generate 'all.pyp'

names = [f"case{idx+1}" for idx, _ in range(len(objs))]
archive(data=objs, to_file=names)  # generate 'case1.pyp', 'case2.pyp', ..., 'casen.pyp'

Or integrate all individual cases into a compressed file:

Code 3.30 data persistence for python objects with compression
archive(data=objs, to_files=names, compress_in='compress')  # generated 'compress.pyp.gz'
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

unarchive

toolkit to load python persistence object(s).

Arguments:
Parameters:

data (Union[str, list[str]]) – archived file with pyp.gz or ‘pyp’ suffix, or list of archived files with pyp suffix

Variables:
  • ~compress_algorithm (Optional[int]) – compression method code; 0 for STORED; 8 for DEFLATED; 12 for BZIP2; and 14 for LZMA; 8 as default

  • ~compress_level (Optional[int]) – int of 0 (DEFLATED), 1 to 9 (DEFLATED and BZIP2) are accepted; 5 as default

Returns:

NoReturn

Return type:

NoneType

Examples:
Code 3.31 load persistent python objects
from info.me import unarchive
names = ['case1.pyp', 'case2.pyp', ..., 'casen.pyp']

case2 = unarchive(data=names[1])  # load 'case2.pyp'

for f in unarchive(data=names):  # or load all cases one by one
    print(f)

for f in unarchive(data='compress.pyp.gz'):  # or load from a 'pyp.gz' compressed file, if existed
    print(f)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06


Authors:

Chen Zhang

Version:

0.0.5

Created on:

Jun 27, 2023