3.1.2. Module io¶
3.1.2.1. Description¶
Utilities for file or folder selection, filter, regrouping, or mapping. Can be derived as attribute or data loader
function as required. Original location of IO module in informatics is mainly in namespace
info.toolbox.libs.io. The main entry info.me is also available for convenience.
search leaf folders recursively from a root folder. |
|
search (specific) files recursively from root folder. |
|
filter a sequence of str using regex pattern. |
|
regroup a str sequence based on a list of patterns. |
|
filter values of dict using regex pattern. |
|
python object(s) persistence toolkit. |
|
toolkit to load python persistence object(s). |
3.1.2.2. Docstrings¶
- leaf_folders¶
search leaf folders recursively from a root folder.
- Arguments:
- Parameters:
data (str) – path-like string of root folder
file_system (Literal['desktop', 'hdfs']) – file system mode;
'desktop'for desktop file system and'hdfs'for distributed file system;'desktop'as default
- Returns:
a generator for paths of leaf folders from root folder
- Return type:
Generator
- Raises:
TypeError –
datais not assigned properly
- Examples:
# for file folder tree as: # --- root # |--- folder1 # | | --- file1 # | | --- file2 # |--- folder2 # | --- file3 # | --- sub_folder # | --- file4 # | --- file5 from info.me import io for leaf_folder in io.leaf_folders(data=root): print(leaf_folder) # expected output: # root/folder1 # root/folder2/sub_folder
- Logs:
Added in version 0.0.3.
Changed in version 0.0.4: support search in distributed file system
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- search_from_root¶
search (specific) files recursively from root folder.
- Arguments:
- Parameters:
data (str) – path-like string of root folder
search_condition (Callable[[str], bool]) – function to determine selected files;
lambda x: xas defaultfile_system (Literal['desktop', 'hdfs']) – file system mode;
'desktop'for desktop file system and'hdfs'for distributed file system;'desktop'as default
- Returns:
a generator for selected files from root folder
- Return type:
Generator
- Raises:
TypeError –
datais not assigned properly
- Examples:
from info.me import io for script in io.search_from_root(data='.', search_condition=lambda x: x[-2:] == 'py'): print(script)
- Logs:
Added in version 0.0.2.
Changed in version 0.0.4: support search in distributed file system
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- generic_filter¶
filter a sequence of str using regex pattern. return a generator for a sequence composed of filtered str, or mapped object from those str.
- Arguments:
- Parameters:
data (Iterable[str]) – a str sequence
filter_pattern (str) – regular pattern for filter;
r'.*'as defaultapply_map (Callable[[str], Any]) – mapping function for filtered str as input;
Noneas default
- Returns:
a generator of filtered str, or mapped object from those str
- Return type:
Generator
- Raises:
TypeError –
datais not assigned properly
- Examples:
from info.me import io names = ['Kane', 'Elfin', 'Kyle', 'Elena', 'Dark', 'Kate', 'David', 'Ezra', 'Deborah'] for init_with_E in io.generic_filter(data=names, filter_pattern=r'^E'): print(init_with_E) for len_of_init_with_E in io.generic_filter(data=names, filter_pattern=r'^E', get_map=lambda x: len(x)): print(len_of_init_with_E)
- Logs:
Added in version 0.0.2.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- files_regroup¶
regroup a str sequence based on a list of patterns. regular expression supported.
- Arguments:
- Parameters:
data (Iterable[str]) – a str sequence
regroup_labels (list[str]) – a regular expression list;
[r'.*']as default
- Returns:
a dict with patterns as keywords, and elements matched those patterns as their respective values
- Return type:
dict[str, Sequence[str]]
- Raises:
TypeError –
datais not assigned properly
- Examples:
from info.me import io names = ['Kane', 'Elfin', 'Kyle', 'Elena', 'Dark', 'Kate', 'David', 'Ezra', 'Deborah'] for k, v in io.files_regroup(data=names, regroup_labels=[r'^K', r'^D', r'^E']).items(): print(k, v)
- Logs:
Added in version 0.0.2.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- dict_filter¶
filter values of dict using regex pattern. return a dict composed of elements which matches pattern, or mapped objects from that elements as values.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – a dict composed of keywords and str sequence values
match_pattern (str) – regular pattern for filter;
r'.*'as defaultusing_map (Callable[[str], Any]) – mapping function for matched str as input;
Noneas default
- Returns:
dict composed of elements which matches pattern, or mapped objects from that elements as values
- Return type:
dict[str, Iterable[Any]]
- Raises:
TypeError –
datais not assigned properly
- Examples:
from info.me import io names = ['Kane_F', 'Elfin_M', 'Kyle_M', 'Elena_M', 'Dark_F', 'Kate_F', 'David_M', 'Ezra_F', 'Deborah_F'] name_group = io.files_regroup(data=names, regroup_labels=[r'^K', r'^D', r'^E']) for k, v in io.dict_filter(data=name_group, match_pattern=r'F$').items(): print(k, v) for k, v in io.dict_filter(data=name_group, match_pattern=r"F$", using_map=lambda x: len(x)-2).items(): print(k, v)
- Logs:
Added in version 0.0.2.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- archive¶
python object(s) persistence toolkit. append pyp or pyp.gz suffix on file name(s).
- Arguments:
- Parameters:
data (object) – object, or objects to be saved contained in an iterable container
to_file (Union[str, list[str]]) – file name(s) for objects to be saved
compress_in (Optional[str]) – file name when use compression; if assigned, a
_header.pypwill be generated to note down file list;Noneas default to not activate
- Variables:
~compress_algorithm (Optional[int]) – compression method code; 0 for
STORED; 8 forDEFLATED; 12 forBZIP2; and 14 forLZMA; 8 as default; available only ifcompress_inis notNone~compress_level (Optional[int]) – int of 0 (
DEFLATED), 1 to 9 (DEFLATEDandBZIP2) are accepted; 5 as default; available only ifcompress_inis notNone
- Returns:
NoReturn
- Return type:
NoneType
- Examples:
Without compression, python object can be saved integrally and individually:
from info.me import archive objs = [py_obj1, py_obj2, ..., py_objn] archive(data=objs, to_file='all') # generate 'all.pyp' names = [f"case{idx+1}" for idx, _ in range(len(objs))] archive(data=objs, to_file=names) # generate 'case1.pyp', 'case2.pyp', ..., 'casen.pyp'
Or integrate all individual cases into a compressed file:
archive(data=objs, to_files=names, compress_in='compress') # generated 'compress.pyp.gz'
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- unarchive¶
toolkit to load python persistence object(s).
- Arguments:
- Parameters:
data (Union[str, list[str]]) – archived file with pyp.gz or ‘pyp’ suffix, or list of archived files with pyp suffix
- Variables:
~compress_algorithm (Optional[int]) – compression method code; 0 for
STORED; 8 forDEFLATED; 12 forBZIP2; and 14 forLZMA; 8 as default~compress_level (Optional[int]) – int of 0 (
DEFLATED), 1 to 9 (DEFLATEDandBZIP2) are accepted; 5 as default
- Returns:
NoReturn
- Return type:
NoneType
- Examples:
from info.me import unarchive names = ['case1.pyp', 'case2.pyp', ..., 'casen.pyp'] case2 = unarchive(data=names[1]) # load 'case2.pyp' for f in unarchive(data=names): # or load all cases one by one print(f) for f in unarchive(data='compress.pyp.gz'): # or load from a 'pyp.gz' compressed file, if existed print(f)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- Authors:
Chen Zhang
- Version:
0.0.5
- Created on:
Jun 27, 2023