_`Pipelining the data process` ============================== As the preferred language for artificial intelligence, Python is featured as its rich ecosystem, as well as the convenience for fast implementation and developing. Data processing involving in different technical approaches requires systematical integration. Thus, the unified data controlling among those utilities contributes to accelerate verifying prototypes, optimize algorithm performance, as well as lower maintenance cost. .. figure:: ../images/python_ecosystem.jpg :name: python ecosystem :width: 650 :align: center ecosystem of Python Data processing is akin to an assembly line, where an increase in the number of steps results in a exponential growth of factors that can impact the final result. While manually configuring all possible options for trial may seem feasible, it often leads to a chaotic outcome. An uniform protocol, or programming norm, is therefore not only of advantages in integrating various tools developed by teams in different fields in Python ecosystem, but also time-saving for building practical pipelines or applications, on basis of each naive functional module. Following examples demonstrate how to establish pipelines for automating complex tasks. _`Normalized scientific computing` ---------------------------------- Scientific computing flow implemented through informatics functions is of high completeness. And their units are readily to be flexibly reused when create new processing flow. :numref:`flexibility and reusability of unit` is a snippet in implementation for exporting :numref:`Figure %s `. .. code-block:: python :caption: flexibility and reusability of unit :name: flexibility and reusability of unit u1, u2, u3, u4, u5, v1, v2 = [Unit(mappings=[_]) for _ in [load, binarize, identification_cells, colorize, tsb.connected_domain, imshow, hist]] to_fig1 = u1 >> v1 to_fig2 = u1 >> u2 >> v1 to_fig3 = Unit(mappings=[F(lambda **kw: [res := (u1 | (u1 >> u2 >> u3 >> u4))(**kw), np.dstack([res[1], np.linalg.norm(res[0], ord=2, axis=2)])][-1])]) >> v1 to_fig4 = Unit(mappings=[F(lambda **kw: np.array([_.sum() for _ in (u1 >> u2 >> u3 >> u5)(**kw)]))]) >> v2 :code:`to_fig3` corresponds to the case (c). Obtain this figure must overlap the random colored cell nucleus masks, superpositioned with grey scale image, then pass on an image viewer unit. It is the reason unit :code:`u1` is arranged paralleled with a sequential processing line :code:`u1 >> u2 >> u3 >> u4`. To export the (c) case in :numref:`Figure %s `, call :code:`to_fig3(data=file)`. If a researcher desires other parameters, call :code:`to_fig3(data=file, **user_defined_config)`. Or in more complicated situation, if the researcher want to compare outcomes from an identical pipe in different parameters, those derived pipes can also be readily obtained by: :code:`p = to_fig3.shadow(**config1) | to_fig3.shadow(**config2)`. _`Automation experiment` ------------------------ There are also meta tools, for automation computing. The following example concerned the difference between global prewitt and canny filters on a natural image: .. code-block:: python :caption: auto experiment pipeline :name: auto experiment pipeline :emphasize-lines: 15 from info.me import Unit, F from info.me import tensorn as tsn from info.vis import visualization as vis from info.ins import datasets import numpy as np img = datasets.cat() config = vis.FigConfigs.Histogram.update(width=1.2, name=['prewitt', 'canny']) evaluate = F(lambda **kw: [res := kw.get('data'), print(np.std(res[0]-res[1])), vis.Canvas.play(data=np.array([res[0].ravel(), res[1].ravel()]), fig_type='histogram', cvs_legend=True, fig_configs=config), res][-1]) u1, u2, u3, u4, u5, v1 = [Unit(mappings=[_]) for _ in [tsn.cropper, tsn.gaussian_filter, tsn.resize, tsn.prewitt_filter, tsn.canny_filter, evaluate]] p = u1 >> u2 >> u3 >> (u4 | u5) >> v1 p.required_args # {'data', 'new_size', 'k_shape', 'crop_range'} It includes data processing functions dealing with cropping, de-noising, and resampling, followed by another paralleled unit of filters. The user-customized process is implemented via lambda calculus: print out the standard deviation of difference between two paralleled output, display their pixel distribution difference, then return those two filtered figures. As most functions in tensor namespace, including the :code:`F` lambda, have been already registered as informatics version, the :code:`p` can automatically analyze what keyword arguments are the required at least. Making a parameter pool based on the required arguments. The following code can auto trigger the experiments then dump each running case. .. code-block:: python :caption: run auto experiment :name: run auto experiment to_test = { 'data': [img], 'crop_range': [[(0.2, 0.2), (0.8, 0.8)], [(0.3, 0.3), (0.7, 0.7)]], 'k_shape': [(3, 3), (6, 6), (9, 9)], 'new_size': [(400, 400), (600, 600)] } from info.me import autotesting as tst res = tst.experiments(data=p, params_pool=to_test, to_file='./experiment_results') Prompt will info the current condition and calculated standard deviation, running time, and the final result case by case; then the histogram figure will be popped up like :numref:`Figure %s `. .. figure:: ../images/experiment_flow_demo.jpg :name: experiment flow histogram :width: 450 :align: center histogram for pixels distribution after prewitt and canny filters All experiment results will be collected into a persistence file titled `experiment_results.pyp` inplace. _`Automation testing` --------------------- Different from automation experiment which can export the computed results, the automation testing only records the exit code. If the pipeline exits with raised exception, related information will also be noted. Similar as :code:`experiments` in :numref:`run auto experiment`, this meta implementation :code:`funtest` is in the same namespace. It can test for informatics functions, unit and pipelines defined via this framework. .. figure:: ../images/auto_test.jpg :name: automation testing result :width: 500 :align: center automation testing result for resize function :numref:`Figure %s ` is the test result for :code:`resize` function. Class type remains in *result* column. The cost time, arguments for each test item are also be recorded. ---- :Authors: Chen Zhang :Version: 0.0.5 :|create|: Feb 7, 2024