3.1.6. Option networks¶

3.1.6.1. Description¶

Neural networks related tool-chain and conventional implementation via meta programming. For normal use, the dependency of PyTorch is required.

Namespace of this module is mainly in info.toolbox.networks. For convenience in practice, use main entry of info.net.

`Module`	a flexible neural network base class with enhanced training/inference capabilities.
`full_connected_neural`	a configurable fully connected neural network module with flexible architecture options.
`convolutional_neural`	a configurable convolutional neural network (CNN) module with flexible architecture.
`unet`	a configurable U-Net architecture for semantic segmentation with dynamic dimensionality support.
`transformer`	a highly configurable transformer architecture supporting multiple attention mechanisms and embedding methods.

3.1.6.2. Docstrings¶

class Module¶

a flexible neural network base class with enhanced training/inference capabilities. this module extends PyTorch’s nn.Module with additional features including:

configurable training/inference sessions
automatic data type handling
built-in training loop with stopping conditions
support for both regression and classification tasks
generator-based online learning support

Logs:: Added in version 1.0.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

full_connected_neural¶

a configurable fully connected neural network module with flexible architecture options. This implementation provides a multi-layer perceptron (MLP) with customizable layer dimensions, activation functions, dropout, and data type specifications. The network can be either statically sized or dynamically initialized with lazy weight initialization.

Arguments:

Parameters:

structure (list[int]) – list specifying layer dimensions; its first element can be None to enable lazy initialization; e.g. [None, 256, 128] for lazy input or [784, 256, 128] for fixed input
activation (Union[Callable, list[Callable]]) – activation function(s) between layers; can be single function or list per layer; nn.ReLU, or [nn.ReLU, nn.Sigmoid] for different activation per layer;
bias (bool) – whether to include bias terms in linear layers; True as default
dropout (Optional[float]) – dropout probability (0-1) applied after last hidden layer; None as default to disable dropout
ctype_option (_Ctype) – torch datatype for network parameters; 'float32' as default

Returns:

a fully connected neural network

Return type:

Module

Examples:

Code 3.130 multi neural network¶

import torch as tch
from info.net import full_connected_neural
num_samples, input_size, output_size = 120, 10, 5
x, y_classification, y_regression = (tch.randn(num_samples, input_size),
                                     tch.randint(0, output_size, (num_samples,)),
                                     tch.randn(num_samples, output_size))
x_train, x_validation = x[:100], x[100:]
yc_train, yc_validation = y_classification[:100], y_classification[100:]
yr_train, yr_validation = y_regression[:100], y_regression[100:]

# apply on classification task
model1 = full_connected_neural(structure=[input_size, 40, output_size], activation=tch.nn.ReLU)
with model1.train_session() as md:
    md.solve(train=x_train, target=yc_train, validation=(x_validation, yc_validation))

# apply on regression task, with specified configuration
model2 = full_connected_neural(structure=[None, 40, 50, output_size], bias=False,
                               activation=[tch.nn.LeakyReLU, tch.nn.Tanh], dropout=0.2)
with model2.train_session(criterion=tch.nn.MSELoss()) as md:
    md.solve(train=x_train, target=yr_train, validation=(x_validation, yr_validation))

Notes:

this implementation provides a flexible fully connected network that supports:

both static and dynamic (lazy) input dimension handling
per-layer activation function specification
configurable precision through dtype options
optional dropout regularization
automatic training configuration (SGD optimizer with CrossEntropy loss by default)

the network uses PyTorch’s LazyLinear when input dimension is unspecified (None), which automatically infers input size during first forward pass.

See Also:

Module

Logs:: Added in version 1.0.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

convolutional_neural¶

a configurable convolutional neural network (CNN) module with flexible architecture. This framework provides a convolutional neural network architecture comprising configurable convolutional blocks cascaded with a multi-stage fully-connected network. The modular design enables flexible customization of both feature extraction components (convolutional operations) and classification modules (MLP layers), supporting both baseline configurations and application-specific topological variations through parameterized layer composition.

Arguments:

Parameters:

conv_structure (list[int]) – list specifying the number of output channels for each convolutional block
mpl_structure (list[int]) – list specifying the layer sizes for the final MLP
activation (Callable) – global activation function; nn.ReLU as default
in_dimension (int) – spatial dimension of input; must be 1, 2, or 3; 2 as default to adapt natural images related tasks
conv_kernel (dict) –
parameter dict containing 'kernel_size', 'stride', 'padding' or 'dilation'; accepted values can be a positive integer (applied to all dimensions),

or tuple of positive integers specifying per-dimension values matching the input data’s dimensional structure; {'kernel_size': 3, 'stride': 1, 'padding': 1} as default configuration
batch_norm (dict) – batch normalization configuration; None as default to disable batch normalization; if provided, its accepted value should be a dict with 'eps', 'momentum', 'affine', or 'track_running_state' as keys, and allowable value for its respective key
pre_activation (bool) – whether to use pre-activation ordering before convolution; False as default
pool (dict) – 1-length dict of pooling configuration; key should be one among 'Max', 'FractionalMax', 'AdaptiveAvg', 'AdaptiveMax', 'Avg', 'LP', 'MaxUn', and the value is of the similar form as the conv_kernel parameter; {'Max': {'kernel_size': 2, 'stride': 2}} as default to apply conventional max pooling approach
dropout (float) – dropout probability from 0 to 1; None as default to disable dropout
conv_customization (list[dict]) – list of dictionaries to customize each convolutional block’s parameters; each dict can override default conv parameters (from activation to dropout); None as default to apply global configuration; e.g., [{'pre_activation': True}, {'dropout': 0.4}] to specify a two-convolutional layers with pre-activation the 1st, and 0.4 dropout the 2nd if the conv_structure is of [16, 32]

Returns:

a convolutional neural network

Return type:

Module

Examples:

Code 3.131 convolutional neural network¶

import torch as tch
from info.net import convolutional_neural

sp1, sp2, nums, n = (128, 128), (64, 64, 32), 20, 10
x_2d, x_3d, y_c, y_r = (tch.randn(nums, 3, *sp1), tch.randn(nums, 1, *sp2), tch.randint(0, 10, (nums,)),
                        tch.randn(nums, n))

# natural image classification task:
model1 = convolutional_neural(conv_structure=[16, 32], mpl_structure=[128, n])
with model1.train_session() as md:
    md.solve(train=x_2d, target=y_c, loading_mode='local')

# 3D image classification task:
model2 = convolutional_neural(conv_structure=[16, 32], mpl_structure=[128, n], in_dimensions=3)
with model2.train_session() as md:
    md.solve(train=x_3d, target=y_c, loading_mode='local')

# online loading for 3D images with customized configuration:
model3 = convolutional_neural(conv_structure=[16, 32], mpl_structure=[128, n], in_dimensions=3,
                              pre_activation=True, dropout=0.13)
with model3.train_session(criterion=tch.nn.HingeEmbeddingLoss()) as md:
    md.solve(train=(_ for _ in x_3d), target=(_ for _ in y_c), stop_conditions={'epochs': 40})

# or application on regression task:
model4 = convolutional_neural(conv_structure=[16, 32], mpl_structure=[128, n])
with model4.train_session(criterion=tch.nn.MSELoss()) as md:
    md.solve(train=x_2d, target=y_r, loading_mode='local')

Notes:

this implementation is featured as:

dynamic input dimension handling via lazy layers
configurable per-block parameters
automatic flattening before MLP
default Adam optimizer (lr=0.001) and CrossEntropyLoss

it employs an MLP-based backend implementation, inheriting its most features such as adaptive capabilities for both classification and regression tasks.

See Also:

Module
full_connected_neural

Logs:: Added in version 1.0.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

unet¶

a configurable U-Net architecture for semantic segmentation with dynamic dimensionality support. This implementation follows the classic U-Net encoder-decoder structure with skip connections, while providing extensive customization options through parameterized components.

Arguments:

Parameters:

mirrored_channels (list[int]) – channel dimensions for each level of the encoder-decoder blocks; the decoder path mirrors the encoder channel structure
in_dimension (int) – spatial dimension of input; must be 1, 2, or 3; 2 as default to adapt natural images related tasks
activation (Callable) – activation function factory; default uses in-place ReLU for memory efficiency
conv_kernel (dict) –
parameter dict containing 'kernel_size', 'stride', 'padding' or 'dilation'; accepted values can be a positive integer (applied to all dimensions),

or tuple of positive integers specifying per-dimension values matching the input data’s dimensional structure; {'kernel_size': 3, 'stride': 1, 'padding': 1} as default configuration
batch_norm (dict) – batch normalization configuration with optional keys; None as default to disable batch normalization; if provided, its accepted value should be a dict with 'eps', 'momentum', 'affine', or 'track_running_state' as keys, and allowable value for its respective key
pre_activation (bool) – whether to use pre-activation ordering before convolution; False as default
pool (dict) – 1-length dict of pooling configuration; key should be one among 'Max', 'FractionalMax', 'AdaptiveAvg', 'AdaptiveMax', 'Avg', 'LP', 'MaxUn', and the value is of the similar form as the conv_kernel parameter; {'Max': {'kernel_size': 2, 'stride': 2}} as default to apply conventional max pooling approach
dropout (float) – dropout probability from 0 to 1; None as default to disable dropout
export_channel (int) – number of output channels; positive integer no greater than 3; 1 as default

Returns:

an U-Net instance

Return type:

Module

Examples:

Code 3.132 U-Net demonstration¶

import torch as tch
from info.net import unet

# standard 2D U-Net for binary segmentation
x, y = tch.randn(5, 1, 20, 40), tch.randint(0, 2, (5, 1, 20, 40)).float()
model1 = unet(mirrored_channels=[64, 128, 256, 512], in_dimensions=2)
with model1.train_session() as md:
    md.solve(train=x, target=y)

# 3D U-Net with custom normalization
x, y = tch.randn(5, 1, 20, 40, 35), tch.randint(0, 2, (5, 1, 20, 40, 35)).float()
model2 = unet(mirrored_channels=[16, 32], in_dimensions=3, batch_norm={'eps': 1e-6, 'momentum': 0.01},
              activation=(lambda: tch.nn.LeakyReLU(0.1)))
with model2.train_session(criterion=net.dice(1e-3)) as md:
    md.solve(train=x, target=y, loading_mode='local')

# 3D U-Net natively support multimodal fusion
x_multi, y = tch.randn(5, 4, 20, 40, 35), tch.randint(0, 2, (5, 1, 20, 40, 35)).float()
model3 = unet(mirrored_channels=[16, 32], in_dimensions=3)
with model3.train_session() as md:
    md.solve(train=x_multi, target=y)

# 3D U-Net for multiple segmentations, trained using mixture loss
x, y_multi = tch.randn(5, 1, 20, 40, 35), tch.randint(0, 2, (5, 3, 20, 40, 35)).float()
model4 = unet(mirrored_channels=[16, 32], in_dimensions=3, export_channel=3)
mixture_loss = (lambda m1, m2: 0.9 * dice(1e-3)(m1, m2) + 0.1 * tch.nn.CrossEntropyLoss()(m1, m2))
with model4.train_session(criterion=_c) as md:
    md.solve(train=x, target=y_multi, loading_mode='local')

Notes:

architectural features:

symmetric encoder-decoder structure with skip connection
automatic handling of input dimensions (1D/2D/3D)
dynamic channel sizing through mirrored_channels argument
lazy initialization for input flexibility
nearest-exact interpolation for precise feature map alignment

default training configuration utilizes Adam optimizer with 0.001 learning rate, and dice loss with 1e-5 smoothing factor; if requires customization, overwrite the criterion or optimizer argument when invoking the train session.

See Also:

Module

Logs:: Added in version 1.0.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

transformer¶

a highly configurable transformer architecture supporting multiple attention mechanisms and embedding methods. this implementation provides dynamic dimensionality handling, flexible positional encoding strategies, and modular attention configurations suitable for sequence-to-sequence tasks.

Arguments:

Parameters:

dimension_model (int) – hidden dimension size, must be positive integer
num_heads (int) – number of attention heads, positive integer; If unable to precisely divide dimension_model, this value will be heuristically adjusted
vocabulary_size (dict[Literal['in', 'out'], int]) – dictionary specifying input and output vocabulary sizes; {'in': 10000, 'out': 8000} as default
embedding_func (dict[Literal['in', 'out', 'endmost'], Optional[Callable]]) – custom embedding functions for input, output, and final layer; acceptable value is dictionary containing 'in', 'out' and 'endmost' as keys, and embedding function as their respective value; default configuration uses None to automatically initialize the embedding function
encoding_meth (Literal['sinusoid', 'trainable', 'relative', 'rotation']) – positional encoding method; accept value must be one option among 'sinusoid', 'trainable', 'relative', and 'rotation'; default uses 'sinusoid' for canonical transformer implementation
encoding_configs (dict) – configuration dict for positional encoding (method-specific parameters); default configuration uses {'max_length': 5000, 'base': 10000} for 'sinusoid' encoding, {'max_relative': 3} for 'relative', and {'theta': 10000.0, 'start_pos': 0} for 'rotation'
dimension_feed_forward (int) – dimension of feed forward network; 2048 as default
activation (Callable) – global activation function; torch.nn.ReLU as default
num_layers (Union[int, tuple[int, int]]) – encoder and decoder layer counts; support unbalanced encode decode architecture via tuple assignment; e.g., (6, 3) for unequal encoder and decoder transformer
attn_init (dict) – initialization parameters for attention layer; standard configuration uses {'bias': True, 'add_bias_kv': False, 'add_zero_attn': False, 'batch_first': True}; as for cross attention, the 'kdim' and 'vdim' will be determined by dimension_model while None for self attention in 'relative' and 'rotation' encoding method ('sinusoid' and 'trainable' will be None in both self and cross); 'dropout' will be adjusted from global dropout
attn_forward (dict) – configuration passed in attention forward; the standard setting utilizes 'need_weights' as True, 'attn_mask' as None, average_attn_weights as True, and 'is_causal' as False; if customized configuration are provided, the values will be overwrote from default
dropout (float) – global dropout rate; 0.1 as default

Returns:

a Transformer model

Return type:

Module

Examples:

Code 3.133 transformer demonstration¶

import torch as tch
from info.net import transformer

batch, seq1, seq2, voc1, voc2 = 32, 20, 15, 10000, 8000
src, tgt = tch.randint(0, voc1, (batch, seq1)), tch.randint(0, voc2, (batch, seq2))
src_msk, tgt_msk = tch.randn(src.shape) > 0, tch.randn(tgt.shape) > 0

# application on basic sequence-to-sequence task (e.g., machine translation):
model1 = transformer(dimension_model=512, num_heads=8)
with model1.train_session() as md:
    md.solve(train=(src, src_msk), target=(tgt, tgt_msk))

# flexibility to support multiple data types input
src_np, tgt_pt, tgt_msk_gen = src.numpy, tgt, (_ for _ in tgt_msk.clone())
model2 = transformer(dimension_model=512, num_heads=8)
with model1.train_session() as md:
    md.solve(train=(src_np, src_msk), target=(tgt_pt, tgt_msk_gen))

# parameter-reduced memory-efficient model for edge devices, importing locally stored data for training
model3 = transformer(dimension_model=256, num_heads=4, dimension_feed_forward=1024, num_layers=(4, 2),
                     dropout=0.05)
with model3.train_session() as md:
    md.solve(train=(src, src_msk), target=(tgt, tgt_msk), loading_mode='local')

# rotary position embedding to comprehend long-range dependence of sequence
model4 = transformer(dimension_model=512, num_heads=8, encoding_meth='rotation',
                     encoding_configs={'max_length': 4096, 'theta': 10000.0, 'start_pos': 0})
with model4.train_session(optimizer=tch.optim.Adam(model3.parameters(), lr=0.005)) as md:
    md.solve(train=(src, src_msk), target=(tgt, tgt_msk))

# transfer learning using pre-trained embedding function
emb_func = base_model.from_pretrain(...)
model5 = transformer(dimension_model=512, num_heads=8, encoding_meth='trainable',
                     embedding_func={'in': emb_func, 'out': None, 'endmost': None})
with model5.train_session() as md:
    md.solve(train=(src, src_msk), target=(tgt, tgt_msk), stop_conditions={'epochs': 50})

Notes:

architectural features:

flexibility on encoding method options
dynamic attention mechanism selection
configurable encoder-decoder asymmetry
expandability for integrating on-going works

See Also:

Multi-head Attention

Logs:: Added in version 1.0.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

Authors:: Chen Zhang
Version:: 0.0.6
Created on:: Jun 11, 2025

3.1.6. Option networks¶

3.1.6.1. Description¶

3.1.6.2. Docstrings¶

Table of Contents

This Page