3.1.5. Modules for learning¶

3.1.5.1. Module Bayes¶

3.1.5.2. Description¶

For infrastructure used for Bayesian statistical inference and further constructional components of online or self adaption featured algorithms, utilize the functions in namespace info.toolbox.libs.bayes._frame, or alternatively import bayes from the main entry info.me.

`Bayes`	Essential Bayes framework.
`GaussianWishart`	the initializer of gauss-wishart distribution.
`bernoulli`	Bayesian framework of bernoulli kernel.
`categorical`	Bayesian framework of categorical kernel.
`binomial`	Bayesian framework of binomial kernel.
`multinomial`	Bayesian framework of multinomial kernel.
`poisson`	Bayesian framework of poisson kernel.
`gaussian`	Bayesian framework of gauss kernel.

3.1.5.3. Docstrings¶

class Bayes¶

Essential Bayes framework.

Arguments:

Parameters:

name (str) – the name of the kernel likelihood function
kernel (dist) – the kernel likelihood distribution
prior (dist) – the Bayesian prior distribution
likelihood_check (Callable[[...], bool]) – likelihood checker to validate the investigated data or distribution
update_conjugate (Callable[[...], dist]) – method to update the Bayesian conjugate posterior distribution; its arguments should be the corresponding Bayesian conjugate prior, and followed by the likelihood data set in presentation of numpy array or the form of certain distribution
update_predictive (Callable[[...], dist]) – method to update the Bayesian posterior predictive distribution; its arguments should be the updated Bayesian conjugate prior, and followed by the likelihood data set in presentation of numpy array or the form of certain distribution if necessary

Returns:

a kind of Bayesian family

Return type:

Bayes

Properties:

name:: Name of Bayesian framework. It suggests to use the family name of likelihood function during declaration.

kernel:: Distribution of likelihood function used in initialization.

conjugate:: The Bayesian prior distribution. Its associated parameters be updated by invoking the method of update_posterior with input of the likelihood data set or distribution.

predictive:: The Bayesian predictive distribution under the pre-condition of conjugate as Bayesian posterior.

update_conjugate:: The callable function to compute conjugate prior and likelihood data set into conjugate posterior.

update_predictive:: The callable function to compute conjugate posterior and likelihood data set into posterior predictive.

Methods:

update_posterior:: Update the conjugate and predictive distributions, via likelihood distribution or data set.

compare_posterior:: Test for the conjugate posterior under the condition of likelihood distribution or data set, without updating the property conjugate and predictive indeed.

Notes:

A Bayes instance should be a set of correlated distributions and their corresponding rules of calculations that basically follows the principles of Bayes theory. In real implementations, without the loss of the scientific rigorousness, our informatics still consider the degeneration relationship of distributions, induced by dimensional collapse. For example, the likelihood functions used bernoulli, binomial, and categorical are all fulfilled from the multinomial framework. Their relationship can be ascertained in Table 4.3, and the concrete reduction tutorial can be referred in the chapter of multinomial distribution. Similarly, the Gauss family is basically achieved from the multivariate Gauss distribution, on the basis of Gauss Bayesian framework as established in the chapter of continuous Gauss.

Customarily, the parameters update_conjugate and update_predictive takes conjugate distribution and likelihood function or data set as input arguments, then return the corresponding Bayesian posterior and predictive distributions, respectively.

See also:

bernoulli
categorical
binomial
multinomial
poisson
gaussian

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

class GaussianWishart¶

the initializer of gauss-wishart distribution.

Arguments:

Parameters:

mean (ndarray) – mean vector of gauss distribution.
beta (Numeric) – coefficient for the distribution of precision matrix.
nu (int) – degree of freedom, should be no less than 1 minus the number of dimensions of w.
w (ndarray) – a positive definite matrix.

Returns:

a gauss-wishart distribution if all arguments are configured valid, otherwise 0.

Return type:

Union[GauWisTP, int]

Examples:

Code 3.117 gauss wishart instance¶

from info.me import bayes as bys
import numpy as np

_temp = np.random.random((20, 3))
gauss_wishart = bys.GaussianWishart(np.random.random(3), 3.4, 5, _temp.T @ _temp)

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

bernoulli¶

Bayesian framework of bernoulli kernel.

Arguments:

Parameters:

kernel (Union[BernTP, BinTP, MultTP]) – a certain bernoulli distribution, a certain binomial distribution with only one trial, or a certain multinomial distribution with one trial and two categories.
prior (Union[BetaTP, DirTP]) – a certain beta distribution or dirichlet distribution with a two-length alpha; None as default to use uniform prior.

Returns:

the bernoulli Bayesian instance

Return type:

Bayes

Examples:

Code 3.118 Bayesian of bernoulli kernel¶

from info.me import bayes as bys
from scipy import stats as st
import numpy as np

model1 = bys.bernoulli(kernel=st.bernoulli(0.3), prior=st.beta(4, 5))
model1.update_posterior(posterior=np.array([[1, 0], [0, 1], [1, 0], [1, 0]]))

# or equivalently using the categorical kernel with one trial:
model2 = bys.categorical(kernel=st.multinomial(1, [0.7, 0.3]), prior=st.dirichlet([4, 5]))
model2.update_posterior(posterior=np.array([[1, 0], [0, 1], [1, 0], [1, 0]]))

Notes:: On the basis of the degeneration with \(M = 1\) and \(K = 2\) from the Bayesian multinomial distribution, although the kernel and prior support to be initialized via multiple types of valid distributions, the kernel, conjugate and predictive distributions are all fulfilled in multinomial context.

See also:

bernoulli
categorical
multinomial
beta
dirichlet
dirichlet_multinomial

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

categorical¶

Bayesian framework of categorical kernel.

Arguments:

Parameters:

kernel (MultTP) – a certain multinomial distribution instance with one trial.
prior (DirTP) – a certain dirichlet distribution; None as default to use uniform dirichlet prior.

Returns:

the categorical Bayesian instance

Return type:

Bayes

Examples:

Code 3.119 Bayesian of categorical kernel¶

from info.me import bayes as bys
from scipy import stats as st
import numpy as np

model = bys.categorical(kernel=st.multinomial(1, [0.3, 0.2, 0.5]), prior=st.dirichlet([3, 2, 4]))
model.update_posterior(posterior=np.array([[0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 0, 1]]))

Notes:: Implementation is on the basis of the first degeneration situation with \(M = 1\) from the Bayesian multinomial distribution. There is temporarily no explicit application programming interface of categorical distribution in scipy, it employs the collapsed multinomial one with single trial.

See also:

multinomial
dirichlet
dirichlet_multinomial

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

binomial¶

Bayesian framework of binomial kernel.

Arguments:

Parameters:

kernel (Union[BinTP, MultTP]) – a certain binomial distribution or a certain two-categorical multinomial distributions with multiple trials.
prior (Union[BetaTP, DirTP]) – a certain beta distribution or dirichlet distribution with a two-length alpha; None as default to use uniform dirichlet prior.

Returns:

the binomial Bayesian instance

Return type:

Bayes

Examples:

Code 3.120 Bayesian of binomial kernel¶

from info.me import bayes as bys
from scipy import stats as st
import numpy as np

model1 = bys.binomial(kernel=st.binom(5, 0.3), prior=st.beta(4, 5))
model1.update_posterior(posterior=np.array([[4, 1], [3, 2], [2, 3], [5, 0]]))

# or equivalently in the multinomial context:
model2 = bys.multinomial(kernel=st.multinomial(5, [0.7, 0.3]), prior=st.dirichlet([4, 5]))
model2.update_posterior(posterior=st.multinomial(20, [0.7, 0.3]))

Notes:: On the basis of the second degeneration situation with \(K = 2\) from the Bayesian multinomial distribution, although the kernel and prior support to be initialized via multiple types of valid distributions, the kernel, conjugate and predictive distributions are all fulfilled in multinomial context.

See also:

binom
multinomial
beta
dirichlet
dirichlet_multinomial

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

multinomial¶

Bayesian framework of multinomial kernel.

Arguments:

Parameters:

kernel (MultTP) – a certain multinomial distribution instance multiple trials.
prior (DirTP) – a certain dirichlet distribution; None as default to use uniform dirichlet prior.

Returns:

the multinomial Bayesian instance

Return type:

Bayes

Examples:

Code 3.121 Bayesian of multinomial kernel¶

from info.me import bayes as bys
from scipy import stats as st
import numpy as np

model = bys.multinomial(kernel=st.multinomial(5, [0.3, 0.2, 0.5]), prior=st.dirichlet([3, 2, 4]))
model.update_posterior(posterior=np.array([[1, 1, 3], [2, 1, 2], [1, 0, 4], [2, 1, 2]]))

Notes:: Implementation is on the basis of the general situation with \(M > 1\) and \(K > 2\) of the Bayesian multinomial distribution.

See also:

multinomial
dirichlet
dirichlet_multinomial

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

poisson¶

Bayesian framework of poisson kernel.

Arguments:

Parameters:

kernel (PoiTP) – a certain poisson distribution instance.
prior (Union[GamTP, ExpTP, ErlTP]) – a certain gamma distribution; None as default to use \(\mathrm{Gam}(x|1, 1)\) prior.

Returns:

the poisson Bayesian instance

Return type:

Bayes

Examples:

Code 3.122 Bayesian of poisson kernel¶

from info.me import bayes as bys
from scipy import stats as st
import numpy as np

model = bys.poisson(kernel=st.poisson(2.3), prior=st.gamma(1, 0, 0.5))
model.update_posterior(posterior=np.array([0, 3, 2, 1, 4, 6]))

Notes:

Implementation is on the basis of the deduction in the poisson distribution. In addition, consider the exponential and erlang distributions are too specific forms of gamma distribution, however all the prior should be here reinterpreted under the gamma context.

For example, the initialization method in Code 3.122 can also be equivalently achieved by:

Code 3.123 Bayesian of poisson with other priors¶

model1 = bys.poisson(kernel=st.poisson(2.3), prior=st.expon(2))
model2 = bys.poisson(kernel=st.poisson(2.3), prior=st.erlang(1, 0, 0.5))
model1.conjugate.dist.name == model2.conjugate.dist.name == 'gamma'  # True

See also:

poisson
gamma
nbinom

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

gaussian¶

Bayesian framework of gauss kernel.

Arguments:

Parameters:

kernel (Union[GauTP, MGauTP]) – a certain gauss distribution instance.
prior (Union[GauTP, MGauTP, GamTP, WisTP, GauWisTP]) – a certain prior distribution; in case of univariate gauss kernel, it supports initializing using univariate gauss, gamma, or one dimension confined multivariate gauss, wishart, or gauss-wishart; in case of multivariate gauss kernel, it supports initializing using multivariate gauss, wishart, or gauss-wishart; their detailed relationships and deduction can refer Table 4.5; None as default to automatically employ the gauss-wishart \(\mathcal{NW}(\boldsymbol{x} | \boldsymbol{0}_D, 1, D, \boldsymbol{I}_D)\).

Returns:

the gauss Bayesian instance

Return type:

Bayes

Examples:

Code 3.124 Bayesian of gauss kernel¶

from info.me import bayes as bys
from scipy import stats as st
import numpy as np

mean, cov = np.array([1, 2]), np.diag([1.5, 1])
dis = st.multivariate_normal(mean+0.7, cov+0.45)

# framework to infer mean vector:
model1 = bys.gaussian(kernel=st.multivariate_normal(mean, cov), prior=st.multivariate_normal(mean+1, cov+0.3))
model1.update_posterior(posterior=dis.rvs(size=30))

# framework to infer precision matrix:
model2 = bys.gaussian(kernel=st.multivariate_normal(mean, cov), prior=st.wishart(3, np.linalg.inv(cov+0.31)))
model2.update_posterior(posterior=dis.rvs(size=30))

# framework to infer both mean vector and precision matrix:
model3 = bys.gaussian(kernel=st.multivariate_normal(mean, cov),
                      prior=bys.GaussianWishart(mean+0.92, 1.4, 3, np.linalg.inv(cov+0.31)))
model3.update_posterior(posterior=dis.rvs(size=30))

Notes:

Implementation is on the basis of the deduction in the gauss distribution family. According to the deduction, application programming interface of gauss here also support the Bayesian inference in context of univariate gauss:

Code 3.125 Bayesian of univariate gauss kernel¶

mean, var, dis = 1.2, 1.5, st.norm(1.5, 0.7)

# framework to infer mean:
model1 = bys.gaussian(kernel=st.norm(mean, var), prior=st.norm(1.1, 0.9))
model1.update_posterior(posterior=dis.rvs(size=30)[..., np.newaxis])

# framework to infer precision:
model2 = bys.gaussian(kernel=st.norm(mean, var), prior=st.gamma(3, 5))
model2.update_posterior(posterior=dis.rvs(size=30)[..., np.newaxis])

# framework to infer both mean and precision:
model3 = bys.gaussian(kernel=st.norm(mean, var),
                      prior=bys.GaussianWishart(np.array([1.1]), 0.5, 3, np.array([[1.8]])))
model3.update_posterior(posterior=dis.rvs(size=30)[..., np.newaxis])

See also:

norm
multivariate_normal
gamma
wishart
GaussianWishart

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

3.1.5.4. Module anomaly¶

3.1.5.5. Description¶

Utilities used for training models for anomaly and change detection. Functions and classes here mainly in namespace info.toolbox.libs.anomaly. All those objects are also integrated into info.me as well.

`Hotelling`	Hotelling T² constructor for multivariate gaussian distribution.
`NaiveBayes`	NaiveBayes framework.
`Neighbors`	Neighbor algorithm frame for modeling based on empirical distribution.
`VonMisesFisher`	Algorithm frame for spherical like data.

3.1.5.6. Docstrings¶

class Hotelling¶

Hotelling T² constructor for multivariate gaussian distribution.

Arguments:

Parameters:

data (ndarray) – \(\boldsymbol{R}^{n \times m}\) matrix with \(n\) observations of \(m\) dimensions
significance_level (float) – significance level used for anomaly threshold determination; 0.05 as default

Returns:

Hotelling T² distribution

Return type:

Hotelling

Property:

settings:: Hotelling configuration when initializing

model:: \(\boldsymbol{R}^{n \times m}\) data container; the number of observations \(n\) will increase when use data updating

mean:: \(\boldsymbol{R}^m\) mean vector of all observations

sigma:: \(\boldsymbol{R}^{m \times m}\) covariance matrix of all observations

threshold:: threshold calculated for determining anomalous observations, under assigned significance level

Methods:

update:: append new observations via data, then synchronize related properties

predict_dissimilarity:: calculate anomaly scores of new observations, via data keyword assignment; will return a numeric sequence

predict:: determine whether anomalous or not for new observations, via data keyword assignment; will return a boolean sequence

Examples:

Code 3.126 Hotelling T2 for anomaly determination¶

from info.me import anomaly as ano
from scipy.stats import multinomial
import numpy as np

p = np.array([0.03, 0.06, 0.1, 0.34, 0.22, 0.11, 0.08, 0.06])
obs = np.array([multinomial.rvs(50, p) for _ in range(100)])

model = ano.Hotelling(data=obs)
model.predict(data=np.vstack([obs, np.array([multinomial.rvs(50, np.roll(p, 4)) for _ in range(100)])]))

Notes:: Hotelling T² is a classic method to detect outliers, from I.I.D. observations which in consistence of multivariate gaussian distribution (definition see Equation 3.18). It can be seen as the multivariate extension for uni-variate t-test. The related section collected the detailed mathematical deduction of this method.

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

class NaiveBayes¶

NaiveBayes framework. await for completion.

Arguments:

Parameters:

data (ndarray) – \(\boldsymbol{R}^{n \times m}\) matrix with \(n\) observations of \(m\) dimensions
label (ndarray) – 1D boolean label of data, False for normal instances while True for anomalous ones
prior (list[DirTP]) – list composed dirichlet distributions of normal and anomalous respectively; None as default to initialize two dirichlet distributions with 1 for all \(\alpha\)
validation_rate (float) – the ratio of test data in cross validation, to determine the threshold; 0.2 as default to use 5-fold validation
model_lightweight (bool) – whether cache the data points; True will save the data, label and the calculated anomalous statistic; False merely update those two models; the default value uses True

Returns:

naive Bayes model

Return type:

NaiveBayes

Examples:

Code 3.127 Naive Bayes for anomaly determination¶

from info.me import anomaly as ano
from scipy.stats import multinomial
import numpy as np

p = np.array([0.03, 0.06, 0.1, 0.34, 0.22, 0.11, 0.08, 0.06])
obs = np.vstack([np.array([multinomial.rvs(50, p) for _ in range(100)]),
                 np.array([multinomial.rvs(50, np.roll(p, 4)) for _ in range(100)])])
cls = np.concatenate([np.array([0 for _ in range(100)]), np.array([1 for _ in range(100)])]).astype(bool)

model = ano.NaiveBayes(data=obs, labels=cls)
model.predict(data=np.vstack([np.array([multinomial.rvs(50, p) for _ in range(20)]),
                              np.array([multinomial.rvs(50, np.roll(p, 4)) for _ in range(20)])]))

Notes:: await for completion.

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

class Neighbors¶

Neighbor algorithm frame for modeling based on empirical distribution. For definition of empirical distribution, see Equation 4.102, and supplementary material for the principles.

Arguments:

Parameters:

data (ndarray) – \(\boldsymbol{R}^{n \times m}\) matrix with \(n\) observations of \(m\) dimensions
labels (ndarray) – non-negative integer array as labels in consistence of data; if for anomaly detection, suggest labeling the normal data as 0 while other integers for anomalies with other patterns.
distance_measure (int) – order of norm to calculate distance; 2 as default for Euclidean
kamap_optimizer (Callable) – method to determine the optimal \(k_{i,j}\), and threshold \(a_{i,j}\) to distinguish the \(i\)- and \(j\)-class; the value should be capable to accept \(k\) vs. \(a\) map, axes of \(k\) and \(a\) as three arguments, then return the optimal values of \(k_{\mathrm{opt}}\) and \(a_{\mathrm{opt}}\); None as default to call a built-in method, that determines these two optimal values via local minial of 1st order differentiation of \(k\), and the global maximum of \(a\)
nearing_mode (Literal) – use which method to initiate the transformation; valid options are 'KNN', and 'LMNN'; 'KNN' is for computation in original Cartesian space, 'LMNN' is in a computed Riemannian space (see related definitions); 'KNN' as default
k_determine (int) – the maximum number of \(k\) during initiative training; 10 as default
eta_determine (float) – the coefficient used for updating (sub)gradient during initiative training, if spatial calculation and transformation is necessary; 0.05 as default
prior_prob_determine (list[float]) – prior weights assigned for all classes; in consistence with \(\boldsymbol{\alpha}\) of certain dirichlet distribution; None as default using all-equal weights

Returns:

optimal data set based on an empirical distribution

Return type:

Neighbors

Property:

settings:: Neighbors configuration when initializing

x:: \(\boldsymbol{R}^{n \times m}\) data container; the number of observations \(n\) will increase when use data updating

y:: \(\boldsymbol{R}^n\) vector of integers in consistence with x

trans:: \(\boldsymbol{C}^{m \times m}\) transformation; real domain for 'KNN' while complex domain for 'LMNN'

thre:: dict constructed as dict[tuple[i, j], tuple[k_ij, a_ij]] determined by kamap_optimizer; i, j are indicators for different classes

Methods:

update:: append new observations and corresponding labels via data and labels, then synchronize related properties

predict_dissimilarity:: calculate anomaly scores of new observations, via data keyword assignment; will return a dict with construction as dict[tuple[i, j], ndarray]; i and j are indicators for different classes

predict:: determine which maximum likely class, via data keyword assignment; will return a sequence of integers

Examples:

Code 3.128 Neighbors frame for multi classification¶

from info.me import anomaly as ano
from scipy.stats import multinomial
import numpy as np

p1 = np.array([0.03, 0.06, 0.1, 0.34, 0.22, 0.11, 0.08, 0.06])
p2 = np.array([0.03, 0.08, 0.06, 0.12, 0.35, 0.2, 0.11, 0.05])
obs = np.vstack([np.array([multinomial.rvs(50, p) for _ in range(100)]) for p in [p1, p2]])
cls = np.array([0 for _ in range(100)] + [1 for _ in range(100)])

model = ano.Neighbors(data=obs, labels=cls)
model.predict(data=np.vstack([obs, np.array([multinomial.rvs(50, np.roll(p1, 4)) for _ in range(100)])]))

Notes:

initialization differs from defined method:

KNN:

\(k\)-nearest neighbors uses the unit matrix \(\boldsymbol{I}\) as transformation, for calculation in original Cartesian space
LMNN:

Large margin nearest neighbors (LMNN) needs to initialize a Riemannian space. In each updating step, use the gradient of Equation 4.109 (\(\boldsymbol{R} = \boldsymbol{R} - \eta (\partial \Psi (\boldsymbol{R}) / \partial \boldsymbol{R})\)).

In Equation 4.109, the major form of item is \(d^2_{\boldsymbol{R}} (\boldsymbol{a}, \boldsymbol{b}) = (\boldsymbol{a} - \boldsymbol{b})^T \boldsymbol{R} (\boldsymbol{a} - \boldsymbol{b})\). Because \(\boldsymbol{m}^\top \boldsymbol{A} \boldsymbol{n} = \mathrm{Tr} (\boldsymbol{m}^\top \boldsymbol{A} \boldsymbol{n})\), and \((\partial \mathrm{Tr} [ \boldsymbol{m}^\top \boldsymbol{A} \boldsymbol{n} ]) / (\partial \boldsymbol{A}) = \boldsymbol{m} \boldsymbol{n}^\top\). Therefore:

(3.14)¶\[\frac{\partial d^2_{\boldsymbol{R}} (\boldsymbol{a}, \boldsymbol{b})}{\partial \boldsymbol{R}} = \frac{\partial \mathrm{Tr} [ (\boldsymbol{a} - \boldsymbol{b})^\top \boldsymbol{R} (\boldsymbol{a} - \boldsymbol{b}) ]}{\partial \boldsymbol{R}} = (\boldsymbol{a} - \boldsymbol{b}) (\boldsymbol{a} - \boldsymbol{b})^\top\]

As the result, the computation for gradient is in series of linear subspaces on original space. Using the eigen decomposition of updated \(\boldsymbol{R}=\boldsymbol{L}\boldsymbol{\Lambda}\boldsymbol{L}^\top\), to guarantee the semi-positive constraint in Equation 4.109, floor the negative eigen values as 0 in \(\boldsymbol{\Lambda}\) (denoted as \([\boldsymbol{\Lambda}]_{+}\)). Then final Riemannian space can be updated through \(\boldsymbol{L} [\boldsymbol{\Lambda}]_{+} \boldsymbol{L}^\top\).

Repeating the previous calculation until \(\boldsymbol{R}\) converge. The final \(\boldsymbol{R}^*\) is the optimal Riemannian space based on the trained data.

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

class VonMisesFisher¶

Algorithm frame for spherical like data. Theoretical definition of Von Mises Fisher distribution can refer Equation 4.110. And the associated deduction is also provided.

Arguments:

Parameters:

data (ndarray) – \(\boldsymbol{R}^{n \times m}\) matrix with \(n\) observations of \(m\) dimensions
significance_level (float) – significance level used for anomaly threshold determination; 0.05 as default

Returns:

the Von Mises Fisher distribution

Return type:

VonMisesFisher

Property:

settings:: VonMisesFisher configuration when initializing

model:: \(\boldsymbol{R}^{n \times m}\) data container; the number of observations \(n\) will increase when use data updating

mean:: mean vector \(\boldsymbol{s}\) of all observations as referred in the supplementary materials

a:: degree of anomalies for all observations

m:: the estimation on degree of freedom for the calculated \(\chi^2\) distribution

s:: the scale factor for the calculated \(\chi^2\) distribution

dis:: the Von Mises Fisher distribution

Methods:

update:: append new observations and corresponding labels via data, then synchronize related properties

predict_dissimilarity:: calculate anomaly scores of new observations, via data keyword assignment; will return a numeric sequence

predict:: determine whether anomalous or not for new observations, via data keyword assignment; will return a boolean sequence

Examples:

Code 3.129 Von Mises Fisher for anomaly determination¶

from info.me import anomaly as ano
from scipy.stats import multinomial
import numpy as np

p = np.array([0.03, 0.06, 0.1, 0.34, 0.22, 0.11, 0.08, 0.06])
obs = np.array([multinomial.rvs(50, p) for _ in range(100)])

model = ano.VonMisesFisher(data=obs)
model.predict(data=np.vstack([obs, np.array([multinomial.rvs(50, np.roll(p, 4)) for _ in range(100)])]))

Notes:

From the supplementary deduction it is known the degree of anomaly in the context of Von Mises Fisher distribution is in consistent with a certain \(\chi^2(a | m, s)\).

Using the substitution \(\Gamma(m/2) = (2/m) \Gamma((m/2)+1) = (2/(m+2)) (2/m) \Gamma((m/2)+2)\), the moment estimation for solving \(E[a]\) and \(E[a^2]\) can be obtained through:

(3.15)¶\[\begin{split}E[a] &= \int_0^{\infty} da \cdot a \cdot \chi^2 (a | m, s) \\ &= \int_0^{\infty} da \cdot a \cdot \frac{1}{2s\Gamma(\frac{m}{2})} (\frac{a}{2s})^{\frac{m}{2}-1} \exp(-\frac{a}{2s}) \\ &= ms \cdot \int_0^{\infty} da \cdot (\frac{a}{2s}) \cdot \frac{1}{2s\Gamma(\frac{m}{2}+1)} (\frac{a}{2s})^{\frac{m}{2}-1} \exp(-\frac{a}{2s}) \\ &= ms \cdot \int_0^{\infty} da \cdot \chi^2(a | m+2, s) = ms\end{split}\]

(3.16)¶\[\begin{split}E[a^2] &= \int_0^{\infty} da \cdot a^2 \chi^2 (a | m, s) \\ &= \int_0^{\infty} da \cdot a^2 \cdot \frac{1}{2s\Gamma(\frac{m}{2})} (\frac{a}{2s})^{\frac{m}{2}-1} \exp(-\frac{a}{2s}) \\ &= m(m+2)s^2 \cdot \int_0^{\infty} da \cdot (\frac{a}{2s})^2 \cdot \frac{1}{2s\Gamma(\frac{m}{2}+2)} (\frac{a}{2s})^{\frac{m}{2}-1} \exp(-\frac{a}{2s}) \\ &= m(m+2)s^2 \cdot \int_0^{\infty} da \cdot \chi^2(a | m+4, s) = m(m+2)s^2\end{split}\]

\(\hat{m}\) and \(\hat{s}\) represent for the estimations on \(m\) and \(s\) respectively. Simultaneously consider the Equation 3.15 and Equation 3.16, the following formula can be established:

(3.17)¶\[\hat{m} = \frac{2(E[a])^2}{E[a^2] - (E[a])^2};\ \hat{s} = \frac{E[a^2] - (E[a])^2}{E[a]}\]

Compare to the form of \(\chi^2 (M-1, 0.5\kappa)\), the estimation \(\hat{s}\) is nothing else but \(0.5\kappa\), while the \(\hat{m}\) is generally no greater than \(M-1\). In the view point of informatics, the estimation \(\hat{m}\) represents to some extent the valid number of dimension that takes part in the subsequent modeling and calculations.

Logs:: Added in version 0.0.5.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

Authors:: Chen Zhang
Version:: 0.0.5
Created on:: Apr 23, 2024

3.1.5. Modules for learning¶

3.1.5.1. Module Bayes¶

3.1.5.2. Description¶

3.1.5.3. Docstrings¶

3.1.5.4. Module anomaly¶

3.1.5.5. Description¶

3.1.5.6. Docstrings¶

Table of Contents

This Page