3.1.5. Modules for learning¶
3.1.5.1. Module Bayes¶
3.1.5.2. Description¶
For infrastructure used for Bayesian statistical inference and further constructional components of online or self
adaption featured algorithms, utilize the functions in namespace info.toolbox.libs.bayes._frame, or
alternatively import bayes from the main entry info.me.
Essential Bayes framework. |
|
the initializer of gauss-wishart distribution. |
|
Bayesian framework of bernoulli kernel. |
|
Bayesian framework of categorical kernel. |
|
Bayesian framework of binomial kernel. |
|
Bayesian framework of multinomial kernel. |
|
Bayesian framework of poisson kernel. |
|
Bayesian framework of gauss kernel. |
3.1.5.3. Docstrings¶
- class Bayes¶
Essential Bayes framework.
- Arguments:
- Parameters:
name (str) – the name of the kernel likelihood function
kernel (dist) – the kernel likelihood distribution
prior (dist) – the Bayesian prior distribution
likelihood_check (Callable[[...], bool]) – likelihood checker to validate the investigated data or distribution
update_conjugate (Callable[[...], dist]) – method to update the Bayesian conjugate posterior distribution; its arguments should be the corresponding Bayesian conjugate prior, and followed by the likelihood data set in presentation of numpy array or the form of certain distribution
update_predictive (Callable[[...], dist]) – method to update the Bayesian posterior predictive distribution; its arguments should be the updated Bayesian conjugate prior, and followed by the likelihood data set in presentation of numpy array or the form of certain distribution if necessary
- Returns:
a kind of Bayesian family
- Return type:
- Properties:
- name:
Name of Bayesian framework. It suggests to use the family name of likelihood function during declaration.
- kernel:
Distribution of likelihood function used in initialization.
- conjugate:
The Bayesian prior distribution. Its associated parameters be updated by invoking the method of
update_posteriorwith input of the likelihood data set or distribution.
- predictive:
The Bayesian predictive distribution under the pre-condition of
conjugateas Bayesian posterior.
- update_conjugate:
The callable function to compute conjugate prior and likelihood data set into conjugate posterior.
- update_predictive:
The callable function to compute conjugate posterior and likelihood data set into posterior predictive.
- Methods:
- update_posterior:
Update the
conjugateandpredictivedistributions, via likelihood distribution or data set.
- compare_posterior:
Test for the
conjugateposterior under the condition of likelihood distribution or data set, without updating the propertyconjugateandpredictiveindeed.
- Notes:
A Bayes instance should be a set of correlated distributions and their corresponding rules of calculations that basically follows the principles of Bayes theory. In real implementations, without the loss of the scientific rigorousness, our informatics still consider the degeneration relationship of distributions, induced by dimensional collapse. For example, the likelihood functions used bernoulli, binomial, and categorical are all fulfilled from the multinomial framework. Their relationship can be ascertained in Table 4.3, and the concrete reduction tutorial can be referred in the chapter of multinomial distribution. Similarly, the Gauss family is basically achieved from the multivariate Gauss distribution, on the basis of Gauss Bayesian framework as established in the chapter of continuous Gauss.
Customarily, the parameters
update_conjugateandupdate_predictivetakes conjugate distribution and likelihood function or data set as input arguments, then return the corresponding Bayesian posterior and predictive distributions, respectively.
- See also:
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- class GaussianWishart¶
the initializer of gauss-wishart distribution.
- Arguments:
- Parameters:
mean (ndarray) – mean vector of gauss distribution.
beta (Numeric) – coefficient for the distribution of precision matrix.
nu (int) – degree of freedom, should be no less than 1 minus the number of dimensions of
w.w (ndarray) – a positive definite matrix.
- Returns:
a gauss-wishart distribution if all arguments are configured valid, otherwise 0.
- Return type:
Union[GauWisTP, int]
- Examples:
from info.me import bayes as bys import numpy as np _temp = np.random.random((20, 3)) gauss_wishart = bys.GaussianWishart(np.random.random(3), 3.4, 5, _temp.T @ _temp)
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- bernoulli¶
Bayesian framework of bernoulli kernel.
- Arguments:
- Parameters:
kernel (Union[BernTP, BinTP, MultTP]) – a certain bernoulli distribution, a certain binomial distribution with only one trial, or a certain multinomial distribution with one trial and two categories.
prior (Union[BetaTP, DirTP]) – a certain beta distribution or dirichlet distribution with a two-length alpha;
Noneas default to use uniform prior.
- Returns:
the bernoulli Bayesian instance
- Return type:
- Examples:
from info.me import bayes as bys from scipy import stats as st import numpy as np model1 = bys.bernoulli(kernel=st.bernoulli(0.3), prior=st.beta(4, 5)) model1.update_posterior(posterior=np.array([[1, 0], [0, 1], [1, 0], [1, 0]])) # or equivalently using the categorical kernel with one trial: model2 = bys.categorical(kernel=st.multinomial(1, [0.7, 0.3]), prior=st.dirichlet([4, 5])) model2.update_posterior(posterior=np.array([[1, 0], [0, 1], [1, 0], [1, 0]]))
- Notes:
On the basis of the degeneration with \(M = 1\) and \(K = 2\) from the Bayesian multinomial distribution, although the kernel and prior support to be initialized via multiple types of valid distributions, the kernel, conjugate and predictive distributions are all fulfilled in multinomial context.
- See also:
bernoullimultinomialbetadirichletdirichlet_multinomial
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- categorical¶
Bayesian framework of categorical kernel.
- Arguments:
- Parameters:
kernel (MultTP) – a certain multinomial distribution instance with one trial.
prior (DirTP) – a certain dirichlet distribution;
Noneas default to use uniform dirichlet prior.
- Returns:
the categorical Bayesian instance
- Return type:
- Examples:
from info.me import bayes as bys from scipy import stats as st import numpy as np model = bys.categorical(kernel=st.multinomial(1, [0.3, 0.2, 0.5]), prior=st.dirichlet([3, 2, 4])) model.update_posterior(posterior=np.array([[0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 0, 1]]))
- Notes:
Implementation is on the basis of the first degeneration situation with \(M = 1\) from the Bayesian multinomial distribution. There is temporarily no explicit application programming interface of categorical distribution in scipy, it employs the collapsed multinomial one with single trial.
- See also:
multinomialdirichletdirichlet_multinomial
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- binomial¶
Bayesian framework of binomial kernel.
- Arguments:
- Parameters:
kernel (Union[BinTP, MultTP]) – a certain binomial distribution or a certain two-categorical multinomial distributions with multiple trials.
prior (Union[BetaTP, DirTP]) – a certain beta distribution or dirichlet distribution with a two-length alpha;
Noneas default to use uniform dirichlet prior.
- Returns:
the binomial Bayesian instance
- Return type:
- Examples:
from info.me import bayes as bys from scipy import stats as st import numpy as np model1 = bys.binomial(kernel=st.binom(5, 0.3), prior=st.beta(4, 5)) model1.update_posterior(posterior=np.array([[4, 1], [3, 2], [2, 3], [5, 0]])) # or equivalently in the multinomial context: model2 = bys.multinomial(kernel=st.multinomial(5, [0.7, 0.3]), prior=st.dirichlet([4, 5])) model2.update_posterior(posterior=st.multinomial(20, [0.7, 0.3]))
- Notes:
On the basis of the second degeneration situation with \(K = 2\) from the Bayesian multinomial distribution, although the kernel and prior support to be initialized via multiple types of valid distributions, the kernel, conjugate and predictive distributions are all fulfilled in multinomial context.
- See also:
binommultinomialbetadirichletdirichlet_multinomial
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- multinomial¶
Bayesian framework of multinomial kernel.
- Arguments:
- Parameters:
kernel (MultTP) – a certain multinomial distribution instance multiple trials.
prior (DirTP) – a certain dirichlet distribution;
Noneas default to use uniform dirichlet prior.
- Returns:
the multinomial Bayesian instance
- Return type:
- Examples:
from info.me import bayes as bys from scipy import stats as st import numpy as np model = bys.multinomial(kernel=st.multinomial(5, [0.3, 0.2, 0.5]), prior=st.dirichlet([3, 2, 4])) model.update_posterior(posterior=np.array([[1, 1, 3], [2, 1, 2], [1, 0, 4], [2, 1, 2]]))
- Notes:
Implementation is on the basis of the general situation with \(M > 1\) and \(K > 2\) of the Bayesian multinomial distribution.
- See also:
multinomialdirichletdirichlet_multinomial
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- poisson¶
Bayesian framework of poisson kernel.
- Arguments:
- Parameters:
kernel (PoiTP) – a certain poisson distribution instance.
prior (Union[GamTP, ExpTP, ErlTP]) – a certain gamma distribution;
Noneas default to use \(\mathrm{Gam}(x|1, 1)\) prior.
- Returns:
the poisson Bayesian instance
- Return type:
- Examples:
from info.me import bayes as bys from scipy import stats as st import numpy as np model = bys.poisson(kernel=st.poisson(2.3), prior=st.gamma(1, 0, 0.5)) model.update_posterior(posterior=np.array([0, 3, 2, 1, 4, 6]))
- Notes:
Implementation is on the basis of the deduction in the poisson distribution. In addition, consider the exponential and erlang distributions are too specific forms of gamma distribution, however all the prior should be here reinterpreted under the gamma context.
For example, the initialization method in Code 3.122 can also be equivalently achieved by:
model1 = bys.poisson(kernel=st.poisson(2.3), prior=st.expon(2)) model2 = bys.poisson(kernel=st.poisson(2.3), prior=st.erlang(1, 0, 0.5)) model1.conjugate.dist.name == model2.conjugate.dist.name == 'gamma' # True
- See also:
poissongammanbinom
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- gaussian¶
Bayesian framework of gauss kernel.
- Arguments:
- Parameters:
kernel (Union[GauTP, MGauTP]) – a certain gauss distribution instance.
prior (Union[GauTP, MGauTP, GamTP, WisTP, GauWisTP]) – a certain prior distribution; in case of univariate gauss kernel, it supports initializing using univariate gauss, gamma, or one dimension confined multivariate gauss, wishart, or gauss-wishart; in case of multivariate gauss kernel, it supports initializing using multivariate gauss, wishart, or gauss-wishart; their detailed relationships and deduction can refer Table 4.5;
Noneas default to automatically employ the gauss-wishart \(\mathcal{NW}(\boldsymbol{x} | \boldsymbol{0}_D, 1, D, \boldsymbol{I}_D)\).
- Returns:
the gauss Bayesian instance
- Return type:
- Examples:
from info.me import bayes as bys from scipy import stats as st import numpy as np mean, cov = np.array([1, 2]), np.diag([1.5, 1]) dis = st.multivariate_normal(mean+0.7, cov+0.45) # framework to infer mean vector: model1 = bys.gaussian(kernel=st.multivariate_normal(mean, cov), prior=st.multivariate_normal(mean+1, cov+0.3)) model1.update_posterior(posterior=dis.rvs(size=30)) # framework to infer precision matrix: model2 = bys.gaussian(kernel=st.multivariate_normal(mean, cov), prior=st.wishart(3, np.linalg.inv(cov+0.31))) model2.update_posterior(posterior=dis.rvs(size=30)) # framework to infer both mean vector and precision matrix: model3 = bys.gaussian(kernel=st.multivariate_normal(mean, cov), prior=bys.GaussianWishart(mean+0.92, 1.4, 3, np.linalg.inv(cov+0.31))) model3.update_posterior(posterior=dis.rvs(size=30))
- Notes:
Implementation is on the basis of the deduction in the gauss distribution family. According to the deduction, application programming interface of gauss here also support the Bayesian inference in context of univariate gauss:
mean, var, dis = 1.2, 1.5, st.norm(1.5, 0.7) # framework to infer mean: model1 = bys.gaussian(kernel=st.norm(mean, var), prior=st.norm(1.1, 0.9)) model1.update_posterior(posterior=dis.rvs(size=30)[..., np.newaxis]) # framework to infer precision: model2 = bys.gaussian(kernel=st.norm(mean, var), prior=st.gamma(3, 5)) model2.update_posterior(posterior=dis.rvs(size=30)[..., np.newaxis]) # framework to infer both mean and precision: model3 = bys.gaussian(kernel=st.norm(mean, var), prior=bys.GaussianWishart(np.array([1.1]), 0.5, 3, np.array([[1.8]]))) model3.update_posterior(posterior=dis.rvs(size=30)[..., np.newaxis])
- See also:
normmultivariate_normalgammawishart
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
3.1.5.4. Module anomaly¶
3.1.5.5. Description¶
Utilities used for training models for anomaly and change detection. Functions and classes here mainly in
namespace info.toolbox.libs.anomaly. All those objects are also integrated into info.me as well.
Hotelling T2 constructor for multivariate gaussian distribution. |
|
NaiveBayes framework. |
|
Neighbor algorithm frame for modeling based on empirical distribution. |
|
Algorithm frame for spherical like data. |
3.1.5.6. Docstrings¶
- class Hotelling¶
Hotelling T2 constructor for multivariate gaussian distribution.
- Arguments:
- Parameters:
data (ndarray) – \(\boldsymbol{R}^{n \times m}\) matrix with \(n\) observations of \(m\) dimensions
significance_level (float) – significance level used for anomaly threshold determination; 0.05 as default
- Returns:
Hotelling T2 distribution
- Return type:
- Property:
- settings:
Hotelling configuration when initializing
- model:
\(\boldsymbol{R}^{n \times m}\) data container; the number of observations \(n\) will increase when use data updating
- mean:
\(\boldsymbol{R}^m\) mean vector of all observations
- sigma:
\(\boldsymbol{R}^{m \times m}\) covariance matrix of all observations
- threshold:
threshold calculated for determining anomalous observations, under assigned significance level
- Methods:
- update:
append new observations via
data, then synchronize related properties
- predict_dissimilarity:
calculate anomaly scores of new observations, via
datakeyword assignment; will return a numeric sequence
- predict:
determine whether anomalous or not for new observations, via
datakeyword assignment; will return a boolean sequence
- Examples:
from info.me import anomaly as ano from scipy.stats import multinomial import numpy as np p = np.array([0.03, 0.06, 0.1, 0.34, 0.22, 0.11, 0.08, 0.06]) obs = np.array([multinomial.rvs(50, p) for _ in range(100)]) model = ano.Hotelling(data=obs) model.predict(data=np.vstack([obs, np.array([multinomial.rvs(50, np.roll(p, 4)) for _ in range(100)])]))
- Notes:
Hotelling T2 is a classic method to detect outliers, from I.I.D. observations which in consistence of multivariate gaussian distribution (definition see Equation 3.18). It can be seen as the multivariate extension for uni-variate t-test. The related section collected the detailed mathematical deduction of this method.
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- class NaiveBayes¶
NaiveBayes framework. await for completion.
- Arguments:
- Parameters:
data (ndarray) – \(\boldsymbol{R}^{n \times m}\) matrix with \(n\) observations of \(m\) dimensions
label (ndarray) – 1D boolean label of
data,Falsefor normal instances whileTruefor anomalous onesprior (list[DirTP]) – list composed dirichlet distributions of normal and anomalous respectively;
Noneas default to initialize two dirichlet distributions with 1 for all \(\alpha\)validation_rate (float) – the ratio of test data in cross validation, to determine the threshold; 0.2 as default to use 5-fold validation
model_lightweight (bool) – whether cache the data points;
Truewill save thedata,labeland the calculated anomalous statistic;Falsemerely update those two models; the default value usesTrue
- Returns:
naive Bayes model
- Return type:
- Examples:
from info.me import anomaly as ano from scipy.stats import multinomial import numpy as np p = np.array([0.03, 0.06, 0.1, 0.34, 0.22, 0.11, 0.08, 0.06]) obs = np.vstack([np.array([multinomial.rvs(50, p) for _ in range(100)]), np.array([multinomial.rvs(50, np.roll(p, 4)) for _ in range(100)])]) cls = np.concatenate([np.array([0 for _ in range(100)]), np.array([1 for _ in range(100)])]).astype(bool) model = ano.NaiveBayes(data=obs, labels=cls) model.predict(data=np.vstack([np.array([multinomial.rvs(50, p) for _ in range(20)]), np.array([multinomial.rvs(50, np.roll(p, 4)) for _ in range(20)])]))
- Notes:
await for completion.
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- class Neighbors¶
Neighbor algorithm frame for modeling based on empirical distribution. For definition of empirical distribution, see Equation 4.102, and supplementary material for the principles.
- Arguments:
- Parameters:
data (ndarray) – \(\boldsymbol{R}^{n \times m}\) matrix with \(n\) observations of \(m\) dimensions
labels (ndarray) – non-negative integer array as labels in consistence of
data; if for anomaly detection, suggest labeling the normal data as 0 while other integers for anomalies with other patterns.distance_measure (int) – order of norm to calculate distance;
2as default for Euclideankamap_optimizer (Callable) – method to determine the optimal \(k_{i,j}\), and threshold \(a_{i,j}\) to distinguish the \(i\)- and \(j\)-class; the value should be capable to accept \(k\) vs. \(a\) map, axes of \(k\) and \(a\) as three arguments, then return the optimal values of \(k_{\mathrm{opt}}\) and \(a_{\mathrm{opt}}\);
Noneas default to call a built-in method, that determines these two optimal values via local minial of 1st order differentiation of \(k\), and the global maximum of \(a\)nearing_mode (Literal) – use which method to initiate the transformation; valid options are
'KNN', and'LMNN';'KNN'is for computation in original Cartesian space,'LMNN'is in a computed Riemannian space (see related definitions);'KNN'as defaultk_determine (int) – the maximum number of \(k\) during initiative training; 10 as default
eta_determine (float) – the coefficient used for updating (sub)gradient during initiative training, if spatial calculation and transformation is necessary; 0.05 as default
prior_prob_determine (list[float]) – prior weights assigned for all classes; in consistence with \(\boldsymbol{\alpha}\) of certain dirichlet distribution;
Noneas default using all-equal weights
- Returns:
optimal data set based on an empirical distribution
- Return type:
- Property:
- settings:
Neighbors configuration when initializing
- x:
\(\boldsymbol{R}^{n \times m}\) data container; the number of observations \(n\) will increase when use data updating
- y:
\(\boldsymbol{R}^n\) vector of integers in consistence with
x
- trans:
\(\boldsymbol{C}^{m \times m}\) transformation; real domain for
'KNN'while complex domain for'LMNN'
- thre:
dict constructed as
dict[tuple[i, j], tuple[k_ij, a_ij]]determined bykamap_optimizer;i,jare indicators for different classes
- Methods:
- update:
append new observations and corresponding labels via
dataandlabels, then synchronize related properties
- predict_dissimilarity:
calculate anomaly scores of new observations, via
datakeyword assignment; will return a dict with construction asdict[tuple[i, j], ndarray];iandjare indicators for different classes
- predict:
determine which maximum likely class, via
datakeyword assignment; will return a sequence of integers
- Examples:
from info.me import anomaly as ano from scipy.stats import multinomial import numpy as np p1 = np.array([0.03, 0.06, 0.1, 0.34, 0.22, 0.11, 0.08, 0.06]) p2 = np.array([0.03, 0.08, 0.06, 0.12, 0.35, 0.2, 0.11, 0.05]) obs = np.vstack([np.array([multinomial.rvs(50, p) for _ in range(100)]) for p in [p1, p2]]) cls = np.array([0 for _ in range(100)] + [1 for _ in range(100)]) model = ano.Neighbors(data=obs, labels=cls) model.predict(data=np.vstack([obs, np.array([multinomial.rvs(50, np.roll(p1, 4)) for _ in range(100)])]))
- Notes:
initialization differs from defined method:
KNN:
\(k\)-nearest neighbors uses the unit matrix \(\boldsymbol{I}\) as transformation, for calculation in original Cartesian space
LMNN:
Large margin nearest neighbors (LMNN) needs to initialize a Riemannian space. In each updating step, use the gradient of Equation 4.109 (\(\boldsymbol{R} = \boldsymbol{R} - \eta (\partial \Psi (\boldsymbol{R}) / \partial \boldsymbol{R})\)).
In Equation 4.109, the major form of item is \(d^2_{\boldsymbol{R}} (\boldsymbol{a}, \boldsymbol{b}) = (\boldsymbol{a} - \boldsymbol{b})^T \boldsymbol{R} (\boldsymbol{a} - \boldsymbol{b})\). Because \(\boldsymbol{m}^\top \boldsymbol{A} \boldsymbol{n} = \mathrm{Tr} (\boldsymbol{m}^\top \boldsymbol{A} \boldsymbol{n})\), and \((\partial \mathrm{Tr} [ \boldsymbol{m}^\top \boldsymbol{A} \boldsymbol{n} ]) / (\partial \boldsymbol{A}) = \boldsymbol{m} \boldsymbol{n}^\top\). Therefore:
(3.14)¶\[\frac{\partial d^2_{\boldsymbol{R}} (\boldsymbol{a}, \boldsymbol{b})}{\partial \boldsymbol{R}} = \frac{\partial \mathrm{Tr} [ (\boldsymbol{a} - \boldsymbol{b})^\top \boldsymbol{R} (\boldsymbol{a} - \boldsymbol{b}) ]}{\partial \boldsymbol{R}} = (\boldsymbol{a} - \boldsymbol{b}) (\boldsymbol{a} - \boldsymbol{b})^\top\]As the result, the computation for gradient is in series of linear subspaces on original space. Using the eigen decomposition of updated \(\boldsymbol{R}=\boldsymbol{L}\boldsymbol{\Lambda}\boldsymbol{L}^\top\), to guarantee the semi-positive constraint in Equation 4.109, floor the negative eigen values as 0 in \(\boldsymbol{\Lambda}\) (denoted as \([\boldsymbol{\Lambda}]_{+}\)). Then final Riemannian space can be updated through \(\boldsymbol{L} [\boldsymbol{\Lambda}]_{+} \boldsymbol{L}^\top\).
Repeating the previous calculation until \(\boldsymbol{R}\) converge. The final \(\boldsymbol{R}^*\) is the optimal Riemannian space based on the trained data.
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- class VonMisesFisher¶
Algorithm frame for spherical like data. Theoretical definition of Von Mises Fisher distribution can refer Equation 4.110. And the associated deduction is also provided.
- Arguments:
- Parameters:
data (ndarray) – \(\boldsymbol{R}^{n \times m}\) matrix with \(n\) observations of \(m\) dimensions
significance_level (float) – significance level used for anomaly threshold determination; 0.05 as default
- Returns:
the Von Mises Fisher distribution
- Return type:
- Property:
- settings:
VonMisesFisher configuration when initializing
- model:
\(\boldsymbol{R}^{n \times m}\) data container; the number of observations \(n\) will increase when use data updating
- mean:
mean vector \(\boldsymbol{s}\) of all observations as referred in the supplementary materials
- a:
degree of anomalies for all observations
- m:
the estimation on degree of freedom for the calculated \(\chi^2\) distribution
- s:
the scale factor for the calculated \(\chi^2\) distribution
- dis:
the Von Mises Fisher distribution
- Methods:
- update:
append new observations and corresponding labels via
data, then synchronize related properties
- predict_dissimilarity:
calculate anomaly scores of new observations, via
datakeyword assignment; will return a numeric sequence
- predict:
determine whether anomalous or not for new observations, via
datakeyword assignment; will return a boolean sequence
- Examples:
from info.me import anomaly as ano from scipy.stats import multinomial import numpy as np p = np.array([0.03, 0.06, 0.1, 0.34, 0.22, 0.11, 0.08, 0.06]) obs = np.array([multinomial.rvs(50, p) for _ in range(100)]) model = ano.VonMisesFisher(data=obs) model.predict(data=np.vstack([obs, np.array([multinomial.rvs(50, np.roll(p, 4)) for _ in range(100)])]))
- Notes:
From the supplementary deduction it is known the degree of anomaly in the context of Von Mises Fisher distribution is in consistent with a certain \(\chi^2(a | m, s)\).
Using the substitution \(\Gamma(m/2) = (2/m) \Gamma((m/2)+1) = (2/(m+2)) (2/m) \Gamma((m/2)+2)\), the moment estimation for solving \(E[a]\) and \(E[a^2]\) can be obtained through:
(3.15)¶\[\begin{split}E[a] &= \int_0^{\infty} da \cdot a \cdot \chi^2 (a | m, s) \\ &= \int_0^{\infty} da \cdot a \cdot \frac{1}{2s\Gamma(\frac{m}{2})} (\frac{a}{2s})^{\frac{m}{2}-1} \exp(-\frac{a}{2s}) \\ &= ms \cdot \int_0^{\infty} da \cdot (\frac{a}{2s}) \cdot \frac{1}{2s\Gamma(\frac{m}{2}+1)} (\frac{a}{2s})^{\frac{m}{2}-1} \exp(-\frac{a}{2s}) \\ &= ms \cdot \int_0^{\infty} da \cdot \chi^2(a | m+2, s) = ms\end{split}\](3.16)¶\[\begin{split}E[a^2] &= \int_0^{\infty} da \cdot a^2 \chi^2 (a | m, s) \\ &= \int_0^{\infty} da \cdot a^2 \cdot \frac{1}{2s\Gamma(\frac{m}{2})} (\frac{a}{2s})^{\frac{m}{2}-1} \exp(-\frac{a}{2s}) \\ &= m(m+2)s^2 \cdot \int_0^{\infty} da \cdot (\frac{a}{2s})^2 \cdot \frac{1}{2s\Gamma(\frac{m}{2}+2)} (\frac{a}{2s})^{\frac{m}{2}-1} \exp(-\frac{a}{2s}) \\ &= m(m+2)s^2 \cdot \int_0^{\infty} da \cdot \chi^2(a | m+4, s) = m(m+2)s^2\end{split}\]\(\hat{m}\) and \(\hat{s}\) represent for the estimations on \(m\) and \(s\) respectively. Simultaneously consider the Equation 3.15 and Equation 3.16, the following formula can be established:
(3.17)¶\[\hat{m} = \frac{2(E[a])^2}{E[a^2] - (E[a])^2};\ \hat{s} = \frac{E[a^2] - (E[a])^2}{E[a]}\]Compare to the form of \(\chi^2 (M-1, 0.5\kappa)\), the estimation \(\hat{s}\) is nothing else but \(0.5\kappa\), while the \(\hat{m}\) is generally no greater than \(M-1\). In the view point of informatics, the estimation \(\hat{m}\) represents to some extent the valid number of dimension that takes part in the subsequent modeling and calculations.
- Logs:
Added in version 0.0.5.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- Authors:
Chen Zhang
- Version:
0.0.5
- Created on:
Apr 23, 2024