3.1.4. Modules for analysis¶
3.1.4.1. Module hypotest¶
3.1.4.1.1. Description¶
Quantitative statistics on multi grouped data. Building proper hypothesis test and quantitative analysis requires some basic knowledge on mathematical statistics.
Hypothesis test module in informatics is mainly in the namespace of info.toolbox.libs.hypotest. For convenience
the import from mian entry (like from info.me import hypotest) is also supported.
The prefix hypoi denotes the test required independent data populations, based on which the sizes of all
population are unnecessary to be identical. hypoj for joint pairs generally required the sizes of two
samples are of the same, intrinsically paired. hypos is simulation methods using random sampling.
perform one-way ANOVA test among multi-grouped data. |
|
perform pair-wise independent T test among multi-grouped data. |
|
perform Shapiro-Wilk test on each group among multi-grouped data. |
|
perform Omnibus Normality test on each group among multi-grouped data. |
|
perform Kolmogorov-Smirnov test among multi-grouped data. |
|
perform Cramér-von Mises test among multi-grouped data. |
|
perform Alexander Govern test among multi-grouped data. |
|
perform Tukey's range test among multi-grouped data. |
|
perform Kruskal-Wallis H-test among multi-grouped data. |
|
perform Mood's median and scale test among multi-grouped data. |
|
perform Bartlett's test among multi-grouped data. |
|
perform Levene test among multi-grouped data. |
|
perform Fligner-Killeen test among multi-grouped data. |
|
perform Anderson-Darling test among multi-grouped data. |
|
perform rank sum test among multi-grouped data. |
|
perform Epps-Singleton test on each possible pairs among multi-grouped data. |
|
perform Mann–Whitney U test on each possible pairs among multi-grouped data. |
|
perform Brunner-Munzel test on each possible pairs among multi-grouped data. |
|
perform Ansari-Bradley test on each possible pairs among multi-grouped data. |
|
perform skew test on each group among multi-grouped data. |
|
perform kurtosis test on each group among multi-grouped data. |
|
perform Jarque-Bera test on each group among multi-grouped data. |
|
perform Cressie-Read power divergence test on each group among multi-grouped data. |
|
perform Chi-Squared test on each group among multi-grouped data. |
|
compute Pearson correlation coefficient on each possible pairs among multi-grouped data. |
|
compute Spearman correlation coefficient on each possible pairs among multi-grouped data. |
|
compute Kendall's tau correlation coefficient on each possible pairs among multi-grouped data. |
|
perform pair-wise related T test among multi-grouped data. |
|
perform single-rank test among multi-grouped data. |
|
perform Friedman test among multi-grouped data. |
|
perform Multiscale Graph Correlation test on each possible pairs among multi high-dimensional data. |
|
perform Monte Carlo hypothesis test on each group among multi-grouped data. |
|
perform Permutation test on each possible permutation of groups among multi-grouped data. |
3.1.4.1.2. Docstrings¶
- hypoi_f¶
perform one-way ANOVA test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
- Returns:
F statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_f(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_t¶
perform pair-wise independent T test among multi-grouped data. statistic uses Equation 4.6.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
equal_var (bool) – tigger to determine whether groups under comparison are of the identical variance;
Falseas defaulttrim (float) – fraction to trim data from two-tails (Yuen’s T test); valid value ranges from 0.0 to 0.5; 0.0 as default
permutations (Optional[int]) – \(\mathbb{N}\), number of permutations used for calculating numerical solution for \(p\)-value; 0 or
Nonefor analytical solution using t distribution without permutations;Noneas defaultrandom_state (Optional[int]) – random state in Monte Carlo; effective when
permutationsis activated;Noneas defaultnan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'propagate'as defaultalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Returns:
t statistics and \(p\)-values on pair-wised groups
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_t(data=data)
- See also:
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_sw¶
perform Shapiro-Wilk test on each group among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
- Returns:
shapiro statistic and \(p\)-value via Monte Carlo simulation
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_sw(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_normality¶
perform Omnibus Normality test on each group among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'omit'as default
- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_normality(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_ks¶
perform Kolmogorov-Smirnov test among multi-grouped data. calculating on each group, as well as on pair-wised groups.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
dist (Union[dist, list[dist]]) – distribution pre-defined as criterion (or criteria);
rv_frozenobject, or list of those objects inscipy; the standard uni-variate gaussianscipy.stats.norm(loc=0, scale=1)as defaultalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as defaultmethod (Literal['exact', 'asymp', 'auto']) – the method to calculate \(p\)-value;
'exact'uses exact distribution of distribution(s);'asymp'uses asymptotic distribution(s);'auto'uses one of the above options;'auto'as defaultn_sample (int) – number of samples generated from pre-defined distribution(s); 20 as default
- Returns:
statistic and \(p\)-value on each group, and pair-wised groups
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_ks(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_cvm¶
perform Cramér-von Mises test among multi-grouped data. calculating on each group, as well as on pair-wised groups.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
dist (Union[dist, list[dist]]) – distribution pre-defined as criterion (or criteria);
rv_frozenobject, or list of those objects inscipy; the standard uni-variate gaussianscipy.stats.norm(loc=0, scale=1)as defaultmethod (Literal['exact', 'asymp', 'auto']) – the method to calculate \(p\)-value;
'exact'uses exact distribution of distribution(s);'asymp'uses asymptotic distribution(s);'auto'uses one of the above options;'auto'as default
- Returns:
statistic and \(p\)-value on each group, and pair-wised groups
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_cvm(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_ag¶
perform Alexander Govern test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'propagate'as default
- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_ag(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_thsd¶
perform Tukey’s range test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
- Variables:
~full_return (bool) – if
True, low and high of confidence interval will be returned as extra information as well;Falseas default- Returns:
statistic and \(p\)-value on pair-wised groups
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_thsd(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_kw¶
perform Kruskal-Wallis H-test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'propagate'as default
- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_kw(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_mood¶
perform Mood’s median and scale test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
ties (Literal['below', 'above', 'ignore']) – determines how values equal to the grand median are classified;
'below'and'above'counts for below and above respectively;'ignore'will not count;'below'as defaultpower_lambda (float) – number used for power divergence; 1.0 as default for Pearson’s chi-squared statistic
nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'propagate'as defaultalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Variables:
~full_return (bool) – if
True, median and contingency table will be returned as extra information from median test as well;Falseas default- Returns:
statistics and \(p\)-values for median and scale tests
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_mood(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_bartlett¶
perform Bartlett’s test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_bartlett(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_levene¶
perform Levene test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
center (Literal['mean', 'median', 'trimmed']) – the referenced center to determine absolute distance for each observation;
'mean'uses mean;'median'uses median;'trimmed'uses the mean calculate from trimmed data;'median'as defaultproportiontocut (float) – fraction from leftmost and rightmost to be trimmed; effective when
centeris'trimmed'; valid value ranges from 0.0 to 0.5; 0.05 as default
- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_levene(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_fk¶
perform Fligner-Killeen test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
center (Literal['mean', 'median', 'trimmed']) – the referenced center to determine absolute distance for each observation;
'mean'uses mean;'median'uses median;'trimmed'uses the mean calculate from trimmed data;'median'as defaultproportiontocut (float) – fraction from leftmost and rightmost to be trimmed; effective when
centeris'trimmed'; valid value ranges from 0.0 to 0.5; 0.05 as default
- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_fk(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_ad¶
perform Anderson-Darling test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
midrank (bool) – type of Anderson-Darling test;
Truefor to continuous and discrete distributions;Falsefor right side empirical distribution;Trueas default
- Variables:
~full_return (bool) – if
True, critical values in different significance levels will be returned as extra information;Falseas default- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_ad(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_rank¶
perform rank sum test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_rank(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_es¶
perform Epps-Singleton test on each possible pairs among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
es_t (tuple[float, float]) – where the characteristic function to be evaluated;
(0.4, 0.8)as default
- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_es(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_u¶
perform Mann–Whitney U test on each possible pairs among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
method (Literal['asymptotic', 'exact', 'auto']) – the method to calculate \(p\)-value;
'exact'uses exact distribution of distribution(s);'asymptotic'uses approximate distribution(s);'auto'uses one of the above options;'auto'as default to choose'exact'when one of the samples is no greater than 8 and no ties, otherwise'asymptotic'alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as defaultu_continuity (bool) – whether apply continuity correction; effective when
'method'is'asymptotic'; default isTrue
- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_u(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_bm¶
perform Brunner-Munzel test on each possible pairs among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'propagate'as defaultalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as defaultbm_dis (Literal['t', 'normal']) – determine \(p\)-value calculated from t or normal distribution;
"t"as default
- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_bm(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_ab¶
perform Ansari-Bradley test on each possible pairs among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_ab(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_skew¶
perform skew test on each group among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'propagate'as defaultalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_skew(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_kurtosis¶
perform kurtosis test on each group among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'propagate'as defaultalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_kurtosis(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_jb¶
perform Jarque-Bera test on each group among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_jb(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_pd¶
perform Cressie-Read power divergence test on each group among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
f_exp (Iterable[int]) – expected frequencies of all categories;
Noneas default for all equal for all categoriesddf (int) – number to be subtracted from degree of freedom;
0as default uses degree of freedom \(k-1\) where \(k\) is the number of all observationspd_lambda (Numeric) – real-value to determine the power of statistic; 1 as default for Pearson version
- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_pd(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoi_chi2¶
perform Chi-Squared test on each group among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
f_exp (Iterable[int]) – expected frequencies of all categories;
Noneas default for all equal for all categories
- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoi_chi2(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoj_pearson¶
compute Pearson correlation coefficient on each possible pairs among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoj_pearson(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoj_spearman¶
compute Spearman correlation coefficient on each possible pairs among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'propagate'as defaultalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoj_spearman(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoj_kendall¶
compute Kendall’s tau correlation coefficient on each possible pairs among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'propagate'as defaultalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as defaultmethod (Literal['asymptotic', 'exact', 'auto']) – the method to calculate \(p\)-value;
'exact'uses exact distribution of distribution(s);'approx'uses double probability of single-tailed to approximate that of two-tailed;'asymp'uses asymptotic distribution(s);'auto'uses one of the above options;'auto'as defaultkendall_tau (Literal['b', 'c', 'w']) – determine type of :math:` au` to be calculated;
'b'uses Kendall :math:` au`;'c'uses Stuart’s :math:` au`;'w'will activate weighted :math:` au`.rank (bool) – whether using decreasing lexicographical rank; if
False, index of element will be processed as rank; effective when weighted :math:` au` is activated;Trueas defaultweigher (Optional[Callable]) – trigger to determine whether using weight when computing rank \(r\); acceptable mapping must be able to convert positive integer into weight (e.g. \(f(r) = (1+r)^{-1}\));
Noneas default to use no weightadditive (bool) – determine how weight be calculated on statistic; if
True, weight will be processed as item to be added; Otherwise the item to be multiplied; effective when weighted :math:` au` is activated;Trueas default;
- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoj_kendall(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoj_t¶
perform pair-wise related T test among multi-grouped data. statistic uses Equation 4.9.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data;
'propagate'returnnan;'raise'will throw exception;'omit'will ignore null values;'propagate'as defaultalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Variables:
~full_return (bool) – if
True, degree of freedom will be returned as extra information as well;Falseas default- Returns:
t statistics and \(p\)-values on pair-wised groups
- Return type:
dict
- Examples:
- See also:
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoj_rank¶
perform single-rank test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
zero_method (Literal['wilcox', 'pratt', 'zsplit']) – method for counting the pairs with equal value;
'wilcox'ignore that cases;'pratt'only include that cases in ranking process;'zsplit'include that cases in ranking process and split half-half to positive and negative counts;'wilcox'as defaultcorrection (bool) – Whether apply continuity correction to adjust rank statistic if normal approximation used;
Falseas defaultalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as defaultmethod (Literal['exact', 'approx', 'auto']) – the method to calculate \(p\)-value;
'exact'uses exact distribution of distribution(s);'approx'uses approximate distribution(s);'auto'uses one of the above options;'auto'as default
- Variables:
~full_return (bool) – if
True, the \(Z\) statistic will be returned;Falseas default- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoj_rank(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoj_friedman¶
perform Friedman test among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypoj_friedman(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypoj_mgc¶
perform Multiscale Graph Correlation test on each possible pairs among multi high-dimensional data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
distance_criteria (Callable) – criterion to measure the distance of two element when calculating distance matrix;
lambda x, y: np.linalg.norm(x-y, ord=2, axis=0)as default to calculate the Euclidean distancen_resamples (int) – number of resampled permutations to calculate \(p\)-value; 1000 as default
random_state (Optional[int]) – random state to control random sample generation;
Noneas default
- Variables:
~full_return (bool) – if
True, scale map, optimal scales, and random points for null distribution will be returned as extra information as well;Falseas default- Returns:
statistic and \(p\)-value
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random((10, 5)) for _ in range(3)} res = ht.hypoj_mgc(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypos_mc¶
perform Monte Carlo hypothesis test on each group among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
dist (Union[dist, list[dist]]) – distribution(s) predefined; standard uni-variate gaussian
norm(loc=0, scale=1)as defaultn_resamples (int) – number of resampled datapoints generated from predefined distribution(s); 9999 as default
agg_statistics (dict[str, Callable]) – dict composed of name and aggregation function mapping to calculate statistic;
{'mean': lambda x: numpy.mean(x)}as defaultbatch (Optional[int]) – number of samples used for each call of values in
agg_statistics;Noneas default which equals then_resamplesalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Variables:
~full_return (bool) – if
True, random points for null distribution will be returned as extra information as well;Falseas default- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypos_mc(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- hypos_permu¶
perform Permutation test on each possible permutation of groups among multi-grouped data.
- Arguments:
- Parameters:
data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values
permu_type (Literal['independent', 'samples', 'pairings']) – permutation type;
'samples'and'pairings'requires all data to be compared have the same size;'independent'assume all input data are of independent;'independent'as defaultn_resamples (int) – number of resampled datapoints generated from predefined distribution(s); 9999 as default
binding_groups (int) – number of groups for each call of test; acceptable value is integer equal or greater than 2; 2 as default
agg_statistics (dict[str, Callable]) – dict composed of name and aggregation mapping to calculate statistic;
{'std_of_mean': lambda *x: np.std([np.mean(_) for _ in x])}as defaultbatch (Optional[int]) – number of samples used for each call of values in
agg_statistics;Noneas default which equals then_resamplesalternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\);
'two-sided'as default
- Variables:
~full_return (bool) – if
True, random points for null distribution will be returned as extra information as well;Falseas default- Returns:
statistic and \(p\)-value on each group
- Return type:
dict
- Examples:
from info.me import hypotest as ht import numpy as np data = {f"group{_+1}": np.random.random(20) for _ in range(3)} res = ht.hypos_permu(data=data)
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
3.1.4.2. Module factors¶
3.1.4.2.1. Description¶
The module factors will support for scientific experiment design, data exploration, and etc. It is a powerful tool for data exploration, allowing researchers to extract meaningful patterns and relationships from complex datasets. Refer supplementary for its scientific background.
Similarly, the import through entry through info.me is available.
priori scoring implementation for multi factors analysis. |
3.1.4.2.2. Docstrings¶
- priori_scoring¶
priori scoring implementation for multi factors analysis.
- Arguments:
- Parameters:
data (DataFrame) – table with multi factors as indexing, whose columns are un-ranked
constructor (dict[str, list[str]]) – constructor to parse the factors and corresponding levels in indexing of data; dict used factor names as keywords, and list composed of level names as the corresponding value
response_dimensions (list[str]) – list composed of factors that sensitive to affect the final numeric; the factor selection should follow the common sense, or expertise in that field
inertia_dimensions (Optional[list[str]]) – list composed of factors that no sensitive to affect the final numeric;
Noneas default will automatically the unselected factors based onconstructorandresponse_dimensionsmeasure (Optional[Callable]) – the callback aggregation function to map the rearranged pseudo-tensor to a scalar;
Noneas default to use normality combined with ANOVA to measure how extent the data departure from the priori hypothesisempty_value (Optional[Any]) – value to fill the un-existed factor combinations; the
measurefunction should be capable to deal with this value if use customized method;numpy.nanas defaultscore_output (Optional[bool]) – whether export the final scores for all column names;
Falseas default
- Returns:
a dict composed of importance level, and column names (and corresponding scores) in that level
- Return type:
dict[str, ndarray]
- Examples:
from info.me import priori_scoring from itertools import product import numpy as np import pandas as pd cons = { 'A': ['a1', 'a2'], 'B': ['b1', 'b2', 'b3'], 'C': ['c1', 'c2'] } index = np.repeat(['-'.join(_) for _ in product(*[v for k, v in cons.items()])], 10) where_c1 = np.array(['c1' in _ for _ in index]) columns = np.array([f"group_{_+1}" for _ in range(20)]) _values = np.random.random((len(index), len(columns))) values = np.array([vec * 10.8 if c1 else vec * 0.3 for c1, vec in zip(where_c1, _values)]) df = pd.DataFrame(values, index=index, columns=columns) # group_1 group_2 group_3 ... group_18 group_19 group_20 # a1-b1-c1 8.330263 0.224121 6.843401 ... 3.152262 9.911961 7.717418 # a1-b1-c1 5.859479 1.535437 4.032080 ... 8.949758 0.506480 6.763901 # ... ... ... ... ... ... ... ... # a2-b3-c2 0.205181 0.175918 0.293796 ... 0.020738 0.017385 0.094473 # a2-b3-c2 0.162649 0.077234 0.133392 ... 0.122661 0.200381 0.172522 res = priori_scoring(data=df, constructor=cons, response_dimensions=['C'], score_output=True) # {'importance_level_0': array([['group_11', 10.583273152581747]]), # most discriminative # 'importance_level_1': array([['group_1', 5.543840398683746], # ['group_2', 6.006970191046672], # ['group_3', 4.691317734172809], # ...}
- Logs:
Added in version 0.0.3.
– Created by Chen Zhang; Last updated on 01:34, 2025-09-06
- Authors:
Chen Zhang
- Version:
0.0.5
- Created on:
Jun 30, 2023