Statistical functions#

Hypothesis testing#

sdt.stats.permutation_test(data1, data2, statistic=<function mean>, data_column=None, **kwargs)[source]#

Permutation test for two independent samples

Test the null hypothesis that two samples are not distinguishable by statistic. The null distribution is created from the difference of statistic applied to (permuted) datasets.

Parameters:

data1 (DataFrame | ndarray) – Datasets
data2 (DataFrame | ndarray) – Datasets
statistic (Callable[[ndarray], float]) – Which statistic (e.g., mean, median, …) to compare for data1 and data2
data_column (Optional) – Use this column if data1 or data2 are pandas.DataFrame
**kwargs – Passed to scipy.stats.permutation_test()

Returns:

Test result containing observed difference of statistic (i.e., statistic(data1) - statistic(data2)), p-value, and null distribution generated from permuting datasets.

Return type:

PermutationTestResult

sdt.stats.grouped_permutation_test(data1, data2, statistic=<function mean>, data_column=None, group_column=None, **kwargs)[source]#

Grouped permutation test for two partly correlated samples

Test the null hypothesis that two samples are not distinguishable by statistic. The null distribution is created from the difference of statistic applied to (permuted) datasets. Groups of datapoints are left in order. Thus datapoints within groups may be correlated (such as single-molecule trajectories); see [Schn2022].

Parameters:

data1 (DataFrame | Iterable[ndarray] | SeriesGroupBy) – Grouped datasets. Either a column of a pandas DataFrame GroupBy, e.g. pandas.DataFrame(…).groupby("particle")["some_value"] or an iterable of arrays where each array represents a correlated block of data.
data2 (DataFrame | Iterable[ndarray] | SeriesGroupBy) – Grouped datasets. Either a column of a pandas DataFrame GroupBy, e.g. pandas.DataFrame(…).groupby("particle")["some_value"] or an iterable of arrays where each array represents a correlated block of data.
statistic (Callable[[ndarray], float]) – Which statistic (e.g., mean, median, …) to compare for data1 and data2
data_column (Optional) – Use this column if data1 or data2 are pandas.DataFrames
group_column (Optional) – Use this column to determine groups if data1 or data2 are pandas.DataFrame
**kwargs – Passed to scipy.stats.permutation_test(). Note that this cannot be vectorized.

Returns:

Test result containing observed difference of statistic (i.e., statistic(data1) - statistic(data2)), p-value, and null distribution generated from permuting datasets.

Return type:

PermutationTestResult

Probability density function estimation#

sdt.stats.avg_shifted_hist(x, nbins, nshifts, limits=None, density=False)[source]#

Average shifted histogram

Parameters:

x (Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) – values for which to generate average shifted histogram
nbins (int | Literal['terrell-scott', 'rice', 'sqrt', 'sturges', 'scott']) – either literal number of histogram bins or name of the method to determine the number (see https://en.wikipedia.org/wiki/Histogram)
nshifts (int) – number of shifts
limits (Tuple[float, float] | None) – histogram x axis range. If None, use min and max.
density (bool) – y axis scale. If True, generate probability density, if False use the number of events

Return type:

y axis value for each bin and bin edges

References

[Schn2022]

Schneider, M. C. & Schütz, G. J.: “Don’t Be Fooled by Randomness: Valid p-Values for Single Molecule Microscopy”, Frontiers in Bioinformatics, 2022, 2, 811053

Statistical functions

Contents

Statistical functions#

Hypothesis testing#

Probability density function estimation#