Statistical functions#
Hypothesis testing#
- sdt.stats.permutation_test(data1, data2, statistic=<function mean>, data_column=None, **kwargs)[source]#
Permutation test for two independent samples
Test the null hypothesis that two samples are not distinguishable by statistic. The null distribution is created from the difference of statistic applied to (permuted) datasets.
- Parameters:
data1 (DataFrame | ndarray) – Datasets
data2 (DataFrame | ndarray) – Datasets
statistic (Callable[[ndarray], float]) – Which statistic (e.g., mean, median, …) to compare for data1 and data2
data_column (Optional) – Use this column if data1 or data2 are
pandas.DataFrame**kwargs – Passed to
scipy.stats.permutation_test()
- Returns:
Test result containing observed difference of statistic (i.e.,
statistic(data1) - statistic(data2)), p-value, and null distribution generated from permuting datasets.- Return type:
PermutationTestResult
- sdt.stats.grouped_permutation_test(data1, data2, statistic=<function mean>, data_column=None, group_column=None, **kwargs)[source]#
Grouped permutation test for two partly correlated samples
Test the null hypothesis that two samples are not distinguishable by statistic. The null distribution is created from the difference of statistic applied to (permuted) datasets. Groups of datapoints are left in order. Thus datapoints within groups may be correlated (such as single-molecule trajectories); see [Schn2022].
- Parameters:
data1 (DataFrame | Iterable[ndarray] | SeriesGroupBy) – Grouped datasets. Either a column of a pandas DataFrame GroupBy, e.g.
pandas.DataFrame(…).groupby("particle")["some_value"]or an iterable of arrays where each array represents a correlated block of data.data2 (DataFrame | Iterable[ndarray] | SeriesGroupBy) – Grouped datasets. Either a column of a pandas DataFrame GroupBy, e.g.
pandas.DataFrame(…).groupby("particle")["some_value"]or an iterable of arrays where each array represents a correlated block of data.statistic (Callable[[ndarray], float]) – Which statistic (e.g., mean, median, …) to compare for data1 and data2
data_column (Optional) – Use this column if data1 or data2 are
pandas.DataFramesgroup_column (Optional) – Use this column to determine groups if data1 or data2 are
pandas.DataFrame**kwargs – Passed to
scipy.stats.permutation_test(). Note that this cannot be vectorized.
- Returns:
Test result containing observed difference of statistic (i.e.,
statistic(data1) - statistic(data2)), p-value, and null distribution generated from permuting datasets.- Return type:
PermutationTestResult
Probability density function estimation#
- sdt.stats.avg_shifted_hist(x, nbins, nshifts, limits=None, density=False)[source]#
Average shifted histogram
- Parameters:
x (Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) – values for which to generate average shifted histogram
nbins (int | Literal['terrell-scott', 'rice', 'sqrt', 'sturges', 'scott']) – either literal number of histogram bins or name of the method to determine the number (see https://en.wikipedia.org/wiki/Histogram)
nshifts (int) – number of shifts
limits (Tuple[float, float] | None) – histogram x axis range. If None, use min and max.
density (bool) – y axis scale. If True, generate probability density, if False use the number of events
- Return type:
y axis value for each bin and bin edges
References
Schneider, M. C. & Schütz, G. J.: “Don’t Be Fooled by Randomness: Valid p-Values for Single Molecule Microscopy”, Frontiers in Bioinformatics, 2022, 2, 811053