Statistical functions#

Hypothesis testing#

sdt.stats.permutation_test(data1, data2, statistic=<function mean>, data_column=None, **kwargs)[source]#

Permutation test for two independent samples

Test the null hypothesis that two samples are not distinguishable by statistic. The null distribution is created from the difference of statistic applied to (permuted) datasets.

Parameters:
  • data1 (DataFrame | ndarray) – Datasets

  • data2 (DataFrame | ndarray) – Datasets

  • statistic (Callable[[ndarray], float]) – Which statistic (e.g., mean, median, …) to compare for data1 and data2

  • data_column (Optional) – Use this column if data1 or data2 are pandas.DataFrame

  • **kwargs – Passed to scipy.stats.permutation_test()

Returns:

Test result containing observed difference of statistic (i.e., statistic(data1) - statistic(data2)), p-value, and null distribution generated from permuting datasets.

Return type:

PermutationTestResult

sdt.stats.grouped_permutation_test(data1, data2, statistic=<function mean>, data_column=None, group_column=None, **kwargs)[source]#

Grouped permutation test for two partly correlated samples

Test the null hypothesis that two samples are not distinguishable by statistic. The null distribution is created from the difference of statistic applied to (permuted) datasets. Groups of datapoints are left in order. Thus datapoints within groups may be correlated (such as single-molecule trajectories); see [Schn2022].

Parameters:
  • data1 (DataFrame | Iterable[ndarray] | SeriesGroupBy) – Grouped datasets. Either a column of a pandas DataFrame GroupBy, e.g. pandas.DataFrame(…).groupby("particle")["some_value"] or an iterable of arrays where each array represents a correlated block of data.

  • data2 (DataFrame | Iterable[ndarray] | SeriesGroupBy) – Grouped datasets. Either a column of a pandas DataFrame GroupBy, e.g. pandas.DataFrame(…).groupby("particle")["some_value"] or an iterable of arrays where each array represents a correlated block of data.

  • statistic (Callable[[ndarray], float]) – Which statistic (e.g., mean, median, …) to compare for data1 and data2

  • data_column (Optional) – Use this column if data1 or data2 are pandas.DataFrames

  • group_column (Optional) – Use this column to determine groups if data1 or data2 are pandas.DataFrame

  • **kwargs – Passed to scipy.stats.permutation_test(). Note that this cannot be vectorized.

Returns:

Test result containing observed difference of statistic (i.e., statistic(data1) - statistic(data2)), p-value, and null distribution generated from permuting datasets.

Return type:

PermutationTestResult

Probability density function estimation#

sdt.stats.avg_shifted_hist(x, nbins, nshifts, limits=None, density=False)[source]#

Average shifted histogram

Parameters:
  • x (Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) – values for which to generate average shifted histogram

  • nbins (int | Literal['terrell-scott', 'rice', 'sqrt', 'sturges', 'scott']) – either literal number of histogram bins or name of the method to determine the number (see https://en.wikipedia.org/wiki/Histogram)

  • nshifts (int) – number of shifts

  • limits (Tuple[float, float] | None) – histogram x axis range. If None, use min and max.

  • density (bool) – y axis scale. If True, generate probability density, if False use the number of events

Return type:

y axis value for each bin and bin edges

References

[Schn2022]

Schneider, M. C. & Schütz, G. J.: “Don’t Be Fooled by Randomness: Valid p-Values for Single Molecule Microscopy”, Frontiers in Bioinformatics, 2022, 2, 811053