Diffusion analysis#

The sdt.motion module provides tools for analyzing single molecule tracking data.

  • Calculate mean square displacements (MSDs). This can be done directly (Msd) or by fitting a multi-exponential model to the distribution of square displacements (MsdDist). The latter allows for identification of sub-components with different diffusion constants.

  • Fit diffusion models to the MSD data to get quantitative results using Msd.fit() or MsdDist.fit().

  • Plot results.

  • Find immobilized parts of trajectories using find_immobilizations().

Examples

First, load some tracking data

>>> trc = sdt.io.load("tracks.h5")

Calculate the esemble (i.e., pool all data from all tracks) MSD. Use 1000 runs of bootstrapping to calculate error bars.

>>> m = Msd(trc, frame_rate=10, n_boot=1000, pixel_size=0.16, n_lag=5)
>>> msd, err = m.get_msd()
>>> msd
lagt
0.1    0.028557
0.2    0.054788
0.3    0.083447
0.4    0.113715
0.5    0.142936
Name: ensemble, dtype: float64

Fit Brownian motion to the MSD data:

>>> bm = m.fit("brownian")
>>> fit, fit_err = bm.get_results()
>>> fit
D      0.065576
eps    0.020269
Name: ensemble, dtype: float64

An MSD plot can be created using

>>> bm.plot()

To calculate MSDs and fit results for each particle individually, set ensemble=False:

>>> m2 = Msd(trc, frame_rate=10, ensemble=False, pixel_size=0.16, n_lag=5)
>>> msd2, err2 = m2.get_msd()
>>> msd2.head()
lagt           0.1       0.2       0.3       0.4       0.5
particle
0         0.034462  0.069578  0.096958  0.112310  0.131722
2         0.025913  0.011190  0.020903  0.017632  0.019309
3         0.017579  0.035281  0.067338  0.102785  0.151069
13        0.024869  0.047189  0.069974  0.103514  0.128603
14        0.031036  0.061392  0.093781  0.091846  0.092186
>>> fit2, fit_err2 = m2.fit("brownian").get_results()
>>> fit2
parameter         D       eps
particle
0          0.087792 -0.003802
2         -0.036809  0.098699
3          0.044256 -0.001375
13         0.055800  0.009597
14         0.075891  0.002834

Similarly, multiple diffusing components can be identified by analysing the MSD distributions.

>>> m3 = MsdDist(trc, frame_rate=10, n_components=2, n_lag=5)

Now, one gets the MSDs and errors for each component:

>>> c1, c2 = m3.get_msd()
>>> c1.msd
lagt
0.1    0.084859
0.2    1.245575
0.3    0.093846
0.4    0.649084
0.5    1.220108
Name: ensemble, dtype: float64
>>> c2.msd
0.1    1.161134
0.2    2.474728
0.3    3.216000
0.4    4.600476
0.5    5.857745
Name: ensemble, dtype: float64

Fitting of a model works similarly to Msd, but yields results for each component:

>>> bm3 = m3.fit("brownian", n_lag=np.inf)
>>> r1, r2 = bm3.get_results()
>>> r1.fit
D         0.418502
eps       0.197795
weight    0.083748
Name: ensemble, dtype: float64
>>> r2.fit
D         2.879742
eps       0.039767
weight    0.916252
Name: ensemble, dtype: float64

This can also be plotted:

>>> bm3.plot()

Analysis can be carried out for each track individually by setting ensemble=False.

Immobilized parts of particle trajectories can be found using find_immobilizations(). It adds a column named “immob” to the DataFrame that identify mobile and immobile parts of trajectories. Where “immob” >= 0, the particle is immobile and where “immob” < 0, it is mobile.

>>> find_immobilizations(trc, 1, 10)
>>> trc.head()
           x          y       mass  frame  particle  immob
0  13.276002  58.640033  1711.1778    1.0       0.0      0
1  13.058399  58.870968  1742.6831    2.0       0.0      0
2  12.624088  58.132507  2527.7880    3.0       0.0      0
3  13.651173  57.980727  2461.9288    5.0       0.0      0
4  13.482774  58.122870  2508.9154    6.0       0.0      0

MSD calculation#

class sdt.motion.Msd(data, frame_rate, n_lag=20, n_boot=100, ensemble=True, e_name='ensemble', random_state=None, pixel_size=1, columns={})[source]#

Calculate and analyze mean square displacements (MSDs)

from single moleclue tracking data.

Parameters:
  • data (pandas.DataFrame or iterable of pandas.DataFrame) – Tracking data. Either a single DataFrame or a collection of DataFrames.

  • frame_rate (float) – Frame rate

  • n_lag (int or inf, optional) – Number of lag times (time steps) to consider at most. Defaults to 100.

  • n_boot (int, optional) – Number of bootstrapping iterations for calculation of errors. Set to 0 to turn off bootstrapping for performance gain, in which case there will be no errors on the fit results. Defaults to 100.

  • ensemble (bool, optional) – Whether to calculate the MSDs for the whole data set or for each trajectory individually. Defaults to True.

  • pixel_size (float, optional) – Pixel size; multiply coordinates by this factor. Defaults to 1 (no scaling).

  • e_name (str, optional) – If the ensemble parameter is True, use this as the name for the dataset. It shows up in the MSD DataFrame (see get_msd()) and in plots. Defaults to “ensemble”.

  • random_state (numpy.random.RandomState or None, optional) – numpy.random.RandomState instance to use for random sampling for bootstrapping. If None, use create a new instance with seed=None. Defaults to None.

  • columns (dict, optional) – Override default column names as defined in config.columns. Relevant names are coords, particle, and time. This means, if your DataFrame has coordinate columns “x”, “y”, and “z” and the time column “alt_frame”, set columns={"coords": ["x", "y", "z"], "time": "alt_frame"}.

class Result(msd, msd_err)#

Create new instance of Result(msd, msd_err)

msd#

Alias for field number 0

msd_err#

Alias for field number 1

get_msd(series=True)[source]#

Get MSD and error DataFrames

The columns contain data for different lag times. Each row corresponds to one trajectory. The row index is either the particle number if a single DataFrame was passed to __init__() or a tuple identifying both the DataFrame and the particle. If __init__() was called with ensemble=True, there will only be one row with index e_name.

The series parameter controls whether to return a pandas.DataFrame or a pandas.Series if there is only one row.

Parameters:

series (bool, optional) – If True and and there is only one entry (e.g., due to ensemble=True in the constructor), pandas.Series objects will be returned. Otherwise, return pandas.DataFrame. Defaults to True.

Returns:

  • msd (pandas.DataFrame or pandas.Series) – Mean square displacements

  • msd_err (pandas.DataFrame or pandas.Series) – Standard errors of the mean square displacements. If bootstrapping was used, these are the standard deviations of the MSD results from bootstrapping. Otherwise, these are caleculated as the standard deviation of square displacements divided by the number of samples.

fit(model='brownian', *args, **kwargs)[source]#

Fit a model function to the MSD data

Parameters:
  • model ({"anomalous", "brownian"}, optional) – Type of model to fit. Defaults to “brownian”.

  • n_lag (int or inf, optional) – Number of lag times to use for fitting. Defaults to 2 for the Brownian model and inf for anomalous diffusion.

  • exposure_time (float, optional) – Exposure time. Defaults to 0, i.e. no exposure time compensation

  • initial (tuple of float, len 3, optional) – Initial guess for fitting anomalous diffusion. The tuple entries are diffusion coefficient, positional accuracy, and alpha.

class sdt.motion.MsdDist(data, frame_rate, n_components=2, n_lag=10, n_boot=0, ensemble=True, fit_method='lsq', poly_order=30, e_name='ensemble', random_state=None, pixel_size=1, assign_method='msd', columns={})[source]#

Calculate and analyze mean square displacement (MSD) distribution

from single moleclue tracking data. This differs from sdt.motion.Msd by the fact that the distributions are analyzed, which allows for detection of multiple components with different diffusion coefficients [Schu1997].

Only valid for 2D data.

Parameters:
  • data (pandas.DataFrame or iterable of pandas.DataFrame) – Tracking data. Either a single DataFrame or a collection of DataFrames.

  • frame_rate (float) – Frame rate

  • n_components (int, optional) – Number of diffusing components. Defaults to 2.

  • n_lag (int or inf, optional) – Number of lag times (time steps) to consider at most. Defaults to 10.

  • n_boot (int, optional) – Number of bootstrapping iterations for calculation of errors. Set to 0 to turn off bootstrapping for performance gain, in which case there will be no errors on the fit results. Defaults to 0.

  • ensemble (bool, optional) – Whether to calculate the MSDs for the whole data set or for each trajectory individually. Defaults to True.

  • fit_method ({"prony", "lsq", "weighted-lsq"}, optional) – Method to use for cumulative distribution function (CDF) fitting. “prony” is a modified Prony’s method, “lsq” is least squares fitting, and “weighted-lsq” is weighted least squares fitting to account for the fact that the CDF data are concentrated at x=0. Defaults to “lsq”.

  • pixel_size (float, optional) – Pixel size; multiply coordinates by this factor. Defaults to 1 (no scaling).

  • assign_method ({"msd", "weight"}, optional) – If “msd”, components are assigned according to increasing MSDs, i.e., lowest MSD will be component 1, next will be component 2, and so on. If “weight”, components are assigned according to increasing weights. Defaults to “msd”.

  • e_name (str, optional) – If the ensemble parameter is True, use this as the name for the dataset. It shows up in the MSD DataFrame (see get_msd()) and in plots. Defaults to “ensemble”.

  • random_state (numpy.random.RandomState or None, optional) – numpy.random.RandomState instance to use for random sampling for bootstrapping. If None, use create a new instance with seed=None. Defaults to None.

  • columns (dict, optional) – Override default column names as defined in config.columns. Relevant names are coords, particle, and time. This means, if your DataFrame has coordinate columns “x” and “z” and the time column “alt_frame”, set columns={"coords": ["x", "z"], "time": "alt_frame"}.

class Result(msd, msd_err, weight, weight_err)#

Create new instance of Result(msd, msd_err, weight, weight_err)

msd#

Alias for field number 0

msd_err#

Alias for field number 1

weight#

Alias for field number 2

weight_err#

Alias for field number 3

get_msd(series=True)[source]#

Get MSD and error DataFrames

The columns contain data for different lag times. Each row corresponds to one trajectory. The row index is either the particle number if a single DataFrame was passed to __init__() or a tuple identifying both the DataFrame and the particle. If __init__() was called with ensemble=True, there will only be one row with index e_name.

If there is only one row, the series parameter controls whether to return a pandas.DataFrame or a pandas.Series.

Parameters:

series (bool, optional) – If True and and there is only one entry (e.g., due to ensemble=True in the constructor), pandas.Series objects will be returned. Otherwise, return pandas.DataFrame. Defaults to True.

Returns:

Each tuple describes one component. The tuple members are pandas.DataFrame or pandas.Series (depending on the series parameter), where msd holds the mean square displacements, msd_err contains standard errors of the MSDs, weight hold the relative weight of the component calculated for the different lag times, and weight_err contains standard errors of the weights. If no bootstrapping was performed, all errors are NaN.

Return type:

list of namedTuple([“msd”, “msd_err”, “weight”, “weight_err”])

fit(model='brownian', *args, **kwargs)[source]#

Fit a model function to the MSD data

Parameters:
  • model ({"anomalous", "brownian"}) – Type of model to fit

  • n_lag (int or inf, optional) – Number of lag times to use for fitting. Defaults to 2 for the Brownian model and inf for anomalous diffusion.

  • exposure_time (float, optional) – Exposure time. Defaults to 0, i.e. no exposure time compensation

  • initial (tuple of float, len 3, optional) – Initial guess for fitting anomalous diffusion. The tuple entries are diffusion coefficient, positional accuracy, and alpha.

Diffusion models for fitting#

class sdt.motion.AnomalousDiffusion(msd_data, n_lag=inf, exposure_time=0.0, initial=(0.5, 0.05, 1.0))[source]#

Fit anomalous diffusion parameters to MSD values

Fit a function \(msd(t_\text{lag}) = 4 D t_\text{lag}^\alpha + 4 \epsilon^2\) to the tlag-vs.-MSD graph, where \(D\) is the diffusion coefficient, \(\epsilon\) is the positional accuracy (uncertainty), and \(\alpha\) the anomalous diffusion exponent.

Parameters:
  • msd_data (msd_base.MsdData) – MSD data

  • n_lag (int or inf, optional) – Maximum number of lag times to use for fitting. Defaults to inf, i.e. using all.

  • exposure_time (float, optional) – Exposure time. Defaults to 0, i.e. no exposure time correction

  • initial (tuple of float, optional) – Initial guesses for the fitting for \(D\), \(\epsilon\), and \(\alpha\). Defaults to (0.5, 0.05, 1.).

static exposure_time_corr(t, alpha, exposure_time, n=100, force_numeric=False)[source]#

Correct lag times for the movement of particles during exposure

When particles move during exposure, it appears as if the lag times change according to

\[t_\text{app}^\alpha = \lim_{n\rightarrow\infty} \frac{1}{n^2} \sum_{m_1 = 0}^{n-1} \sum_{m_2 = 0}^{n-1} |t + \frac{t_\text{exp}}{n}(m_1 - m_2)|^\alpha - |\frac{t_\text{exp}}{n}(m_1 - m_2)|^\alpha.\]

For \(\alpha=1\), \(t_\text{app} = t - t_\text{exp} / 3\). For \(t_\text{exp} = 0\) or \(\alpha = 2\), \(t_\text{app} = t\). For other parameter values, the sum is computed numerically using a sufficiently large n (100 by default).

See [Goul2000] for details.

Parameters:
  • t (numpy.ndarray) – Lag times

  • alpha (float) – Anomalous diffusion exponent

  • exposure_time (float) – Exposure time

  • n (int, optional) – Number of summands for the numeric calculation. The complexity of the algorithm is O(n²). Defaults to 100.

  • force_numeric (bool, optional) – If True, do not return the analytical solutions for \(\alpha \in \{1, 2\}\) and \(t_\text{exp} = 0\), but calculate numerically. Useful for testing.

Returns:

Apparent lag times that account for diffusion during exposure

Return type:

numpy.ndarray

static theoretical(t, d, eps, alpha=1, exposure_time=0, squeeze_result=True)[source]#

Calculate theoretical MSDs for different lag times

Calculate \(msd(t_\text{lag}) = 4 D t_\text{app}^\alpha + 4 \epsilon^2\), where \(t_\text{app}\) is the apparent time lag which takes into account particle motion during exposure; see exposure_time_corr().

Parameters:
  • t (array-like or scalar) – Lag times

  • d (float) – Diffusion coefficient

  • eps (float) – Positional accuracy.

  • alpha (float, optional) – Anomalous diffusion exponent. Defaults to 1.

  • exposure_time (float, optional) – Exposure time. Defaults to 0.

  • squeeze_result (bool, optional) – If True, return the result as a scalar type or 1D array if possible. Otherwise, always return a 2D array. Defaults to True.

Returns:

Calculated theoretical MSDs

Return type:

numpy.ndarray or scalar

class Result(fit, fit_err)#

Create new instance of Result(fit, fit_err)

fit#

Alias for field number 0

fit_err#

Alias for field number 1

get_results(series=True)[source]#

Get fit results

The columns contain fitted parameters. Each row corresponds to one trajectory. The row index is either the particle number or a tuple identifying both the DataFrame and the particle; c.f. Msd.

If there is only one row, the series parameter controls whether to return a pandas.DataFrame or a pandas.Series.

Parameters:

series (bool, optional) – If True and and there is only one entry (e.g., due to ensemble=True in the Msd constructor), pandas.Series objects will be returned. Otherwise, return pandas.DataFrame. Defaults to True.

Returns:

  • fit (pandas.DataFrame or pandas.Series) – Fit results. Columns are the fit paramaters. Each row represents one particle.

  • fit_err (pandas.DataFrame or pandas.Series) – Fit results standard errors. If no bootstrapping was performed for calculation of MSDs, this is all NaNs.

plot(show_legend=True, ax=None)[source]#

Plot lag time vs. MSD with fitted theoretical curve

Parameters:
  • show_legend (bool, optional) – Whether to add a legend to the plot. Defaults to True.

  • ax (matplotlib.axes.Axes or None, optional) – Axes to use for plotting. If None, use pyplot.gca(). Defaults to None.

Returns:

Axes used for plotting

Return type:

matplotlib.axes.Axes

class sdt.motion.BrownianMotion(msd_data, n_lag=2, exposure_time=0)[source]#

Fit Brownian motion parameters to MSD values

Fit a function \(\mathit{msd}(t_\text{lag}) = 4 D t_\text{lag} + 4 \epsilon^2\) to the tlag-vs.-MSD graph, where \(D\) is the diffusion coefficient and \(\epsilon\) is the positional accuracy (uncertainty).

Parameters:
  • msd_data (msd_base.MsdData) – MSD data

  • n_lag (int or inf, optional) – Maximum number of lag times to use for fitting. Defaults to 2.

  • exposure_time (float, optional) – Exposure time. Defaults to 0, i.e. no exposure time correction

static theoretical(t, d, eps, exposure_time=0)[source]#

Calculate theoretical MSDs for different lag times

Calculate \(msd(t_\text{lag}) = 4 D t_\text{app} + 4 \epsilon^2\), where \(t_\text{app}\) is the apparent time lag which takes into account particle motion during exposure; see exposure_time_corr().

Parameters:
  • t (array-like or scalar) – Lag times

  • d (float) – Diffusion coefficient

  • eps (float) – Positional accuracy.

  • alpha (float, optional) – Anomalous diffusion exponent. Defaults to 1.

  • exposure_time (float, optional) – Exposure time. Defaults to 0.

  • squeeze_result (bool, optional) – If True, return the result as a scalar type or 1D array if possible. Otherwise, always return a 2D array. Defaults to True.

Returns:

Calculated theoretical MSDs

Return type:

numpy.ndarray or scalar

Immobilization detection#

sdt.motion.find_immobilizations(tracks, max_dist, min_duration, label_mobile=True, longest_only=False, rtol=0.0, atol=0, engine='numba', columns={})[source]#

Find immobilizations in particle trajectories

Analyze trajectories and mark parts where all localizations of a particle stay within a circle of radius max_dist from their center of mass for at least a certain number (min_length) of frames as an immobilization.

The tracks DataFrame gets a new column (“immob”) appended (or overwritten), where every immobilzation is assigned a different number greater or equal than 0. Parts of trajectories that are not immobilized are assigned -1 (if label_mobile is False) or one negative number per mobile section starting at -2 (if label_mobile is True).

find_immobilizations_int() uses a slightly different criterion for finding immobilizations.

Parameters:
  • tracks (pandas.DataFrame) – Tracking data

  • max_dist (float) – Maximum radius within a particle may move while still being considered immobilized

  • min_duration (int) – Minimum duration the particle has to stay at the same place for a part of a trajectory to be considered immobilized. Duration is the difference of the maximum frame number and the minimum frame number. E. g. if the frames 1, 2, and 4 would lead to a duration of 3.

  • label_mobile (bool, optional) – Whether to give each mobile track section a distinct label (a negative number starting at -2). Defaults to True.

  • longest_only (bool, optional) – If True, search only for the longest immobilzation in each track to speed up the process. Defaults to False.

  • rtol (float, optional) – The fraction of localizations that may not be within max_dist of the center of mass while still recognizing the sub-track as immobile. One also has to set the a_tol parameter to something non-zero if using this since a sub-track will not be considered immobile if either r_tol or a_tol are exceeded. Defaults to 0.

  • atol (int, optional) – The number of localizations that may not be within max_dist of the center of mass while still recognizing the sub-track as immobile. One also has to set the r_tol parameter to something non-zero if using this since a sub-track will not be considered immobile if either r_tol or a_tol are exceeded. Defaults to 0.

Returns:

Although the tracks DataFrame is modified in place, it is also returned for convenience.

Return type:

pandas.DataFrame

See also

find_immobilizations_int

Uses a slightly different criterion for finding immobilizations

Parameters:
  • engine ({"numba", "python"}, optional) – If engine is “numba” and the numba package is installed, use the much faster numba-accelerated alogrithm. Otherwise, fall back to a pure python one. Defaults to “numba”

  • columns (dict, optional) – Override default column names as defined in config.columns. Relevant names are coords, time, and particle. This means, if your DataFrame has coordinate columns “x” and “z” and the time column “alt_frame”, set columns={"coords": ["x", "z"], "time": "alt_frame"}.

sdt.motion.label_mobile(tracks, engine='numba', columns={})[source]#

Give each mobile section of a track a label (unique number)

When calling find_immobilizations() with label_mobile=False, all mobile sections of tracks will have -1 in their immob column. This function assigns a unique (negative) number to each section, starting at -2.

The data DataFrame will be modified in place, but also returned.

Parameters:
  • tracks (DataFrame) – Tracking data with an immob column where all mobile sections of tracks are assigned -1.

  • engine (Literal['numba', 'python']) – If engine is “numba” and the numba package is installed, use the much faster numba-accelerated alogrithm. Otherwise, fall back to a pure python one. Defaults to “numba”

  • columns (Mapping) – Override default column names as defined in config.columns. The only relevant name is particle.

Return type:

tracks with updated "immob" column.

sdt.motion.find_immobilizations_int(tracks, max_dist, min_duration, label_mobile=True, longest_only=False, columns={})[source]#

Find immobilizations in particle trajectories

Analyze trajectories and mark parts where all localizations of a particle stay within a circle of radius max_dist for at least a certain number (min_length) of frames as an immobilization. In other words: If consecutive localizations stay within the intersection of circles of radius max_dist and coordinates of the localizations as centers, they are considered immobilized.

This is different from find_immobilizations().

The tracks DataFrame gets a new column (“immob”) appended (or overwritten), where every immobilzation is assigned a different number greater or equal than 0. Parts of trajectories that are not immobilized are assigned -1 (if label_mobile is False) or one negative number per mobile section starting at -2 (if label_mobile is True).

Parameters:
  • tracks (pandas.DataFrame) – Tracking data

  • max_dist (float) – Maximum radius within a particle may move while still being considered immobilized

  • min_duration (int) – Minimum duration the particle has to stay at the same place for a part of a trajectory to be considered immobilized. Duration is the difference of the maximum frame number and the minimum frame number. E. g. if the frames 1, 2, and 4 would lead to a duration of 3.

  • label_mobile (bool, optional) – Whether to give each mobile track section a distinct label (a negative number starting at -2). Defaults to True.

  • longest_only (bool, optional) – If True, search only for the longest immobilzation in each track to speed up the process. Defaults to False.

Returns:

Although the tracks DataFrame is modified in place, it is also returned for convenience.

Return type:

pandas.DataFrame

See also

find_immobilizations

Uses a slightly different criterion for finding immobilizations

Parameters:

columns (dict, optional) – Override default column names as defined in config.columns. Relevant names are coords, time, and particle. This means, if your DataFrame has coordinate columns “x” and “z” and the time column “alt_frame”, set columns={"coords": ["x", "z"], "time": "alt_frame"}.

References

[Goul2000]

Goulian, M. & Simon, S. M.: “Tracking Single Proteins within Cells”, Biophysical Journal, Elsevier BV, 2000, 79, 2188–2198

[Schu1997]

Schütz, G.; Schindler, H. & Schmidt, T.: “Single-molecule microscopy on model membranes reveals anomalous diffusion”, Biophysical Journal, Elsevier BV, 1997, 73, 1073–1080