Diffusion analysis#
The sdt.motion
module provides tools for analyzing single molecule
tracking data.
Calculate mean square displacements (MSDs). This can be done directly (
Msd
) or by fitting a multi-exponential model to the distribution of square displacements (MsdDist
). The latter allows for identification of sub-components with different diffusion constants.Fit diffusion models to the MSD data to get quantitative results using
Msd.fit()
orMsdDist.fit()
.Plot results.
Find immobilized parts of trajectories using
find_immobilizations()
.
Examples
First, load some tracking data
>>> trc = sdt.io.load("tracks.h5")
Calculate the esemble (i.e., pool all data from all tracks) MSD. Use 1000 runs of bootstrapping to calculate error bars.
>>> m = Msd(trc, frame_rate=10, n_boot=1000, pixel_size=0.16, n_lag=5)
>>> msd, err = m.get_msd()
>>> msd
lagt
0.1 0.028557
0.2 0.054788
0.3 0.083447
0.4 0.113715
0.5 0.142936
Name: ensemble, dtype: float64
Fit Brownian motion to the MSD data:
>>> bm = m.fit("brownian")
>>> fit, fit_err = bm.get_results()
>>> fit
D 0.065576
eps 0.020269
Name: ensemble, dtype: float64
An MSD plot can be created using
>>> bm.plot()
To calculate MSDs and fit results for each particle individually, set
ensemble=False
:
>>> m2 = Msd(trc, frame_rate=10, ensemble=False, pixel_size=0.16, n_lag=5)
>>> msd2, err2 = m2.get_msd()
>>> msd2.head()
lagt 0.1 0.2 0.3 0.4 0.5
particle
0 0.034462 0.069578 0.096958 0.112310 0.131722
2 0.025913 0.011190 0.020903 0.017632 0.019309
3 0.017579 0.035281 0.067338 0.102785 0.151069
13 0.024869 0.047189 0.069974 0.103514 0.128603
14 0.031036 0.061392 0.093781 0.091846 0.092186
>>> fit2, fit_err2 = m2.fit("brownian").get_results()
>>> fit2
parameter D eps
particle
0 0.087792 -0.003802
2 -0.036809 0.098699
3 0.044256 -0.001375
13 0.055800 0.009597
14 0.075891 0.002834
Similarly, multiple diffusing components can be identified by analysing the MSD distributions.
>>> m3 = MsdDist(trc, frame_rate=10, n_components=2, n_lag=5)
Now, one gets the MSDs and errors for each component:
>>> c1, c2 = m3.get_msd()
>>> c1.msd
lagt
0.1 0.084859
0.2 1.245575
0.3 0.093846
0.4 0.649084
0.5 1.220108
Name: ensemble, dtype: float64
>>> c2.msd
0.1 1.161134
0.2 2.474728
0.3 3.216000
0.4 4.600476
0.5 5.857745
Name: ensemble, dtype: float64
Fitting of a model works similarly to Msd
, but yields results for
each component:
>>> bm3 = m3.fit("brownian", n_lag=np.inf)
>>> r1, r2 = bm3.get_results()
>>> r1.fit
D 0.418502
eps 0.197795
weight 0.083748
Name: ensemble, dtype: float64
>>> r2.fit
D 2.879742
eps 0.039767
weight 0.916252
Name: ensemble, dtype: float64
This can also be plotted:
>>> bm3.plot()
Analysis can be carried out for each track individually by setting
ensemble=False
.
Immobilized parts of particle trajectories can be found using
find_immobilizations()
. It adds a column named “immob” to the
DataFrame that identify mobile and immobile parts of trajectories. Where
“immob” >= 0, the particle is immobile and where “immob” < 0, it is mobile.
>>> find_immobilizations(trc, 1, 10)
>>> trc.head()
x y mass frame particle immob
0 13.276002 58.640033 1711.1778 1.0 0.0 0
1 13.058399 58.870968 1742.6831 2.0 0.0 0
2 12.624088 58.132507 2527.7880 3.0 0.0 0
3 13.651173 57.980727 2461.9288 5.0 0.0 0
4 13.482774 58.122870 2508.9154 6.0 0.0 0
MSD calculation#
- class sdt.motion.Msd(data, frame_rate, n_lag=20, n_boot=100, ensemble=True, e_name='ensemble', random_state=None, pixel_size=1, columns={})[source]#
Calculate and analyze mean square displacements (MSDs)
from single moleclue tracking data.
- Parameters:
data (pandas.DataFrame or iterable of pandas.DataFrame) – Tracking data. Either a single DataFrame or a collection of DataFrames.
frame_rate (float) – Frame rate
n_lag (int or inf, optional) – Number of lag times (time steps) to consider at most. Defaults to 100.
n_boot (int, optional) – Number of bootstrapping iterations for calculation of errors. Set to 0 to turn off bootstrapping for performance gain, in which case there will be no errors on the fit results. Defaults to 100.
ensemble (bool, optional) – Whether to calculate the MSDs for the whole data set or for each trajectory individually. Defaults to True.
pixel_size (float, optional) – Pixel size; multiply coordinates by this factor. Defaults to 1 (no scaling).
e_name (str, optional) – If the ensemble parameter is True, use this as the name for the dataset. It shows up in the MSD DataFrame (see
get_msd()
) and in plots. Defaults to “ensemble”.random_state (numpy.random.RandomState or None, optional) –
numpy.random.RandomState
instance to use for random sampling for bootstrapping. If None, use create a new instance withseed=None
. Defaults to None.columns (dict, optional) – Override default column names as defined in
config.columns
. Relevant names are coords, particle, and time. This means, if your DataFrame has coordinate columns “x”, “y”, and “z” and the time column “alt_frame”, setcolumns={"coords": ["x", "y", "z"], "time": "alt_frame"}
.
- class Result(msd, msd_err)#
Create new instance of Result(msd, msd_err)
- msd#
Alias for field number 0
- msd_err#
Alias for field number 1
- get_msd(series=True)[source]#
Get MSD and error DataFrames
The columns contain data for different lag times. Each row corresponds to one trajectory. The row index is either the particle number if a single DataFrame was passed to
__init__()
or a tuple identifying both the DataFrame and the particle. If__init__()
was called withensemble=True
, there will only be one row with index e_name.The series parameter controls whether to return a
pandas.DataFrame
or apandas.Series
if there is only one row.- Parameters:
series (bool, optional) – If True and and there is only one entry (e.g., due to
ensemble=True
in the constructor),pandas.Series
objects will be returned. Otherwise, returnpandas.DataFrame
. Defaults to True.- Returns:
msd (pandas.DataFrame or pandas.Series) – Mean square displacements
msd_err (pandas.DataFrame or pandas.Series) – Standard errors of the mean square displacements. If bootstrapping was used, these are the standard deviations of the MSD results from bootstrapping. Otherwise, these are caleculated as the standard deviation of square displacements divided by the number of samples.
- fit(model='brownian', *args, **kwargs)[source]#
Fit a model function to the MSD data
- Parameters:
model ({"anomalous", "brownian"}, optional) – Type of model to fit. Defaults to “brownian”.
n_lag (int or inf, optional) – Number of lag times to use for fitting. Defaults to 2 for the Brownian model and inf for anomalous diffusion.
exposure_time (float, optional) – Exposure time. Defaults to 0, i.e. no exposure time compensation
initial (tuple of float, len 3, optional) – Initial guess for fitting anomalous diffusion. The tuple entries are diffusion coefficient, positional accuracy, and alpha.
- class sdt.motion.MsdDist(data, frame_rate, n_components=2, n_lag=10, n_boot=0, ensemble=True, fit_method='lsq', poly_order=30, e_name='ensemble', random_state=None, pixel_size=1, assign_method='msd', columns={})[source]#
Calculate and analyze mean square displacement (MSD) distribution
from single moleclue tracking data. This differs from
sdt.motion.Msd
by the fact that the distributions are analyzed, which allows for detection of multiple components with different diffusion coefficients [Schu1997].Only valid for 2D data.
- Parameters:
data (pandas.DataFrame or iterable of pandas.DataFrame) – Tracking data. Either a single DataFrame or a collection of DataFrames.
frame_rate (float) – Frame rate
n_components (int, optional) – Number of diffusing components. Defaults to 2.
n_lag (int or inf, optional) – Number of lag times (time steps) to consider at most. Defaults to 10.
n_boot (int, optional) – Number of bootstrapping iterations for calculation of errors. Set to 0 to turn off bootstrapping for performance gain, in which case there will be no errors on the fit results. Defaults to 0.
ensemble (bool, optional) – Whether to calculate the MSDs for the whole data set or for each trajectory individually. Defaults to True.
fit_method ({"prony", "lsq", "weighted-lsq"}, optional) – Method to use for cumulative distribution function (CDF) fitting. “prony” is a modified Prony’s method, “lsq” is least squares fitting, and “weighted-lsq” is weighted least squares fitting to account for the fact that the CDF data are concentrated at x=0. Defaults to “lsq”.
pixel_size (float, optional) – Pixel size; multiply coordinates by this factor. Defaults to 1 (no scaling).
assign_method ({"msd", "weight"}, optional) – If “msd”, components are assigned according to increasing MSDs, i.e., lowest MSD will be component 1, next will be component 2, and so on. If “weight”, components are assigned according to increasing weights. Defaults to “msd”.
e_name (str, optional) – If the ensemble parameter is True, use this as the name for the dataset. It shows up in the MSD DataFrame (see
get_msd()
) and in plots. Defaults to “ensemble”.random_state (numpy.random.RandomState or None, optional) –
numpy.random.RandomState
instance to use for random sampling for bootstrapping. If None, use create a new instance withseed=None
. Defaults to None.columns (dict, optional) – Override default column names as defined in
config.columns
. Relevant names are coords, particle, and time. This means, if your DataFrame has coordinate columns “x” and “z” and the time column “alt_frame”, setcolumns={"coords": ["x", "z"], "time": "alt_frame"}
.
- class Result(msd, msd_err, weight, weight_err)#
Create new instance of Result(msd, msd_err, weight, weight_err)
- msd#
Alias for field number 0
- msd_err#
Alias for field number 1
- weight#
Alias for field number 2
- weight_err#
Alias for field number 3
- get_msd(series=True)[source]#
Get MSD and error DataFrames
The columns contain data for different lag times. Each row corresponds to one trajectory. The row index is either the particle number if a single DataFrame was passed to
__init__()
or a tuple identifying both the DataFrame and the particle. If__init__()
was called withensemble=True
, there will only be one row with index e_name.If there is only one row, the series parameter controls whether to return a
pandas.DataFrame
or apandas.Series
.- Parameters:
series (bool, optional) – If True and and there is only one entry (e.g., due to
ensemble=True
in the constructor),pandas.Series
objects will be returned. Otherwise, returnpandas.DataFrame
. Defaults to True.- Returns:
Each tuple describes one component. The tuple members are
pandas.DataFrame
orpandas.Series
(depending on the series parameter), where msd holds the mean square displacements, msd_err contains standard errors of the MSDs, weight hold the relative weight of the component calculated for the different lag times, and weight_err contains standard errors of the weights. If no bootstrapping was performed, all errors are NaN.- Return type:
list of namedTuple([“msd”, “msd_err”, “weight”, “weight_err”])
- fit(model='brownian', *args, **kwargs)[source]#
Fit a model function to the MSD data
- Parameters:
model ({"anomalous", "brownian"}) – Type of model to fit
n_lag (int or inf, optional) – Number of lag times to use for fitting. Defaults to 2 for the Brownian model and inf for anomalous diffusion.
exposure_time (float, optional) – Exposure time. Defaults to 0, i.e. no exposure time compensation
initial (tuple of float, len 3, optional) – Initial guess for fitting anomalous diffusion. The tuple entries are diffusion coefficient, positional accuracy, and alpha.
Diffusion models for fitting#
- class sdt.motion.AnomalousDiffusion(msd_data, n_lag=inf, exposure_time=0.0, initial=(0.5, 0.05, 1.0))[source]#
Fit anomalous diffusion parameters to MSD values
Fit a function \(msd(t_\text{lag}) = 4 D t_\text{lag}^\alpha + 4 \epsilon^2\) to the tlag-vs.-MSD graph, where \(D\) is the diffusion coefficient, \(\epsilon\) is the positional accuracy (uncertainty), and \(\alpha\) the anomalous diffusion exponent.
- Parameters:
msd_data (msd_base.MsdData) – MSD data
n_lag (int or inf, optional) – Maximum number of lag times to use for fitting. Defaults to inf, i.e. using all.
exposure_time (float, optional) – Exposure time. Defaults to 0, i.e. no exposure time correction
initial (tuple of float, optional) – Initial guesses for the fitting for \(D\), \(\epsilon\), and \(\alpha\). Defaults to
(0.5, 0.05, 1.)
.
- static exposure_time_corr(t, alpha, exposure_time, n=100, force_numeric=False)[source]#
Correct lag times for the movement of particles during exposure
When particles move during exposure, it appears as if the lag times change according to
\[t_\text{app}^\alpha = \lim_{n\rightarrow\infty} \frac{1}{n^2} \sum_{m_1 = 0}^{n-1} \sum_{m_2 = 0}^{n-1} |t + \frac{t_\text{exp}}{n}(m_1 - m_2)|^\alpha - |\frac{t_\text{exp}}{n}(m_1 - m_2)|^\alpha.\]For \(\alpha=1\), \(t_\text{app} = t - t_\text{exp} / 3\). For \(t_\text{exp} = 0\) or \(\alpha = 2\), \(t_\text{app} = t\). For other parameter values, the sum is computed numerically using a sufficiently large n (100 by default).
See [Goul2000] for details.
- Parameters:
t (numpy.ndarray) – Lag times
alpha (float) – Anomalous diffusion exponent
exposure_time (float) – Exposure time
n (int, optional) – Number of summands for the numeric calculation. The complexity of the algorithm is O(n²). Defaults to 100.
force_numeric (bool, optional) – If True, do not return the analytical solutions for \(\alpha \in \{1, 2\}\) and \(t_\text{exp} = 0\), but calculate numerically. Useful for testing.
- Returns:
Apparent lag times that account for diffusion during exposure
- Return type:
numpy.ndarray
- static theoretical(t, d, eps, alpha=1, exposure_time=0, squeeze_result=True)[source]#
Calculate theoretical MSDs for different lag times
Calculate \(msd(t_\text{lag}) = 4 D t_\text{app}^\alpha + 4 \epsilon^2\), where \(t_\text{app}\) is the apparent time lag which takes into account particle motion during exposure; see
exposure_time_corr()
.- Parameters:
t (array-like or scalar) – Lag times
d (float) – Diffusion coefficient
eps (float) – Positional accuracy.
alpha (float, optional) – Anomalous diffusion exponent. Defaults to 1.
exposure_time (float, optional) – Exposure time. Defaults to 0.
squeeze_result (bool, optional) – If True, return the result as a scalar type or 1D array if possible. Otherwise, always return a 2D array. Defaults to True.
- Returns:
Calculated theoretical MSDs
- Return type:
numpy.ndarray or scalar
- class Result(fit, fit_err)#
Create new instance of Result(fit, fit_err)
- fit#
Alias for field number 0
- fit_err#
Alias for field number 1
- get_results(series=True)[source]#
Get fit results
The columns contain fitted parameters. Each row corresponds to one trajectory. The row index is either the particle number or a tuple identifying both the DataFrame and the particle; c.f.
Msd
.If there is only one row, the series parameter controls whether to return a
pandas.DataFrame
or apandas.Series
.- Parameters:
series (bool, optional) – If True and and there is only one entry (e.g., due to
ensemble=True
in theMsd
constructor),pandas.Series
objects will be returned. Otherwise, returnpandas.DataFrame
. Defaults to True.- Returns:
fit (pandas.DataFrame or pandas.Series) – Fit results. Columns are the fit paramaters. Each row represents one particle.
fit_err (pandas.DataFrame or pandas.Series) – Fit results standard errors. If no bootstrapping was performed for calculation of MSDs, this is all NaNs.
- plot(show_legend=True, ax=None)[source]#
Plot lag time vs. MSD with fitted theoretical curve
- Parameters:
show_legend (bool, optional) – Whether to add a legend to the plot. Defaults to True.
ax (matplotlib.axes.Axes or None, optional) – Axes to use for plotting. If None, use
pyplot.gca()
. Defaults to None.
- Returns:
Axes used for plotting
- Return type:
matplotlib.axes.Axes
- class sdt.motion.BrownianMotion(msd_data, n_lag=2, exposure_time=0)[source]#
Fit Brownian motion parameters to MSD values
Fit a function \(\mathit{msd}(t_\text{lag}) = 4 D t_\text{lag} + 4 \epsilon^2\) to the tlag-vs.-MSD graph, where \(D\) is the diffusion coefficient and \(\epsilon\) is the positional accuracy (uncertainty).
- Parameters:
msd_data (msd_base.MsdData) – MSD data
n_lag (int or inf, optional) – Maximum number of lag times to use for fitting. Defaults to 2.
exposure_time (float, optional) – Exposure time. Defaults to 0, i.e. no exposure time correction
- static theoretical(t, d, eps, exposure_time=0)[source]#
Calculate theoretical MSDs for different lag times
Calculate \(msd(t_\text{lag}) = 4 D t_\text{app} + 4 \epsilon^2\), where \(t_\text{app}\) is the apparent time lag which takes into account particle motion during exposure; see
exposure_time_corr()
.- Parameters:
t (array-like or scalar) – Lag times
d (float) – Diffusion coefficient
eps (float) – Positional accuracy.
alpha (float, optional) – Anomalous diffusion exponent. Defaults to 1.
exposure_time (float, optional) – Exposure time. Defaults to 0.
squeeze_result (bool, optional) – If True, return the result as a scalar type or 1D array if possible. Otherwise, always return a 2D array. Defaults to True.
- Returns:
Calculated theoretical MSDs
- Return type:
numpy.ndarray or scalar
Immobilization detection#
- sdt.motion.find_immobilizations(tracks, max_dist, min_duration, label_mobile=True, longest_only=False, rtol=0.0, atol=0, engine='numba', columns={})[source]#
Find immobilizations in particle trajectories
Analyze trajectories and mark parts where all localizations of a particle stay within a circle of radius max_dist from their center of mass for at least a certain number (min_length) of frames as an immobilization.
The tracks DataFrame gets a new column (“immob”) appended (or overwritten), where every immobilzation is assigned a different number greater or equal than 0. Parts of trajectories that are not immobilized are assigned -1 (if label_mobile is False) or one negative number per mobile section starting at -2 (if label_mobile is True).
find_immobilizations_int()
uses a slightly different criterion for finding immobilizations.- Parameters:
tracks (pandas.DataFrame) – Tracking data
max_dist (float) – Maximum radius within a particle may move while still being considered immobilized
min_duration (int) – Minimum duration the particle has to stay at the same place for a part of a trajectory to be considered immobilized. Duration is the difference of the maximum frame number and the minimum frame number. E. g. if the frames 1, 2, and 4 would lead to a duration of 3.
label_mobile (bool, optional) – Whether to give each mobile track section a distinct label (a negative number starting at -2). Defaults to True.
longest_only (bool, optional) – If True, search only for the longest immobilzation in each track to speed up the process. Defaults to False.
rtol (float, optional) – The fraction of localizations that may not be within max_dist of the center of mass while still recognizing the sub-track as immobile. One also has to set the a_tol parameter to something non-zero if using this since a sub-track will not be considered immobile if either r_tol or a_tol are exceeded. Defaults to 0.
atol (int, optional) – The number of localizations that may not be within max_dist of the center of mass while still recognizing the sub-track as immobile. One also has to set the r_tol parameter to something non-zero if using this since a sub-track will not be considered immobile if either r_tol or a_tol are exceeded. Defaults to 0.
- Returns:
Although the tracks DataFrame is modified in place, it is also returned for convenience.
- Return type:
pandas.DataFrame
See also
find_immobilizations_int
Uses a slightly different criterion for finding immobilizations
- Parameters:
engine ({"numba", "python"}, optional) – If engine is “numba” and the numba package is installed, use the much faster numba-accelerated alogrithm. Otherwise, fall back to a pure python one. Defaults to “numba”
columns (dict, optional) – Override default column names as defined in
config.columns
. Relevant names are coords, time, and particle. This means, if your DataFrame has coordinate columns “x” and “z” and the time column “alt_frame”, setcolumns={"coords": ["x", "z"], "time": "alt_frame"}
.
- sdt.motion.label_mobile(tracks, engine='numba', columns={})[source]#
Give each mobile section of a track a label (unique number)
When calling
find_immobilizations()
withlabel_mobile=False
, all mobile sections of tracks will have -1 in their immob column. This function assigns a unique (negative) number to each section, starting at -2.The data DataFrame will be modified in place, but also returned.
- Parameters:
tracks (DataFrame) – Tracking data with an immob column where all mobile sections of tracks are assigned -1.
engine (Literal['numba', 'python']) – If engine is “numba” and the numba package is installed, use the much faster numba-accelerated alogrithm. Otherwise, fall back to a pure python one. Defaults to “numba”
columns (Mapping) – Override default column names as defined in
config.columns
. The only relevant name is particle.
- Return type:
tracks with updated
"immob"
column.
- sdt.motion.find_immobilizations_int(tracks, max_dist, min_duration, label_mobile=True, longest_only=False, columns={})[source]#
Find immobilizations in particle trajectories
Analyze trajectories and mark parts where all localizations of a particle stay within a circle of radius max_dist for at least a certain number (min_length) of frames as an immobilization. In other words: If consecutive localizations stay within the intersection of circles of radius max_dist and coordinates of the localizations as centers, they are considered immobilized.
This is different from
find_immobilizations()
.The tracks DataFrame gets a new column (“immob”) appended (or overwritten), where every immobilzation is assigned a different number greater or equal than 0. Parts of trajectories that are not immobilized are assigned -1 (if label_mobile is False) or one negative number per mobile section starting at -2 (if label_mobile is True).
- Parameters:
tracks (pandas.DataFrame) – Tracking data
max_dist (float) – Maximum radius within a particle may move while still being considered immobilized
min_duration (int) – Minimum duration the particle has to stay at the same place for a part of a trajectory to be considered immobilized. Duration is the difference of the maximum frame number and the minimum frame number. E. g. if the frames 1, 2, and 4 would lead to a duration of 3.
label_mobile (bool, optional) – Whether to give each mobile track section a distinct label (a negative number starting at -2). Defaults to True.
longest_only (bool, optional) – If True, search only for the longest immobilzation in each track to speed up the process. Defaults to False.
- Returns:
Although the tracks DataFrame is modified in place, it is also returned for convenience.
- Return type:
pandas.DataFrame
See also
find_immobilizations
Uses a slightly different criterion for finding immobilizations
- Parameters:
columns (dict, optional) – Override default column names as defined in
config.columns
. Relevant names are coords, time, and particle. This means, if your DataFrame has coordinate columns “x” and “z” and the time column “alt_frame”, setcolumns={"coords": ["x", "z"], "time": "alt_frame"}
.
References
Goulian, M. & Simon, S. M.: “Tracking Single Proteins within Cells”, Biophysical Journal, Elsevier BV, 2000, 79, 2188–2198
Schütz, G.; Schindler, H. & Schmidt, T.: “Single-molecule microscopy on model membranes reveals anomalous diffusion”, Biophysical Journal, Elsevier BV, 1997, 73, 1073–1080