Single molecule FRET analysis#
The sdt.fret
module provides functionality to analyze single
molecule FRET data. This includes
tracking and measurement of FRET-related quantities using the
SmFRETTracker
class.analyzing and filtering of the data with help of
SmFRETAnalyzer
.functions for plotting results, such as
smfret_scatter()
,smfret_hist()
, anddraw_track()
.
Examples
Load data for tracking:
>>> chromatic_corr = multicolor.Registrator.load("cc.npz")
>>> donor_loc = io.load("donor.h5")
>>> acceptor_loc = io.load("acceptor.h5")
>>> donor_img = io.ImageSequence("donor.tif").open()
>>> acceptor_img = io.ImageSequence("acceptor.tif").open()
Tracking of single molecule FRET signals. This involves merging features from donor and acceptor channels, the actual tracking, getting the brightness of the donor and the acceptor for each localization from the raw images and much more.
>>> tracker = SmFRETTracker("dddda", chromatic_corr)
>>> trc = tracker.track(donor_img, acceptor_img, donor_loc, acceptor_loc)
Now these data can be analyzed and filtered. Calculate FRET-related quantities such as FRET efficiency, stoichiometry, etc.:
>>> ana = SmFRETAnalyzer(trc)
>>> ana.calc_fret_values()
Let us reject any tracks where the acceptor does not bleach in a single step:
>>> ana.acceptor_bleach_step("acceptor")
Remove any tracks where the mass upon acceptor excitation does not exceed 500 counts at least once
>>> ana.query_particles("fret_a_mass > 500", 1)
Accept only localizations that lie in pixels where the boolean mask is True:
>>> mask = numpy.load("mask.npy")
>>> ana.image_mask(mask, "donor")
Filtered data can be accessed via the SmFRETAnalyzer.tracks
attribute.
Draw a scatter plot of FRET efficiency vs. stoichiometry:
>>> smfret_scatter({"data1": filt.tracks}, ("fret", "eff), ("fret", "stoi"))
Tracking#
- class sdt.fret.SmFRETTracker(excitation_seq='da', registrator=None, link_radius=5, link_mem=1, min_length=1, feat_radius=4, bg_frame=2, bg_estimator='mean', neighbor_radius='auto', interpolate=True, coloc_dist=2.0, acceptor_channel=2, link_quiet=True, link_options={}, columns={})[source]#
Class for tracking of smFRET data
There is support for dumping and loading to/from YAML using
sdt.io.yaml
.- Parameters:
excitation_seq (str) – Set the
excitation_seq
attribute.registrator (sdt.multicolor.registrator.Registrator) – Registrator used to overlay channels. If None, use the identity transform.
link_radius (float) – Maximum movement of features between frames. See search_range option of
trackpy.link_df()
.link_mem (int) – Maximum number of frames for which a feature may not be detected. See memory option of
trackpy.link_df()
.min_length (int) – Minimum length of tracks.
feat_radius (int) – Radius of circle that is a little larger than features. See radius option of
brightness.from_raw_image()
.bg_frame (int) – Size of frame around features for background determination. See bg_frame option of
brightness.from_raw_image()
.bg_estimator (str | Callable[[ndarray], float]) – Statistic to estimate background. See bg_estimator option of
brightness.from_raw_image()
.neighbor_radius (float) – How far two features may be apart while still being considered close enough so that one influences the brightness measurement of the other. This is related to the radius option of
brightness.from_raw_image()
. If “auto”, use the smallest value that avoids overlaps.interpolate (bool) – Whether to interpolate coordinates of features that have been missed by the localization algorithm.
coloc_dist (float) – After overlaying donor and acceptor channel features, this gives the maximum distance up to which donor and acceptor signal are considered to come from the same molecule.
acceptor_channel (int) – Whether the acceptor channel is number 1 or 2 in registrator.
link_options (Dict) – Specify additional options to
trackpy.link_df()
. “search_range” and “memory” will be overwritten by the link_radius and link_mem parameters.link_quiet (bool) – If True, call
trackpy.quiet()
.columns (Dict) – Override default column names as defined in
config.columns
. Relevant names are coords, time, mass, signal, bg, bg_dev. This means, if your DataFrame has coordinate columns “x” and “z” and the time column “alt_frame”, setcolumns={"coords": ["x", "z"], "time": "alt_frame"}
. This parameters sets thecolumns
attribute.
- frame_selector: FrameSelector#
A
FrameSelector
instance with the matchingexcitation_seq
.
- registrator: Registrator#
multicolor.Registrator used to overlay channels
- link_options: Dict#
Options passed to
trackpy.link_df()
- min_length: int#
Minimum length of tracks
- brightness_options: Dict#
Options passed to
brightness.from_raw_image()
. Make sure to adjustneighbor_radius
if you change either the mask or the radius option!
- interpolate: bool#
Whether to interpolate coordinates of features that have been missed by the localization algorithm.
- coloc_dist: float#
After overlaying donor and acceptor channel features, this gives the maximum distance up to which donor and acceptor signal are considered to come from the same molecule.
- acceptor_channel: int#
Can be either 1 or 2, depending the acceptor is the first or the second channel in
registrator
.
- columns: Dict#
Column names in DataFrames. Defaults are taken from
config.columns
.
- neighbor_radius: float#
How far two features may be apart while still being considered close enough so that one influences the brightness measurement of the other. This is related to the radius option of
brightness.from_raw_image()
.
- property excitation_seq: str#
Excitation sequence. “d” stands for donor, “a” for acceptor, anything else describes other kinds of frames which are irrelevant for tracking.
One needs only specify the shortest sequence that is repeated, i. e. “ddddaddddadddda” is the same as “dddda”.
- track(donor_img, acceptor_img, donor_loc, acceptor_loc, d_mass=False)[source]#
Track smFRET data
Localization data for both the donor and the acceptor channel is merged (since a FRET construct has to be visible in at least one channel). The merged data is than linked into trajectories using py:func:trackpy.link_df. For this the
trackpy
package needs to be installed. Additionally, the feature brightness is determined for both donor and acceptor for raw image data usingbrightness.from_raw_image()
. These data are written into a apandas.DataFrame
whose columns have a MultiIndex containing the “donor” and “acceptor” items in the top level.A column specifying whether the entry originates from donor or acceptor excitation is also added: (“fret”, “exc_type”). It is “d” for donor and “a” for acceptor excitation; see the
flag_excitation_type()
method.- Parameters:
donor_img (Sequence[ndarray]) – Raw image frames for donor and acceptor channel. This need to be of type list, but anything that returns image data when indexed with a frame number will do.
acceptor_img (Sequence[ndarray]) – Raw image frames for donor and acceptor channel. This need to be of type list, but anything that returns image data when indexed with a frame number will do.
donor_loc (DataFrame) – Localization data for donor and acceptor channel
acceptor_loc (DataFrame) – Localization data for donor and acceptor channel
d_mass (bool) –
If True, get total brightness upon donor excitation by from the sum of donor and acceptor image. If False, the donor excitation brightness can still be calculated as the sum of donor and acceptor brightness in
analyze()
. Defaults to False.Note: Until
slicerator
with support for multiple inputs to pipelines is released, setting this to True will load all of donor_img and acceptor_img into memory, even ifpims
is used.
- Returns:
The columns are indexed with a
pandas.MultiIndex
.The top index level consists of “donor” (tracking data for the
donor channel), “acceptor” (tracking data for the acceptor
channel), and “fret”. The latter contains a column with the
particle number (“particle”), an indicator (0 / 1) whether there
is a near neighbor (“has_neighbor”), and an indicator whether the
data point was interpolated (“interp”) because it was not in the
localization data in either channel.
- Return type:
DataFrame
- flag_excitation_type(tracks)[source]#
Add a column indicating excitation type (donor/acceptor/…)
Add (“fret”, “exc_type”) column in place. It is of “category” type.
- Parameters:
tracks (DataFrame) – Result of
track()
Analysis and Filtering#
- class sdt.fret.SmFRETAnalyzer(tracks, cp_detector=None, copy=False, reset_filters=True, keep_filters=[], columns={})[source]#
Class for analyzing and filtering of smFRET data
This provides various analysis and filtering methods which act on the
tracks
attribute.Correction of FRET efficiencies and stoichiometries for donor leakage and direct acceptor excitation is implemented according to [Hell2018], while different detection efficiencies for the different fluorophores are accounted for as described in [MacC2010] as well the different excitation efficiencies according to [Lee2005].
- Parameters:
tracks (pandas.core.frame.DataFrame) – smFRET tracking data as produced by
SmFRETTracker
by running itsSmFRETTracker.track()
method.cp_detector (Any) – Changepoint detetctor. If None, create a
changepoint.Pelt
instance withmodel="l2"
,min_size=1
, andjump=1
.copy (bool) – If True, copy tracks to the
tracks
attribute.reset_filters (bool) – If True, reset filters in
tracks
. See alsoreset_filters()
.keep_filters (str | Iterable[str]) – Filter columns to keep upon resetting. See also
reset_filters()
.columns (Dict) – Override default column names as defined in
config.columns
. Relevant names are coords and time. This means, if your DataFrame has coordinate columns “x” and “z” and the time column “alt_frame”, setcolumns={"coords": ["x", "z"], "time": "alt_frame"}
. This parameters sets thecolumns
attribute.
- bleach_thresh: Sequence[float] = (500.0, 500.0)#
Intensity (mass) thresholds upon donor and acceptor excitation, respecitively, below which a signal is considered bleached. Used for bleaching step analysis.
- tracks: DataFrame#
smFRET tracking data
- cp_detector: Any#
Changepoint detector class instance used to perform bleching step detection.
- columns: Dict#
Column names to use in DataFrames. See
config.set_columns()
for details.
- leakage: float#
Correction factor for donor leakage into the acceptor channel; \(\alpha\) in [Hell2018]
- direct_excitation: float#
Correction factor for direct acceptor excitation by the donor laser; \(\delta\) in [Hell2018]
- detection_eff: float#
Correction factor(s) for the detection efficiency difference beteen donor and acceptor fluorophore; \(\gamma\) in [Hell2018].
Can be a scalar for global correction or a
pandas.Series
for individual correction. In the latter case, the index is the particle number and the value is the correction factor.
- excitation_eff: float#
Correction factor(s) for the excitation efficiency difference beteen donor and acceptor fluorophore; \(\beta\) in [Hell2018].
- reset_filters(keep=[])[source]#
Reset filters
This drops filter columns from
tracks
.- Parameters:
keep (str | Iterable[str]) – Filter column name(s) to keep
- calc_fret_values(keep_d_mass=False, invalid_nan=True, a_mass_interp='nearest-up', skip_neighbors=True)[source]#
Calculate FRET-related values
This needs to be called before the filtering methods and before calculating the true FRET efficiencies and stoichiometries. However, any corrections to the donor and acceptor localization data (such as
flatfield_correction()
) need to be done before this.Calculated values apparent FRET efficiencies and stoichiometries, the total brightness (mass) upon donor excitation, and the acceptor brightness (mass) upon direct excitation, which is interpolated for donor excitation datapoints in order to allow for calculation of stoichiometries.
For each localization in tracks, the total brightness upon donor excitation is calculated by taking the sum of
("donor", "mass")
and("acceptor", "mass")
values. It is added as a("fret", "d_mass")
column to the tracks DataFrame. The apparent FRET efficiency (acceptor brightness (mass) divided by sum of donor and acceptor brightnesses) is added as a("fret", "eff_app")
column to the tracks DataFrame.The apparent stoichiometry value \(S_\text{app}\) is given as
\[S_\text{app} = \frac{I_{DD} + I_{DA}}{I_{DD} + I_{DA} + I_{AA}}\]as in [Hell2018]. \(I_{DD}\) is the donor brightness upon donor excitation, \(I_{DA}\) is the acceptor brightness upon donor excitation, and \(I_{AA}\) is the acceptor brightness upon acceptor excitation. The latter is calculated by interpolation for frames with donor excitation.
\(I_{AA}\) is append as a
("fret", "a_mass")
column. The stoichiometry value is added in the("fret", "stoi_app")
column.- Parameters:
keep_d_mass (bool) – If a
("fret", "d_mass")
column is already present in tracks, use that instead of overwriting it with the sum of("donor", "mass")
and("acceptor", "mass")
values. Useful iftrack()
was called withd_mass=True
.invalid_nan (bool) – If True, all “d_mass”, “eff_app”, and “stoi_app” values for excitation types other than donor excitation are set to NaN, since the values don’t make sense.
a_mass_interp (str) – How to interpolate the acceptor mass upon direct excitation in donor excitation frames. Sensible values are “linear” for linear interpolation; “nearest” to take the value of the closest direct acceptor excitation frame (using the previous frame in case of a tie); “nearest-up”, which is similar to “nearest” but takes the next frame in case of a tie; “next” and “previous” to use the next and previous frames, respectively.
skip_neighbors (bool) – If True, skip localizations where
("fret", "has_neighbor")
is True when interpolating acceptor mass upon direct excitation.
- apply_filters(include_negative=False, ignore=[], ret_type='data')[source]#
Apply filters to
tracks
This removes all entries from
tracks
that have been marked as filtered using e.g.query()
,query_particle()
,bleach_step()
, andimage_mask()
.- Parameters:
include_negative (bool) – If False, include only entries for which all
"filter"
column values are zero. If True, include also entries with negative"filter"
column values.ignore (str | Sequence[str]) –
"filter"
column(s) to ignore when deciding whether to include an entry or not. For instance, settingignore="bleach_step"
will not consider the("filter", "bleach_step")
column values.ret_type (str) – If
"data"
, return a copy oftracks
excluding all entries that have been marked as filtered, i.e., that have a positive (or nonzero, see the include_negative parameter) entry in any"filter"
column. If"mask"
, return a boolean array indicating whether an entry is to be removed or not
- Returns:
Copy of
tracks
with all filtered rows removed orcorresponding boolean mask.
- Return type:
DataFrame | array
- mass_changepoints(channel, stats='median', stat_margin=1, **kwargs)[source]#
Segment tracks by changepoint detection in brightness time trace
Changepoint detection is run on the donor or acceptor brightness (
mass
) time trace, depending on the channels argument. This appends py:attr:tracks with a("fret", "d_seg")
or (“fret”, “a_seg”)` column for donor or acceptor, resp. For each localization, this holds the number of the segment it belongs to. Furthermore, statistics (such as median brightness) can/should be calculated, which can later be used to analyze stepwise bleaching (seebleach_step()
).:py:attr:`tracks` will be sorted according to ``(“fret”, “particle”)`` and ``(“donor”, self.columns[“time”])`` in the process.
- Parameters:
channel (str) – In which channel (
"donor"
or"acceptor"
) to perform changepoint detection.stats (Callable | str | Iterable[Callable | str]) – Statistics to calculate for each track segment. For each entry
s
, a column named"{channel}_seg_{s}"
is appendend, wherechannel
isd
for donor anda
for acceptor.s
can be the name of a numpy function or a callable returning a statistic, such asnumpy.mean()
.stat_margin (int) – Number of data points around a changepoint to exclude from statistics calculation. This can prevent bias in the statistics due to recording a bleaching event in progress.
**kwargs – Keyword arguments to pass to
cp_detector
find_changepoints() method.
Examples
Pass
penalty=1e6
to the changepoint detector’sfind_changepoints
method, perform detection both channels:>>> ana.mass_changepoints("donor", penalty=1e6) >>> ana.mass_changepoints("acceptor", penalty=1e6)
- bleach_step(condition='donor or acceptor', stat='median', reason='bleach_step')[source]#
Find tracks with acceptable fluorophore bleaching behavior
What “acceptable” means is specified by the condition parameter.
("fret", "d_seg")
,("fret", f"d_seg_{stat}")
,("fret", "a_seg")
, and("fret", f"a_seg_{stat}")
(where{stat}
is replaced by the value of the stat parameter) columns need to be present intracks
for this to work, which can be achieved by performing changepoint in both channels usingsegment_mass()
.The donor considered bleached if its
("fret", f"d_seg_{stat}")
is belowbleach_thresh
[0]
. The acceptor considered bleached if its("fret", f"a_seg_{stat}")
is belowbleach_thresh
[0]
- Parameters:
condition (str) – If
"donor"
, accept only tracks where the donor bleaches in a single step and the acceptor shows either no bleach step or completely bleaches in a single step. Likewise,"acceptor"
will accept only tracks where the acceptor bleaches fully in one step and the donor shows no partial bleaching.donor or acceptor
requires that one channel bleaches in a single step while the other either also bleaches in one step or not at all (no partial bleaching). If"no partial"
, there may be no partial bleaching, but bleaching is not required.stat (str) – Statistic to use to determine bleaching steps. Has to be one that was passed to via
stats
parameter tomass_changepoints()
.reason (str) – Filtering reason / column name to use.
Examples
Consider acceptors with a brightness
("fret", "a_mass")
of less than 500 counts and donors with a brightness("fret", "d_mass")
of less than 800 counts bleached. Remove all tracks that don’t show acceptable bleaching behavior.>>> ana.bleach_thresh = (800, 500) >>> ana.bleach_step("donor or acceptor")
- query(expr, mi_sep='_', reason='query')[source]#
Filter features according to column values
Flatten the column MultiIndex and filter the resulting DataFrame’s eval method.
- Parameters:
expr (str) – Filter expression. See
pandas.DataFrame.eval()
for details.mi_sep (str) – Separate multi-index levels by this character / string.
reason (str) – Filtering reason / column name to use.
Examples
Remove lines where (“fret”, “a_mass”) <= 500 from
tracks
>>> filt.query("fret_a_mass > 500")
- query_particles(expr, min_abs=1, min_rel=0.0, mi_sep='_', reason='query_p')[source]#
Remove particles that don’t fulfill expr enough times
Any particle that does not fulfill expr at least min_abs times AND during at least a fraction of min_rel of its length is removed from
tracks
.The column MultiIndex is flattened for this purpose.
- Parameters:
expr (str) – Filter expression. See
pandas.DataFrame.eval()
for details.min_abs (int) – Minimum number of times a particle has to fulfill expr. If negative, this means “all but
abs(min_abs)
”. If 0, it has to be fulfilled in all frames.min_rel (float) – Minimum fraction of data points that have to fulfill expr for a particle not to be removed.
mi_sep (str) – Use this to separate levels when flattening the column MultiIndex. Defaults to “_”.
reason (str) – Filtering reason / column name to use.
Examples
Remove any particles where not (“fret”, “a_mass”) > 500 at least twice from
tracks
.>>> filt.query_particles("fret_a_mass > 500", 2)
Remove any particles where (“fret”, “a_mass”) <= 500 in more than one frame:
>>> filt.query_particles("fret_a_mass > 500", -1)
Remove any particle where not (“fret”, “a_mass”) > 500 for at least 75 % of the particle’s data points, with a minimum of two data points:
>>> filt.query_particles("fret_a_mass > 500", 2, min_rel=0.75)
- image_mask(mask, channel, reason='image_mask')[source]#
Filter using a boolean mask image
Remove all lines where coordinates lie in a region where mask is False.
- Parameters:
mask (ndarray | List[Dict]) –
Mask image(s). If this is a single array, apply it to the whole
tracks
DataFrame.This can also be a list of dicts, where each dict
d
has to have a “key” and a “mask” (ndarray) entry. Then eachd["mask"]
is applied separately to the correspondingself.tracks.loc[d["key"]]
. Additionally, the dicts may also have “start” and “stop” entries, in which case the mask will be applied only to datapoints with frames greater or equal start and less than stop; all others will be discarded. With this it is possible to apply multiple masks to the same key depending on the frame number.channel (str) – Channel to use for the filtering
reason (str) – Filtering reason / column name to use.
Examples
Create a 2D boolean mask to remove any features that do not have x and y coordinates between 50 and 100 in the donor channel.
>>> mask = numpy.zeros((200, 200), dtype=bool) >>> mask[50:100, 50:100] = True >>> filt.image_mask(mask, "donor")
If
tracks
has a MultiIndex index, where e.g. the first level is “file1”, “file2”, … and different masks should be applied for each file, this is possible by passing a list of dicts. Furthermore, we can apply one mask to all frames up to 100 and another to the rest:>>> masks = [] >>> for i in range(1, 10): >>> masks.append({"key": "file%i" % i, ... "mask": numpy.ones((10 * i, 10 * i), dtype=bool), ... "stop": 100}) >>> masks.append({"key": "file%i" % i, ... "mask": numpy.ones((20 * i, 20 * i), dtype=bool), ... "start": 100}) >>> filt.image_mask(masks, "donor")
- flatfield_correction(donor_corr, acceptor_corr)[source]#
Apply flatfield correction to donor and acceptor localization data
If present, donor and acceptor
f"{self.columns['mass']}_pre_flat"
andf"{self.columns['signal']}_pre_flat"
columns are used as inputs for flatfield correction, results are written toself.columns["mass"]
and self.columns[“signal”]`. Otherwise,self.columns["mass"]
and self.columns[“signal”]` are copied tof"{self.columns['mass']}_pre_flat"
andf"{self.columns['signal']}_pre_flat"
first.Any values derived from those (e.g., apparent FRET efficiencies) need to be recalculated manually.
- calc_leakage()[source]#
Calculate donor leakage (bleed-through) into the acceptor channel
For this to work,
tracks
must be a dataset of donor-only molecules. In this case, the leakage \(alpha\) can be computed using the formula [Hell2018]\[\alpha = \frac{\langle E_\text{app}\rangle}{1 - \langle E_\text{app}\rangle},\]where \(\langle E_\text{app}\rangle\) is the mean apparent FRET efficiency of a donor-only population.
The leakage \(\alpha\) together with the direct acceptor excitation \(\delta\) can be used to calculate the real fluorescence due to FRET,
\[F_\text{DA} = I_\text{DA} - \alpha I_\text{DD} - \delta I_\text{AA}.\]This sets the
leakage
attribute. Seefret_correction()
for how use this to calculate corrected FRET values.
- calc_direct_excitation()[source]#
Calculate direct acceptor excitation by the donor laser
For this to work,
tracks
must be a dataset of acceptor-only molecules. In this case, the direct acceptor excitation \(delta\) can be computed using the formula [Hell2018]\[\alpha = \frac{\langle S_\text{app}\rangle}{1 - \langle S_\text{app}\rangle},\]where \(\langle ES\text{app}\rangle\) is the mean apparent FRET stoichiometry of an acceptor-only population.
The leakage \(\alpha\) together with the direct acceptor excitation \(\delta\) can be used to calculate the real fluorescence due to FRET,
\[F_\text{DA} = I_\text{DA} - \alpha I_\text{DD} - \delta I_\text{AA}.\]This sets the
direct_excitation
attribute. Seefret_correction()
for how use this to calculate corrected FRET values.
- calc_detection_eff(min_seg_len=5, how=<function nanmedian>, stat=<function median>)[source]#
Calculate detection efficiency ratio of dyes
The detection efficiency ratio is the ratio of decrease in acceptor brightness to the increase in donor brightness upon acceptor photobleaching [MacC2010]:
\[\gamma = \frac{\langle I_\text{DA}^\text{pre}\rangle - \langle I_\text{DA}^\text{post}\rangle}{ \langle I_\text{DD}^\text{post}\rangle - \langle I_\text{DD}^\text{pre}\rangle}\]This needs molecules with exactly one donor and one acceptor fluorophore to work. Tracks need to be segmented already (see
segment_a_mass()
).The correction can be calculated for each track individually or some statistic (e.g. the median) of the indivdual \(gamma\) values can be used as a global correction factor for all tracks.
The detection efficiency \(\gamma\) can be used to calculate the real fluorescence of the donor fluorophore,
\[F_\text{DD} = \gamma I_\text{DD}.\]This sets the
detection_eff
attribute. Seefret_correction()
for how use this to calculate corrected FRET values.- Parameters:
min_seg_len (int) – How many data points need to be present before and after the bleach step to ensure a reliable calculation of the mean intensities. If there are fewer data points, a value of NaN will be assigned.
how (Callable | str) – If “individual”, the \(\gamma\) value for each track will be stored and used to correct the values individually when calling
fret_correction()
. If a function, apply this function to the \(\gamma\) array and its return value as a global correction factor. A sensible example for such a function would benumpy.nanmean()
. Beware that some \(\gamma\) may be NaN.stat (Callable | str) – Statistic to use to determine bleaching steps. If a string, it has to be the name of a function from
numpy
.
- calc_excitation_eff(n_components=1, component=0, random_seed=0)[source]#
Calculate excitation efficiency ratio of dyes
This is a measure of how efficient the direct acceptor excitation is compared to the donor excitation. It depends on the fluorophores and also on the excitation laser intensities.
It can be calculated using the formula [Lee2005]
\[\beta = \frac{1 - \langle S_\gamma \rangle}{ \langle S_\gamma\rangle},\]where \(S_\gamma\) is calculated like the apparent stoichiometry, but with the donor and acceptor fluorescence upon donor excitation already corrected using the leakage, direct excitation, and detection efficiency factors.
This needs molecules with exactly one donor and one acceptor fluorophore to work. Tracks need to be segmented already (see
segment_a_mass()
). Theleakage
,direct_excitation
, anddetection_eff
attributes need to be set correctly.The excitation efficiency \(\beta\) can be used to correct the acceptor fluorescence upon acceptor excitation,
\[F_\text{AA} = I_\text{AA} / \beta.\]This sets the
excitation_eff
attribute. Seefret_correction()
for how use this to calculate corrected FRET values.- Parameters:
n_components (int) – If > 1, perform a Gaussian mixture fit on the 2D apparent efficiency-vs.-stoichiomtry dataset. This helps to choose only the correct component with one donor and one acceptor. Defaults to 1.
component (int) – If n_components > 1, use this to choos the component number. Components are ordered according to decreasing mean apparent FRET efficiency.
gaussian_mixture_split()
can be used to check which component is the desired one. Defaults to 0.random_seed (int) – Seed for the random number generator used to initialize the Gaussian mixture model fit.
- calc_detection_excitation_effs(n_components, components=None, random_seed=0)[source]#
Get detection and excitation efficiency from multi-state sample
States are found in efficiency-vs.-stoichiometry space using a Gaussian mixture fit. Detection efficiency factor \(\gamma\) and excitation efficiency factor \(\delta\) are found performing a linear fit to the equation
\[S^{-1} = 1 + \beta\gamma + (1 - \gamma)\beta E\]to the Gaussian mixture fit results, where \(S\) are the components’ mean stoichiometries (corrected for leakage and direct excitation) and \(E\) are the corresponding FRET efficiencies (also corrected for leakage and direct excitation) [Hell2018].
- Parameters:
n_components (int) – Number of components for Gaussian mixture model
components (Sequence[int] | None) – List of indices of components to use for the linear fit. If None, use all.
random_seed (int) – Seed for the random number generator used to initialize the Gaussian mixture model fit.
- fret_correction(invalid_nan=True)[source]#
Apply corrections to calculate real FRET-related values
By correcting the measured acceptor and donor intensities upon donor excitation (\(I_\text{DA}\) and \(I_\text{DD}\)) and acceptor intensity upon acceptor excitation (\(I_\text{AA}\)) for donor leakage into the acceptor channel \(\alpha\), acceptor excitation by the donor laser \(\delta\), detection efficiencies \(\gamma\), and excitation efficiencies \(\beta\) using [Hell2018]
\[\begin{split}F_\text{DA} &= I_\text{DA} - \alpha I_\text{DD} - \delta I_\text{AA} \\ F_\text{DD} &= \gamma I_\text{DD} \\ F_\text{AA} &= I_\text{AA} / \beta\end{split}\]the real FRET efficiency and stoichiometry values can be calculated:
\[\begin{split}E &= \frac{F_\text{DA}}{F_\text{DA} + F_\text{DD}} \\ S &= \frac{F_\text{DA} + F_\text{DD}}{F_\text{DA} + F_\text{DD} + F_\text{AA}}\end{split}\]\(F_\text{DA}\) will be appended to
tracks
as the("fret", "f_da")
column; \(F_\text{DD}\) as("fret", "f_dd")
; \(F_\text{DA}\) as("fret", "f_aa")
; \(E\) as("fret", "eff")
; and \(S\) as("fret", "stoi")
.- Parameters:
invalid_nan (bool) – If True, all “eff”, and “stoi” values for excitation types other than donor excitation are set to NaN, since the values don’t make sense. Defaults to True.
Plotting#
- sdt.fret.smfret_scatter(track_data, xdata=('fret', 'eff'), ydata=('fret', 'stoi'), frame=None, columns=2, size=5, xlim=(None, None), ylim=(None, None), xlabel=None, ylabel=None, scatter_args={}, grid=True, ax=None)[source]#
Make scatter plots of multiple smFRET datasets
- Parameters:
track_data (dict of str: pandas.DataFrame) – dict keys are used to identify the smFRET datasets (dict values). The DataFrames have to have the same format as e.g. produced by
SmFRETTracker
.x_data (tuple of str, optional) – Column indices of data to plot on the x (y) axis. Defaults to
("fret", "eff")
for x_data and("fret", "stoi")
for y_data.y_data (tuple of str, optional) – Column indices of data to plot on the x (y) axis. Defaults to
("fret", "eff")
for x_data and("fret", "stoi")
for y_data.frame (int or None, optional) – If given, only plot data from a certain frame. Defaults to None.
columns (int, optional) – In how many columns to lay out plots. Defaults to 2.
size (int, optional) – Size per plot. Defaults to 5.
xlim (tuple of float, optional) – Set x (y) axis limits. Defaults to
(None, None)
, i.e. automatic determination.ylim (tuple of float, optional) – Set x (y) axis limits. Defaults to
(None, None)
, i.e. automatic determination.xlabel (str or None, optional) – Label for x (y) axis. If None, use x_data (y_data). Defaults to None.
ylabel (str or None, optional) – Label for x (y) axis. If None, use x_data (y_data). Defaults to None.
scatter_args (dict, optional) – Further arguments to pass as keyword arguments to the scatter function. Defaults to {}.
grid (bool, optional) – Whether to draw a grid in the plots. Defaults to True.
ax (array-like of matplotlib.axes.Axes or None, optional) – Axes to use for plotting. If None, a new figure with axes is created. Defaults to None.
- Returns:
fig (matplotlib.figure.Figure) – figure object used for plotting
ax (numpy.ndarray of mpl.axes.Axes) – axes objects of the plots
- sdt.fret.smfret_hist(track_data, data=('fret', 'eff'), frame=None, columns=2, size=5, xlim=(None, None), xlabel=None, ylabel=None, group_re=None, hist_args={}, ax=None)[source]#
Make histogram plots of multiple smFRET datasets
- Parameters:
track_data (dict of str: pandas.DataFrame) – dict keys are used to identify the smFRET datasets (dict values). The DataFrames have to have the same format as e.g. produced by
SmFRETTracker
.data (tuple of str, optional) – Column indices of data. Defaults to
("fret", "eff")
.y_data (tuple of str, optional) – Column indices of data. Defaults to
("fret", "eff")
.frame (int or None, optional) – If given, only plot data from a certain frame. Defaults to None.
columns (int, optional) – In how many columns to lay out plots. Defaults to 2.
size (int, optional) – Size per plot. Defaults to 5.
xlim (tuple of float, optional) – Set x axis limits. Defaults to
(None, None)
, i.e. automatic determination.xlabel (str or None, optional) – Label for x (y) axis. If None, use data for the x axis and
"# events"
on the y axis. Defaults to None.ylabel (str or None, optional) – Label for x (y) axis. If None, use data for the x axis and
"# events"
on the y axis. Defaults to None.group_re (tuple(str, int, int) or None, optional) – The first entry in the tuple should be a regular expression with at least two groups. Datasets from track_data will be plotted in the same plot if whichever regex group is specified by the second entry in the tuple is the same. The third entry identifies the label of the dataset in the plot. If None, do no grouping. Defaults to None.
hist_args (dict, optional) – Further arguments to pass as keyword arguments to the histogram plotting function. Defaults to {}.
ax (array-like of matplotlib.axes.Axes or None, optional) – Axes to use for plotting. If None, a new figure with axes is created. Defaults to None.
- Returns:
fig (matplotlib.figure.Figure) – figure object used for plotting
ax (numpy.ndarray of mpl.axes.Axes) – axes objects of the plots
- sdt.fret.draw_track(tracks, track_no, donor_img, acceptor_img, size, n_cols=8, figure=None, columns={})[source]#
Draw donor and acceptor images for a track
For each frame in a track, draw the raw image in the proximity of the feature localization.
Note: This is rather slow.
- Parameters:
tracks (pandas.DataFrame) – smFRET tracking data as e.g. produced by
SmFRETTracker
track_no (int) – Track/particle number
donor_img (list-like of numpy.ndarray) – Image sequences of the donor and acceptor channels
acceptor_img (list-like of numpy.ndarray) – Image sequences of the donor and acceptor channels
size (int) – For each feature, draw a square of
2 * size + 1
size pixels.n_cols (int, optional) – Arrange images in that many columns. Defaults to 8.
figure (matplotlib.figure.Figure or None, optional) – Use this figure to draw. If None, use
matplotlib.pyplot.gcf()
. Defaults to None.columns (dict, optional) – Override default column names as defined in
config.columns
. The only relevant name is coords. This means, if your DataFrame has coordinate columns “x” and “z”, setcolumns={"coords": ["x", "z"]}
.
- Returns:
The figure object used for plotting
- Return type:
matplotlib.figure.Figure
Helpers#
- sdt.fret.numeric_exc_type(df)[source]#
Context manager temporarily turning (“fret”, “exc_type”) column to int
This is useful e.g. in
helper.split_dataframe()
so that the resulting split array does not have object dtype due to (“fret”, “exc_type”) being categorical.Example
>>> tracks["fret", "exc_type"].dtype CategoricalDtype(categories=['a', 'd'], ordered=False) >>> with numeric_exc_type(tracks) as exc_cats: >>> tracks["fret", "exc_type"].dtype dtype('int64') >>> exc_cats[0] "a"
exc_cats
is an array that holds old categories. It can be used to find out which (new) integer corresponds to which categoryWhen leaving the
with
block, the old categorical column is restored. This works only for the original DataFrame, but not for any copies made within the block!- Parameters:
df (pandas.DataFrame) – Dataframe for which to temporarily use an integer (“fret”, “exc_type”) column.
- Yields:
pandas.Index – Maps integers to categories
- sdt.fret.gaussian_mixture_split(data, n_components, columns=[('fret', 'eff_app'), ('fret', 'stoi_app')], random_seed=0)[source]#
Fit Gaussian mixture model and predict component for each particle
First, all datapoints are used to fit a Gaussian mixture model. Then each particle is assigned the component in which most of its datapoints lie.
This requires scikit-learn (sklearn).
- Parameters:
data (DataFrame) – Single-molecule FRET data
n_components (int) – Number of components in the mixture
columns (Sequence[Tuple[str, str]]) – Which columns to fit.
random_seed (int) – Seed for the random number generator used to initialize the Gaussian mixture model fit.
- Returns:
Component label for each entry in data and mean values for each
component, one line per component.
- Return type:
Tuple[ndarray, ndarray]
- sdt.fret.apply_track_filters(tracks, include_negative=False, ignore=[], ret_type='data')[source]#
Apply filters to tracks
This removes all entries from tracks that have been marked as filtered via a
"filter"
column.- Parameters:
include_negative (bool) – If False, include only entries for which all
"filter"
column values are zero. If True, include also entries with negative"filter"
column values.ignore (str | Sequence[str]) –
"filter"
column(s) to ignore when deciding whether to include an entry or not. For instance, settingignore="bleach_step"
will not consider the("filter", "bleach_step")
column values.ret_type (str) – If
"data"
, return a copy of tracks excluding all entries that have been marked as filtered, i.e., that have a positive (or nonzero, see the include_negative parameter) entry in any"filter"
column. If"mask"
, return a boolean array indicating whether an entry is to be removed or nottracks (DataFrame) –
- Returns:
Copy of tracks with all filtered rows removed or corresponding boolean
mask.
- Return type:
DataFrame | array
References
Hellenkamp, B. et al.: “Precision and accuracy of single-molecule FRET measurements—a multi-laboratory benchmark study”, Nature methods, Nature Publishing Group, 2018, 15, 669