Single molecule FRET analysis#

The sdt.fret module provides functionality to analyze single molecule FRET data. This includes

Examples

Load data for tracking:

>>> chromatic_corr = multicolor.Registrator.load("cc.npz")
>>> donor_loc = io.load("donor.h5")
>>> acceptor_loc = io.load("acceptor.h5")
>>> donor_img = io.ImageSequence("donor.tif").open()
>>> acceptor_img = io.ImageSequence("acceptor.tif").open()

Tracking of single molecule FRET signals. This involves merging features from donor and acceptor channels, the actual tracking, getting the brightness of the donor and the acceptor for each localization from the raw images and much more.

>>> tracker = SmFRETTracker("dddda", chromatic_corr)
>>> trc = tracker.track(donor_img, acceptor_img, donor_loc, acceptor_loc)

Now these data can be analyzed and filtered. Calculate FRET-related quantities such as FRET efficiency, stoichiometry, etc.:

>>> ana = SmFRETAnalyzer(trc)
>>> ana.calc_fret_values()

Let us reject any tracks where the acceptor does not bleach in a single step:

>>> ana.acceptor_bleach_step("acceptor")

Remove any tracks where the mass upon acceptor excitation does not exceed 500 counts at least once

>>> ana.query_particles("fret_a_mass > 500", 1)

Accept only localizations that lie in pixels where the boolean mask is True:

>>> mask = numpy.load("mask.npy")
>>> ana.image_mask(mask, "donor")

Filtered data can be accessed via the SmFRETAnalyzer.tracks attribute.

Draw a scatter plot of FRET efficiency vs. stoichiometry:

>>> smfret_scatter({"data1": filt.tracks}, ("fret", "eff), ("fret", "stoi"))

Tracking#

class sdt.fret.SmFRETTracker(excitation_seq='da', registrator=None, link_radius=5, link_mem=1, min_length=1, feat_radius=4, bg_frame=2, bg_estimator='mean', neighbor_radius='auto', interpolate=True, coloc_dist=2.0, acceptor_channel=2, link_quiet=True, link_options={}, columns={})[source]#

Class for tracking of smFRET data

There is support for dumping and loading to/from YAML using sdt.io.yaml.

Parameters:
  • excitation_seq (str) – Set the excitation_seq attribute.

  • registrator (sdt.multicolor.registrator.Registrator) – Registrator used to overlay channels. If None, use the identity transform.

  • link_radius (float) – Maximum movement of features between frames. See search_range option of trackpy.link_df().

  • link_mem (int) – Maximum number of frames for which a feature may not be detected. See memory option of trackpy.link_df().

  • min_length (int) – Minimum length of tracks.

  • feat_radius (int) – Radius of circle that is a little larger than features. See radius option of brightness.from_raw_image().

  • bg_frame (int) – Size of frame around features for background determination. See bg_frame option of brightness.from_raw_image().

  • bg_estimator (str | Callable[[ndarray], float]) – Statistic to estimate background. See bg_estimator option of brightness.from_raw_image().

  • neighbor_radius (float) – How far two features may be apart while still being considered close enough so that one influences the brightness measurement of the other. This is related to the radius option of brightness.from_raw_image(). If “auto”, use the smallest value that avoids overlaps.

  • interpolate (bool) – Whether to interpolate coordinates of features that have been missed by the localization algorithm.

  • coloc_dist (float) – After overlaying donor and acceptor channel features, this gives the maximum distance up to which donor and acceptor signal are considered to come from the same molecule.

  • acceptor_channel (int) – Whether the acceptor channel is number 1 or 2 in registrator.

  • link_options (Dict) – Specify additional options to trackpy.link_df(). “search_range” and “memory” will be overwritten by the link_radius and link_mem parameters.

  • link_quiet (bool) – If True, call trackpy.quiet().

  • columns (Dict) – Override default column names as defined in config.columns. Relevant names are coords, time, mass, signal, bg, bg_dev. This means, if your DataFrame has coordinate columns “x” and “z” and the time column “alt_frame”, set columns={"coords": ["x", "z"], "time": "alt_frame"}. This parameters sets the columns attribute.

frame_selector: FrameSelector#

A FrameSelector instance with the matching excitation_seq.

registrator: Registrator#

multicolor.Registrator used to overlay channels

Options passed to trackpy.link_df()

min_length: int#

Minimum length of tracks

brightness_options: Dict#

Options passed to brightness.from_raw_image(). Make sure to adjust neighbor_radius if you change either the mask or the radius option!

interpolate: bool#

Whether to interpolate coordinates of features that have been missed by the localization algorithm.

coloc_dist: float#

After overlaying donor and acceptor channel features, this gives the maximum distance up to which donor and acceptor signal are considered to come from the same molecule.

acceptor_channel: int#

Can be either 1 or 2, depending the acceptor is the first or the second channel in registrator.

columns: Dict#

Column names in DataFrames. Defaults are taken from config.columns.

neighbor_radius: float#

How far two features may be apart while still being considered close enough so that one influences the brightness measurement of the other. This is related to the radius option of brightness.from_raw_image().

property excitation_seq: str#

Excitation sequence. “d” stands for donor, “a” for acceptor, anything else describes other kinds of frames which are irrelevant for tracking.

One needs only specify the shortest sequence that is repeated, i. e. “ddddaddddadddda” is the same as “dddda”.

track(donor_img, acceptor_img, donor_loc, acceptor_loc, d_mass=False)[source]#

Track smFRET data

Localization data for both the donor and the acceptor channel is merged (since a FRET construct has to be visible in at least one channel). The merged data is than linked into trajectories using py:func:trackpy.link_df. For this the trackpy package needs to be installed. Additionally, the feature brightness is determined for both donor and acceptor for raw image data using brightness.from_raw_image(). These data are written into a a pandas.DataFrame whose columns have a MultiIndex containing the “donor” and “acceptor” items in the top level.

A column specifying whether the entry originates from donor or acceptor excitation is also added: (“fret”, “exc_type”). It is “d” for donor and “a” for acceptor excitation; see the flag_excitation_type() method.

Parameters:
  • donor_img (Sequence[ndarray]) – Raw image frames for donor and acceptor channel. This need to be of type list, but anything that returns image data when indexed with a frame number will do.

  • acceptor_img (Sequence[ndarray]) – Raw image frames for donor and acceptor channel. This need to be of type list, but anything that returns image data when indexed with a frame number will do.

  • donor_loc (DataFrame) – Localization data for donor and acceptor channel

  • acceptor_loc (DataFrame) – Localization data for donor and acceptor channel

  • d_mass (bool) –

    If True, get total brightness upon donor excitation by from the sum of donor and acceptor image. If False, the donor excitation brightness can still be calculated as the sum of donor and acceptor brightness in analyze(). Defaults to False.

    Note: Until slicerator with support for multiple inputs to pipelines is released, setting this to True will load all of donor_img and acceptor_img into memory, even if pims is used.

Returns:

  • The columns are indexed with a pandas.MultiIndex.

  • The top index level consists of “donor” (tracking data for the

  • donor channel), “acceptor” (tracking data for the acceptor

  • channel), and “fret”. The latter contains a column with the

  • particle number (“particle”), an indicator (0 / 1) whether there

  • is a near neighbor (“has_neighbor”), and an indicator whether the

  • data point was interpolated (“interp”) because it was not in the

  • localization data in either channel.

Return type:

DataFrame

flag_excitation_type(tracks)[source]#

Add a column indicating excitation type (donor/acceptor/…)

Add (“fret”, “exc_type”) column in place. It is of “category” type.

Parameters:

tracks (DataFrame) – Result of track()

classmethod to_yaml(dumper, data)[source]#

Dump as YAML

Pass this as the representer parameter to yaml.Dumper.add_representer()

classmethod from_yaml(loader, node)[source]#

Construct from YAML

Pass this as the constructor parameter to yaml.Loader.add_constructor()

Analysis and Filtering#

class sdt.fret.SmFRETAnalyzer(tracks, cp_detector=None, copy=False, reset_filters=True, keep_filters=[], columns={})[source]#

Class for analyzing and filtering of smFRET data

This provides various analysis and filtering methods which act on the tracks attribute.

Correction of FRET efficiencies and stoichiometries for donor leakage and direct acceptor excitation is implemented according to [Hell2018], while different detection efficiencies for the different fluorophores are accounted for as described in [MacC2010] as well the different excitation efficiencies according to [Lee2005].

Parameters:
  • tracks (pandas.core.frame.DataFrame) – smFRET tracking data as produced by SmFRETTracker by running its SmFRETTracker.track() method.

  • cp_detector (Any) – Changepoint detetctor. If None, create a changepoint.Pelt instance with model="l2", min_size=1, and jump=1.

  • copy (bool) – If True, copy tracks to the tracks attribute.

  • reset_filters (bool) – If True, reset filters in tracks. See also reset_filters().

  • keep_filters (str | Iterable[str]) – Filter columns to keep upon resetting. See also reset_filters().

  • columns (Dict) – Override default column names as defined in config.columns. Relevant names are coords and time. This means, if your DataFrame has coordinate columns “x” and “z” and the time column “alt_frame”, set columns={"coords": ["x", "z"], "time": "alt_frame"}. This parameters sets the columns attribute.

bleach_thresh: Sequence[float] = (500.0, 500.0)#

Intensity (mass) thresholds upon donor and acceptor excitation, respecitively, below which a signal is considered bleached. Used for bleaching step analysis.

tracks: DataFrame#

smFRET tracking data

cp_detector: Any#

Changepoint detector class instance used to perform bleching step detection.

columns: Dict#

Column names to use in DataFrames. See config.set_columns() for details.

leakage: float#

Correction factor for donor leakage into the acceptor channel; \(\alpha\) in [Hell2018]

direct_excitation: float#

Correction factor for direct acceptor excitation by the donor laser; \(\delta\) in [Hell2018]

detection_eff: float#

Correction factor(s) for the detection efficiency difference beteen donor and acceptor fluorophore; \(\gamma\) in [Hell2018].

Can be a scalar for global correction or a pandas.Series for individual correction. In the latter case, the index is the particle number and the value is the correction factor.

excitation_eff: float#

Correction factor(s) for the excitation efficiency difference beteen donor and acceptor fluorophore; \(\beta\) in [Hell2018].

reset_filters(keep=[])[source]#

Reset filters

This drops filter columns from tracks.

Parameters:

keep (str | Iterable[str]) – Filter column name(s) to keep

calc_fret_values(keep_d_mass=False, invalid_nan=True, a_mass_interp='nearest-up', skip_neighbors=True)[source]#

Calculate FRET-related values

This needs to be called before the filtering methods and before calculating the true FRET efficiencies and stoichiometries. However, any corrections to the donor and acceptor localization data (such as flatfield_correction()) need to be done before this.

Calculated values apparent FRET efficiencies and stoichiometries, the total brightness (mass) upon donor excitation, and the acceptor brightness (mass) upon direct excitation, which is interpolated for donor excitation datapoints in order to allow for calculation of stoichiometries.

For each localization in tracks, the total brightness upon donor excitation is calculated by taking the sum of ("donor", "mass") and ("acceptor", "mass") values. It is added as a ("fret", "d_mass") column to the tracks DataFrame. The apparent FRET efficiency (acceptor brightness (mass) divided by sum of donor and acceptor brightnesses) is added as a ("fret", "eff_app") column to the tracks DataFrame.

The apparent stoichiometry value \(S_\text{app}\) is given as

\[S_\text{app} = \frac{I_{DD} + I_{DA}}{I_{DD} + I_{DA} + I_{AA}}\]

as in [Hell2018]. \(I_{DD}\) is the donor brightness upon donor excitation, \(I_{DA}\) is the acceptor brightness upon donor excitation, and \(I_{AA}\) is the acceptor brightness upon acceptor excitation. The latter is calculated by interpolation for frames with donor excitation.

\(I_{AA}\) is append as a ("fret", "a_mass") column. The stoichiometry value is added in the ("fret", "stoi_app") column.

Parameters:
  • keep_d_mass (bool) – If a ("fret", "d_mass") column is already present in tracks, use that instead of overwriting it with the sum of ("donor", "mass") and ("acceptor", "mass") values. Useful if track() was called with d_mass=True.

  • invalid_nan (bool) – If True, all “d_mass”, “eff_app”, and “stoi_app” values for excitation types other than donor excitation are set to NaN, since the values don’t make sense.

  • a_mass_interp (str) – How to interpolate the acceptor mass upon direct excitation in donor excitation frames. Sensible values are “linear” for linear interpolation; “nearest” to take the value of the closest direct acceptor excitation frame (using the previous frame in case of a tie); “nearest-up”, which is similar to “nearest” but takes the next frame in case of a tie; “next” and “previous” to use the next and previous frames, respectively.

  • skip_neighbors (bool) – If True, skip localizations where ("fret", "has_neighbor") is True when interpolating acceptor mass upon direct excitation.

apply_filters(include_negative=False, ignore=[], ret_type='data')[source]#

Apply filters to tracks

This removes all entries from tracks that have been marked as filtered using e.g. query(), query_particle(), bleach_step(), and image_mask().

Parameters:
  • include_negative (bool) – If False, include only entries for which all "filter" column values are zero. If True, include also entries with negative "filter" column values.

  • ignore (str | Sequence[str]) – "filter" column(s) to ignore when deciding whether to include an entry or not. For instance, setting ignore="bleach_step" will not consider the ("filter", "bleach_step") column values.

  • ret_type (str) – If "data", return a copy of tracks excluding all entries that have been marked as filtered, i.e., that have a positive (or nonzero, see the include_negative parameter) entry in any "filter" column. If "mask", return a boolean array indicating whether an entry is to be removed or not

Returns:

  • Copy of tracks with all filtered rows removed or

  • corresponding boolean mask.

Return type:

DataFrame | array

mass_changepoints(channel, stats='median', stat_margin=1, **kwargs)[source]#

Segment tracks by changepoint detection in brightness time trace

Changepoint detection is run on the donor or acceptor brightness (mass) time trace, depending on the channels argument. This appends py:attr:tracks with a ("fret", "d_seg") or (“fret”, “a_seg”)` column for donor or acceptor, resp. For each localization, this holds the number of the segment it belongs to. Furthermore, statistics (such as median brightness) can/should be calculated, which can later be used to analyze stepwise bleaching (see bleach_step()).

:py:attr:`tracks` will be sorted according to ``(“fret”, “particle”)`` and ``(“donor”, self.columns[“time”])`` in the process.

Parameters:
  • channel (str) – In which channel ("donor" or "acceptor") to perform changepoint detection.

  • stats (Callable | str | Iterable[Callable | str]) – Statistics to calculate for each track segment. For each entry s, a column named "{channel}_seg_{s}" is appendend, where channel is d for donor and a for acceptor. s can be the name of a numpy function or a callable returning a statistic, such as numpy.mean().

  • stat_margin (int) – Number of data points around a changepoint to exclude from statistics calculation. This can prevent bias in the statistics due to recording a bleaching event in progress.

  • **kwargs – Keyword arguments to pass to cp_detector find_changepoints() method.

Examples

Pass penalty=1e6 to the changepoint detector’s find_changepoints method, perform detection both channels:

>>> ana.mass_changepoints("donor", penalty=1e6)
>>> ana.mass_changepoints("acceptor", penalty=1e6)
bleach_step(condition='donor or acceptor', stat='median', reason='bleach_step')[source]#

Find tracks with acceptable fluorophore bleaching behavior

What “acceptable” means is specified by the condition parameter.

("fret", "d_seg"), ("fret", f"d_seg_{stat}"), ("fret", "a_seg"), and ("fret", f"a_seg_{stat}") (where {stat} is replaced by the value of the stat parameter) columns need to be present in tracks for this to work, which can be achieved by performing changepoint in both channels using segment_mass().

The donor considered bleached if its ("fret", f"d_seg_{stat}") is below bleach_thresh [0]. The acceptor considered bleached if its ("fret", f"a_seg_{stat}") is below bleach_thresh [0]

Parameters:
  • condition (str) – If "donor", accept only tracks where the donor bleaches in a single step and the acceptor shows either no bleach step or completely bleaches in a single step. Likewise, "acceptor" will accept only tracks where the acceptor bleaches fully in one step and the donor shows no partial bleaching. donor or acceptor requires that one channel bleaches in a single step while the other either also bleaches in one step or not at all (no partial bleaching). If "no partial", there may be no partial bleaching, but bleaching is not required.

  • stat (str) – Statistic to use to determine bleaching steps. Has to be one that was passed to via stats parameter to mass_changepoints().

  • reason (str) – Filtering reason / column name to use.

Examples

Consider acceptors with a brightness ("fret", "a_mass") of less than 500 counts and donors with a brightness ("fret", "d_mass") of less than 800 counts bleached. Remove all tracks that don’t show acceptable bleaching behavior.

>>> ana.bleach_thresh = (800, 500)
>>> ana.bleach_step("donor or acceptor")
query(expr, mi_sep='_', reason='query')[source]#

Filter features according to column values

Flatten the column MultiIndex and filter the resulting DataFrame’s eval method.

Parameters:
  • expr (str) – Filter expression. See pandas.DataFrame.eval() for details.

  • mi_sep (str) – Separate multi-index levels by this character / string.

  • reason (str) – Filtering reason / column name to use.

Examples

Remove lines where (“fret”, “a_mass”) <= 500 from tracks

>>> filt.query("fret_a_mass > 500")
query_particles(expr, min_abs=1, min_rel=0.0, mi_sep='_', reason='query_p')[source]#

Remove particles that don’t fulfill expr enough times

Any particle that does not fulfill expr at least min_abs times AND during at least a fraction of min_rel of its length is removed from tracks.

The column MultiIndex is flattened for this purpose.

Parameters:
  • expr (str) – Filter expression. See pandas.DataFrame.eval() for details.

  • min_abs (int) – Minimum number of times a particle has to fulfill expr. If negative, this means “all but abs(min_abs)”. If 0, it has to be fulfilled in all frames.

  • min_rel (float) – Minimum fraction of data points that have to fulfill expr for a particle not to be removed.

  • mi_sep (str) – Use this to separate levels when flattening the column MultiIndex. Defaults to “_”.

  • reason (str) – Filtering reason / column name to use.

Examples

Remove any particles where not (“fret”, “a_mass”) > 500 at least twice from tracks.

>>> filt.query_particles("fret_a_mass > 500", 2)

Remove any particles where (“fret”, “a_mass”) <= 500 in more than one frame:

>>> filt.query_particles("fret_a_mass > 500", -1)

Remove any particle where not (“fret”, “a_mass”) > 500 for at least 75 % of the particle’s data points, with a minimum of two data points:

>>> filt.query_particles("fret_a_mass > 500", 2, min_rel=0.75)
image_mask(mask, channel, reason='image_mask')[source]#

Filter using a boolean mask image

Remove all lines where coordinates lie in a region where mask is False.

Parameters:
  • mask (ndarray | List[Dict]) –

    Mask image(s). If this is a single array, apply it to the whole tracks DataFrame.

    This can also be a list of dicts, where each dict d has to have a “key” and a “mask” (ndarray) entry. Then each d["mask"] is applied separately to the corresponding self.tracks.loc[d["key"]]. Additionally, the dicts may also have “start” and “stop” entries, in which case the mask will be applied only to datapoints with frames greater or equal start and less than stop; all others will be discarded. With this it is possible to apply multiple masks to the same key depending on the frame number.

  • channel (str) – Channel to use for the filtering

  • reason (str) – Filtering reason / column name to use.

Examples

Create a 2D boolean mask to remove any features that do not have x and y coordinates between 50 and 100 in the donor channel.

>>> mask = numpy.zeros((200, 200), dtype=bool)
>>> mask[50:100, 50:100] = True
>>> filt.image_mask(mask, "donor")

If tracks has a MultiIndex index, where e.g. the first level is “file1”, “file2”, … and different masks should be applied for each file, this is possible by passing a list of dicts. Furthermore, we can apply one mask to all frames up to 100 and another to the rest:

>>> masks = []
>>> for i in range(1, 10):
>>>     masks.append({"key": "file%i" % i,
...                   "mask": numpy.ones((10 * i, 10 * i), dtype=bool),
...                   "stop": 100})
>>>     masks.append({"key": "file%i" % i,
...                   "mask": numpy.ones((20 * i, 20 * i), dtype=bool),
...                   "start": 100})
>>> filt.image_mask(masks, "donor")
flatfield_correction(donor_corr, acceptor_corr)[source]#

Apply flatfield correction to donor and acceptor localization data

If present, donor and acceptor f"{self.columns['mass']}_pre_flat" and f"{self.columns['signal']}_pre_flat" columns are used as inputs for flatfield correction, results are written to self.columns["mass"] and self.columns[“signal”]`. Otherwise, self.columns["mass"] and self.columns[“signal”]` are copied to f"{self.columns['mass']}_pre_flat" and f"{self.columns['signal']}_pre_flat" first.

Any values derived from those (e.g., apparent FRET efficiencies) need to be recalculated manually.

Parameters:
  • donor_corr (Corrector) – Corrector instances for donor and acceptor channel, respectivey.

  • acceptor_corr (Corrector) – Corrector instances for donor and acceptor channel, respectivey.

calc_leakage()[source]#

Calculate donor leakage (bleed-through) into the acceptor channel

For this to work, tracks must be a dataset of donor-only molecules. In this case, the leakage \(alpha\) can be computed using the formula [Hell2018]

\[\alpha = \frac{\langle E_\text{app}\rangle}{1 - \langle E_\text{app}\rangle},\]

where \(\langle E_\text{app}\rangle\) is the mean apparent FRET efficiency of a donor-only population.

The leakage \(\alpha\) together with the direct acceptor excitation \(\delta\) can be used to calculate the real fluorescence due to FRET,

\[F_\text{DA} = I_\text{DA} - \alpha I_\text{DD} - \delta I_\text{AA}.\]

This sets the leakage attribute. See fret_correction() for how use this to calculate corrected FRET values.

calc_direct_excitation()[source]#

Calculate direct acceptor excitation by the donor laser

For this to work, tracks must be a dataset of acceptor-only molecules. In this case, the direct acceptor excitation \(delta\) can be computed using the formula [Hell2018]

\[\alpha = \frac{\langle S_\text{app}\rangle}{1 - \langle S_\text{app}\rangle},\]

where \(\langle ES\text{app}\rangle\) is the mean apparent FRET stoichiometry of an acceptor-only population.

The leakage \(\alpha\) together with the direct acceptor excitation \(\delta\) can be used to calculate the real fluorescence due to FRET,

\[F_\text{DA} = I_\text{DA} - \alpha I_\text{DD} - \delta I_\text{AA}.\]

This sets the direct_excitation attribute. See fret_correction() for how use this to calculate corrected FRET values.

calc_detection_eff(min_seg_len=5, how=<function nanmedian>, stat=<function median>)[source]#

Calculate detection efficiency ratio of dyes

The detection efficiency ratio is the ratio of decrease in acceptor brightness to the increase in donor brightness upon acceptor photobleaching [MacC2010]:

\[\gamma = \frac{\langle I_\text{DA}^\text{pre}\rangle - \langle I_\text{DA}^\text{post}\rangle}{ \langle I_\text{DD}^\text{post}\rangle - \langle I_\text{DD}^\text{pre}\rangle}\]

This needs molecules with exactly one donor and one acceptor fluorophore to work. Tracks need to be segmented already (see segment_a_mass()).

The correction can be calculated for each track individually or some statistic (e.g. the median) of the indivdual \(gamma\) values can be used as a global correction factor for all tracks.

The detection efficiency \(\gamma\) can be used to calculate the real fluorescence of the donor fluorophore,

\[F_\text{DD} = \gamma I_\text{DD}.\]

This sets the detection_eff attribute. See fret_correction() for how use this to calculate corrected FRET values.

Parameters:
  • min_seg_len (int) – How many data points need to be present before and after the bleach step to ensure a reliable calculation of the mean intensities. If there are fewer data points, a value of NaN will be assigned.

  • how (Callable | str) – If “individual”, the \(\gamma\) value for each track will be stored and used to correct the values individually when calling fret_correction(). If a function, apply this function to the \(\gamma\) array and its return value as a global correction factor. A sensible example for such a function would be numpy.nanmean(). Beware that some \(\gamma\) may be NaN.

  • stat (Callable | str) – Statistic to use to determine bleaching steps. If a string, it has to be the name of a function from numpy.

calc_excitation_eff(n_components=1, component=0, random_seed=0)[source]#

Calculate excitation efficiency ratio of dyes

This is a measure of how efficient the direct acceptor excitation is compared to the donor excitation. It depends on the fluorophores and also on the excitation laser intensities.

It can be calculated using the formula [Lee2005]

\[\beta = \frac{1 - \langle S_\gamma \rangle}{ \langle S_\gamma\rangle},\]

where \(S_\gamma\) is calculated like the apparent stoichiometry, but with the donor and acceptor fluorescence upon donor excitation already corrected using the leakage, direct excitation, and detection efficiency factors.

This needs molecules with exactly one donor and one acceptor fluorophore to work. Tracks need to be segmented already (see segment_a_mass()). The leakage, direct_excitation, and detection_eff attributes need to be set correctly.

The excitation efficiency \(\beta\) can be used to correct the acceptor fluorescence upon acceptor excitation,

\[F_\text{AA} = I_\text{AA} / \beta.\]

This sets the excitation_eff attribute. See fret_correction() for how use this to calculate corrected FRET values.

Parameters:
  • n_components (int) – If > 1, perform a Gaussian mixture fit on the 2D apparent efficiency-vs.-stoichiomtry dataset. This helps to choose only the correct component with one donor and one acceptor. Defaults to 1.

  • component (int) – If n_components > 1, use this to choos the component number. Components are ordered according to decreasing mean apparent FRET efficiency. gaussian_mixture_split() can be used to check which component is the desired one. Defaults to 0.

  • random_seed (int) – Seed for the random number generator used to initialize the Gaussian mixture model fit.

calc_detection_excitation_effs(n_components, components=None, random_seed=0)[source]#

Get detection and excitation efficiency from multi-state sample

States are found in efficiency-vs.-stoichiometry space using a Gaussian mixture fit. Detection efficiency factor \(\gamma\) and excitation efficiency factor \(\delta\) are found performing a linear fit to the equation

\[S^{-1} = 1 + \beta\gamma + (1 - \gamma)\beta E\]

to the Gaussian mixture fit results, where \(S\) are the components’ mean stoichiometries (corrected for leakage and direct excitation) and \(E\) are the corresponding FRET efficiencies (also corrected for leakage and direct excitation) [Hell2018].

Parameters:
  • n_components (int) – Number of components for Gaussian mixture model

  • components (Sequence[int] | None) – List of indices of components to use for the linear fit. If None, use all.

  • random_seed (int) – Seed for the random number generator used to initialize the Gaussian mixture model fit.

fret_correction(invalid_nan=True)[source]#

Apply corrections to calculate real FRET-related values

By correcting the measured acceptor and donor intensities upon donor excitation (\(I_\text{DA}\) and \(I_\text{DD}\)) and acceptor intensity upon acceptor excitation (\(I_\text{AA}\)) for donor leakage into the acceptor channel \(\alpha\), acceptor excitation by the donor laser \(\delta\), detection efficiencies \(\gamma\), and excitation efficiencies \(\beta\) using [Hell2018]

\[\begin{split}F_\text{DA} &= I_\text{DA} - \alpha I_\text{DD} - \delta I_\text{AA} \\ F_\text{DD} &= \gamma I_\text{DD} \\ F_\text{AA} &= I_\text{AA} / \beta\end{split}\]

the real FRET efficiency and stoichiometry values can be calculated:

\[\begin{split}E &= \frac{F_\text{DA}}{F_\text{DA} + F_\text{DD}} \\ S &= \frac{F_\text{DA} + F_\text{DD}}{F_\text{DA} + F_\text{DD} + F_\text{AA}}\end{split}\]

\(F_\text{DA}\) will be appended to tracks as the ("fret", "f_da") column; \(F_\text{DD}\) as ("fret", "f_dd"); \(F_\text{DA}\) as ("fret", "f_aa"); \(E\) as ("fret", "eff"); and \(S\) as ("fret", "stoi").

Parameters:

invalid_nan (bool) – If True, all “eff”, and “stoi” values for excitation types other than donor excitation are set to NaN, since the values don’t make sense. Defaults to True.

Plotting#

sdt.fret.smfret_scatter(track_data, xdata=('fret', 'eff'), ydata=('fret', 'stoi'), frame=None, columns=2, size=5, xlim=(None, None), ylim=(None, None), xlabel=None, ylabel=None, scatter_args={}, grid=True, ax=None)[source]#

Make scatter plots of multiple smFRET datasets

Parameters:
  • track_data (dict of str: pandas.DataFrame) – dict keys are used to identify the smFRET datasets (dict values). The DataFrames have to have the same format as e.g. produced by SmFRETTracker.

  • x_data (tuple of str, optional) – Column indices of data to plot on the x (y) axis. Defaults to ("fret", "eff") for x_data and ("fret", "stoi") for y_data.

  • y_data (tuple of str, optional) – Column indices of data to plot on the x (y) axis. Defaults to ("fret", "eff") for x_data and ("fret", "stoi") for y_data.

  • frame (int or None, optional) – If given, only plot data from a certain frame. Defaults to None.

  • columns (int, optional) – In how many columns to lay out plots. Defaults to 2.

  • size (int, optional) – Size per plot. Defaults to 5.

  • xlim (tuple of float, optional) – Set x (y) axis limits. Defaults to (None, None), i.e. automatic determination.

  • ylim (tuple of float, optional) – Set x (y) axis limits. Defaults to (None, None), i.e. automatic determination.

  • xlabel (str or None, optional) – Label for x (y) axis. If None, use x_data (y_data). Defaults to None.

  • ylabel (str or None, optional) – Label for x (y) axis. If None, use x_data (y_data). Defaults to None.

  • scatter_args (dict, optional) – Further arguments to pass as keyword arguments to the scatter function. Defaults to {}.

  • grid (bool, optional) – Whether to draw a grid in the plots. Defaults to True.

  • ax (array-like of matplotlib.axes.Axes or None, optional) – Axes to use for plotting. If None, a new figure with axes is created. Defaults to None.

Returns:

  • fig (matplotlib.figure.Figure) – figure object used for plotting

  • ax (numpy.ndarray of mpl.axes.Axes) – axes objects of the plots

sdt.fret.smfret_hist(track_data, data=('fret', 'eff'), frame=None, columns=2, size=5, xlim=(None, None), xlabel=None, ylabel=None, group_re=None, hist_args={}, ax=None)[source]#

Make histogram plots of multiple smFRET datasets

Parameters:
  • track_data (dict of str: pandas.DataFrame) – dict keys are used to identify the smFRET datasets (dict values). The DataFrames have to have the same format as e.g. produced by SmFRETTracker.

  • data (tuple of str, optional) – Column indices of data. Defaults to ("fret", "eff").

  • y_data (tuple of str, optional) – Column indices of data. Defaults to ("fret", "eff").

  • frame (int or None, optional) – If given, only plot data from a certain frame. Defaults to None.

  • columns (int, optional) – In how many columns to lay out plots. Defaults to 2.

  • size (int, optional) – Size per plot. Defaults to 5.

  • xlim (tuple of float, optional) – Set x axis limits. Defaults to (None, None), i.e. automatic determination.

  • xlabel (str or None, optional) – Label for x (y) axis. If None, use data for the x axis and "# events" on the y axis. Defaults to None.

  • ylabel (str or None, optional) – Label for x (y) axis. If None, use data for the x axis and "# events" on the y axis. Defaults to None.

  • group_re (tuple(str, int, int) or None, optional) – The first entry in the tuple should be a regular expression with at least two groups. Datasets from track_data will be plotted in the same plot if whichever regex group is specified by the second entry in the tuple is the same. The third entry identifies the label of the dataset in the plot. If None, do no grouping. Defaults to None.

  • hist_args (dict, optional) – Further arguments to pass as keyword arguments to the histogram plotting function. Defaults to {}.

  • ax (array-like of matplotlib.axes.Axes or None, optional) – Axes to use for plotting. If None, a new figure with axes is created. Defaults to None.

Returns:

  • fig (matplotlib.figure.Figure) – figure object used for plotting

  • ax (numpy.ndarray of mpl.axes.Axes) – axes objects of the plots

sdt.fret.draw_track(tracks, track_no, donor_img, acceptor_img, size, n_cols=8, figure=None, columns={})[source]#

Draw donor and acceptor images for a track

For each frame in a track, draw the raw image in the proximity of the feature localization.

Note: This is rather slow.

Parameters:
  • tracks (pandas.DataFrame) – smFRET tracking data as e.g. produced by SmFRETTracker

  • track_no (int) – Track/particle number

  • donor_img (list-like of numpy.ndarray) – Image sequences of the donor and acceptor channels

  • acceptor_img (list-like of numpy.ndarray) – Image sequences of the donor and acceptor channels

  • size (int) – For each feature, draw a square of 2 * size + 1 size pixels.

  • n_cols (int, optional) – Arrange images in that many columns. Defaults to 8.

  • figure (matplotlib.figure.Figure or None, optional) – Use this figure to draw. If None, use matplotlib.pyplot.gcf(). Defaults to None.

  • columns (dict, optional) – Override default column names as defined in config.columns. The only relevant name is coords. This means, if your DataFrame has coordinate columns “x” and “z”, set columns={"coords": ["x", "z"]}.

Returns:

The figure object used for plotting

Return type:

matplotlib.figure.Figure

Helpers#

sdt.fret.numeric_exc_type(df)[source]#

Context manager temporarily turning (“fret”, “exc_type”) column to int

This is useful e.g. in helper.split_dataframe() so that the resulting split array does not have object dtype due to (“fret”, “exc_type”) being categorical.

Example

>>> tracks["fret", "exc_type"].dtype
CategoricalDtype(categories=['a', 'd'], ordered=False)
>>> with numeric_exc_type(tracks) as exc_cats:
>>>     tracks["fret", "exc_type"].dtype
dtype('int64')
>>>     exc_cats[0]
"a"

exc_cats is an array that holds old categories. It can be used to find out which (new) integer corresponds to which category

When leaving the with block, the old categorical column is restored. This works only for the original DataFrame, but not for any copies made within the block!

Parameters:

df (pandas.DataFrame) – Dataframe for which to temporarily use an integer (“fret”, “exc_type”) column.

Yields:

pandas.Index – Maps integers to categories

sdt.fret.gaussian_mixture_split(data, n_components, columns=[('fret', 'eff_app'), ('fret', 'stoi_app')], random_seed=0)[source]#

Fit Gaussian mixture model and predict component for each particle

First, all datapoints are used to fit a Gaussian mixture model. Then each particle is assigned the component in which most of its datapoints lie.

This requires scikit-learn (sklearn).

Parameters:
  • data (DataFrame) – Single-molecule FRET data

  • n_components (int) – Number of components in the mixture

  • columns (Sequence[Tuple[str, str]]) – Which columns to fit.

  • random_seed (int) – Seed for the random number generator used to initialize the Gaussian mixture model fit.

Returns:

  • Component label for each entry in data and mean values for each

  • component, one line per component.

Return type:

Tuple[ndarray, ndarray]

sdt.fret.apply_track_filters(tracks, include_negative=False, ignore=[], ret_type='data')[source]#

Apply filters to tracks

This removes all entries from tracks that have been marked as filtered via a "filter" column.

Parameters:
  • include_negative (bool) – If False, include only entries for which all "filter" column values are zero. If True, include also entries with negative "filter" column values.

  • ignore (str | Sequence[str]) – "filter" column(s) to ignore when deciding whether to include an entry or not. For instance, setting ignore="bleach_step" will not consider the ("filter", "bleach_step") column values.

  • ret_type (str) – If "data", return a copy of tracks excluding all entries that have been marked as filtered, i.e., that have a positive (or nonzero, see the include_negative parameter) entry in any "filter" column. If "mask", return a boolean array indicating whether an entry is to be removed or not

  • tracks (DataFrame) –

Returns:

  • Copy of tracks with all filtered rows removed or corresponding boolean

  • mask.

Return type:

DataFrame | array

References

[Hell2018] (1,2,3,4,5,6,7,8)

Hellenkamp, B. et al.: “Precision and accuracy of single-molecule FRET measurements—a multi-laboratory benchmark study”, Nature methods, Nature Publishing Group, 2018, 15, 669

[MacC2010] (1,2)

McCann, J. J. et al.: “Optimizing Methods to Recover Absolute FRET Efficiency from Immobilized Single Molecules” Biophysical Journal, Elsevier BV, 2010, 99, 961–970

[Lee2005] (1,2)

Lee, N. et al.: “Accurate FRET Measurements within Single Diffusing Biomolecules Using Alternating-Laser Excitation”, Biophysical Journal, Elsevier BV, 2005, 88, 2939–2953