Data input/output

Data input/output#

sdt.io provides convenient ways to save and load all kinds of data.

Image sequences can be saved as multi-page TIFF files with help of save_as_tiff(), including metadata. Using the SdtTiffStack package, these files can be easily read again.
There is support for reading single molecule data as produced by the sdt package and various MATLAB tools using the load() function. Most data formats can be written by save().
Further, there are helpers for common filesystem-related tasks, such as the chdir() and get_files().
YAML is a way of data in a both human- and machine-readable way. The sdt.io.yaml submodule extends PyYAML to give a nice representation of numpy.ndarrays. Further, it provides a mechanism to easily add representations for custom data types.

sdt.io.yaml has support for ROI types from sdt.roi, slice, OrderedDict, numpy.ndarray.

Examples

Open an image sequence. Make subststacks without actually loading any data. Only load data when accessing single frames.

>>> seq = ImageSequence("images.SPE").open()
>>> len(seq)
100
>>> seq2 = seq[::2]  # No data is loaded here
>>> len(seq2)
50
>>> frame = seq2[1]  # Load frame 1 (i.e., frame 2 in the original `seq`)
>>> frame.shape
(100, 150)
>>> seq.close()

Save an image sequence to a TIFF file using save_as_tiff():

>>> with ImageSequence("images.SPE") as seq:
...     save_as_tiff(seq, "images.tif")
>>> seq = [frame1, frame2, frame2]  # list of arrays representing images
>>> save_as_tiff(seq, "images2.tif")

load() supports many types of single molecule data into pandas.DataFrame

>>> d1 = load("features.h5")
>>> d1.head()
           x          y      signal          bg         mass      size  frame
0  97.333295  61.423270  252.900938  217.345552  1960.274055  1.110691      0
1  60.857730  82.120585  315.317311  229.205847   724.322652  0.604647      0
2  83.271210   6.144862  215.995479  224.119462   911.167854  0.819383      0
3   8.354563  33.013809  177.990405  216.341051  1284.869645  1.071868      0
4  46.215290  40.053183  207.207850  219.746090  1719.788381  1.149329      0
>>> d2 = load("tracks.trc")
>>> d2.head()
           x          y       mass  frame  particle
0  14.328209  53.256334  17558.629    1.0       0.0
1  14.189825  53.204634  17850.164    2.0       0.0
2  14.371586  53.391367  18323.903    3.0       0.0
3  14.363836  53.415152  16024.740    4.0       0.0
4  14.528098  53.242159  14341.417    5.0       0.0
>>> d3 = load("features.pkc")
>>> d3.head()
           x           y      size         mass          bg      bg_dev  frame
0  39.888750   97.023047  1.123692  8332.624410  506.853598  102.278242    0.0
1  41.918963  102.717941  1.062784  8197.686482  306.632393  126.153321    0.0
2  38.584142   66.143237  0.883132  7314.566544  273.506181   29.597416    0.0
3  68.595091   96.649889  0.904778  6837.369352  275.512017   29.935145    0.0
4  55.593909  109.955202  1.094519  7331.581064  279.787186   38.772275    0.0

Single molecule data can be saved in various formats using save():

>>> save("output.h5", d1)
>>> save("output.trc", d2)

Temporarily change the working directory using chdir():

>>> with chdir("subdir"):
...     # here the working directory is "subdir"
>>> # here we are back

Recursively search files matching a regular expression in a subdirectory by means of get_files():

>>> names, ids = get_files(r"^image_.*_(\d{3}).tif$", "subdir")
>>> names
['image_xxx_001.tif', 'image_xxx_002.tif', 'image_yyy_003.tif']
>>> ids
[(1,), (2,), (3,)]

sdt.io.yaml extends PyYAML’s yaml package and can be used in place of it:

>>> from io import StringIO  # standard library io, not sdt.io
>>> sio = StringIO()  # write to StringIO instead of a file
>>> from sdt.io import yaml
>>> a = numpy.arange(10).reshape((2, -1))  # example data to be dumped
>>> yaml.safe_dump(a, sio)  # sdt.io.yaml.safe_dump instead of PyYAML safe_dump
>>> print(sio.getvalue())
!array
[[0, 1, 2, 3, 4],
 [5, 6, 7, 8, 9]]

Image files#

class sdt.io.ImageSequence(uri, **kwargs)[source]#

Sliceable, lazy-loading interface to multi-image files

Single images can be retrieved by index, while substacks can be created by slicing and fancy indexing using lists/arrays of indices or boolean indices. Creating substacks does not load data into memory, allowing for dealing with containing many images.

Examples

Load 3rd frame:

>>> with ImageSequence("some_file.tif") as stack:
...     img = stack[3]

Use fancy indexing to create substacks:

>>> stack = ImageSequence("some_file.tif").open()
>>> len(stack)
30
>>> substack1 = stack[1::2]  # Slice, will not load any data
>>> len(substack2)
15
>>> np.all(substack2[1] == stack[3])  # Actually load data using int index
True
>>> substack2 = stack[[3, 5]]  # Create lazy substack using list of indices
>>> substack3 = stack[[True, False] * len(stack) // 2]  # or boolean index
>>> seq.close()

Parameters:

uri (str | pathlib.Path | bytes | IO) – File or file location or data to read from.
format – File format. Use None for automatic detection.
**kwargs – Keyword arguments passed to imageio.v3.imopen() when opening the file.

property is_slice: bool#: Whether this instance is the result of slicing another instance

uri: str | Path | bytes | IO#: File or file location or data to read from.

reader_args: Mapping#: Keyword arguments passed to imageio.v3.imopen() when opening file.

open()[source]#

Open the file

Return type:: self

close()[source]#: Close the file

get_data(t, **kwargs)[source]#

Get a single frame

Parameters:

t (int) – Frame number
**kwargs – Additional keyword arguments to pass to the imageio plugin’s read() method.

Returns:

Image data. This has a frame_no attribute holding the original frame
number.

Return type:

Image

get_metadata(t=None)[source]#

Get metadata for a frame

If t is not given, return the global metadata.

Parameters:

t (int | None) – Frame number

Returns:

Metadata dictionary. A “frame_no” entry with the original frame
number (i.e., before slicing the sequence) is also added.

Return type:

Dict

get_meta_data(t=None)[source]#

Alias for get_metadata()

Parameters:: t (int | None) –
Return type:: Dict

property closed: bool#: True if the file is currently closed.

sdt.io.save_as_tiff(filename, frames, metadata=None, contiguous=True)[source]#

Write a sequence of images to a TIFF stack

If the items in frames contain a dict named metadata, an attempt to serialize it to YAML and save it as the TIFF file’s ImageDescription tags.

Parameters:

filename (str | Path) – Name of the output file
frames (Iterable[ndarray]) – Frames to be written to TIFF file.
metadata (None | Iterable[Mapping] | Mapping) – Metadata to be written. If a single dict, save with the first frame. If an iterable, save each entry with the corresponding frame.
contiguous (bool) – Whether to write to the TIFF file contiguously or not. This has implications when reading the data. If using PIMS, set to True. If using imageio, use "I" mode for reading if True. Setting to False allows for per-image metadata.

Single molecule data#

Generic functions#

sdt.io.load(filename, typ='auto', fmt='auto', color='red')[source]#

Load localization or tracking data from file

Use the load_*() function appropriate for the file type in order to load the data. The file type is determined by the file’s extension or the fmt parameter.

Supported file types:

HDF5 files (*.h5)
ThunderSTORM CSV files (*.csv)
particle_tracking_2D positions (*_positions.mat)
particle_tracking_2D tracks (*_tracks.mat)
pkc files (*.pkc)
pks files (*.pks)
trc files (*.trc)

Parameters:

filename (str or pathlib.Path) – Name of the file
typ (str, optional) – If the file is HDF5, load this key (usually either “features” or “tracks”), unless it is “auto”. In that case try to read “tracks” and if that fails, try to read “features”. If the file is in particle_tracker format, this can be either “auto”, “features” or “tracks”. Defaults to “auto”.
fmt ({"auto", "hdf5", "particle_tracker", "pkc", "pks", "trc", "csv"}, optional) – Output format. If “auto”, infer the format from filename. Otherwise, write the given format.
color ({"red", "green"}, optional) – For pkc files, specify whether to load the red (default) or green channel.

Returns:

Loaded data

Return type:

pandas.DataFrame

sdt.io.save(filename, data, typ='auto', fmt='auto')[source]#

Save feature/tracking data

This supports HDF5, trc, and particle_tracker formats.

Parameters:

filename (str or pathlib.Path) – Name of the file to write to
data (pandas.DataFrame) – Data to save
typ ({"auto", "features", "tracks"}) – Specify whether to save feature data or tracking data. If “auto”, consider data tracking data if a “particle” column is present, otherwise treat as feature data.
fmt ({"auto", "hdf5", "particle_tracker", "trc"}) – Output format. If “auto”, infer the format from filename. Otherwise, write the given format.

Specific functions#

sdt.io.load_msdplot(filename)[source]#

Load msdplot data from .mat file

Parameters:

filename (str or pathlib.Path) – Name of the file to load
Returns –
dict([d – d is the diffusion coefficient in μm²/s, stderr its standard error, qianerr its Qian error, pa the positional accuracy in nm and emsd a pandas.DataFrame containing the msd-vs.-tlag data.
stderr – d is the diffusion coefficient in μm²/s, stderr its standard error, qianerr its Qian error, pa the positional accuracy in nm and emsd a pandas.DataFrame containing the msd-vs.-tlag data.
qianerr – d is the diffusion coefficient in μm²/s, stderr its standard error, qianerr its Qian error, pa the positional accuracy in nm and emsd a pandas.DataFrame containing the msd-vs.-tlag data.
pa – d is the diffusion coefficient in μm²/s, stderr its standard error, qianerr its Qian error, pa the positional accuracy in nm and emsd a pandas.DataFrame containing the msd-vs.-tlag data.
emsd]) – d is the diffusion coefficient in μm²/s, stderr its standard error, qianerr its Qian error, pa the positional accuracy in nm and emsd a pandas.DataFrame containing the msd-vs.-tlag data.

sdt.io.load_pt2d(filename, typ, load_protocol=True)[source]#

Load a _positions.mat file created by particle_tracking_2D

Use scipy.io.loadmat() to load the file and convert data to a pandas.DataFrame.

Parameters:

filename (str or pathlib.Path) – Name of the file to load
typ ({"features", "tracks"}) – Specify whether to load feature data (positions.mat) or tracking data (tracks.mat)
load_protocol (bool, optional) – Look for a _protocol.mat file (i. e. replace the “_positions.mat” part of filename with “_protocol.mat”) in order to load the column names. This may be buggy for some older versions of particle_tracking_2D. If reading the protocol fails, this behaves as if load_protocol=False. Defaults to True.

Returns:

Loaded data.

Return type:

pandas.DataFrame

sdt.io.load_pkmatrix(filename, green=False)[source]#

Load a pkmatrix from a .mat file

Use scipy.io.loadmat() to load the file and convert data to a pandas.DataFrame.

Parameters:

filename (str or pathlib.Path) – Name of the file to load
green (bool) – If True, load pkmatrix_green, which is the right half of the image when using prepare_peakposition in 2 color mode. Otherwise, load pkmatrix. Defaults to False.

Returns:

Loaded data.

Return type:

pandas.DataFrame

sdt.io.load_pks(filename)[source]#

Load a pks matrix from a MATLAB file

Use scipy.io.loadmat() to load the file and convert data to a pandas.DataFrame.

Parameters:: filename (str or pathlib.Path) – Name of the file to load
Returns:: Loaded data.
Return type:: pandas.DataFrame

sdt.io.load_trc(filename)[source]#

Load tracking data from a .trc file

Parameters:: filename (str or pathlib.Path) – Name of the file to load
Returns:: Loaded data.
Return type:: pandas.DataFrame

sdt.io.load_csv(filename)[source]#

Load localization data from a CSV file created by ThunderSTORM

Parameters:: filename (str or pathlib.Path) – Name of the file to load
Returns:: Single molecule data
Return type:: pandas.DataFrame

sdt.io.save_pt2d(filename, data, typ='tracks')[source]#

Save feature/tracking data in particle_tracker format

Parameters:

filename (str or pathlib.Path) – Name of the file to write to
data (pandas.DataFrame) – Data to save
typ ({"features", "tracks"}) – Specify whether to save feature data or tracking data.

sdt.io.save_trc(filename, data)[source]#

Save tracking data in trc format

Parameters:

filename (str) – Name of the file to write to
data (pandas.DataFrame) – Data to save

YAML#

sdt.io.yaml.load(stream, Loader=<class 'sdt.io.yaml.Loader'>)[source]#: Wrap PyYAML’s yaml.load() using Loader

sdt.io.yaml.load_all(stream, Loader=<class 'sdt.io.yaml.Loader'>)[source]#: Wrap PyYAML’s yaml.load_all() using Loader

sdt.io.yaml.safe_load(stream)[source]#: Wrap PyYAML’s yaml.load() using SafeLoader

sdt.io.yaml.safe_load_all(stream)[source]#: Wrap PyYAML’s yaml.load_all() using SafeLoader

sdt.io.yaml.dump(data, stream=None, Dumper=<class 'sdt.io.yaml.Dumper'>, **kwds)[source]#: Wrapper around yaml.dump() using Dumper

sdt.io.yaml.dump_all(documents, stream=None, Dumper=<class 'sdt.io.yaml.Dumper'>, **kwds)[source]#: Wrap PyYAML’s yaml.dump_all() using Dumper

sdt.io.yaml.safe_dump(data, stream=None, **kwds)[source]#: Wrap PyYAML’s yaml.dump() using SafeDumper

sdt.io.yaml.safe_dump_all(documents, stream=None, **kwds)[source]#: Wrap PyYAML’s yaml.dump_all() using SafeDumper

class sdt.io.yaml.Loader(stream)[source]#

A ArrayLoader with support for many more types

of the sdt package, e. g. roi.ROI.

Initialize the scanner.

class sdt.io.yaml.SafeLoader(stream)[source]#

A SafeArrayLoader with support for many more types

of the sdt package, e. g. roi.ROI.

Initialize the scanner.

class sdt.io.yaml.Dumper(stream, default_style=None, default_flow_style=False, canonical=None, indent=None, width=None, allow_unicode=None, line_break=None, encoding=None, explicit_start=None, explicit_end=None, version=None, tags=None, sort_keys=True)[source]#

A ArrayDumper with support for many more types

of the sdt package, e. g. roi.ROI.

class sdt.io.yaml.SafeDumper(stream, default_style=None, default_flow_style=False, canonical=None, indent=None, width=None, allow_unicode=None, line_break=None, encoding=None, explicit_start=None, explicit_end=None, version=None, tags=None, sort_keys=True)[source]#

A SafeArrayDumper with support for many more types

of the sdt package, e. g. roi.ROI.

sdt.io.yaml.register_yaml_class(cls)[source]#

Add support for representing and loading a class

A representer is be added to Dumper and SafeDumper. A loader is added to Loader and SafeLoader.

The class should have a yaml_tag attribute and may have a yaml_flow_style attribute.

If to_yaml or from_yaml class methods exist, they will be used for representing or constructing class instances (see PyYAML’s yaml.Dumper.add_representer() and yaml.Loader.add_constructor() details). Otherwise, the default Dumper.represent_yaml_object() and Loader.construct_yaml_object() are used.

Parameters:: cls (type) – Class to add support for

Data input/output

Contents

Data input/output#

Image files#

Single molecule data#

Generic functions#

Specific functions#

Filesystem-related#

YAML#