Helpers for writing new code#

Helper classes and functions#

The sdt.helper package provides some common tools to be used in higher level functions. This includes

  • a singleton class decorator (Singleton) and a thread-safe version of it (ThreadSafeSingleton)

  • functions for common tasks involving pandas.DataFrame: flatten_multiindex(), split_dataframe()

  • the Slicerator and Pipeline classes as well as the pipeline() decorator for creation of lazy-loading, fancy-slicable iterators.

  • the numba module, which define stubs for important numba objects in case numba is not installed. That way, things like the jit decorator will not raise an error during import if numba is not present.

  • the raise_in_thread() function, which allows for raising exceptions in specific threads.

Examples

Fast splitting of pandas.DataFrame can be achieved using split_dataframe():

>>> df = pandas.DataFrame([[0, 1], [1, 1], [2, 2]], columns=["a", "b"])
>>> split = split_dataframe(df, "b")
>>> for b, arr in split:
...     print("b:", b)
...     print(arr)
b: 1
[[0 1]
 [1 1]]
b: 2
[[2 2]]

To convert a pandas.MultiIndex into a normal index, use flatten_multiindex(). This is necessary e.g. to be able to call pandas.DataFrame.query().

>>> mi = pandas.MultiIndex.from_product([["A", "B"], ["a", "b"]])
>>> df = pandas.DataFrame([[1, 2, 3, 4]], columns=mi)
>>> df
   A     B
   a  b  a  b
0  1  2  3  4
>>> df.columns = flatten_multiindex(df.columns)
>>> df
   A_a  A_b  B_a  B_b
0    1    2    3    4

A singleton type can be created with help of the Singleton and ThreadSafeSingleton decorators. Both behave the same way, but ThreadSafeSingleton addtionally uses a mutex to ensure thread safety.

>>> @helper.Singleton
... class Example:
...     def __init__(self):
...         self.x = 1
>>> Example()  # Try constructing an instance, which is not allowed
Traceback (most recent call last):
  File "<ipython-input-19-a4a1b2f1680f>", line 1, in <module>
    Example()
  File "/home/lukas/Software/sdt-python/sdt/helper/singleton.py", line 63, in __call__
    raise TypeError("Singletons must be accessed by instance")
TypeError: Singletons must be accessed by instance
>>> Example.instance
<__main__.Example object at 0x7f28c068c780>
>>> Example.instance.x
1

Use the numba submodule to avoid a hard dependency on numba:

>>> from sdt.helper import numba
>>> @numba.jit(nopython=True)  # This will not raise an error
... def f(x):
...     return x

However, trying to call f() will raise an error if numba is not installed.

To check whether numba is available, one can use

>>> from sdt.helper import numba
>>> if numba.numba_available:
...     # numba is installed
... else:
...     # numba is not installed

Programming reference#

sdt.helper.split_dataframe(df, split_column, columns=None, sort=True, type='array', keep_index=False)[source]#

Split a DataFrame according to the values of a column

This is somewhat like pandas.DataFrame.groupby(), but (optionally) turning the data into a numpy.array, which makes it a lot faster.

Parameters:
  • df (DataFrame) – DataFrame to be split

  • split_column (Any) – Column to group/split data by.

  • columns (Any | None) – Column(s) to return. If None, use all columns.

  • sort (bool) – For this function to work, the DataFrame needs to be sorted. If this parameter is True, do the sorting in the function. If the DataFrame is already sorted (according to split_column), set this to False for efficiency. Defaults to True.

  • type (str) – If "array", return split data as a single numpy.ndarray (fast). If "array_list", return split data as a list of arrays. Each list entry corresponds to one column (also fast, preserves columns’ dtype). If "DataFrame", return pandas.DataFrame (slow).

  • keep_index (bool) – If True, the index of the DataFrame df will is prependend to the columns of the split array. Only applicable if type="array" or type="array_list".

Returns:

Split DataFrame. The first entry of each tuple is the corresponding split_column entry, the second is the data, whose type depends on the type parameter.

Return type:

list of tuple(scalar, array)

sdt.helper.flatten_multiindex(idx, sep='_')[source]#

Flatten pandas MultiIndex

by concatenating the different levels’ names.

Examples

>>> mi = pandas.MultiIndex.from_product([["A", "B"], ["a", "b"]])
>>> mi
MultiIndex(levels=[['A', 'B'], ['a', 'b']],
        labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
>>> flatten_multiindex(mi)
['A_a', 'A_b', 'B_a', 'B_b']
Parameters:
  • idx (pandas.MultiIndex) – MultiIndex to flatten

  • sep (str, optional) – String to separate index levels. Defaults to “_”.

Returns:

Flattened index entries

Return type:

list of str

class sdt.helper.Singleton(cls)[source]#

Class decorator to create singleton objects

Based on reyoung/singleton (released under MIT license).

Examples

>>> @Singleton
... class Example:
...     def __init__(self):
...         self.x = 1
>>> Example.instance
<__main__.Example object at 0x7fe65a904a20>
Parameters:

cls (class) – Decorator class type

initialize(*args, **kwargs)[source]#

Initialize singleton object if it has not been initialized

Parameters:
  • *args – Passed to the singleton object’s __init__()

  • **kwargs – Passed to the singleton object’s __init__()

property is_initialized#

True if instance is initialized

property instance#

Singleton instance

class sdt.helper.ThreadSafeSingleton(cls)[source]#

Thread-safe version of the Singleton class decorator

Parameters:

cls (class) – Decorator class type

initialize(*args, **kwargs)[source]#

Initialize singleton object if it has not been initialized

Parameters:
  • *args – Passed to the singleton object’s __init__()

  • **kwargs – Passed to the singleton object’s __init__()

property is_initialized#

True if instance is initialized

property instance#

Singleton instance

class sdt.helper.Slicerator(ancestor, indices=None, length=None, propagate_attrs=None)[source]#

A generator that supports fancy indexing

When sliced using any iterable with a known length, it returns another object like itself, a Slicerator. When sliced with an integer, it returns the data payload.

Also, the attributes of the parent object can be propagated, exposed through the child Slicerators. By default, no attributes are propagated. Attributes can be white-listed by using the optional parameter propagated_attrs.

Methods taking an index will be remapped if they are decorated with index_attr. They also have to be present in the propagate_attrs list.

Parameters:
  • ancestor (object) –

  • indices (iterable) – Giving indices into ancestor. Required if len(ancestor) is invalid.

  • length (integer) – length of indices This is required if indices is a generator, that is, if len(indices) is invalid

  • propagate_attrs (list of str, optional) – list of attributes to be propagated into Slicerator

Examples

Slicing on a Slicerator returns another Slicerator:

>>> v = Slicerator([0, 1, 2, 3], range(4), 4)
>>> v1 = v[:2]
>>> type(v[:2])
Slicerator
>>> v2 = v[::2]
>>> type(v2)
Slicerator
>>> v2[0]
0

Unless the slice itself has an unknown length, which makes slicing impossible:

>>> v3 = v2((i for i in [0]))  # argument is a generator
>>> type(v3)
generator
classmethod from_func(func, length, propagate_attrs=None)[source]#

Make a Slicerator from a function that accepts an integer index

Parameters:
  • func (callable) – callable that accepts an integer as its argument

  • length (int) – number of elements; used to supposed revserse slicing like [-1]

  • propagate_attrs (list, optional) – list of attributes to be propagated into Slicerator

classmethod from_class(some_class, propagate_attrs=None)[source]#

Make an existing class support fancy indexing via Slicerator objects

When sliced using any iterable with a known length, it returns a Slicerator. When sliced with an integer, it returns the data payload.

Also, the attributes of the parent object can be propagated, exposed through the child Slicerators. By default, no attributes are propagated. Attributes can be white_listed in the following ways:

  1. using the optional parameter propagate_attrs; the contents of this list will overwrite any other list of propagated attributes

  2. using the @propagate_attr decorator inside the class definition

  3. using a propagate_attrs class attribute inside the class definition

The difference between options 2 and 3 appears when subclassing. As option 2 is bound to the method, the method will always be propagated. On the contrary, option 3 is bound to the class, so this can be overwritten by the subclass.

Methods taking an index will be remapped if they are decorated with index_attr. This decorator does not ensure that the method is propagated.

The existing class should support indexing (__getitem__() method) and it should define a length (__len__()).

The result will look exactly like the existing class (__name__, __doc__, __module__, __repr__() will be propagated), but __getitem__() will be renamed to _get() and __getitem__() will produce a Slicerator object when sliced.

Parameters:
  • some_class (type) –

  • propagated_attrs (list, optional) – list of attributes to be propagated into Slicerator this will overwrite any other propagation list

class sdt.helper.Pipeline(proc_func, *ancestors, propagate_attrs=None, propagate_how='first')[source]#

A class to support lazy function evaluation on an iterable.

When a Pipeline object is indexed, it returns an element of its ancestor modified with a process function.

Parameters:
  • proc_func (callable) – function that processes data returned by Slicerator. The function acts element-wise and is only evaluated when data is actually returned

  • *ancestors (objects) – Object to be processed.

  • propagate_attrs (set of str or None, optional) – Names of attributes to be propagated through the pipeline. If this is None, go through ancestors and look at _propagate_attrs and propagate_attrs attributes and search for attributes having a _propagate_flag attribute. Defaults to None.

  • propagate_how ({'first', 'last'} or int, optional) – Where to look for attributes to propagate. If this is an integer, it specifies the index of the ancestor (in ancestors). If it is ‘first’, go through all ancestors starting with the first one until one is found that has the attribute. If it is ‘last’, go through the ancestors in reverse order. Defaults to ‘first’.

Example

Construct the pipeline object that multiplies elements by two:

>>> ancestor = [0, 1, 2, 3, 4]
>>> times_two = Pipeline(lambda x: 2*x, ancestor)

Whenever the pipeline object is indexed, it takes the correct element from its ancestor, and then applies the process function.

>>> times_two[3]
6

See also

pipeline

sdt.helper.pipeline(func=None, **kwargs)[source]#

Decorator to enable lazy evaluation of a function.

When the function is applied to a Slicerator or Pipeline object, it returns another lazily-evaluated, Pipeline object.

When the function is applied to any other object, it falls back on its normal behavior.

Parameters:
  • func (callable or type) – Function or class type for lazy evaluation

  • retain_doc (bool, optional) – If True, don’t modify func’s doc string to say that it has been made lazy. Defaults to False

  • ancestor_count (int or 'all', optional) – Number of inputs to the pipeline. For instance, a function taking three parameters that adds up the elements of two Slicerators and a constant offset would have ancestor_count=2. If ‘all’, all the function’s arguments are used for the pipeline. Defaults to 1.

Returns:

Lazy function evaluation Pipeline for func.

Return type:

Pipeline

See also

Pipeline

Examples

Apply the pipeline decorator to your image processing function.

>>> @pipeline
...  def color_channel(image, channel):
...      return image[channel, :, :]

In order to preserve the original function’s doc string (i. e. do not add a note saying that it was made lazy), use the decorator like so:

>>> @pipeline(retain_doc=True)
... def color_channel(image, channel):
...     '''This doc string will not be changed'''
...     return image[channel, :, :]

Passing a Slicerator the function returns a Pipeline that “lazily” applies the function when the images come out. Different functions can be applied to the same underlying images, creating independent objects.

>>> red_images = color_channel(images, 0)
>>> green_images = color_channel(images, 1)

Pipeline functions can also be composed.

>>> @pipeline
... def rescale(image):
...     return (image - image.min())/image.ptp()
>>> rescale(color_channel(images, 0))

The function can still be applied to ordinary images. The decorator only takes affect when a Slicerator object is passed.

>>> single_img = images[0]
>>> red_img = red_channel(single_img)  # normal behavior

Pipeline functions can take more than one slicerator.

>>> @pipeline(ancestor_count=2)
...  def sum_offset(img1, img2, offset):
...      return img1 + img2 + offset
sdt.helper.raise_in_thread(thread_id, exception_type)[source]#

Raises an exception an a thread

This can be used e.g. to stop a thread

class StopThread(Exception):
    pass

def worker():
    try:
        # do stuff
    except StopThread:
        pass

th = threading.Thread(target=worker)
th.start()
# a little later…
raise_in_thread(th.ident, StopThread)

Note that the exception is not raised while worker() is running C code, but only when it returns to Python.

Adapted from http://tomerfiliba.com/recipes/Thread2/.

Parameters:
  • thread_id (int) – ID of the thread. See threading.get_ident() and :py:attr:`threading.Thread.ident().

  • exception_type (type) – Type of the exception to raise. Note that this should be a type, not an instance.

Mechanism for getting and setting default function parameters#

Typically, pandas.DataFrames containing single molecule localization data would have x coordinates in the “x” column, y coordinates in the y column, the total intensity in the “mass” column and so on. Sometimes, this is however not the case, e.g. when multiple DataFrames have been concatenated using a MultiIndex. In that case, it is necessary to be able to tell a function that takes the DataFrame as an input, that it has to look for the x coordinate e.g. in the ("channel1", "x") column.

The sdt.config module contains function decorators that provide sensible default values (e.g. ["x", "y"] for coordinate columns), which can be changed by the user. There exist the set_columns() decorator which is used for setting DataFrame column names and teh use_defaults() decorator, which for all other kind of default arguments.

set_columns() gets its defaults for columns, which can be changed by the user for a global effect. Similarly, use_defaults() reads rc.

Examples

Define a function that will take the DataFrame column names from the column argument:

>>> @set_columns
... def get_mass(data, columns={}):
...     return data[columns["mass"]]

Thanks to set_columns(), the columns dict will have sensible default values (which can be changed globally by the user by setting the corresponding items in columns). Additionally, any user of the get_mass function can override the column names when calling the function.

Programming reference#

sdt.config.set_columns(func)[source]#

Decorator to set default column names for DataFrames

Use this on functions that accept a dict as the columns argument. Values from columns will be added for any key not present in the dict argument. This is intended as a way to be able to use functions on DataFrames with non-standard column names.

Parameters:

func (function) – Function to be decorated

Returns:

Modified function

Return type:

function

Examples

Create some data:

>>> a = numpy.arange(6).reshape((-1, 2))
>>> df = pandas.DataFrame(a, columns=["mass", "other_mass"])
>>> df
    mass  other_mass
0     0           1
1     2           3
2     4           5

Example function which should return the “mass” column from a single molecule data DataFrame:

>>> @set_columns
... def get_mass(data, columns={}):
...     return data[columns["mass"]]
>>> get_mass(df)
0    0
1    2
2    4
Name: mass, dtype: int64

However, if for some reason the “other_mass” column should be used instead, this can be achieved by

>>> get_mass(df, columns={"mass": "other_mass"})
0    1
1    3
2    5
Name: other_mass, dtype: int64
sdt.config.use_defaults(func)[source]#

Decorator to apply default values to functions

If any function argument whose name is a key in rc is None, set its value to what is specified in rc.

Parameters:

func (function) – Function to be decorated

Returns:

Modified function

Return type:

function

Examples

>>> @use_defaults
... def f(channel_names=None):
...     return channel_names
>>> ['channel1', 'channel2']
['channel1', 'channel2']
>>> f()
['channel1', 'channel2']
>>> f(["ch1", "ch2", "ch3"])
['ch1', 'ch2', 'ch3']
>>> config.rc["channel_names"] = ["channel4"]
>>> f()
['channel4']
sdt.config.columns = {'bg': 'bg', 'bg_dev': 'bg_dev', 'coords': ['x', 'y'], 'mass': 'mass', 'particle': 'particle', 'signal': 'signal', 'time': 'frame'}#

Default column names in pandas.DataFrame

sdt.config.rc = {'channel_names': ['channel1', 'channel2']}#

Global config dictionary