Helpers for writing new code#
Helper classes and functions#
The sdt.helper
package provides some common tools to be used in
higher level functions. This includes
a singleton class decorator (
Singleton
) and a thread-safe version of it (ThreadSafeSingleton
)functions for common tasks involving
pandas.DataFrame
:flatten_multiindex()
,split_dataframe()
the
Slicerator
andPipeline
classes as well as thepipeline()
decorator for creation of lazy-loading, fancy-slicable iterators.the
numba
module, which define stubs for important numba objects in case numba is not installed. That way, things like thejit
decorator will not raise an error during import if numba is not present.the
raise_in_thread()
function, which allows for raising exceptions in specific threads.
Examples
Fast splitting of pandas.DataFrame
can be achieved using
split_dataframe()
:
>>> df = pandas.DataFrame([[0, 1], [1, 1], [2, 2]], columns=["a", "b"])
>>> split = split_dataframe(df, "b")
>>> for b, arr in split:
... print("b:", b)
... print(arr)
b: 1
[[0 1]
[1 1]]
b: 2
[[2 2]]
To convert a pandas.MultiIndex
into a normal index, use
flatten_multiindex()
. This is necessary e.g. to be able to call
pandas.DataFrame.query()
.
>>> mi = pandas.MultiIndex.from_product([["A", "B"], ["a", "b"]])
>>> df = pandas.DataFrame([[1, 2, 3, 4]], columns=mi)
>>> df
A B
a b a b
0 1 2 3 4
>>> df.columns = flatten_multiindex(df.columns)
>>> df
A_a A_b B_a B_b
0 1 2 3 4
A singleton type can be created with help of the Singleton
and
ThreadSafeSingleton
decorators. Both behave the same way,
but ThreadSafeSingleton
addtionally uses a mutex to ensure
thread safety.
>>> @helper.Singleton
... class Example:
... def __init__(self):
... self.x = 1
>>> Example() # Try constructing an instance, which is not allowed
Traceback (most recent call last):
File "<ipython-input-19-a4a1b2f1680f>", line 1, in <module>
Example()
File "/home/lukas/Software/sdt-python/sdt/helper/singleton.py", line 63, in __call__
raise TypeError("Singletons must be accessed by instance")
TypeError: Singletons must be accessed by instance
>>> Example.instance
<__main__.Example object at 0x7f28c068c780>
>>> Example.instance.x
1
Use the numba
submodule to avoid a hard dependency on numba:
>>> from sdt.helper import numba
>>> @numba.jit(nopython=True) # This will not raise an error
... def f(x):
... return x
However, trying to call f()
will raise an error if numba is not installed.
To check whether numba is available, one can use
>>> from sdt.helper import numba
>>> if numba.numba_available:
... # numba is installed
... else:
... # numba is not installed
Programming reference#
- sdt.helper.split_dataframe(df, split_column, columns=None, sort=True, type='array', keep_index=False)[source]#
Split a DataFrame according to the values of a column
This is somewhat like
pandas.DataFrame.groupby()
, but (optionally) turning the data into anumpy.array
, which makes it a lot faster.- Parameters:
df (DataFrame) – DataFrame to be split
split_column (Any) – Column to group/split data by.
columns (Any | None) – Column(s) to return. If None, use all columns.
sort (bool) – For this function to work, the DataFrame needs to be sorted. If this parameter is True, do the sorting in the function. If the DataFrame is already sorted (according to split_column), set this to False for efficiency. Defaults to True.
type (str) – If
"array"
, return split data as a singlenumpy.ndarray
(fast). If"array_list"
, return split data as a list of arrays. Each list entry corresponds to one column (also fast, preserves columns’ dtype). If"DataFrame"
, returnpandas.DataFrame
(slow).keep_index (bool) – If True, the index of the DataFrame df will is prependend to the columns of the split array. Only applicable if
type="array"
ortype="array_list"
.
- Returns:
Split DataFrame. The first entry of each tuple is the corresponding split_column entry, the second is the data, whose type depends on the type parameter.
- Return type:
list of tuple(scalar, array)
- sdt.helper.flatten_multiindex(idx, sep='_')[source]#
Flatten pandas MultiIndex
by concatenating the different levels’ names.
Examples
>>> mi = pandas.MultiIndex.from_product([["A", "B"], ["a", "b"]]) >>> mi MultiIndex(levels=[['A', 'B'], ['a', 'b']], labels=[[0, 0, 1, 1], [0, 1, 0, 1]]) >>> flatten_multiindex(mi) ['A_a', 'A_b', 'B_a', 'B_b']
- Parameters:
idx (pandas.MultiIndex) – MultiIndex to flatten
sep (str, optional) – String to separate index levels. Defaults to “_”.
- Returns:
Flattened index entries
- Return type:
list of str
- class sdt.helper.Singleton(cls)[source]#
Class decorator to create singleton objects
Based on reyoung/singleton (released under MIT license).
Examples
>>> @Singleton ... class Example: ... def __init__(self): ... self.x = 1 >>> Example.instance <__main__.Example object at 0x7fe65a904a20>
- Parameters:
cls (class) – Decorator class type
- initialize(*args, **kwargs)[source]#
Initialize singleton object if it has not been initialized
- Parameters:
*args – Passed to the singleton object’s
__init__()
**kwargs – Passed to the singleton object’s
__init__()
- property instance#
Singleton instance
- class sdt.helper.ThreadSafeSingleton(cls)[source]#
Thread-safe version of the
Singleton
class decorator- Parameters:
cls (class) – Decorator class type
- initialize(*args, **kwargs)[source]#
Initialize singleton object if it has not been initialized
- Parameters:
*args – Passed to the singleton object’s
__init__()
**kwargs – Passed to the singleton object’s
__init__()
- property instance#
Singleton instance
- class sdt.helper.Slicerator(ancestor, indices=None, length=None, propagate_attrs=None)[source]#
A generator that supports fancy indexing
When sliced using any iterable with a known length, it returns another object like itself, a Slicerator. When sliced with an integer, it returns the data payload.
Also, the attributes of the parent object can be propagated, exposed through the child Slicerators. By default, no attributes are propagated. Attributes can be white-listed by using the optional parameter propagated_attrs.
Methods taking an index will be remapped if they are decorated with index_attr. They also have to be present in the propagate_attrs list.
- Parameters:
ancestor (object) –
indices (iterable) – Giving indices into ancestor. Required if len(ancestor) is invalid.
length (integer) – length of indices This is required if indices is a generator, that is, if len(indices) is invalid
propagate_attrs (list of str, optional) – list of attributes to be propagated into Slicerator
Examples
Slicing on a Slicerator returns another Slicerator:
>>> v = Slicerator([0, 1, 2, 3], range(4), 4) >>> v1 = v[:2] >>> type(v[:2]) Slicerator >>> v2 = v[::2] >>> type(v2) Slicerator >>> v2[0] 0
Unless the slice itself has an unknown length, which makes slicing impossible:
>>> v3 = v2((i for i in [0])) # argument is a generator >>> type(v3) generator
- classmethod from_func(func, length, propagate_attrs=None)[source]#
Make a Slicerator from a function that accepts an integer index
- Parameters:
func (callable) – callable that accepts an integer as its argument
length (int) – number of elements; used to supposed revserse slicing like [-1]
propagate_attrs (list, optional) – list of attributes to be propagated into Slicerator
- classmethod from_class(some_class, propagate_attrs=None)[source]#
Make an existing class support fancy indexing via Slicerator objects
When sliced using any iterable with a known length, it returns a Slicerator. When sliced with an integer, it returns the data payload.
Also, the attributes of the parent object can be propagated, exposed through the child Slicerators. By default, no attributes are propagated. Attributes can be white_listed in the following ways:
using the optional parameter propagate_attrs; the contents of this list will overwrite any other list of propagated attributes
using the @propagate_attr decorator inside the class definition
using a propagate_attrs class attribute inside the class definition
The difference between options 2 and 3 appears when subclassing. As option 2 is bound to the method, the method will always be propagated. On the contrary, option 3 is bound to the class, so this can be overwritten by the subclass.
Methods taking an index will be remapped if they are decorated with index_attr. This decorator does not ensure that the method is propagated.
The existing class should support indexing (
__getitem__()
method) and it should define a length (__len__()
).The result will look exactly like the existing class (
__name__
,__doc__
,__module__
,__repr__()
will be propagated), but__getitem__()
will be renamed to_get()
and__getitem__()
will produce aSlicerator
object when sliced.- Parameters:
some_class (type) –
propagated_attrs (list, optional) – list of attributes to be propagated into Slicerator this will overwrite any other propagation list
- class sdt.helper.Pipeline(proc_func, *ancestors, propagate_attrs=None, propagate_how='first')[source]#
A class to support lazy function evaluation on an iterable.
When a
Pipeline
object is indexed, it returns an element of its ancestor modified with a process function.- Parameters:
proc_func (callable) – function that processes data returned by Slicerator. The function acts element-wise and is only evaluated when data is actually returned
*ancestors (objects) – Object to be processed.
propagate_attrs (set of str or None, optional) – Names of attributes to be propagated through the pipeline. If this is None, go through ancestors and look at _propagate_attrs and propagate_attrs attributes and search for attributes having a _propagate_flag attribute. Defaults to None.
propagate_how ({'first', 'last'} or int, optional) – Where to look for attributes to propagate. If this is an integer, it specifies the index of the ancestor (in ancestors). If it is ‘first’, go through all ancestors starting with the first one until one is found that has the attribute. If it is ‘last’, go through the ancestors in reverse order. Defaults to ‘first’.
Example
Construct the pipeline object that multiplies elements by two:
>>> ancestor = [0, 1, 2, 3, 4] >>> times_two = Pipeline(lambda x: 2*x, ancestor)
Whenever the pipeline object is indexed, it takes the correct element from its ancestor, and then applies the process function.
>>> times_two[3] 6
See also
- sdt.helper.pipeline(func=None, **kwargs)[source]#
Decorator to enable lazy evaluation of a function.
When the function is applied to a Slicerator or Pipeline object, it returns another lazily-evaluated, Pipeline object.
When the function is applied to any other object, it falls back on its normal behavior.
- Parameters:
func (callable or type) – Function or class type for lazy evaluation
retain_doc (bool, optional) – If True, don’t modify func’s doc string to say that it has been made lazy. Defaults to False
ancestor_count (int or 'all', optional) – Number of inputs to the pipeline. For instance, a function taking three parameters that adds up the elements of two
Slicerators
and a constant offset would haveancestor_count=2
. If ‘all’, all the function’s arguments are used for the pipeline. Defaults to 1.
- Returns:
Lazy function evaluation
Pipeline
for func.- Return type:
See also
Examples
Apply the pipeline decorator to your image processing function.
>>> @pipeline ... def color_channel(image, channel): ... return image[channel, :, :]
In order to preserve the original function’s doc string (i. e. do not add a note saying that it was made lazy), use the decorator like so:
>>> @pipeline(retain_doc=True) ... def color_channel(image, channel): ... '''This doc string will not be changed''' ... return image[channel, :, :]
Passing a Slicerator the function returns a Pipeline that “lazily” applies the function when the images come out. Different functions can be applied to the same underlying images, creating independent objects.
>>> red_images = color_channel(images, 0) >>> green_images = color_channel(images, 1)
Pipeline functions can also be composed.
>>> @pipeline ... def rescale(image): ... return (image - image.min())/image.ptp() >>> rescale(color_channel(images, 0))
The function can still be applied to ordinary images. The decorator only takes affect when a Slicerator object is passed.
>>> single_img = images[0] >>> red_img = red_channel(single_img) # normal behavior
Pipeline functions can take more than one slicerator.
>>> @pipeline(ancestor_count=2) ... def sum_offset(img1, img2, offset): ... return img1 + img2 + offset
- sdt.helper.raise_in_thread(thread_id, exception_type)[source]#
Raises an exception an a thread
This can be used e.g. to stop a thread
class StopThread(Exception): pass def worker(): try: # do stuff except StopThread: pass th = threading.Thread(target=worker) th.start() # a little later… raise_in_thread(th.ident, StopThread)
Note that the exception is not raised while
worker()
is running C code, but only when it returns to Python.Adapted from http://tomerfiliba.com/recipes/Thread2/.
- Parameters:
thread_id (int) – ID of the thread. See
threading.get_ident() and :py:attr:`threading.Thread.ident()
.exception_type (type) – Type of the exception to raise. Note that this should be a type, not an instance.
Mechanism for getting and setting default function parameters#
Typically, pandas.DataFrames
containing single molecule
localization data would have x coordinates in the “x” column, y coordinates in
the y column, the total intensity in the “mass” column and so on. Sometimes,
this is however not the case, e.g. when multiple DataFrames have been
concatenated using a MultiIndex. In that case, it is necessary to be able
to tell a function that takes the DataFrame as an input, that it has to look
for the x coordinate e.g. in the ("channel1", "x")
column.
The sdt.config
module contains function decorators that provide
sensible default values (e.g. ["x", "y"]
for coordinate columns), which can
be changed by the user. There exist the set_columns()
decorator which
is used for setting DataFrame column names and teh use_defaults()
decorator, which for all other kind of default arguments.
set_columns()
gets its defaults for columns
, which can
be changed by the user for a global effect. Similarly, use_defaults()
reads rc
.
Examples
Define a function that will take the DataFrame column names from the column argument:
>>> @set_columns
... def get_mass(data, columns={}):
... return data[columns["mass"]]
Thanks to set_columns()
, the columns dict will have sensible
default values (which can be changed globally by the user by setting the
corresponding items in columns
). Additionally, any user of the
get_mass function can override the column names when calling the function.
Programming reference#
- sdt.config.set_columns(func)[source]#
Decorator to set default column names for DataFrames
Use this on functions that accept a dict as the columns argument. Values from
columns
will be added for any key not present in the dict argument. This is intended as a way to be able to use functions on DataFrames with non-standard column names.- Parameters:
func (function) – Function to be decorated
- Returns:
Modified function
- Return type:
function
Examples
Create some data:
>>> a = numpy.arange(6).reshape((-1, 2)) >>> df = pandas.DataFrame(a, columns=["mass", "other_mass"]) >>> df mass other_mass 0 0 1 1 2 3 2 4 5
Example function which should return the “mass” column from a single molecule data DataFrame:
>>> @set_columns ... def get_mass(data, columns={}): ... return data[columns["mass"]] >>> get_mass(df) 0 0 1 2 2 4 Name: mass, dtype: int64
However, if for some reason the “other_mass” column should be used instead, this can be achieved by
>>> get_mass(df, columns={"mass": "other_mass"}) 0 1 1 3 2 5 Name: other_mass, dtype: int64
- sdt.config.use_defaults(func)[source]#
Decorator to apply default values to functions
If any function argument whose name is a key in
rc
is None, set its value to what is specified inrc
.- Parameters:
func (function) – Function to be decorated
- Returns:
Modified function
- Return type:
function
Examples
>>> @use_defaults ... def f(channel_names=None): ... return channel_names >>> ['channel1', 'channel2'] ['channel1', 'channel2'] >>> f() ['channel1', 'channel2'] >>> f(["ch1", "ch2", "ch3"]) ['ch1', 'ch2', 'ch3'] >>> config.rc["channel_names"] = ["channel4"] >>> f() ['channel4']
- sdt.config.columns = {'bg': 'bg', 'bg_dev': 'bg_dev', 'coords': ['x', 'y'], 'mass': 'mass', 'particle': 'particle', 'signal': 'signal', 'time': 'frame'}#
Default column names in
pandas.DataFrame
- sdt.config.rc = {'channel_names': ['channel1', 'channel2']}#
Global config dictionary