Optimization and fitting algorithms#
Fitting of 1D and 2D Gaussian functions#
Gaussian1DModel
and Gaussian2DModel
are models for the
lmfit package for easy fitting of 1D and
2D Gaussian functions to data. For further information on how to use these,
please refer to the lmfit
documentation.
Examples
1D fit example: First create some data to work on.
>>> x = numpy.arange(100) # Create some data
>>> y = numpy.exp(-(x - 50)**2 / 8)
Now fit model to the data:
>>> m = optimize.Gaussian1DModel() # Create model
>>> p = m.guess(y, x) # Initial guess
>>> res = m.fit(y, params=p, x=x) # Do the fitting
>>> res.best_values # Show fitted parameters
{'offset': 4.4294473935549931e-136,
'sigma': 1.9999999999999996,
'center': 50.0,
'amplitude': 1.0}
>>> res.eval(x=50.3) # Evaluate fitted Gaussian at x=50.3
0.98881304461123321
2D fit example: Create data, a little more complicated in 2D.
>>> coords = numpy.indices((50, 100)) # Create data
>>> x, y = coords
>>> center = numpy.array([[20, 40]]).T
>>> centered_flat = coords.reshape((2, -1)) - center
>>> cov = numpy.linalg.inv(numpy.array([[8, 0], [0, 18]]))
>>> z = 2 * numpy.exp(-np.sum(centered_flat * (cov @ centered_flat), axis=0))
>>> z = z.reshape(x.shape)
Do the fitting:
>>> m = optimize.Gaussian2DModel() # Create model
>>> p = m.guess(z, x, y) # Initial guess
>>> res = m.fit(z, params=p, x=x, y=y) # Do the fitting
>>> res.best_values # Show fitted parameters
{'rotation': 0.0,
'offset': 2.6045547770814313e-55,
'sigmay': 3.0,
'centery': 40.0,
'sigmax': 1.9999999999999996,
'centerx': 20.0,
'amplitude': 2.0}
>>> res.eval(x=20.5, y=40.5) # Evaluate fitted Gaussian at x=20.5, y=40.5
1.9117294272505907
Models#
- class sdt.optimize.Gaussian1DModel(*args, **kwargs)[source]#
Model class for fitting a 1D Gaussian
Derives from
lmfit.Model
.Parameters are amplitude, center, sigma, offset.
The model function will normally take an independent variable (generally, the first argument) and a series of arguments that are meant to be parameters for the model. It will return an array of data to model some data as for a curve-fitting problem.
- Parameters:
func (callable) – Function to be wrapped.
independent_vars (
list
ofstr
, optional) – Arguments to func that are independent variables (default is None).param_names (
list
ofstr
, optional) – Names of arguments to func that are to be made into parameters (default is None).nan_policy ({'raise', 'propagate', 'omit'}, optional) – How to handle NaN and missing values in data. See Notes below.
prefix (str, optional) – Prefix used for the model.
name (str, optional) – Name for the model. When None (default) the name is the same as the model function (func).
**kws (dict, optional) – Additional keyword arguments to pass to model function.
Notes
1. Parameter names are inferred from the function arguments, and a residual function is automatically constructed.
2. The model function must return an array that will be the same size as the data being modeled.
3. nan_policy sets what to do when a NaN or missing value is seen in the data. Should be one of:
‘raise’ : raise a ValueError (default)
‘propagate’ : do nothing
‘omit’ : drop missing data
Examples
The model function will normally take an independent variable (generally, the first argument) and a series of arguments that are meant to be parameters for the model. Thus, a simple peak using a Gaussian defined as:
>>> import numpy as np >>> def gaussian(x, amp, cen, wid): ... return amp * np.exp(-(x-cen)**2 / wid)
can be turned into a Model with:
>>> gmodel = Model(gaussian)
this will automatically discover the names of the independent variables and parameters:
>>> print(gmodel.param_names, gmodel.independent_vars) ['amp', 'cen', 'wid'], ['x']
- class sdt.optimize.Gaussian2DModel(*args, **kwargs)[source]#
Model class for fitting a 2D Gaussian
Derives from
lmfit.Model
.Parameters are amplitude, centerx, sigmax, centery, sigmay, offset, rotation.
The model function will normally take an independent variable (generally, the first argument) and a series of arguments that are meant to be parameters for the model. It will return an array of data to model some data as for a curve-fitting problem.
- Parameters:
func (callable) – Function to be wrapped.
independent_vars (
list
ofstr
, optional) – Arguments to func that are independent variables (default is None).param_names (
list
ofstr
, optional) – Names of arguments to func that are to be made into parameters (default is None).nan_policy ({'raise', 'propagate', 'omit'}, optional) – How to handle NaN and missing values in data. See Notes below.
prefix (str, optional) – Prefix used for the model.
name (str, optional) – Name for the model. When None (default) the name is the same as the model function (func).
**kws (dict, optional) – Additional keyword arguments to pass to model function.
Notes
1. Parameter names are inferred from the function arguments, and a residual function is automatically constructed.
2. The model function must return an array that will be the same size as the data being modeled.
3. nan_policy sets what to do when a NaN or missing value is seen in the data. Should be one of:
‘raise’ : raise a ValueError (default)
‘propagate’ : do nothing
‘omit’ : drop missing data
Examples
The model function will normally take an independent variable (generally, the first argument) and a series of arguments that are meant to be parameters for the model. Thus, a simple peak using a Gaussian defined as:
>>> import numpy as np >>> def gaussian(x, amp, cen, wid): ... return amp * np.exp(-(x-cen)**2 / wid)
can be turned into a Model with:
>>> gmodel = Model(gaussian)
this will automatically discover the names of the independent variables and parameters:
>>> print(gmodel.param_names, gmodel.independent_vars) ['amp', 'cen', 'wid'], ['x']
Auxiliary functions#
- sdt.optimize.guess_gaussian_parameters(data, *indep_vars)[source]#
Initial guess of parameters of the Gaussian
This function does a crude estimation of the parameters:
The offset is guessed by looking at the edges of data.
The center of the Gaussian is approximated by the center of mass.
sigma as well as the angle of rotation are estimated using calculating the covariance matrix.
The amplitude is taken as the maximum above offset.
- Parameters:
data (numpy.ndarray) – Data for which the parameter should be guessed
indep_vars (numpy.ndarrays) – Independent variable arrays (x, y, z, …)
- Returns:
This dict has the follwing keys:
”amplitude” (float)
”center” (np.ndarray): coordinates of the guessed center
”sigma” (np.ndarray)
”offset” (float): addititve offset
”rotation” (float): guessed angle of rotation. Works (currently) only for 2D data.
The keys match the arguments of
gaussian_1d()
andgaussian_2d()
so that the dict can be passed directly to the function.- Return type:
collections.OrderedDict
Random sample consensus (RANSAC)#
Perform a fit to noisy data (i.e., data containing outliers) by repeatedly
randomly choosing a (small) subset of the data
fitting parameters
determining of the goodness of the fit for the rest of the data, only accepting those points within a predefined error margin
refining the fit by fitting all remaining data
Finally take the parameters from the fit with the smallest overall error.
Examples
Assuming that z
is an array of noisy Gaussian values for x
and y
, a
fit can be performed as follows:
>>> model = optimize.Gaussian2DModel()
>>> model.set_param_hint("rotation", vary=False) # Set restriction
>>> r = optimize.RANSAC(model, max_error=1, n_fit=10, n_iter=100,
initial_guess=model.guess)
>>> best_fit, inlier_idx = r.fit(z, x=x, y=y)
Programming interface#
- class sdt.optimize.RANSAC(model, max_error, n_fit, n_iter=1000, max_outliers=0.5, norm=2, random_state=None, initial_guess=None, independent_vars=None)[source]#
Perform a fit to noisy data using RANSAC
Take a subset of the data, fit to the subset and compute the error of the remaining datapoints. Repeat. Finally use the best fit.
- Parameters:
model (Any) – Fitting model instance. Needs to have a
fit
method whose first argument are the dependent variable values, which can take the independent variable values via a keyword argument and which returns a variable featuring aneval
method taking the independent variable values as a keyword argument and returning the corresponding dependent values according to the fit. lmfit models are an example for this.max_error (float) – Maximum error between data and fit for a datapoint to be considered an inlier.
n_fit (int) – Number of datapoints to use for fitting.
n_iter (int) – Number of fitting iterations.
max_outliers (float) – Maximum fraction of outliers for a fit not to be rejected.
norm (int) – Which (p-) norm to use for error calculation.
random_state (numpy.random.mtrand.RandomState) – Used for randomly drawing the sample to fit in each iteration.
initial_guess (Callable[[], numpy.ndarray] | None) – If given, this is called before fitting in each iteration. The result is passed as the
param
keyword argument to the model’sfit
method. E.g., if using an lmfit model, theguess
method can be used.independent_vars (Sequence[str]) – Names of independent variables. If not given, it will try to use the
indepenent_vars
attribute of the model if present (e.g., for lmfit models) and fall back to["x"]
otherwise.
- model: Any#
Fitting model instance
- n_fit: int#
Number of datapoints to use for fitting
- n_iter: int#
Number of fitting iterations
- max_outliers: float#
Maximum fraction of outliers for a fit not to be rejected
- max_error: float#
Maximum error between data and fit for a datapoint to be considered an inlier
- norm: int#
p-norm to use for error calculation
- independent_vars: Sequence[str]#
Names of independent variables
- random_state: RandomState#
Used for randomly drawing the sample to fit in each iteration
- initial_guess: Callable[[], ndarray] | None#
If given, this is called before fitting in each iteration and should return an initial guess for fitting values.
- fit(data, **kwargs)[source]#
Perform fitting
- Parameters:
data (ndarray) – Dependent variable values
**kwargs – Other parameters, including independent variable values. This is passed to
model.fit()
.
- Returns:
best_fit – The return value of
model.fit()
which produced the best fit.best_idx – Indices of inliers of the best fit.
- Raises:
RuntimeError – No fit produced at most
max_outliers
outliers.- Return type:
Tuple[Any, ndarray]
Fitting affine transformations to point pairs#
Given a set of pairs of points, an affine transformation between first and second pair entries can be found by means of a linear least squares fit.
Examples
Let xy
be row-wise coordinates of points and xy_t
their transformed
counterparts, i.e., xy[0, :]
describes a point corresponding to
xy_t[i, :]
. Then the best-fitting transformation can be found using
>>> r = optimize.AffineModel().fit(xy_t, xy)
>>> r.transform
array([[...]])
Programming interface#
- class sdt.optimize.AffineModel[source]#
Fit an affine transformation to pairs of points
This provides a similar programming interface as lmfit and other fitting classes in the
sdt.optimize
module.- fit(data, x)[source]#
Perform the fit
- Parameters:
data (ndarray) – Row-wise coordinates of transformed points
x (ndarray) – Row-wise coordinates of original points
- Return type:
Fit results
- class sdt.optimize.AffineModelResult(model, transform)[source]#
Result of fitting an affine transformation to pairs of points
- Parameters:
model (sdt.optimize.affine_fit.AffineModel) – Model instance used for fitting
transform (numpy.ndarray) – Transformation matrix
- model: AffineModel#
Model instance used for fitting
- best_values: Dict[str, float]#
Fit parameters in a lmfit-compatible way. Contains only a “transform” entry.
- transform: ndarray#
Transformation matrix
Fitting of a sum of exponential functions#
The ExpSumModel
class allows for fitting a sum of exponential
functions
to data represented by pairs \((x_k, y_k)\). This is done by using a modified Prony’s method.
The mathematics behind this can be found in Theory as well as at [1]. This module includes rewrite of the GPLv3-licensed code by Greg von Winckel, which is also available at [1]. Further insights about the algorithm may be gained by reading anything about Prony’s method and [2].
Examples
Given an array x
of values of the independent variable and an array y
of corresponding values of the sum of the exponentials, the parameters
offset
(\(\alpha\) in above formula), mant
(\(\beta_i\)) and
exp
(\(\lambda_i\)) can be found by calling
>>> res = optimize.ExpSumModel(n_exp=2).fit(y, x)
>>> res.offset
3.5
>>> res.mant
array([0.8, 0.2])
>>> res.exp
array([-0.2, -1.5])
Programming interface#
- class sdt.optimize.ExpSumModel(n_exp, poly_order=30)[source]#
Fit a sum of exponential functions to data
Determine the best parameters \(\alpha, \beta_k, \lambda_k\) by fitting \(y(x) = \alpha + \sum_{k=1}^p \beta_k \text{e}^{\lambda_k x}\) to the data using a modified Prony’s method.
- Parameters:
n_exp (int) – Number of exponential summands
poly_order (int) – Order of polynomial used for approximation
- n_exp: int#
Number of exponential summands
- poly_order: int#
Order of polynomial used for approximation
- fit(y, x, initial_guess=None)[source]#
Perform the fit
Determine the best parameters \(\alpha, \beta_k, \lambda_k\).
- Parameters:
y (ndarray) – Function values corresponding to x.
x (ndarray) – Independent variable values
initial_guess (ndarray | None) – An initial guess for determining the parameters of the ODE (if you don’t know what this is about, don’t bother). The array is 1D and has n_exp + 1 entries. If None, use
numpy.ones(n_exp + 1)
.
- Return type:
Fit result.
- class sdt.optimize.ExpSumModelResult(model, offset, mant, exp, ode_coeff)[source]#
Result of fitting a sum of exponential functions to data
- Parameters:
model (sdt.optimize.exp_fit.ExpSumModel) – Model used for fitting
x – Independent variable
offset (float) – Additive parameter
mant (numpy.ndarray) – Mantissa coefficients
exp (numpy.ndarray) – Coefficients in the exponent
ode_coeff (numpy.ndarray) – ODE coefficients (from the modified Prony’s method)
- model: ExpSumModel#
Model instance used for fitting
- offset: float#
Additive parameter \(\alpha\)
- mant: ndarray#
Mantissa coefficients \(\beta_k\)
- exp: ndarray#
Mantissa coefficients \(\lambda_k\)
- ode_coeff: ndarray#
ODE coefficients (from the modified Prony’s method)
- best_values: Dict[str, float]#
Fit parameters in a lmfit-compatible way
- class sdt.optimize.ProbExpSumModel(n_exp, poly_order=30)[source]#
Fit a mixture of exponential distribution CDFs
Determine the best parameters \(\alpha, \beta_k, \lambda_k\) by fitting \(\alpha + \sum_{k=1}^p \beta_k \text{e}^{\lambda_k t}\). Additionally, there are the constraints \(\sum_{k=1}^p -\beta_k = 1\) and \(\alpha = 1\).
Notes
Since \(\sum_{i=1}^p -\beta_i = 1\) and \(\alpha = 1\), assuming \(\lambda_k\) already known (since they are gotten by fitting the coefficients of the ODE), there is only the constrained linear least squares problem
\[1 + \sum_{k=1}^{p-1} \beta_k \text{e}^{\lambda_k t} + (-1 - \sum_{k=1}^{p-1} \beta_k) \text{e}^{\lambda_p t} = y\]left to solve. This is equivalent to
\[\sum_{k=1}^{p-1} \beta_k (\text{e}^{\lambda_k t} - \text{e}^{\lambda_p t}) = y - 1 + \text{e}^{\lambda_p t},\]which yields \(\beta_1, …, \beta_{p-1}\). \(\beta_p\) can then be determined from the constraint.
- Parameters:
n_exp (int) – Number of exponential summands
poly_order (int) – Order of polynomial used for approximation
- fit(y, x, initial_guess=None)[source]#
Perform the fit
Determine the best parameters \(\alpha, \beta_k, \lambda_k\).
- Parameters:
y (ndarray) – Function values corresponding to x.
x (ndarray) – Independent variable values
initial_guess (ndarray | None) – An initial guess for determining the parameters of the ODE (if you don’t know what this is about, don’t bother). The array is 1D and has n_exp + 1 entries. If None, use
numpy.ones(n_exp + 1)
.
- Return type:
Fit result.
Theory#
\(y(t)\) is the general solution of the ordinary differential equation (ODE)
if
In other words: The \(\lambda_i\) are the roots of the polynomial \(p(z) = \sum_{j=0}^p a_j z^j\).
For numerical calculations, \(y(t)\) is approximated by a Legendre series,
Since this is a polynomial, any derivative is again a polynomial and can thus be written as sum of Legendre polynomials,
where \(D\) is the Legendre differentiation matrix.
For the purpose of solving the ODE, let \(\alpha = 1\) (i. e. divide the whole ODE by \(alpha\)). Its approximated version is then
with \(e_1 = (1, 0, \ldots, 0)\) being the first canonical unit vector.
\(y(x)\) is supposed to be the best approximation of the original data \(z\), meaning that
From that follows
This leads to
where \(w_i\) are weights for numerical integration.
In order to determine the model parameters, first determine the first \(p\) Legendre coefficients \(\hat{y}_k\). Then, for some set of parameters \(a_j\), determine the rest of the Legendre coefficient by solving the ODE (which is a linear system of equations in Legendre space) and compare to the original data \(z\). Do least squares fitting of \(a_j\) in that manner. This yields some optimal \(a_j\) values. From that, it is straight-forward to determine the exponential factors \(\lambda_i\) by finding the roots of the polynomial.
A linear least squares fit can then be used to determine the remaining parameters \(\alpha\) and \(\beta_i\).