openest.models.ddp_model module¶

class openest.models.ddp_model.DDPModel(p_format=None, source=None, xx_is_categorical=False, xx=None, yy_is_categorical=False, yy=None, pp=None, unaccounted=None, scaled=True)[source]¶

Bases: openest.models.univariate_model.UnivariateModel, openest.models.memoizable.MemoizableUnivariate

Discrete-Discrete-Probability (DDP) Format

A DDP file describes a dose-response relationship with a limited collection of response outcomes. The dose and response values may be either categorical or sampled at a collection of numerical levels.

<y-value-1>, …, <y-value-N> and <x-value-1>, …, <x-value-N> are either strings (for named categories) or numerical values.

The format of a DDP file is:

<format>,<y-value-1>,<y-value-2>,...
<x-value-1>,p(y1|x1),p(y2|x1),...
<x-value-2>,p(y1|x2),p(y2|x2),...

Below is a sample categorical DDP file:

ddp1,live,dead
control,.5,.5
treated,.9,.1

Below is a sample numerical DDP file:

ddp1,-10.0,-.33333333333,3.33333333333,10.0
0,0.5,0.5,0.0,0.0
3333333333,0.0,0.5,0.5,0.0
6666666667,0.0,0.0,0.5,0.5
0,0.0,0.0,0.0,0.5

Parameters:

p_format (str) –
Probability format. May be one of the following values:
- ddp1 - the p(.) values are simple probabilities (0 < p(.) < 1 and sum p(y|x) = 1)
- ddp2 - the p(.) values are log probabilities
source (str) – Metadata attribute. Name of file this object was read in from.
xx_is_categorical (bool) – Indicates whether xx is categorical. False indicates numeric data.
xx (list-like) – X axis index
yy_is_categorical (bool) – Indicates whether yy is categorical. False indicates numeric data.
yy (list-like) – Y axis index
pp (array-like) – underlying numpy(?) data array
unaccounted (numpy.array) – column of remaining probability. unaccounted = 1-sum(pp, axis=1).
scaled (bool) – Indicates whether data has been scaled. If scaled, re-scale so pp.sum(axis=1)==1.

add_to_y(a)[source]¶: add value a to each element of index y (numeric only)

static combine(one, two)[source]¶

copy()[source]¶: copy data and return DDPModel with the same data

static create_lin(yy, xxs)[source]¶

Create a DDP model by supplying y index and dictionary of p-values

Parameters:	yy (list-like) – y-index labels xxs (dict) – dictionary keyed with x-index values with p-values for vals

draw_sample(x=None)[source]¶

Randomly sample label from y-index using p values in row x

If x is None (default), use first row. Uses self.get_closest(x) to find matching nearest match for x-index label x

eval_pval(x, p, threshold=0.001)[source]¶

Inverse CDF Evaluation

Returns the value of $y$ that corresponds to a given p-value: $F^{-1}(p | x)$.

eval_pval_index(ii, p, threshold=0.001)[source]¶

filter_x(xx)[source]¶: Slice DDPModel data such that the values of the x index == xx

static from_file(filename, delimiter)[source]¶: read DDP file from file path

get_closest(x=None)[source]¶

return closest index on x axis

If x index is categorical, coerce x to string and find first matching index. If numeric, find the closest value.

If x is None (default), return 0

get_mean(x=None)[source]¶

Returns the mean of the y-index labels weighted by p values in row x

If x is None (default), use first row. Uses self.get_closest(x) to find matching nearest match for x-index label x

get_sdev(x=None)[source]¶

Returns the std dev of the y-index labels weighted by p values in row x

If x is None (default), use first row. Uses self.get_closest(x) to find matching nearest match for x-index label x

get_xx()[source]¶: returns x axis index

get_yy()[source]¶: returns x axis index

init_from(file, delimiter, status_callback=None, source=None)[source]¶: Read DDP data set from file

init_from_other(ddp)[source]¶: copy attributes of other DDP dataset to this one

interpolate_x(newxx, kind='quadratic')[source]¶

custom interpolation method. wrapper around scipy.interp1d.

Parameters:	newxx (list-like) – new x axis kind (str) – interpolation method, passed to scipy.interp1d

interpolate_y(newyy, kind='quadratic')[source]¶

custom interpolation method. wrapper around scipy.interp1d.

Parameters:	newyy (list-like) – new y axis kind (str) – interpolation method, passed to scipy.interp1d

kind()[source]¶: returns model type (“ddp_model”)

lin_p()[source]¶: convert any DDPModel to ddp1 (linear probability) format

log_p()[source]¶: convert any DDPModel to ddp2 (log probability) format

static merge(models)[source]¶

recategorize_x(oldxx, newxx)[source]¶

rescale(as_ddp=True)[source]¶: Can rescale non-ddp (that is, as sampling of continuous distribution)

scale_p(a)[source]¶: coerce to ddp2 (log probability) format and scale by a

scale_y(a)[source]¶: multiply index y (numeric only) by scale factor a

to_ddp(ys=None)[source]¶: coerce to DDP, interpolating along y axis if necessary

transpose()[source]¶: transpose data structure

write(file, delimiter)[source]¶: write CSV to file object

write_file(filename, delimiter)[source]¶: write CSV to file path