openest.models.ddp_model module

class openest.models.ddp_model.DDPModel(p_format=None, source=None, xx_is_categorical=False, xx=None, yy_is_categorical=False, yy=None, pp=None, unaccounted=None, scaled=True)[source]

Bases: openest.models.univariate_model.UnivariateModel, openest.models.memoizable.MemoizableUnivariate

Discrete-Discrete-Probability (DDP) Format

A DDP file describes a dose-response relationship with a limited collection of response outcomes. The dose and response values may be either categorical or sampled at a collection of numerical levels.

<y-value-1>, …, <y-value-N> and <x-value-1>, …, <x-value-N> are either strings (for named categories) or numerical values.

The format of a DDP file is:

<format>,<y-value-1>,<y-value-2>,...
<x-value-1>,p(y1|x1),p(y2|x1),...
<x-value-2>,p(y1|x2),p(y2|x2),...

Below is a sample categorical DDP file:

ddp1,live,dead
control,.5,.5
treated,.9,.1

Below is a sample numerical DDP file:

ddp1,-10.0,-.33333333333,3.33333333333,10.0
0.0,0.5,0.5,0.0,0.0
13.3333333333,0.0,0.5,0.5,0.0
26.6666666667,0.0,0.0,0.5,0.5
40.0,0.0,0.0,0.0,0.5
Parameters:
  • p_format (str) –

    Probability format. May be one of the following values:

    • ddp1 - the p(.) values are simple probabilities (0 < p(.) < 1 and sum p(y|x) = 1)
    • ddp2 - the p(.) values are log probabilities
  • source (str) – Metadata attribute. Name of file this object was read in from.
  • xx_is_categorical (bool) – Indicates whether xx is categorical. False indicates numeric data.
  • xx (list-like) – X axis index
  • yy_is_categorical (bool) – Indicates whether yy is categorical. False indicates numeric data.
  • yy (list-like) – Y axis index
  • pp (array-like) – underlying numpy(?) data array
  • unaccounted (numpy.array) – column of remaining probability. unaccounted = 1-sum(pp, axis=1).
  • scaled (bool) – Indicates whether data has been scaled. If scaled, re-scale so pp.sum(axis=1)==1.
add_to_y(a)[source]

add value a to each element of index y (numeric only)

static combine(one, two)[source]
copy()[source]

copy data and return DDPModel with the same data

static create_lin(yy, xxs)[source]

Create a DDP model by supplying y index and dictionary of p-values

Parameters:
  • yy (list-like) – y-index labels
  • xxs (dict) – dictionary keyed with x-index values with p-values for vals
draw_sample(x=None)[source]

Randomly sample label from y-index using p values in row x

If x is None (default), use first row. Uses self.get_closest(x) to find matching nearest match for x-index label x

eval_pval(x, p, threshold=0.001)[source]

Inverse CDF Evaluation

Returns the value of $y$ that corresponds to a given p-value: $F^{-1}(p | x)$.

eval_pval_index(ii, p, threshold=0.001)[source]
filter_x(xx)[source]

Slice DDPModel data such that the values of the x index == xx

static from_file(filename, delimiter)[source]

read DDP file from file path

get_closest(x=None)[source]

return closest index on x axis

If x index is categorical, coerce x to string and find first matching index. If numeric, find the closest value.

If x is None (default), return 0

get_mean(x=None)[source]

Returns the mean of the y-index labels weighted by p values in row x

If x is None (default), use first row. Uses self.get_closest(x) to find matching nearest match for x-index label x

get_sdev(x=None)[source]

Returns the std dev of the y-index labels weighted by p values in row x

If x is None (default), use first row. Uses self.get_closest(x) to find matching nearest match for x-index label x

get_xx()[source]

returns x axis index

get_yy()[source]

returns x axis index

init_from(file, delimiter, status_callback=None, source=None)[source]

Read DDP data set from file

init_from_other(ddp)[source]

copy attributes of other DDP dataset to this one

interpolate_x(newxx, kind='quadratic')[source]

custom interpolation method. wrapper around scipy.interp1d.

Parameters:
  • newxx (list-like) – new x axis
  • kind (str) – interpolation method, passed to scipy.interp1d
interpolate_y(newyy, kind='quadratic')[source]

custom interpolation method. wrapper around scipy.interp1d.

Parameters:
  • newyy (list-like) – new y axis
  • kind (str) – interpolation method, passed to scipy.interp1d
kind()[source]

returns model type (“ddp_model”)

lin_p()[source]

convert any DDPModel to ddp1 (linear probability) format

log_p()[source]

convert any DDPModel to ddp2 (log probability) format

static merge(models)[source]
recategorize_x(oldxx, newxx)[source]
rescale(as_ddp=True)[source]

Can rescale non-ddp (that is, as sampling of continuous distribution)

scale_p(a)[source]

coerce to ddp2 (log probability) format and scale by a

scale_y(a)[source]

multiply index y (numeric only) by scale factor a

to_ddp(ys=None)[source]

coerce to DDP, interpolating along y axis if necessary

transpose()[source]

transpose data structure

write(file, delimiter)[source]

write CSV to file object

write_file(filename, delimiter)[source]

write CSV to file path