climb.tool.impl.data_suite.third_party.copulas.multivariate package

Submodules

climb.tool.impl.data_suite.third_party.copulas.multivariate.base module

class climb.tool.impl.data_suite.third_party.copulas.multivariate.base.Multivariate(random_seed=None)[source]

Bases: object

Abstract class for a multi-variate copula object.

cdf(X)[source]

Compute the cumulative distribution value for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the cumulative distribution will be computed.

Returns:

Cumulative distribution values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

check_fit()[source]

Check whether this model has already been fit to a random variable.

Raise a NotFittedError if it has not.

Raises:

NotFittedError – if the model is not fitted.

cumulative_distribution(X)[source]

Compute the cumulative distribution value for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the cumulative distribution will be computed.

Returns:

Cumulative distribution values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

fit(X)[source]

Fit the model to table with values from multiple random variables.

Parameters:

X (pandas.DataFrame) – Values of the random variables.

fitted = False
classmethod from_dict(params)[source]

Create a new instance from a parameters dictionary.

Parameters:

params (dict) – Parameters of the distribution, in the same format as the one returned by the to_dict method.

Returns:

Instance of the distribution defined on the parameters.

Return type:

Multivariate

classmethod load(path)[source]

Load a Multivariate instance from a pickle file.

Parameters:

path (str) – Path to the pickle file where the distribution has been serialized.

Returns:

Loaded instance.

Return type:

Multivariate

log_probability_density(X)[source]

Compute the log of the probability density for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the log probability density will be computed.

Returns:

Log probability density values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

pdf(X)[source]

Compute the probability density for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the probability density will be computed.

Returns:

Probability density values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

probability_density(X)[source]

Compute the probability density for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the probability density will be computed.

Returns:

Probability density values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

sample(num_rows=1)[source]

Sample values from this model.

Argument:
num_rows (int):

Number of rows to sample.

Returns:

Array of shape (n_samples, *) with values randomly sampled from this model distribution.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

save(path)[source]

Serialize this multivariate instance using pickle.

Parameters:

path (str) – Path to where this distribution will be serialized.

to_dict()[source]

Return a dict with the parameters to replicate this object.

Returns:

Parameters of this distribution.

Return type:

dict

climb.tool.impl.data_suite.third_party.copulas.multivariate.gaussian module

class climb.tool.impl.data_suite.third_party.copulas.multivariate.gaussian.GaussianMultivariate(*args, **kwargs)[source]

Bases: Multivariate

Class for a multivariate distribution that uses the Gaussian copula.

Parameters:

distribution (str or dict) – Fully qualified name of the class to be used for modeling the marginal distributions or a dictionary mapping column names to the fully qualified distribution names.

columns = None
covariance = None
cumulative_distribution(X)[source]

Compute the cumulative distribution value for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the cumulative distribution will be computed.

Returns:

Cumulative distribution values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

fit(X, *args, **kwargs)

Fit the model to table with values from multiple random variables.

Parameters:

X (pandas.DataFrame) – Values of the random variables.

classmethod from_dict(copula_dict)[source]

Create a new instance from a parameters dictionary.

Parameters:

params (dict) – Parameters of the distribution, in the same format as the one returned by the to_dict method.

Returns:

Instance of the distribution defined on the parameters.

Return type:

Multivariate

probability_density(X)[source]

Compute the probability density for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the probability density will be computed.

Returns:

Probability density values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

sample(*args, **kwargs)

Sample values from this model.

Argument:
num_rows (int):

Number of rows to sample.

Returns:

Array of shape (n_samples, *) with values randomly sampled from this model distribution.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

to_dict()[source]

Return a dict with the parameters to replicate this object.

Returns:

Parameters of this distribution.

Return type:

dict

univariates = None

climb.tool.impl.data_suite.third_party.copulas.multivariate.tree module

class climb.tool.impl.data_suite.third_party.copulas.multivariate.tree.CenterTree(random_seed=None)[source]

Bases: Tree

get_anchor()[source]

Find anchor variable with highest sum of dependence with the rest.

Returns:

Anchor variable.

Return type:

int

tree_type = 0
class climb.tool.impl.data_suite.third_party.copulas.multivariate.tree.DirectTree(random_seed=None)[source]

Bases: Tree

tree_type = 1
class climb.tool.impl.data_suite.third_party.copulas.multivariate.tree.Edge(index, left, right, copula_name, copula_theta)[source]

Bases: object

classmethod from_dict(edge_dict)[source]

Create a new instance from a parameters dictionary.

Parameters:

params (dict) – Parameters of the Edge, in the same format as the one returned by the to_dict method.

Returns:

Instance of the edge defined on the parameters.

Return type:

Edge

classmethod get_child_edge(index, left_parent, right_parent)[source]

Construct a child edge from two parent edges.

Parameters:
  • index (int) – Index of the new Edge.

  • left_parent (Edge) – Left parent

  • right_parent (Edge) – Right parent

Returns:

The new child edge.

Return type:

Edge

classmethod get_conditional_uni(left_parent, right_parent)[source]

Identify pair univariate value from parents.

Parameters:
  • left_parent (Edge) – left parent

  • right_parent (Edge) – right parent

Returns:

left and right parents univariate.

Return type:

tuple[np.ndarray, np.ndarray]

get_likelihood(uni_matrix)[source]

Compute likelihood given a U matrix.

Parameters:

uni_matrix (numpy.array) – Matrix to compute the likelihood.

Returns:

likelihood and conditional values.

Return type:

tuple (np.ndarray, np.ndarray, np.array)

is_adjacent(another_edge)[source]

Check if two edges are adjacent.

Parameters:

another_edge (Edge) – edge object of another edge

Returns:

True if the two edges are adjacent.

Return type:

bool

static sort_edge(edges)[source]

Sort iterable of edges first by left node indices then right.

Parameters:

edges (list[Edge]) – List of edges to be sorted.

Returns:

Sorted list by left and right node indices.

Return type:

list[Edge]

to_dict()[source]

Return a dict with the parameters to replicate this Edge.

Returns:

Parameters of this Edge.

Return type:

dict

class climb.tool.impl.data_suite.third_party.copulas.multivariate.tree.RegularTree(random_seed=None)[source]

Bases: Tree

tree_type = 2
class climb.tool.impl.data_suite.third_party.copulas.multivariate.tree.Tree(random_seed=None)[source]

Bases: Multivariate

Helper class to instantiate a single tree in the vine model.

fit(index, n_nodes, tau_matrix, previous_tree, edges=None)[source]

Fit this tree object.

Parameters:
  • index (int) – index of the tree.

  • n_nodes (int) – number of nodes in the tree.

  • tau_matrix (numpy.array) – kendall’s tau matrix of the data, shape (n_nodes, n_nodes).

  • previous_tree (Tree) – tree object of previous level.

fitted = False
classmethod from_dict(tree_dict, previous=None)[source]

Create a new instance from a parameters dictionary.

Parameters:

params (dict) – Parameters of the Tree, in the same format as the one returned by the to_dict method.

Returns:

Instance of the tree defined on the parameters.

Return type:

Tree

get_adjacent_matrix()[source]

Get adjacency matrix.

Returns:

adjacency matrix

Return type:

numpy.ndarray

get_likelihood(uni_matrix)[source]

Compute likelihood of the tree given an U matrix.

Parameters:

uni_matrix (numpy.array) – univariate matrix to evaluate likelihood on.

Returns:

likelihood of the current tree, next level conditional univariate matrix

Return type:

tuple[float, numpy.array]

get_tau_matrix()[source]

Get tau matrix for adjacent pairs.

Returns:

tau matrix for the current tree

Return type:

tau (numpy.ndarray)

prepare_next_tree()[source]

Prepare conditional U matrix for next tree.

to_dict()[source]

Return a dict with the parameters to replicate this Tree.

Returns:

Parameters of this Tree.

Return type:

dict

tree_type = None
class climb.tool.impl.data_suite.third_party.copulas.multivariate.tree.TreeTypes(value)[source]

Bases: Enum

An enumeration.

CENTER = 0
DIRECT = 1
REGULAR = 2
climb.tool.impl.data_suite.third_party.copulas.multivariate.tree.get_tree(tree_type)[source]

Get a Tree instance of the specified type.

Parameters:

tree_type (str or TreeTypes) – Type of tree of which to get an instance.

Returns:

Instance of a Tree of the specified type.

Return type:

Tree

climb.tool.impl.data_suite.third_party.copulas.multivariate.vine module

class climb.tool.impl.data_suite.third_party.copulas.multivariate.vine.VineCopula(*args, **kwargs)[source]

Bases: Multivariate

Vine copula model.

A \(vine\) is a graphical representation of one factorization of the n-variate probability distribution in terms of \(n(n − 1)/2\) bivariate copulas by means of the chain rule.

It consists of a sequence of levels and as many levels as variables. Each level consists of a tree (no isolated nodes and no loops) satisfying that if it has \(n\) nodes there must be \(n − 1\) edges.

Each node in tree \(T_1\) is a variable and edges are couplings of variables constructed with bivariate copulas.

Each node in tree \(T_{k+1}\) is a coupling in \(T_{k}\), expressed by the copula of the variables; while edges are couplings between two vertices that must have one variable in common, becoming a conditioning variable in the bivariate copula. Thus, every level has one node less than the former. Once all the trees are drawn, the factorization is the product of all the nodes.

Parameters:
  • vine_type (str) – type of the vine copula, could be ‘center’,’direct’,’regular’

  • random_seed (int) – Random seed to use.

model

Distribution to compute univariates.

Type:

copulas.univariate.Univariate

u_matrix

Univariates.

Type:

numpy.array

n_sample

Number of samples.

Type:

int

n_var

Number of variables.

Type:

int

columns

Names of the variables.

Type:

pandas.Series

tau_mat

Kendall correlation parameters for data.

Type:

numpy.array

truncated

Max level used to build the vine.

Type:

int

depth

Vine depth.

Type:

int

trees

List of trees used by this vine.

Type:

list[Tree]

ppfs

percent point functions from the univariates used by this vine.

Type:

list[callable]

fit(X, *args, **kwargs)

Fit the model to table with values from multiple random variables.

Parameters:

X (pandas.DataFrame) – Values of the random variables.

classmethod from_dict(vine_dict)[source]

Create a new instance from a parameters dictionary.

Parameters:

params (dict) – Parameters of the Vine, in the same format as the one returned by the to_dict method.

Returns:

Instance of the Vine defined on the parameters.

Return type:

Vine

get_likelihood(uni_matrix)[source]

Compute likelihood of the vine.

sample(*args, **kwargs)

Sample values from this model.

Argument:
num_rows (int):

Number of rows to sample.

Returns:

Array of shape (n_samples, *) with values randomly sampled from this model distribution.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

to_dict()[source]

Return a dict with the parameters to replicate this Vine.

Returns:

Parameters of this Vine.

Return type:

dict

train_vine(tree_type)[source]

Build the wine.

  1. For the construction of the first tree \(T_1\), assign one node to each variable and then couple them by maximizing the measure of association considered. Different vines impose different constraints on this construction. When those are applied different trees are achieved at this level.

  2. Select the copula that best fits to the pair of variables coupled by each edge in \(T_1\).

  3. Let \(C_{ij}(u_i , u_j )\) be the copula for a given edge \((u_i, u_j)\) in \(T_1\). Then for every edge in \(T_1\), compute either

    \[{v^1}_{j|i} = \frac{\partial C_{ij}(u_i, u_j)}{\partial u_j}\]

    or similarly \({v^1}_{i|j}\), which are conditional cdfs. When finished with all the edges, construct the new matrix with \(v^1\) that has one less column u.

  4. Set k = 2.

  5. Assign one node of \(T_k\) to each edge of \(T_ {k−1}\). The structure of \(T_{k−1}\) imposes a set of constraints on which edges of \(T_k\) are realizable. Hence the next step is to get a linked list of the accesible nodes for every node in \(T_k\).

  6. As in step 1, nodes of \(T_k\) are coupled maximizing the measure of association considered and satisfying the constraints impose by the kind of vine employed plus the set of constraints imposed by tree \(T_{k−1}\).

  7. Select the copula that best fit to each edge created in \(T_k\).

  8. Recompute matrix \(v_k\) as in step 4, but taking \(T_k\) and \(vk−1\) instead of \(T_1\) and u.

  9. Set \(k = k + 1\) and repeat from (5) until all the trees are constructed.

Parameters:

tree_type (str or TreeTypes) – Type of trees to use.

Module contents

class climb.tool.impl.data_suite.third_party.copulas.multivariate.GaussianMultivariate(*args, **kwargs)[source]

Bases: Multivariate

Class for a multivariate distribution that uses the Gaussian copula.

Parameters:

distribution (str or dict) – Fully qualified name of the class to be used for modeling the marginal distributions or a dictionary mapping column names to the fully qualified distribution names.

columns = None
covariance = None
cumulative_distribution(X)[source]

Compute the cumulative distribution value for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the cumulative distribution will be computed.

Returns:

Cumulative distribution values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

fit(X, *args, **kwargs)

Fit the model to table with values from multiple random variables.

Parameters:

X (pandas.DataFrame) – Values of the random variables.

classmethod from_dict(copula_dict)[source]

Create a new instance from a parameters dictionary.

Parameters:

params (dict) – Parameters of the distribution, in the same format as the one returned by the to_dict method.

Returns:

Instance of the distribution defined on the parameters.

Return type:

Multivariate

probability_density(X)[source]

Compute the probability density for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the probability density will be computed.

Returns:

Probability density values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

sample(*args, **kwargs)

Sample values from this model.

Argument:
num_rows (int):

Number of rows to sample.

Returns:

Array of shape (n_samples, *) with values randomly sampled from this model distribution.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

to_dict()[source]

Return a dict with the parameters to replicate this object.

Returns:

Parameters of this distribution.

Return type:

dict

univariates = None
class climb.tool.impl.data_suite.third_party.copulas.multivariate.Multivariate(random_seed=None)[source]

Bases: object

Abstract class for a multi-variate copula object.

cdf(X)[source]

Compute the cumulative distribution value for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the cumulative distribution will be computed.

Returns:

Cumulative distribution values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

check_fit()[source]

Check whether this model has already been fit to a random variable.

Raise a NotFittedError if it has not.

Raises:

NotFittedError – if the model is not fitted.

cumulative_distribution(X)[source]

Compute the cumulative distribution value for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the cumulative distribution will be computed.

Returns:

Cumulative distribution values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

fit(X)[source]

Fit the model to table with values from multiple random variables.

Parameters:

X (pandas.DataFrame) – Values of the random variables.

fitted = False
classmethod from_dict(params)[source]

Create a new instance from a parameters dictionary.

Parameters:

params (dict) – Parameters of the distribution, in the same format as the one returned by the to_dict method.

Returns:

Instance of the distribution defined on the parameters.

Return type:

Multivariate

classmethod load(path)[source]

Load a Multivariate instance from a pickle file.

Parameters:

path (str) – Path to the pickle file where the distribution has been serialized.

Returns:

Loaded instance.

Return type:

Multivariate

log_probability_density(X)[source]

Compute the log of the probability density for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the log probability density will be computed.

Returns:

Log probability density values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

pdf(X)[source]

Compute the probability density for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the probability density will be computed.

Returns:

Probability density values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

probability_density(X)[source]

Compute the probability density for each point in X.

Parameters:

X (pandas.DataFrame) – Values for which the probability density will be computed.

Returns:

Probability density values for points in X.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

sample(num_rows=1)[source]

Sample values from this model.

Argument:
num_rows (int):

Number of rows to sample.

Returns:

Array of shape (n_samples, *) with values randomly sampled from this model distribution.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

save(path)[source]

Serialize this multivariate instance using pickle.

Parameters:

path (str) – Path to where this distribution will be serialized.

to_dict()[source]

Return a dict with the parameters to replicate this object.

Returns:

Parameters of this distribution.

Return type:

dict

class climb.tool.impl.data_suite.third_party.copulas.multivariate.Tree(random_seed=None)[source]

Bases: Multivariate

Helper class to instantiate a single tree in the vine model.

fit(index, n_nodes, tau_matrix, previous_tree, edges=None)[source]

Fit this tree object.

Parameters:
  • index (int) – index of the tree.

  • n_nodes (int) – number of nodes in the tree.

  • tau_matrix (numpy.array) – kendall’s tau matrix of the data, shape (n_nodes, n_nodes).

  • previous_tree (Tree) – tree object of previous level.

fitted = False
classmethod from_dict(tree_dict, previous=None)[source]

Create a new instance from a parameters dictionary.

Parameters:

params (dict) – Parameters of the Tree, in the same format as the one returned by the to_dict method.

Returns:

Instance of the tree defined on the parameters.

Return type:

Tree

get_adjacent_matrix()[source]

Get adjacency matrix.

Returns:

adjacency matrix

Return type:

numpy.ndarray

get_likelihood(uni_matrix)[source]

Compute likelihood of the tree given an U matrix.

Parameters:

uni_matrix (numpy.array) – univariate matrix to evaluate likelihood on.

Returns:

likelihood of the current tree, next level conditional univariate matrix

Return type:

tuple[float, numpy.array]

get_tau_matrix()[source]

Get tau matrix for adjacent pairs.

Returns:

tau matrix for the current tree

Return type:

tau (numpy.ndarray)

prepare_next_tree()[source]

Prepare conditional U matrix for next tree.

to_dict()[source]

Return a dict with the parameters to replicate this Tree.

Returns:

Parameters of this Tree.

Return type:

dict

tree_type = None
class climb.tool.impl.data_suite.third_party.copulas.multivariate.TreeTypes(value)[source]

Bases: Enum

An enumeration.

CENTER = 0
DIRECT = 1
REGULAR = 2
class climb.tool.impl.data_suite.third_party.copulas.multivariate.VineCopula(*args, **kwargs)[source]

Bases: Multivariate

Vine copula model.

A \(vine\) is a graphical representation of one factorization of the n-variate probability distribution in terms of \(n(n − 1)/2\) bivariate copulas by means of the chain rule.

It consists of a sequence of levels and as many levels as variables. Each level consists of a tree (no isolated nodes and no loops) satisfying that if it has \(n\) nodes there must be \(n − 1\) edges.

Each node in tree \(T_1\) is a variable and edges are couplings of variables constructed with bivariate copulas.

Each node in tree \(T_{k+1}\) is a coupling in \(T_{k}\), expressed by the copula of the variables; while edges are couplings between two vertices that must have one variable in common, becoming a conditioning variable in the bivariate copula. Thus, every level has one node less than the former. Once all the trees are drawn, the factorization is the product of all the nodes.

Parameters:
  • vine_type (str) – type of the vine copula, could be ‘center’,’direct’,’regular’

  • random_seed (int) – Random seed to use.

model

Distribution to compute univariates.

Type:

copulas.univariate.Univariate

u_matrix

Univariates.

Type:

numpy.array

n_sample

Number of samples.

Type:

int

n_var

Number of variables.

Type:

int

columns

Names of the variables.

Type:

pandas.Series

tau_mat

Kendall correlation parameters for data.

Type:

numpy.array

truncated

Max level used to build the vine.

Type:

int

depth

Vine depth.

Type:

int

trees

List of trees used by this vine.

Type:

list[Tree]

ppfs

percent point functions from the univariates used by this vine.

Type:

list[callable]

fit(X, *args, **kwargs)

Fit the model to table with values from multiple random variables.

Parameters:

X (pandas.DataFrame) – Values of the random variables.

classmethod from_dict(vine_dict)[source]

Create a new instance from a parameters dictionary.

Parameters:

params (dict) – Parameters of the Vine, in the same format as the one returned by the to_dict method.

Returns:

Instance of the Vine defined on the parameters.

Return type:

Vine

get_likelihood(uni_matrix)[source]

Compute likelihood of the vine.

sample(*args, **kwargs)

Sample values from this model.

Argument:
num_rows (int):

Number of rows to sample.

Returns:

Array of shape (n_samples, *) with values randomly sampled from this model distribution.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

to_dict()[source]

Return a dict with the parameters to replicate this Vine.

Returns:

Parameters of this Vine.

Return type:

dict

train_vine(tree_type)[source]

Build the wine.

  1. For the construction of the first tree \(T_1\), assign one node to each variable and then couple them by maximizing the measure of association considered. Different vines impose different constraints on this construction. When those are applied different trees are achieved at this level.

  2. Select the copula that best fits to the pair of variables coupled by each edge in \(T_1\).

  3. Let \(C_{ij}(u_i , u_j )\) be the copula for a given edge \((u_i, u_j)\) in \(T_1\). Then for every edge in \(T_1\), compute either

    \[{v^1}_{j|i} = \frac{\partial C_{ij}(u_i, u_j)}{\partial u_j}\]

    or similarly \({v^1}_{i|j}\), which are conditional cdfs. When finished with all the edges, construct the new matrix with \(v^1\) that has one less column u.

  4. Set k = 2.

  5. Assign one node of \(T_k\) to each edge of \(T_ {k−1}\). The structure of \(T_{k−1}\) imposes a set of constraints on which edges of \(T_k\) are realizable. Hence the next step is to get a linked list of the accesible nodes for every node in \(T_k\).

  6. As in step 1, nodes of \(T_k\) are coupled maximizing the measure of association considered and satisfying the constraints impose by the kind of vine employed plus the set of constraints imposed by tree \(T_{k−1}\).

  7. Select the copula that best fit to each edge created in \(T_k\).

  8. Recompute matrix \(v_k\) as in step 4, but taking \(T_k\) and \(vk−1\) instead of \(T_1\) and u.

  9. Set \(k = k + 1\) and repeat from (5) until all the trees are constructed.

Parameters:

tree_type (str or TreeTypes) – Type of trees to use.