climb.tool.impl.data_suite.third_party.copulas.univariate package¶

Submodules¶

climb.tool.impl.data_suite.third_party.copulas.univariate.base module¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.base.BoundedType(value)[source]¶

Bases: Enum

An enumeration.

BOUNDED = 2¶

SEMI_BOUNDED = 1¶

UNBOUNDED = 0¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.base.ParametricType(value)[source]¶

Bases: Enum

An enumeration.

NON_PARAMETRIC = 0¶

PARAMETRIC = 1¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.base.ScipyModel[source]¶

Bases: Univariate, ABC

Wrapper for scipy models.

This class makes the probability_density, cumulative_distribution, percent_point and sample point at the underlying pdf, cdf, ppd and rvs methods respectively.

fit, _get_params and _set_params must be implemented by the subclasses.

MODEL_CLASS = None¶

cumulative_distribution(X)[source]¶

Compute the cumulative distribution value for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1).
Returns:: Cumulative distribution values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

fit(X)[source]¶

Fit the model to a random variable.

Parameters:: X (numpy.ndarray) – Values of the random variable. It must have shape (n, 1).

log_probability_density(X)[source]¶

Compute the log of the probability density for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the log probability density will be computed. It must have shape (n, 1).
Returns:: Log probability density values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

percent_point(U)[source]¶

Compute the inverse cumulative distribution value for each point in U.

Parameters:: U (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1) and values must be in [0,1].
Returns:: Inverse cumulative distribution values for points in U.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

probability_density(X)[source]¶

Compute the probability density for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the probability density will be computed. It must have shape (n, 1).
Returns:: Probability density values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

sample(n_samples=1)[source]¶

Sample values from this model.

Argument:

n_samples (int):: Number of values to sample

Returns:: Array of shape (n_samples, 1) with values randomly sampled from this model distribution.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

class climb.tool.impl.data_suite.third_party.copulas.univariate.base.Univariate(*args, **kwargs)[source]¶

Bases: object

Univariate Distribution.

Parameters:

candidates (list[str or type or Univariate]) – List of candidates to select the best univariate from. It can be a list of strings representing Univariate FQNs, or a list of Univariate subclasses or a list of instances.
parametric (ParametricType) – If not None, only select subclasses of this type. Ignored if candidates is passed.
bounded (BoundedType) – If not None, only select subclasses of this type. Ignored if candidates is passed.
random_seed (int) – Random seed to use.
selection_sample_size (int) – Size of the subsample to use for candidate selection. If None, all the data is used.

BOUNDED = 0¶

PARAMETRIC = 0¶

cdf(X)[source]¶

Compute the cumulative distribution value for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1).
Returns:: Cumulative distribution values for points in X.
Return type:: numpy.ndarray

check_fit()[source]¶

Check whether this model has already been fit to a random variable.

Raise a NotFittedError if it has not.

Raises:: NotFittedError – if the model is not fitted.

cumulative_distribution(X)[source]¶

Compute the cumulative distribution value for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1).
Returns:: Cumulative distribution values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

fit(X)[source]¶

Fit the model to a random variable.

Parameters:: X (numpy.ndarray) – Values of the random variable. It must have shape (n, 1).

fitted = False¶

classmethod from_dict(params)[source]¶

Build a distribution from its params dict.

Parameters:: params (dict) – Dictionary containing the FQN of the distribution and the necessary parameters to rebuild it. The input format is exactly the same that is outputted by the distribution class to_dict method.
Returns:: Distribution instance.
Return type:: Univariate

classmethod load(path)[source]¶

Load a Univariate instance from a pickle file.

Parameters:: path (str) – Path to the pickle file where the distribution has been serialized.
Returns:: Loaded instance.
Return type:: Univariate

log_probability_density(X)[source]¶

Compute the log of the probability density for each point in X.

It should be overridden with numerically stable variants whenever possible.

Parameters:: X (numpy.ndarray) – Values for which the log probability density will be computed. It must have shape (n, 1).
Returns:: Log probability density values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

pdf(X)[source]¶

Compute the probability density for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the probability density will be computed. It must have shape (n, 1).
Returns:: Probability density values for points in X.
Return type:: numpy.ndarray

percent_point(U)[source]¶

Compute the inverse cumulative distribution value for each point in U.

Parameters:: U (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1) and values must be in [0,1].
Returns:: Inverse cumulative distribution values for points in U.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

ppf(U)[source]¶

Compute the inverse cumulative distribution value for each point in U.

Parameters:: U (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1) and values must be in [0,1].
Returns:: Inverse cumulative distribution values for points in U.
Return type:: numpy.ndarray

probability_density(X)[source]¶

Compute the probability density for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the probability density will be computed. It must have shape (n, 1).
Returns:: Probability density values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

sample(n_samples=1)[source]¶

Sample values from this model.

Argument:

n_samples (int):: Number of values to sample

Returns:: Array of shape (n_samples, 1) with values randomly sampled from this model distribution.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

save(path)[source]¶

Serialize this univariate instance using pickle.

Parameters:: path (str) – Path to where this distribution will be serialized.

to_dict()[source]¶

Return the parameters of this model in a dict.

Returns:: Dictionary containing the distribution type and all the parameters that define the distribution.
Return type:: dict
Raises:: NotFittedError – if the model is not fitted.

climb.tool.impl.data_suite.third_party.copulas.univariate.beta module¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.beta.BetaUnivariate[source]¶

Bases: ScipyModel

Wrapper around scipy.stats.beta.

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.beta.html

BOUNDED = 2¶

MODEL_CLASS = <scipy.stats._continuous_distns.beta_gen object>¶

PARAMETRIC = 1¶

climb.tool.impl.data_suite.third_party.copulas.univariate.gamma module¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.gamma.GammaUnivariate[source]¶

Bases: ScipyModel

Wrapper around scipy.stats.gamma.

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gamma.html

BOUNDED = 1¶

MODEL_CLASS = <scipy.stats._continuous_distns.gamma_gen object>¶

PARAMETRIC = 1¶

climb.tool.impl.data_suite.third_party.copulas.univariate.gaussian module¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.gaussian.GaussianUnivariate[source]¶

Bases: ScipyModel

Gaussian univariate model.

BOUNDED = 0¶

MODEL_CLASS = <scipy.stats._continuous_distns.norm_gen object>¶

PARAMETRIC = 1¶

climb.tool.impl.data_suite.third_party.copulas.univariate.gaussian_kde module¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.gaussian_kde.GaussianKDE(*args, **kwargs)[source]¶

Bases: ScipyModel

A wrapper for gaussian Kernel density estimation implemented in scipy.stats toolbox. gaussian_kde is slower than statsmodels but allows more flexibility.

When a sample_size is provided the fit method will sample the data, and mask the real information. Also, ensure the number of entries will be always the value of sample_size.

Parameters:: sample_size (int) – amount of parameters to sample

BOUNDED = 0¶

MODEL_CLASS¶: alias of gaussian_kde

PARAMETRIC = 0¶

cumulative_distribution(X)[source]¶

Compute the cumulative distribution value for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1).
Returns:: Cumulative distribution values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

percent_point(U, method='chandrupatla')[source]¶

Compute the inverse cumulative distribution value for each point in U.

Parameters:

U (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1) and values must be in [0,1].
method (str) – Whether to use the chandrupatla or bisect solver.

Returns:

Inverse cumulative distribution values for points in U.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

probability_density(X)[source]¶

Compute the probability density for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the probability density will be computed. It must have shape (n, 1).
Returns:: Probability density values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

sample(n_samples=1)[source]¶

Sample values from this model.

Argument:

n_samples (int):: Number of values to sample

Returns:: Array of shape (n_samples, 1) with values randomly sampled from this model distribution.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

climb.tool.impl.data_suite.third_party.copulas.univariate.log_laplace module¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.log_laplace.LogLaplace[source]¶

Bases: ScipyModel

Wrapper around scipy.stats.loglaplace.

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.loglaplace.html

BOUNDED = 1¶

MODEL_CLASS = <scipy.stats._continuous_distns.loglaplace_gen object>¶

PARAMETRIC = 1¶

climb.tool.impl.data_suite.third_party.copulas.univariate.selection module¶

climb.tool.impl.data_suite.third_party.copulas.univariate.selection.select_univariate(X, candidates)[source]¶

Select the best univariate class for this data.

Parameters:

X (pandas.DataFrame) – Data for which be best univariate must be found.
candidates (list[Univariate]) – List of Univariate subclasses (or instances of those) to choose from.

Returns:

Instance of the selected candidate.

Return type:

Univariate

climb.tool.impl.data_suite.third_party.copulas.univariate.student_t module¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.student_t.StudentTUnivariate[source]¶

Bases: ScipyModel

Wrapper around scipy.stats.t.

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html

BOUNDED = 0¶

MODEL_CLASS = <scipy.stats._continuous_distns.t_gen object>¶

PARAMETRIC = 1¶

climb.tool.impl.data_suite.third_party.copulas.univariate.truncated_gaussian module¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.truncated_gaussian.TruncatedGaussian(*args, **kwargs)[source]¶

Bases: ScipyModel

Wrapper around scipy.stats.truncnorm.

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.truncnorm.html

BOUNDED = 2¶

MODEL_CLASS = <scipy.stats._continuous_distns.truncnorm_gen object>¶

PARAMETRIC = 1¶

climb.tool.impl.data_suite.third_party.copulas.univariate.uniform module¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.uniform.UniformUnivariate[source]¶

Bases: ScipyModel

Uniform univariate model.

BOUNDED = 2¶

MODEL_CLASS = <scipy.stats._continuous_distns.uniform_gen object>¶

PARAMETRIC = 1¶

Module contents¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.BetaUnivariate[source]¶

Bases: ScipyModel

Wrapper around scipy.stats.beta.

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.beta.html

BOUNDED = 2¶

MODEL_CLASS = <scipy.stats._continuous_distns.beta_gen object>¶

PARAMETRIC = 1¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.BoundedType(value)[source]¶

Bases: Enum

An enumeration.

BOUNDED = 2¶

SEMI_BOUNDED = 1¶

UNBOUNDED = 0¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.GammaUnivariate[source]¶

Bases: ScipyModel

Wrapper around scipy.stats.gamma.

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gamma.html

BOUNDED = 1¶

MODEL_CLASS = <scipy.stats._continuous_distns.gamma_gen object>¶

PARAMETRIC = 1¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.GaussianKDE(*args, **kwargs)[source]¶

Bases: ScipyModel

A wrapper for gaussian Kernel density estimation implemented in scipy.stats toolbox. gaussian_kde is slower than statsmodels but allows more flexibility.

When a sample_size is provided the fit method will sample the data, and mask the real information. Also, ensure the number of entries will be always the value of sample_size.

Parameters:: sample_size (int) – amount of parameters to sample

BOUNDED = 0¶

MODEL_CLASS¶: alias of gaussian_kde

PARAMETRIC = 0¶

cumulative_distribution(X)[source]¶

Compute the cumulative distribution value for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1).
Returns:: Cumulative distribution values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

percent_point(U, method='chandrupatla')[source]¶

Compute the inverse cumulative distribution value for each point in U.

Parameters:

U (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1) and values must be in [0,1].
method (str) – Whether to use the chandrupatla or bisect solver.

Returns:

Inverse cumulative distribution values for points in U.

Return type:

numpy.ndarray

Raises:

NotFittedError – if the model is not fitted.

probability_density(X)[source]¶

Compute the probability density for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the probability density will be computed. It must have shape (n, 1).
Returns:: Probability density values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

sample(n_samples=1)[source]¶

Sample values from this model.

Argument:

n_samples (int):: Number of values to sample

Returns:: Array of shape (n_samples, 1) with values randomly sampled from this model distribution.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

class climb.tool.impl.data_suite.third_party.copulas.univariate.GaussianUnivariate[source]¶

Bases: ScipyModel

Gaussian univariate model.

BOUNDED = 0¶

MODEL_CLASS = <scipy.stats._continuous_distns.norm_gen object>¶

PARAMETRIC = 1¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.LogLaplace[source]¶

Bases: ScipyModel

Wrapper around scipy.stats.loglaplace.

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.loglaplace.html

BOUNDED = 1¶

MODEL_CLASS = <scipy.stats._continuous_distns.loglaplace_gen object>¶

PARAMETRIC = 1¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.ParametricType(value)[source]¶

Bases: Enum

An enumeration.

NON_PARAMETRIC = 0¶

PARAMETRIC = 1¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.StudentTUnivariate[source]¶

Bases: ScipyModel

Wrapper around scipy.stats.t.

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html

BOUNDED = 0¶

MODEL_CLASS = <scipy.stats._continuous_distns.t_gen object>¶

PARAMETRIC = 1¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.TruncatedGaussian(*args, **kwargs)[source]¶

Bases: ScipyModel

Wrapper around scipy.stats.truncnorm.

Documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.truncnorm.html

BOUNDED = 2¶

MODEL_CLASS = <scipy.stats._continuous_distns.truncnorm_gen object>¶

PARAMETRIC = 1¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.UniformUnivariate[source]¶

Bases: ScipyModel

Uniform univariate model.

BOUNDED = 2¶

MODEL_CLASS = <scipy.stats._continuous_distns.uniform_gen object>¶

PARAMETRIC = 1¶

class climb.tool.impl.data_suite.third_party.copulas.univariate.Univariate(*args, **kwargs)[source]¶

Bases: object

Univariate Distribution.

Parameters:

candidates (list[str or type or Univariate]) – List of candidates to select the best univariate from. It can be a list of strings representing Univariate FQNs, or a list of Univariate subclasses or a list of instances.
parametric (ParametricType) – If not None, only select subclasses of this type. Ignored if candidates is passed.
bounded (BoundedType) – If not None, only select subclasses of this type. Ignored if candidates is passed.
random_seed (int) – Random seed to use.
selection_sample_size (int) – Size of the subsample to use for candidate selection. If None, all the data is used.

BOUNDED = 0¶

PARAMETRIC = 0¶

cdf(X)[source]¶

Compute the cumulative distribution value for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1).
Returns:: Cumulative distribution values for points in X.
Return type:: numpy.ndarray

check_fit()[source]¶

Check whether this model has already been fit to a random variable.

Raise a NotFittedError if it has not.

Raises:: NotFittedError – if the model is not fitted.

cumulative_distribution(X)[source]¶

Compute the cumulative distribution value for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1).
Returns:: Cumulative distribution values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

fit(X)[source]¶

Fit the model to a random variable.

Parameters:: X (numpy.ndarray) – Values of the random variable. It must have shape (n, 1).

fitted = False¶

classmethod from_dict(params)[source]¶

Build a distribution from its params dict.

Parameters:: params (dict) – Dictionary containing the FQN of the distribution and the necessary parameters to rebuild it. The input format is exactly the same that is outputted by the distribution class to_dict method.
Returns:: Distribution instance.
Return type:: Univariate

classmethod load(path)[source]¶

Load a Univariate instance from a pickle file.

Parameters:: path (str) – Path to the pickle file where the distribution has been serialized.
Returns:: Loaded instance.
Return type:: Univariate

log_probability_density(X)[source]¶

Compute the log of the probability density for each point in X.

It should be overridden with numerically stable variants whenever possible.

Parameters:: X (numpy.ndarray) – Values for which the log probability density will be computed. It must have shape (n, 1).
Returns:: Log probability density values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

pdf(X)[source]¶

Compute the probability density for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the probability density will be computed. It must have shape (n, 1).
Returns:: Probability density values for points in X.
Return type:: numpy.ndarray

percent_point(U)[source]¶

Compute the inverse cumulative distribution value for each point in U.

Parameters:: U (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1) and values must be in [0,1].
Returns:: Inverse cumulative distribution values for points in U.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

ppf(U)[source]¶

Compute the inverse cumulative distribution value for each point in U.

Parameters:: U (numpy.ndarray) – Values for which the cumulative distribution will be computed. It must have shape (n, 1) and values must be in [0,1].
Returns:: Inverse cumulative distribution values for points in U.
Return type:: numpy.ndarray

probability_density(X)[source]¶

Compute the probability density for each point in X.

Parameters:: X (numpy.ndarray) – Values for which the probability density will be computed. It must have shape (n, 1).
Returns:: Probability density values for points in X.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

sample(n_samples=1)[source]¶

Sample values from this model.

Argument:

n_samples (int):: Number of values to sample

Returns:: Array of shape (n_samples, 1) with values randomly sampled from this model distribution.
Return type:: numpy.ndarray
Raises:: NotFittedError – if the model is not fitted.

save(path)[source]¶

Serialize this univariate instance using pickle.

Parameters:: path (str) – Path to where this distribution will be serialized.

to_dict()[source]¶

Return the parameters of this model in a dict.

Returns:: Dictionary containing the distribution type and all the parameters that define the distribution.
Return type:: dict
Raises:: NotFittedError – if the model is not fitted.