climb.tool.impl package¶
Subpackages¶
- climb.tool.impl.data_suite package
- Subpackages
- climb.tool.impl.data_suite.data package
- climb.tool.impl.data_suite.models package
- Submodules
- climb.tool.impl.data_suite.models.base_model module
- climb.tool.impl.data_suite.models.benchmarks module
- climb.tool.impl.data_suite.models.conformal module
- climb.tool.impl.data_suite.models.copula module
- climb.tool.impl.data_suite.models.ensemble module
- climb.tool.impl.data_suite.models.mcd module
- climb.tool.impl.data_suite.models.nn_conformal module
- climb.tool.impl.data_suite.models.representation module
- Module contents
- climb.tool.impl.data_suite.third_party package
- climb.tool.impl.data_suite.utils package
- Submodules
- climb.tool.impl.data_suite.version module
- Module contents
- Subpackages
- climb.tool.impl.smart_testing_helpers namespace
- Submodules
- climb.tool.impl.smart_testing_helpers.SMART module
SMARTSMART.ConfigSMART.calculate_accuracy_difference()SMART.calculate_outcome_difference()SMART.clear_cache()SMART.configSMART.contextSMART.context_targetSMART.extract_hypotheses_and_justifications()SMART.find_subgroup_variables()SMART.fit()SMART.generate_model_report()SMART.get_optimal_queries()SMART.get_optimal_queries_strings()SMART.get_optimal_split_query()SMART.hypothesesSMART.llmSMART.model_configSMART.model_post_init()SMART.optimal_queriesSMART.predict()SMART.revise_fit()SMART.revise_hypotheses()SMART.subgroupsSMART.taskSMART.verbose
clean_query_string()convert_to_string_condition()generate_combinations_for_variable()
- climb.tool.impl.smart_testing_helpers.utils module
bootstrapping_test_for_accuracy()bootstrapping_test_for_accuracy_string()calculate_group_statistics()calculate_group_statistics_string()calculate_lift()calculate_lift_outcome()calculate_odds_ratio()calculate_odds_ratio_acc()calculate_weighted_relative_accuracy()calculate_weighted_relative_outcomes()chi_square_test_for_accuracy()compute_differences_metrics_two_datasets()mcnemars_test()welchs_t_test_for_accuracy()
Submodules¶
climb.tool.impl.sub_agents module¶
climb.tool.impl.tool_autoprognosis module¶
- class climb.tool.impl.tool_autoprognosis.AutoprognosisClassification[source]¶
Bases:
ToolBase- property description_for_user: str¶
A description of what this tool does, for the user. Should make sense in the context: “This tool <description_for_user>.”
- property logs_useful: bool¶
Return True if the logs of this tool are especially useful for the LLM to understand what has been done.
This will be used by the engine to determine whether to shorten the logs if needed for token reasons etc. This is up to the engine’s discretion, this property just provides a hint.
The user will always be able to see the full logs.
- Returns:
True if the logs of this tool are especially useful for the LLM to understand what has been done.
- Return type:
- class climb.tool.impl.tool_autoprognosis.AutoprognosisClassificationTrainTest[source]¶
Bases:
ToolBase- property description_for_user: str¶
A description of what this tool does, for the user. Should make sense in the context: “This tool <description_for_user>.”
- property logs_useful: bool¶
Return True if the logs of this tool are especially useful for the LLM to understand what has been done.
This will be used by the engine to determine whether to shorten the logs if needed for token reasons etc. This is up to the engine’s discretion, this property just provides a hint.
The user will always be able to see the full logs.
- Returns:
True if the logs of this tool are especially useful for the LLM to understand what has been done.
- Return type:
- class climb.tool.impl.tool_autoprognosis.AutoprognosisRegression[source]¶
Bases:
ToolBase- property description_for_user: str¶
A description of what this tool does, for the user. Should make sense in the context: “This tool <description_for_user>.”
- property logs_useful: bool¶
Return True if the logs of this tool are especially useful for the LLM to understand what has been done.
This will be used by the engine to determine whether to shorten the logs if needed for token reasons etc. This is up to the engine’s discretion, this property just provides a hint.
The user will always be able to see the full logs.
- Returns:
True if the logs of this tool are especially useful for the LLM to understand what has been done.
- Return type:
- class climb.tool.impl.tool_autoprognosis.AutoprognosisRegressionTrainTest[source]¶
Bases:
ToolBase- property description_for_user: str¶
A description of what this tool does, for the user. Should make sense in the context: “This tool <description_for_user>.”
- property logs_useful: bool¶
Return True if the logs of this tool are especially useful for the LLM to understand what has been done.
This will be used by the engine to determine whether to shorten the logs if needed for token reasons etc. This is up to the engine’s discretion, this property just provides a hint.
The user will always be able to see the full logs.
- Returns:
True if the logs of this tool are especially useful for the LLM to understand what has been done.
- Return type:
- class climb.tool.impl.tool_autoprognosis.AutoprognosisSubgroupEvaluation[source]¶
Bases:
ToolBase- property description_for_user: str¶
A description of what this tool does, for the user. Should make sense in the context: “This tool <description_for_user>.”
- property logs_useful: bool¶
Return True if the logs of this tool are especially useful for the LLM to understand what has been done.
This will be used by the engine to determine whether to shorten the logs if needed for token reasons etc. This is up to the engine’s discretion, this property just provides a hint.
The user will always be able to see the full logs.
- Returns:
True if the logs of this tool are especially useful for the LLM to understand what has been done.
- Return type:
- class climb.tool.impl.tool_autoprognosis.AutoprognosisSurvival[source]¶
Bases:
ToolBase- property description_for_user: str¶
A description of what this tool does, for the user. Should make sense in the context: “This tool <description_for_user>.”
- property logs_useful: bool¶
Return True if the logs of this tool are especially useful for the LLM to understand what has been done.
This will be used by the engine to determine whether to shorten the logs if needed for token reasons etc. This is up to the engine’s discretion, this property just provides a hint.
The user will always be able to see the full logs.
- Returns:
True if the logs of this tool are especially useful for the LLM to understand what has been done.
- Return type:
- class climb.tool.impl.tool_autoprognosis.AutoprognosisSurvivalTrainTest[source]¶
Bases:
ToolBase- property description_for_user: str¶
A description of what this tool does, for the user. Should make sense in the context: “This tool <description_for_user>.”
- property logs_useful: bool¶
Return True if the logs of this tool are especially useful for the LLM to understand what has been done.
This will be used by the engine to determine whether to shorten the logs if needed for token reasons etc. This is up to the engine’s discretion, this property just provides a hint.
The user will always be able to see the full logs.
- Returns:
True if the logs of this tool are especially useful for the LLM to understand what has been done.
- Return type:
- class climb.tool.impl.tool_autoprognosis.BasicProgressReport(wd: str, task: Literal['classification', 'regression', 'survival'])[source]¶
Bases:
Hooks
- climb.tool.impl.tool_autoprognosis.autoprognosis_classification(tc: ToolCommunicator, data_file_path: str, target_variable: str, mode: Literal['linear', 'all'], workspace: str) None[source]¶
- climb.tool.impl.tool_autoprognosis.autoprognosis_classification_train_test(tc: ToolCommunicator, training_data_path: str, target_variable: str, test_data_path: str | None, mode: Literal['linear', 'all'], workspace: str) None[source]¶
- climb.tool.impl.tool_autoprognosis.autoprognosis_regression(tc: ToolCommunicator, data_file_path: str, target_variable: str, mode: Literal['linear', 'all'], workspace: str) None[source]¶
- climb.tool.impl.tool_autoprognosis.autoprognosis_regression_train_test(tc: ToolCommunicator, training_data_path: str, target_variable: str, test_data_path: str | None, mode: Literal['linear', 'all'], workspace: str) None[source]¶
- climb.tool.impl.tool_autoprognosis.autoprognosis_subgroup_evaluation(tc: ToolCommunicator, task: Literal['classification', 'regression', 'survival'], data_file_paths: List[str], target_variable: str, model_path: str, workspace: str, time_variable: str | None = None) None[source]¶
climb.tool.impl.tool_autoprognosis_explainers module¶
- class climb.tool.impl.tool_autoprognosis_explainers.AutoprognosisExplainerInvase[source]¶
Bases:
ToolBase
- class climb.tool.impl.tool_autoprognosis_explainers.AutoprognosisExplainerSymbolicPursuit[source]¶
Bases:
ToolBase
- climb.tool.impl.tool_autoprognosis_explainers.autoprognosis_explainer_invase(tc: ToolCommunicator, model_file_path: str, data_file_path: str, target_variable: str, workspace: str, feature_names: List[str] | None = None, n_epoch: int = 200, n_folds: int = 1, task_type: str = 'classification', time_variable: str | None = None) None[source]¶
- climb.tool.impl.tool_autoprognosis_explainers.autoprognosis_explainer_symbolic_pursuit(tc: ToolCommunicator, model_file_path: str, data_file_path: str, target_variable: str, workspace: str, feature_names: List[str] | None = None, n_epoch: int = 10000, subsample: int = 10, task_type: str = 'classification', prefit: bool = False, time_variable: str | None = None) None[source]¶
climb.tool.impl.tool_balance_data module¶
- class climb.tool.impl.tool_balance_data.BalanceData[source]¶
Bases:
ToolBase
- climb.tool.impl.tool_balance_data.balance_data(tc: ToolCommunicator, data_file_path: str, balanced_data_file_path: str, target_column: str, method: str, sampling_strategy: str | float | Dict | None, desired_ratio: float, workspace: str) None[source]¶
- Parameters:
tc (ToolCommunicator) – The tool communicator object.
data_file_path (str) – The path to the input CSV file.
balanced_data_file_path (str) – The path to the output CSV file with balanced data.
method (str) – The balancing method to use. Options are ‘over’ for oversampling, ‘under’ for undersampling, ‘smote’ for SMOTE, and ‘combine’ for combining under-sampling and SMOTE.
sampling_strategy (str) – The sampling strategy to use. Options are: - ‘minority’ to balance the minority class, - ‘not minority’ to balance all classes except the minority class, - ‘not majority’ to balance all classes except the majority class, - ‘all’ to balance all classes, - a float to specify the desired ratio of minority to majority samples. - a dict where the keys correspond to the targeted classes and the values correspond to the desired number of samples for each targeted class.
workspace (str) – The workspace directory path.
- climb.tool.impl.tool_balance_data.clean_dataframe(df: DataFrame, unique_threshold: int = 15)[source]¶
Cleans the dataframe by encoding categorical variables, handling missing values, and converting data types.
Parameters: - df (pd.DataFrame): The input dataframe to clean. - unique_threshold (int): Threshold to decide if a numerical column should be treated as categorical.
Returns: - df_cleaned (pd.DataFrame): The cleaned dataframe. - encoders (dict): Dictionary of LabelEncoders for categorical columns.
climb.tool.impl.tool_conformal_prediction module¶
- class climb.tool.impl.tool_conformal_prediction.ConformalPrediction[source]¶
Bases:
ToolBase
- class climb.tool.impl.tool_conformal_prediction.ModelWrapper(model, fitted: bool = True, classes_: ndarray | None = None)[source]¶
Bases:
BaseEstimatorA lightweight wrapper that delegates fit, predict, and predict_proba to the underlying model. It stores the classifier’s classes_ attribute explicitly, so that cloning in prefit mode preserves it.
For binary classification, if the underlying model returns a one-column probability array, it converts it into a two-column array.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- class climb.tool.impl.tool_conformal_prediction.SurvivalToClassificationWrapper(survival_model, T0, fitted: bool = True)[source]¶
Bases:
ModelWrapperA wrapper that converts a survival model (e.g. a RiskEnsemble) into a binary classifier at a chosen time horizon T0. It inherits from ModelWrapper so that prefit functionality and parameter handling are preserved.
- It implements:
predict_survival_function: If the underlying model has that method, it uses it. Otherwise, it calls the risk ensemble’s predict with eval_time_horizons=[T0] to obtain a risk score, then approximates the survival probability as S(T0) = exp(-risk_score).
- predict_proba: Evaluates survival probability at T0 and returns a two-column array:
Column 0: S(T0) (i.e. probability that the event has NOT occurred by T0) Column 1: 1 - S(T0) (i.e. probability that the event has occurred by T0)
predict: Returns argmax of predict_proba.
classes_: Always returns np.array([0, 1]).
- climb.tool.impl.tool_conformal_prediction.conformal_prediction_function(tc: ToolCommunicator, model_file_path: str, train_data_file_path: str, test_data_file_path: str | None, task_type: str, target_column: str, workspace: str, alpha: float = 0.1, time_to_event_column: str | None = None) None[source]¶
Apply conformal prediction on a pre-trained model.
- Parameters:
model_file_path (str) – Path to the pre-trained model file.
train_data_file_path (str) – Path to the CSV file with training data.
test_data_file_path (str, optional) – Path to the CSV file with test data. If not provided, the function creates a test split from the training data.
task_type (str) – One of ‘classification’, ‘regression’, or ‘survival’.
target_column (str) – Name of the target variable column.
workspace (str) – Path to the workspace directory.
alpha (float) – Miscoverage level (e.g., 0.1 for 90% coverage).
time_to_event_column (str, optional) – Name of the time-to-event column (required for survival).
- Returns:
DataFrame with a “Predictions_in_conf_interval” column listing the classes included. For regression: DataFrame with “lower_bound” and “upper_bound” columns.
- Return type:
For classification and survival
- Raises:
ValueError – if the train and test data do not share the same feature columns.
climb.tool.impl.tool_data_centric module¶
climb.tool.impl.tool_data_suite module¶
- class climb.tool.impl.tool_data_suite.DataSuiteInsights[source]¶
Bases:
ToolBase
- climb.tool.impl.tool_data_suite.data_suite_insights(tc: ToolCommunicator, data_file_path: str, target_column: str, workspace: str) None[source]¶
data_suite
- Parameters:
tc (ToolCommunicator) – The tool communicator object.
data_file_path (str) – The path to the input CSV file.
workspace (str) – The workspace directory path.
climb.tool.impl.tool_descriptive_stats module¶
- class climb.tool.impl.tool_descriptive_stats.DescriptiveStatistics[source]¶
Bases:
ToolBase
- climb.tool.impl.tool_descriptive_stats.check_normal_distribution(df: DataFrame, max_rows: int = 5000, p_value_thresh: float = 1e-05, random_state: int = 0, subset_cols: List[str] | None = None) Tuple[List[str], List[str]][source]¶
- climb.tool.impl.tool_descriptive_stats.create_descriptive_statistics_table(tc: ToolCommunicator, data_file_path: str, workspace: str) None[source]¶
Create a medical paper style descriptive statistics table for a dataset.
Details: - Categorical variables are summarized by listing unique values and showing: count / total (percentage). - Numerical variables are summarized by showing: mean ± std (if normally distributed) or median (Q1 - Q3) (if not). - The user will also be shown plots of the data:
bar plots for categorical variables,
and histograms and box plots for numerical variables.
- Parameters:
tc (ToolCommunicator) – tool communicator object.
data_file_path (str) – path to the data file.
workspace (str) – path to the workspace directory.
- climb.tool.impl.tool_descriptive_stats.format_descriptive_statistics_table_for_print(df: DataFrame) str[source]¶
- climb.tool.impl.tool_descriptive_stats.plot_and_save_columns(dataframe: DataFrame, categorical_columns: List[str], numeric_columns: List[str], workspace: str) Tuple[Dict[str, Any], Dict[str, str]][source]¶
- climb.tool.impl.tool_descriptive_stats.run_with_time_limit(func, time_limit=10, **kwargs) Any[source]¶
climb.tool.impl.tool_exploratory_data_analysis module¶
- class climb.tool.impl.tool_exploratory_data_analysis.ExploratoryDataAnalysis[source]¶
Bases:
ToolBase
- climb.tool.impl.tool_exploratory_data_analysis.exploratory_data_analysis(tc: ToolCommunicator, data_file_path: str, target: str | None, workspace: str) None[source]¶
Perform exploratory data analysis (EDA) on a CSV file, outputting a detailed textual summary.
Key features: 1. Dataset Overview:
Reports the dataset’s dimensions and column data types.
- Numerical Feature Analysis:
Provides statistics (mean, median…), including skewness and kurtosis, to detail numerical data distribution.
- Categorical Variable Analysis:
Lists unique counts, top and rare categories, aiding in the assessment of categorical data distribution.
- Missing Values Analysis:
Identifies and counts missing values per column, essential for data cleaning.
- Correlation Analysis:
Calculates most (anti-)correlated features, creates a correlogram.
- Outliers Identification:
Detects outliers using IQR, reporting counts and bounds, crucial for data quality assessment.
- Duplicate Records Analysis:
Checks and reports the count of duplicate records, important for ensuring data integrity.
- Parameters:
tc (ToolCommunicator) – tool communicator object.
data_file_path (str) – path to the data file.
target (str) – target feature name.
- Returns:
Detailed EDA report.
- Return type:
climb.tool.impl.tool_feature_extraction_from_text module¶
- class climb.tool.impl.tool_feature_extraction_from_text.FeatureExtractionFromText[source]¶
Bases:
ToolBase
- climb.tool.impl.tool_feature_extraction_from_text.feature_extraction_from_text(tc: ToolCommunicator, data_file_path: str, extracted_data_file_path: str, topics_dict: str, workspace: str) None[source]¶
Extract specified categorical topics from free-text fields in a pandas DataFrame.
Parameters: - data_file_path (str): Path to the input CSV file. - extracted_data_file_path (str): Path to the output CSV file with extracted features. - topics_dict (str): A nested dictionary where keys are free-text column names,
and values are dictionaries mapping topics to their synonyms. e.g. topics_dict = {
- “column1”: {
“topic1”: [“synonym1”, “synonym2”], “topic2”: [“synonym3”, “synonym4”]
}, “column2”: {
“topic1”: [“synonym1”, “synonym2”], “topic3”: [“synonym5”, “synonym6”]
},
}
workspace (str): The path to the workspace directory.
climb.tool.impl.tool_feature_importance module¶
- class climb.tool.impl.tool_feature_importance.PermutationExplainer[source]¶
Bases:
ToolBase
- class climb.tool.impl.tool_feature_importance.ShapCompatibleWrapper(model)[source]¶
Bases:
objectA wrapper class to make a model compatible with SHAP explainer.
The
register_categoricalmethod registers the categorical columns in the data and encodes them as integer values. Thepredictmethod takes a DataFrame as input and maps the integer values back to the original categorical values before making predictions.Usage: ``` # Assuming model is the original fitted model that has a predict method. # Prepare a wrapped model and the data compatible with SHAP explainer as follows: shap_compatible_model = ShapCompatibleWrapper(model) X_for_shap = shap_compatible_model.register_categorical(X)
# Then run SHAP explainer as follows: explainer = shap.Explainer(shap_compatible_model.predict, X_for_shap, …) shap_values = explainer(X_for_shap) ```
- class climb.tool.impl.tool_feature_importance.ShapExplainer[source]¶
Bases:
ToolBase
climb.tool.impl.tool_feature_selection module¶
climb.tool.impl.tool_hardware module¶
- class climb.tool.impl.tool_hardware.HardwareInfo[source]¶
Bases:
ToolBase
- climb.tool.impl.tool_hardware.check_user_hardware(tc: ToolCommunicator) None[source]¶
Gather information about the user’s CPU, RAM, and GPU (if available).
The report will be as follows: ``` CPU Information: - Physical Cores: <value> - Total Cores: <value> - Max Frequency: <value> MHz
RAM Information: - Total Memory: <value> GB - Available Memory: <value> GB
GPU Information: - GPU 1: <model> with <value>MB of memory
PyTorch CUDA Information: - CUDA is available: <True/False> - Number of CUDA devices: <value> ```
- Parameters:
tc (ToolCommunicator) – tool communicator object.
climb.tool.impl.tool_imputation module¶
- class climb.tool.impl.tool_imputation.HyperImputeImputation[source]¶
Bases:
ToolBase
- class climb.tool.impl.tool_imputation.HyperImputeImputationTrainTest[source]¶
Bases:
ToolBase
- climb.tool.impl.tool_imputation.hyperimpute_impute(tc: ToolCommunicator, data_file_path: str, imputed_file_path: str, workspace: str, subset: List[str] | None = None) None[source]¶
climb.tool.impl.tool_paper module¶
- class climb.tool.impl.tool_paper.UploadAndSummarizeExamplePaper[source]¶
Bases:
ToolBase- property description_for_user: str¶
A description of what this tool does, for the user. Should make sense in the context: “This tool <description_for_user>.”
- property user_input_requested: List[UserInputRequest]¶
climb.tool.impl.tool_smart_testing module¶
- class climb.tool.impl.tool_smart_testing.SmartTesting[source]¶
Bases:
ToolBase
- climb.tool.impl.tool_smart_testing.smart_testing(tc: ToolCommunicator, data_path: str, model_path: str, context: str, context_target: str, session: Session, additional_kwargs_required: Dict[str, Any], workspace: str)[source]¶
- Parameters:
tc (ToolCommunicator) – The tool communicator object.
data_path (str) – The path to the input CSV file.
workspace (str) – The workspace directory path.
climb.tool.impl.tool_upload module¶
- class climb.tool.impl.tool_upload.UploadDataFile[source]¶
Bases:
ToolBase- property description_for_user: str¶
A description of what this tool does, for the user. Should make sense in the context: “This tool <description_for_user>.”
- property user_input_requested: List[UserInputRequest]¶