humanmodels package

Module contents

class humanmodels.HumanClassifier(logic_expression: Union[str, dict], map_variables_to_features: dict, random_state=None)

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

Human-style classification, using a dictionary of rules to be evaluated as logic expressions (e.g. “x*2 + 4*z > 0”), to then associate samples to a class.

Builder for the class.

Parameters
  • logic_expression (str or dictionary {int: string}) – A string (or dictionary of strings).

  • map_variables_to_features (dict) – Dictionary containing the mapping between variables and features indexes (in datasets).

  • target_class (list of int, optional) – If several logic expressions are specified, a list of target classes must be passed as an argument, in order for HumanClassification to behave as a one-vs-all classifier. The default is None.

Returns

Return type

None.

check_parameters()

Check coherence of the parameters.

Returns

Return type

None.

error_function(parameter_values, parameter_names, X, y)

Error function, to be optimized. The parameters in each expression given for each class are replaced by the candidate parameter values, and the predictions of the model are then compared against class labels, obtaining a cost function value depending on the results.

fit(X, y, optimizer: str = 'cma', optimizer_options: Optional[dict] = None, n_jobs: int = 0, verbose: bool = False)

Fits the internal model to the data, using features in X and known values in y.

Parameters
  • X (array, shape(n_samples, n_features)) – Training data

  • y (array, shape(n_samples, 1)) – Training values for the target feature/variable

  • optimizer (string, default="cma") –

    The optimizer that is going to be used. Acceptable values:
    • ”cma”: Covariance-Matrix-Adaptation Evolution Strategy, derivative-free optimization.

  • optimizer_options (dict, default=None) – Dictionary of options that can be passed to the optimization algorithm. Shape and type depend on the choice made for ‘optimizer’.

  • n_jobs (int, default=0) – Option that can be passed to the optimization algorithm, number of jobs to be executed in parallel. Default is zero, avoids the use of multiprocessing. -1 uses all available CPUs.

  • verbose (bool, default=False) – If True, prints internal output to screen.

Returns

Return type

Self.

parameters_to_string(c)
predict(X)

Predict the class labels for each sample in X

Parameters

X (array, shape(n_samples, n_features)) – Array containing samples, for which class labels are going to be predicted.

Returns

C – Prediction vector, with a class label for each sample.

Return type

array, shape(n_samples)

to_string()
variables_to_string(c)
class humanmodels.HumanRegressor(equation_string: str, map_variables_to_features: Optional[dict] = None, target_variable_string: Optional[str] = None, random_state=None)

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Human-designed regressor, initialized with a sympy-compatible text string describing an equation. Also needs a dictionary mapping the correspondance between the variables named in the equation and the features in X.

Builder for the class.

Parameters
  • equation_string (str) –

    String containing the equation of the model. Examples:
    1. ”y = 2*x + 4”

    2. ”4*x_0 + 5*x_1 + 6*x_2”

    If a left-hand side variable is NOT provided (as in example #2), the optional target_variable parameter must be specified.

  • map_features_to_variables (dict) – Maps the names (or integer indexes) of the features to the variables in the internal symbolic expression representing the model.

  • target_variable_string (str, optional) – String containing the name of the target variable. It’s not necessary to specify target_variable if the left-hand part of the equation has been provided in equation_string. The default is None.

Returns

Return type

None.

check_parameters()

Checks internal parameters for consistency, before starting data fitting.

Returns

Return type

None.

error_function(parameter_values, X, y, verbose=False)

Error function to be optimized. Inside the error function, the local sympy expression will have its parameters replaced with candidate values.

Returns

mse – Mean squared error of the model’s prediction with the candidate parameters, agains true values in ‘y’.

Return type

float

fit(X, y, map_variables_to_features: Optional[dict] = None, optimizer_options: Optional[dict] = None, optimizer: str = 'bfgs', n_jobs: int = 0, verbose: bool = False)

Fits the internal model to the data, using features in X and known values in y.

Parameters
  • X (array, shape(n_samples, n_features)) – Training data

  • y (array, shape(n_samples, 1)) – Training values for the target feature/variable

  • map_features_to_variables (dict, default=None) – Dictionary describing the mapping between features (in X) and variables (in the model); it’s optional because normally it has already been provided when the class has been instantiated.

  • optimizer (string, default="bfgs") –

    The optimizer that is going to be used. Acceptable values:
    • ”bfgs”, default: It’s the Broyden-Fletcher-Goldfarb-Shanno algorithm, suitable for function with whose derivative can be computed. Generally faster, but might not always work.

    • ”cma”: Covariance-Matrix-Adaptation Evolution Strategy, derivative-free optimization. Much slower, but generally more effective than “bfgs”.

  • optimizer_options (string, default=None) – Options that can be passed to the optimization algorithm. Shape and type depend on the choice made for ‘optimizer’.

  • n_jobs (int, default=0) – Option that can be passed to the optimization algorithm, number of jobs to be executed in parallel. Default is zero, avoids the use of multiprocessing. -1 uses all available CPUs.

  • verbose (bool, default=False) – If True, prints internal output to screen.

Returns

Return type

None.

predict(X, map_variables_to_features=None, verbose=False)

Once the model is trained, this function can be used to predict the value of the target feature for samples in X. It will fail if the model has not been trained (TODO is this the default scikit-learn behavior?)

Parameters
  • X (array-like or sparse matrix, shape(n_samples, n_features)) – Samples.

  • map_features_to_variables (dict, optional) – A mapping between variables and features can be specified, if for some reason a different mapping than the one provided during instantiation is needed for this new array. The default is None, and in that case the model will use the previously provided mapping.

Returns

C – Returns predicted values.

Return type

array, shape(n_samples)

to_string()
Returns

model_description – A human-readable string describing the model.

Return type

str