Scikit-learn For Feature Engineering

Created: 2023-03-02 15:34
#quicknote

It offers a wide range of ML algorithms and data transformations. Most of them follows the same functionality -> it is easy to implement new algorithms.

There are three classes of algorithms:

  • Estimators;
  • Transformers;
  • Pipeline

Estimators

A class with fit() and predict() methods. It fits and predicts. Examples: Lasso, Decision trees, SVMs etc.

class Estimator(object):

	def fit(self, X, y=None):
		"""
		Fits the estimator to data.
		"""
		return self
	
	def predict(self, X):
	"""
	Compute the predictions
	"""
	return predictions

Transformers

A class that has fit() and transform() methods. It transform data. Examples: scalers, feature selectors, transformers etc.

class Transformer(object):

	def fit(self, X, y=None):
		"""
		Learn the parameters to engineer the features.
		"""
	
	def transform(self, X):
	"""
	Transform the input data
	"""
	return X_transformed

It can handle:

  • Missing data imputation;
  • Categorical variable encoding;
  • Scaling;
  • Discretization;
  • Variable Transformtion;
  • Combining feature;
  • Extract features from text.

Pipeline

Class that allows to run transformers and estimators in sequence. Most steps are Transformers and then the last step is an Estimator.

class Pipeline(Transformer):
	
	@property
	def name_steps(self):
		"""
		Sequence of transformers
		"""
		return self.steps
	@property
	def _final_estimator(self):
	"""
	Estimator
	"""
	return self.steps[-1]

Example:
sklearn_pipeline.png

Tags

#mlops #course #featureengineering