Features

Created: 2023-02-13 11:47
#note

Feature validation pipeline:

  • basic feature validation (missing or erroneous data, data formats);
  • data distribution validation (mean, quartiles etc);
  • out of distribution validation (values beyond quartiles and new class values);
  • correlation validation (feature vs target)

Managed feature stored:

  • centralized store for features;
  • preprocessed and ready for ML;
  • shared across multiple teams;
  • regularly updated with new features;
  • registry for features

Best practices:

  • shared ownership with defined responsibilities
  • flexible schema for regular additions;
  • loosely coupled datasets, yet linkable
  • updated registry for available data
  • flexibility for last mile post-processing
  • multiple technologies as needed; low cost as possible

Tags

#mlops #feature #pipeline