Technologies

Created: 2023-02-13 11:34
#note

Still an evolving field -> lots of tools.

Recommendations:

  • decouple tools as much as possible from the main pipelines;
  • choose tools that suits the use case (classical, NLP, vision);
  • test before you buy;
  • if pipelines are third-party based, understand road map and support aspects;
  • look for extensibility and programmability

Tools:

  • for Data Engineering:
    • Apache Hadoop;
    • Spark;
    • Kafka;
    • RDBMS, NOSQL and distributed databases (MySQL, MongoDB, HDFS);
    • data versioning tools (DVC, lakeFS, Neptune);
  • Experiment management:
  • AutoML:
    • Kubeflow;
    • Databricks;
  • Model registry and life cycle management:
    • MLflow, Neptune, Weights & Biases;
  • Benchmarking:
    • MLPerf, DAWNBench, MLflow

For LLM-specific observability tooling, see LLM Observability, Langfuse, and MLflow.

Tags

#mlops #ml #technologies