Feature Store

Definition

A centralised platform for storing, managing, and serving the engineered features (input variables) used by machine learning models in both training and real-time inference. Feature stores ensure consistency between training and production environments, enable feature reuse across multiple ML models, reduce duplication of feature engineering effort, and provide a governance layer for tracking feature lineage and ownership. They are a key component of mature MLOps infrastructure and represent a significant technology intangible asset.

Complementary Terms

Concepts that frequently appear alongside Feature Store in practice.

Data Lake

A centralised repository that stores large volumes of raw data in its native format — structured, semi-structured, and unstructured — until it is needed for analysis. Unlike data warehouses, which store data in predefined schemas, data lakes use a schema-on-read approach that provides flexibility for diverse analytical workloads including machine learning, real-time analytics, and ad hoc exploration.

Model Drift

The degradation in a machine learning model's predictive accuracy over time as the statistical properties of the input data diverge from the training data distribution. Model drift requires ongoing monitoring and periodic retraining to maintain performance, and is a key operational risk in production AI systems.

MLOps

A set of practices combining machine learning, DevOps, and data engineering to standardise and streamline the end-to-end lifecycle of machine learning models, from development through deployment to monitoring. MLOps encompasses version control for models and data, automated testing, continuous integration and deployment, and model performance monitoring in production.

Data Mesh

A decentralised data architecture paradigm that treats data as a product owned by domain-specific teams rather than centralising all data management in a single platform team. Data mesh is built on four principles: domain ownership, data as a product, self-serve data infrastructure, and federated computational governance.

Data Pipeline

An automated sequence of data processing steps that extracts, transforms, and loads data from source systems into target systems for analysis, reporting, or machine learning model training. Well-architected data pipelines are critical infrastructure assets that enable data-driven decision-making and AI deployment, and their reliability directly impacts downstream business processes.

API Economy

The ecosystem of business models, partnerships, and revenue streams enabled by application programming interfaces that allow software systems to communicate and share data. APIs enable companies to monetise their data and functionality, create platform ecosystems, and embed services into third-party applications.

Prompt Engineering

The practice of designing and optimising input instructions (prompts) to elicit desired outputs from large language models and other generative AI systems. Effective prompt engineering can significantly improve AI output quality and consistency, and documented prompt libraries are emerging as a form of organisational knowledge capital.

Data Lineage

The documented lifecycle of data as it moves through an organisation's systems, showing its origin, transformations, dependencies, and destinations. Data lineage provides visibility into how data is created, processed, and consumed, enabling organisations to ensure data quality, comply with regulatory requirements (particularly GDPR's right to explanation), debug data pipeline issues, and assess the impact of system changes.

Feature Store

Complementary Terms

Put this knowledge to work