Amazon SageMaker Feature Store
A fully managed repository for machine learning features
Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, update, retrieve, and share machine learning (ML) features.
Features are the attributes or properties models use during training and inference to make predictions. For example, in a ML application that recommends a music playlist, features could include song ratings, which songs were listened to previously, and how long songs were listened to. The accuracy of a ML model is based on a precise set and composition of features. Often, these features are used repeatedly by multiple teams training multiple models. And whichever feature set was used to train the model needs to be available to make real-time predictions (inference). Keeping a single source of features that is consistent and up-to-date across these different access patterns is a challenge as most organizations keep two different feature stores, one for training and one for inference.
Amazon SageMaker Feature Store is a purpose-built repository where you can store and access features so it’s much easier to name, organize, and reuse them across teams. SageMaker Feature Store provides a unified store for features during training and real-time inference without the need to write additional code or create manual processes to keep features consistent. SageMaker Feature Store keeps track of the metadata of stored features (e.g. feature name or version number) so that you can query the features for the right attributes in batches or in real time using Amazon Athena, an interactive query service. SageMaker Feature Store also keeps features updated, because as new data is generated during inference, the single repository is updated so new features are always available for models to use during training and inference.
Key Features
Ingest data from many sources
There are many ways to ingest features into Amazon SageMaker Feature Store. You can use streaming data sources like Amazon Kinesis Data Firehose. You can also create features in data preparation tools such as Amazon SageMaker Data Wrangler, and store them directly into SageMaker Feature Store with just a few clicks.
Search and discovery
Amazon SageMaker Feature Store tags and indexes features so they are easily discoverable through a visual interface in SageMaker Studio. Browsing the feature catalog allows teams to understand features better and determine if a feature is useful for a particular model.
Ensure Feature Consistency
Amazon SageMaker Feature Store helps ensure models make accurate predictions by making the same features available for both training and for inference. Training and inference are very different use cases and the storage requirements are different for each. SageMaker Feature Store addresses both requirements. During training, models use a complete data set which often takes hours, while inference needs to happen in milliseconds and usually requires a subset of the data. For example, in a model that predicts the next best song in a playlist, you train the model on thousands of songs, but during inference, SageMaker Feature Store only accesses the last three songs to predict the next song. SageMaker Feature Store allows models to access the same set of features for training runs (which are usually done offline and in batches), and for real-time inference.
Feature standardization
It’s common to see different definitions for similar features across a business. For example, “temperature” could be defined in Celsius or Fahrenheit or “dates” could be represented at date-month-year or month-date-year. Amazon SageMaker Feature store eliminates confusion across teams by storing features definitions in a single repository so that it’s clear how each feature is defined. Having features clearly defined makes it easier to reuse features for different applications.
Integrate with Amazon SageMaker Pipelines
Amazon SageMaker Feature Store integrates with Amazon SageMaker Pipelines to create, add feature search and discovery to, and reuse automated machine learning workflows. As a result, it’s easy to add feature search, discovery, and reuse to your ML workflow.