Posted On: Oct 18, 2021
Amazon SageMaker announces a new set of capabilities that will enable interactive Spark based data processing from SageMaker Studio Notebooks. Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. With a single click, data scientists and developers can quickly spin up Studio Notebooks to interactively explore datasets and build ML models.
Starting today, data scientists and data engineers can visually browse, discover, and connect to Spark data processing environments running on Amazon EMR, right from their Studio notebooks in a few simple clicks. Once connected, they can interactively query, explore and visualize data, and run Spark jobs using the built-in SparkMagic notebook environments for Python and Scala.
Analyzing, transforming and preparing large amounts of data is a foundational step of any data science and ML workflow and businesses are increasingly leveraging Apache Spark for fast data preparation. SageMaker Studio already offers purpose-built tooling such as Experiments, Clarify and Model Monitor for ML. With the newly launched capability, customers can easily access purpose-built Spark environments from Studio Notebooks. SageMaker Studio can therefore now serve as a unified environment for data science and data engineering workflows enabling customers to standardize data workflows onto Studio notebooks.
These new data analytics capabilities in SageMaker Studio are generally available in both Amazon Web Services China (Beijing) Region, operated by Sinnet, and Amazon Web Services China (Ningxia) Region, operated by NWCD. To learn more about SageMaker Studio visit the SageMaker user guide.