Skip to main content

Analytics Services

AWS Data Exchange

What it is:
AWS Data Exchange makes it easy to find, subscribe to, and use third-party datasets in the cloud, such as demographics, weather, or financial data.

Why it matters:

  • Enables external data integration for AI/ML models
  • Automates data subscription, delivery, and updates
  • Helps enhance model accuracy with premium datasets

Typical Use Cases:

  • Enriching ML models with weather or location data
  • Using healthcare or financial datasets from third parties
  • Automating ingestion of licensed datasets into S3 or Redshift

Learn more

Amazon EMR

What it is:
Amazon EMR is a managed cluster platform that runs big data frameworks like Apache Spark, Hive, and Hadoop for data processing and transformation at scale.

Why it matters:

  • Supports large-scale data preprocessing for ML
  • Easily processes petabytes of structured or unstructured data
  • Integrates with S3, HDFS, Redshift, and more

Typical Use Cases:

  • Preprocessing datasets for ML models
  • Running Spark ML jobs at scale
  • Performing distributed feature engineering

Learn more

AWS Glue

What it is:
AWS Glue is a serverless data integration service that discovers, prepares, and combines data for analytics and ML, using ETL pipelines.

Why it matters:

  • Automates data cataloging, cleaning, and transformation
  • Integrates directly with S3, Redshift, and RDS
  • Supports Python- and Spark-based ETL jobs

Typical Use Cases:

  • Cleaning and joining ML training data
  • Building ETL pipelines for AI dashboards
  • Creating feature pipelines for SageMaker models

Learn more

AWS Glue DataBrew

What it is:
Glue DataBrew is a visual data preparation tool for users who want to clean and normalize data without writing code.

Why it matters:

  • Enables non-developers to explore and prepare datasets
  • Provides 250+ built-in transformations (e.g., deduplication, joins)
  • Accelerates data prep for ML pipelines and dashboards

Typical Use Cases:

  • Exploring AI/ML datasets visually
  • Removing outliers, fixing nulls before model training
  • Generating reusable transformations with no code

Learn more

AWS Lake Formation

What it is:
Lake Formation helps you build, secure, and manage data lakes on AWS. It simplifies ingesting, cataloging, and securing data from various sources into S3.

Why it matters:

  • Makes it easier to create a centralized data lake for AI
  • Provides fine-grained data access control
  • Integrates with Glue, Athena, Redshift, and SageMaker

Typical Use Cases:

  • Creating data lakes for AI training and analysis
  • Managing data access permissions for teams
  • Curating and tagging ML training datasets

Learn more

Amazon OpenSearch Service

What it is:
OpenSearch Service is a managed search and analytics engine that supports full-text search, log analytics, and vector search for AI use cases.

Why it matters:

  • Supports semantic search and RAG (Retrieval-Augmented Generation)
  • Integrates with Bedrock Knowledge Bases
  • Includes k-NN vector indexing for similarity search

Typical Use Cases:

  • Powering AI chatbots with semantic search
  • Storing and retrieving embeddings for vector search
  • Building analytics dashboards from log data

Learn more

Amazon QuickSight

What it is:
QuickSight is AWS’s business intelligence and data visualization tool that helps create dashboards, reports, and charts from various data sources.

Why it matters:

  • Allows real-time visualization of AI/ML results
  • Supports embedded dashboards in apps
  • Uses ML-powered insights (e.g., anomaly detection, forecasting)

Typical Use Cases:

  • Visualizing model predictions or performance metrics
  • Creating dashboards for business stakeholders
  • Monitoring usage and accuracy trends for ML solutions

Learn more

Amazon Redshift

What it is:
Amazon Redshift is a fully managed cloud data warehouse that lets you analyze structured and semi-structured data at scale using SQL.

Why it matters:

  • Integrates with SageMaker for in-database ML
  • Supports Redshift ML to run models directly in the warehouse
  • Handles petabyte-scale analytics

Typical Use Cases:

  • Running AI inference directly in SQL queries
  • Building AI-powered dashboards from transactional data
  • Training ML models on aggregated data

Learn more