In-scope AWS services and features
The following list contains AWS services and features that are in scope for the exam. This list is non-exhaustive and is subject to change. AWS offerings appear in categories that align with the offerings’ primary functions:
Machine Learning
Amazon Textract (OCR)
What it is:
Amazon Textract extracts printed and handwritten text, tables, and forms from scanned documents using OCR (Optical Character Recognition).
Typical Use Cases:
- Automating form processing (e.g., tax, insurance)
- Digitizing PDFs and scanned documents
- Extracting structured data for analysis
Amazon Comprehend
What it is:
Amazon Comprehend is a Natural Language Processing (NLP) service that uses ML to uncover insights from text — like identifying entities, language, sentiment, and key phrases.
Typical Use Cases:
- Analyzing customer feedback
- Tagging documents automatically
- Detecting personally identifiable information (PII)
Amazon Transcribe (STT)
What it is:
Amazon Transcribe converts audio into accurate, readable text using ASR (Automatic Speech Recognition). It supports real-time and batch transcription.
Typical Use Cases:
- Meeting transcriptions
- Voice command logging
- Subtitles for audio/video content
Amazon Polly (TTS)
What it is:
Amazon Polly converts text into natural-sounding human speech using advanced deep learning technologies. It supports dozens of languages and voice styles.
Typical Use Cases:
- Reading text aloud for accessibility
- Creating voice responses for chatbots
- Generating audio for training content or news
Amazon Translate
What it is:
Amazon Translate is a neural machine translation service that allows real-time and batch translation between dozens of languages.
Typical Use Cases:
- Multilingual chat applications
- Document localization
- Translating user-generated content
Amazon Lex (ASR & NLU)
What it is:
Amazon Lex is a service for building conversational interfaces using voice and text — similar to how Alexa works. It combines Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU).
Typical Use Cases:
- Customer support chatbots
- Voice-enabled apps and IVRs
- Automated service desks
Amazon Fraud Detector
What it is:
Amazon Fraud Detector helps detect potentially fraudulent activities in real time using pre-built ML models tailored to fraud detection scenarios.
Typical Use Cases:
- Identifying suspicious online account signups
- Flagging fraudulent payment attempts
- Detecting identity theft in transactions
Amazon Personalize (Recommendation)
What it is:
Amazon Personalize is a real-time recommendation engine that creates personalized user experiences using your own data — no ML experience required.
Typical Use Cases:
- Personalized product recommendations
- Video or music streaming suggestions
- Content ranking based on user behavior
Amazon Rekognition (Computer Vision)
What it is:
Amazon Rekognition is a computer vision service that uses deep learning to analyze images and videos. It can detect objects, scenes, faces, text, and inappropriate content, and also supports facial analysis and facial recognition.
- Facial recognition for user verification or security
- Content moderation for images and videos
- Detecting objects and scenes in media assets
- Analyzing sentiment or demographics from facial attributes
Amazon SageMaker
What it is:
Amazon SageMaker is a comprehensive platform to build, train, tune, and deploy custom machine learning models. It supports everything from data prep to production deployment.
Typical Use Cases:
- Training deep learning models (e.g., NLP, vision)
- Hosting and serving models at scale
- Creating MLOps pipelines
- Canvas
- JumpStart
- Ground Truth
- Data Wrangler
- Clarify
- Feature Store
- Model Monitor
- Model Cards
Amazon SageMaker Canvas is a no-code tool that enables users to build accurate ML models without any ML expertise.
Key Features:
- Drag-and-drop interface with no coding required.
- Access ready-to-use foundation models from Amazon Bedrock and SageMaker JumpStart.
- Build custom ML models using AutoML powered by SageMaker AutoPilot.
Typical Use Cases:
- Empower business analysts to create predictive models.
- Rapidly prototype and test ML use cases without engineering help.
- Automate model building and deployment workflows.
Why it matters:
It democratizes ML by making model creation accessible to non-technical users.
Amazon SageMaker JumpStart is a hub for quick access to pretrained models and solutions to accelerate ML adoption.
Key Features:
- Browse, evaluate, and deploy Foundation Models (FMs).
- Customize pretrained models with your own data.
- Perform tasks like text summarization, image generation, and more with minimal setup.
Typical Use Cases:
- Kickstart new ML projects using pretrained models.
- Rapidly test generative AI use cases.
- Deploy production-ready models with minimal effort.
Why it matters:
It accelerates the ML journey by providing reusable assets and templates for fast experimentation and deployment.
Amazon SageMaker Ground Truth is a fully managed data labeling service with human-in-the-loop capabilities.
Key Features:
- Manage data generation, annotation, and model review.
- Use Amazon Augmented AI (A2I) for custom human review workflows.
- Supports both self-service and AWS-managed labeling options.
Typical Use Cases:
- Create high-quality labeled training datasets.
- Improve model accuracy with human feedback.
- Add human validation for sensitive or ambiguous predictions.
Why it matters:
It ensures labeled data is accurate and relevant, improving ML model performance and trustworthiness.
Amazon SageMaker Data Wrangler simplifies data preparation and feature engineering for ML.
Key Features:
- Visual interface for data selection, cleaning, exploration, and processing.
- Reduces weeks of data prep to minutes.
- Scales to handle large tabular and image datasets.
Typical Use Cases:
- Prepare raw data for training.
- Automate feature engineering tasks.
- Quickly visualize and clean datasets.
Why it matters:
It saves time and effort, streamlining the often time-consuming data preparation stage in the ML workflow.
Amazon SageMaker Clarify detects bias, explains model predictions, and provides fairness and explainability tools for your ML models.
Key Features:
- Analyze data for bias during preparation.
- Evaluate foundation models for accuracy, robustness, and toxicity.
- Explain input feature importance during development and inference.
- Integrates with SageMaker Experiments to show feature importance graphs.
Typical Use Cases:
- Detect unintended bias in datasets and models.
- Understand why a model made a certain prediction.
- Support responsible AI by ensuring transparency in ML workflows.
Why it matters:
It strengthens fairness, accountability, and trust in AI systems by making models more explainable and bias-aware.
Amazon SageMaker Feature Store is a fully managed repository for storing, sharing, and managing ML features.
Key Features:
- Centralized store for features and metadata.
- Supports point-in-time queries to retrieve feature values historically.
- Tracks feature lineage and processing workflows.
Typical Use Cases:
- Reuse features across multiple ML projects.
- Reduce repetitive data processing.
- Ensure consistency between training and inference data.
Why it matters:
It improves efficiency and consistency in feature engineering, boosting model performance and reproducibility.
Amazon SageMaker Model Monitor tracks the quality and performance of deployed ML models in production.
Key Features:
- Continuous monitoring with real-time endpoints.
- Schedule monitoring for batch transform jobs.
- Detects issues in data quality, model quality, bias drift, and feature attribution drift.
Typical Use Cases:
- Ensure that deployed models maintain expected performance.
- Detect data drift and model degradation.
- Automate alerts for compliance and model retraining.
Why it matters:
It provides continuous assurance that models perform accurately and fairly after deployment.
Amazon SageMaker Model Cards provide a single place to document, catalog, and share critical details about your machine learning models for governance and reporting.
Key Features:
- Capture intended use, risk rating, training details, metrics, evaluation results, and observations.
- Include considerations, recommendations, and custom information.
- Create an immutable record for responsible model deployment and compliance.
- Export model cards to PDF for sharing with stakeholders.
Typical Use Cases:
- Document model purpose, assumptions, and usage constraints.
- Facilitate model review and approval workflows.
- Support governance and audit requirements.
Why it matters:
It helps standardize model documentation, ensuring models are used responsibly and meet compliance and governance standards.
Amazon Q
What it is:
Amazon Q is a generative AI assistant embedded within the AWS ecosystem. It helps developers and IT teams understand AWS services, generate code, and troubleshoot infrastructure using natural language.
Typical Use Cases:
- Explaining AWS concepts and CLI commands
- Generating infrastructure-as-code (e.g., CloudFormation)
- Helping users navigate AWS Console faster
- AWS Q Business
- AWS Q Developer
Amazon Q Business allows employees to ask natural language questions and receive accurate answers based on internal company data.
Key Features:
- Connects to data sources such as SharePoint, Confluence, Salesforce, Slack, S3, and more.
- Uses Retrieval-Augmented Generation (RAG) to ground answers in your organization’s documents.
- Maintains enterprise security by respecting identity and access permissions.
Typical Use Cases:
- Ask: “What is our company’s refund policy?” and get a direct answer from internal PDFs or wikis.
- Help HR, finance, and operations teams self-serve without IT intervention.
- Analyze and summarize knowledge spread across internal systems.
Why it matters:
It enables secure, company-specific knowledge access for non-technical employees without needing custom AI development.
Amazon Q Developer is optimized for technical users like developers, DevOps engineers, and data scientists. It enables natural language interaction with AWS services.
Key Features:
- Embedded in the AWS Console, IDE (via Amazon CodeWhisperer), and CLI.
- Generates and explains Infrastructure as Code (CloudFormation, Terraform, CDK).
- Understands your AWS environment and offers context-aware suggestions.
Typical Use Cases:
- Ask: “How do I create an S3 bucket with versioning using CloudFormation?”
- Troubleshoot: “Why is my Lambda function failing with a 502 error?”
- Generate code snippets for APIs, AWS SDK calls, SageMaker notebooks, etc.
Why it matters:
It significantly accelerates development and operations workflows, helping teams build and manage AWS infrastructure more efficiently.
Amazon Kendra (Intelligent Search Engine)
- What it is: An intelligent enterprise search engine with natural language support.
- Use Case: Enterprise document search, FAQ chatbots.
Amazon A2I (Augmented AI)
What it is:
Amazon A2I (Augmented AI) helps you build workflows that include human review of ML predictions. It’s especially useful when ML confidence is low or when regulatory compliance requires human checks.
Typical Use Cases:
- Reviewing document processing results (e.g., from Textract)
- Moderating sensitive content flagged by Rekognition
- Validating NLP classification outputs
Amazon Bedrock
What it is:
Amazon Bedrock is a serverless platform that allows you to build and scale generative AI applications using foundation models (FMs) from leading providers (Anthropic, Meta, Cohere, etc.) — all without managing infrastructure.
Typical Use Cases:
- Building chatbots, text summarizers, or content generators
- Retrieval-Augmented Generation (RAG) via Knowledge Bases
- Language translation, classification, and embedding generation
Database
Amazon DocumentDB (NoSQL document-oriented database)
Type: NoSQL (document-oriented), MongoDB-compatible.
Use Cases: Content management, catalogs, user profiles, flexible JSON data storage.
AI Context: Useful for storing semi-structured data (JSON) used by AI applications.
Vector DB Suitability: ❌ Not ideal
Amazon DynamoDB (NoSQL key-value store)
Type: Serverless NoSQL (key-value store), providing high availability and low latency.
Use Cases: High-traffic apps, session management, metadata storage.
AI Context: Suitable for real-time AI workloads like metadata management or recommendation engines.
Vector DB Suitability: ❌ Not ideal
Amazon ElastiCache (in-memory data store)
Type: Managed in-memory data store (Redis/Memcached).
Use Cases: Real-time caching, session storage, leaderboards, real-time analytics.
AI Context: Ideal for caching model inference outputs and providing rapid data retrieval.
Vector DB Suitability: ❌ Not ideal
Amazon MemoryDB (in-memory Redis-compatible database)
Type: Durable, in-memory Redis-compatible database.
Use Cases: Real-time transactional workloads requiring high performance and durability.
AI Context: Excellent for real-time AI applications with stringent speed and durability requirements.
Vector DB Suitability: ❌ Not ideal
Amazon Neptune (Managed graph database) ✅ Support VectorDB
Type: Managed graph database (supports Gremlin, SPARQL).
Use Cases: Fraud detection, recommendation systems, knowledge graphs, relationship analysis.
AI Context: Powerful for AI scenarios involving connected data, semantic searches, and graph analytics.
Vector DB Suitability: ✅ Possible (Graph-based vector search).
Amazon RDS (Managed relational databases) ✅ Support VectorDB
Type: Managed relational databases (MySQL, PostgreSQL, Oracle, SQL Server).
Use Cases: Traditional applications, structured transactional systems (ERP, CRM), structured data analytics.
AI Context: Ideal for structured, relational data usage in AI contexts.
Vector DB Suitability: ✅ Possible (via PostgreSQL with pgvector
extension, moderate-scale recommended).
Amazon Aurora (Managed relational databases) ✅ Support VectorDB
Type: High-performance managed relational database compatible with MySQL and PostgreSQL.
Use Cases: Enterprise-grade applications, highly scalable transactional workloads, analytics.
AI Context: Good choice for structured relational data requiring high throughput, performance, and reliability in AI workloads.
Vector DB Suitability: ✅ Possible (using PostgreSQL-compatible Aurora with the pgvector
extension), suitable for moderate vector-search scenarios but not optimized for extensive vector workloads.
Storage
Amazon S3
What it is:
Amazon S3 (Simple Storage Service) is AWS’s object storage service designed to store and retrieve any amount of data from anywhere on the web.
Why it matters:
- It's scalable, durable (99.999999999%), and cost-effective
- Frequently used to store training datasets, model outputs, logs, and documents
- Integrates seamlessly with services like SageMaker, Bedrock, and Lambda
Typical Use Cases:
- Storing datasets for AI/ML model training
- Hosting website files or media assets
- Saving logs and predictions from AI pipelines
- Backup and recovery of application data
Amazon S3 Glacier
What it is:
Amazon S3 Glacier is a low-cost storage service for data archiving and long-term backup. It is designed for data that is infrequently accessed but must be retained securely for years.
Why it matters:
- Ideal for archiving training datasets or compliance logs
- Offers different retrieval speeds (minutes to hours)
- Cost-effective for storing AI/ML data not actively used
Typical Use Cases:
- Archiving large ML datasets not currently in use
- Storing compliance and audit data for AI projects
- Backing up AI-generated reports, logs, and checkpoints
Management and Governance
AWS CloudTrail
What it is:
CloudTrail is a service that records all API calls and actions made in your AWS account, including who made the call, what services were affected, and when.
Why it matters:
- Provides an audit trail for all changes and activities
- Helps you detect suspicious behavior or unauthorized access
- Useful for compliance reporting and forensic analysis
Typical Use Cases:
- Investigating security incidents (e.g., who deleted a resource?)
- Monitoring access to sensitive services (e.g., S3, IAM, SageMaker)
- Setting up alarms on critical changes
AWS CloudWatch
What it is:
CloudWatch is AWS’s central monitoring service for metrics, logs, and alarms. It collects and tracks data from AWS services and custom sources.
Why it matters:
- Helps you visualize performance (CPU, memory, latency, etc.)
- Allows you to set alarms and get notified when something goes wrong
- Enables automated actions (e.g., restarting instances)
Typical Use Cases:
- Monitoring model performance or resource usage in SageMaker
- Setting alerts on Lambda failures or high error rates
- Creating dashboards for your application’s health
AWS Config
What it is:
AWS Config is a resource compliance and configuration tracking service. It monitors changes to AWS resources and evaluates them against predefined rules.
Why it matters:
- Provides a timeline of resource changes
- Ensures your environment adheres to security and compliance policies
- Supports automatic remediation of non-compliant resources
Typical Use Cases:
- Checking if S3 buckets are publicly accessible
- Tracking IAM policy changes
- Auditing the history of ML model versions or endpoints
AWS Trusted Advisor
What it is:
Trusted Advisor is a service that scans your AWS environment and gives recommendations to help improve performance, security, fault tolerance, and cost optimization.
Why it matters:
- Highlights security vulnerabilities (e.g., open ports, weak IAM policies)
- Identifies unused resources to reduce cost
- Suggests best-practice improvements
Typical Use Cases:
- Checking for over-provisioned EC2/SageMaker instances
- Ensuring MFA is enabled for root accounts
- Finding unused EBS volumes or idle load balancers
AWS Well-Architected Tool
What it is:
This is a self-assessment tool that helps you review and improve your architecture based on the AWS Well-Architected Framework, which includes 6 pillars (Operational Excellence, Security,Reliability, Cost Optimization, Performance Efficiency, Sustainability).
Why it matters:
- Provides a structured review of your architecture
- Helps you identify risks and improvement areas
- Guides you in building resilient and efficient applications
Typical Use Cases:
- Assessing your ML/AI solution before production
- Aligning your architecture with AWS best practices
- Comparing designs across multiple workloads or teams
Security, Identity, and Compliance
AWS Artifact
What it is:
AWS Artifact is your central hub for AWS compliance reports and certifications, such as SOC, ISO, and PCI. It provides downloadable documents to help with audits and legal assessments.
Why it matters:
- Gives easy access to AWS compliance documentation
- Helps meet regulatory and customer requirements
- Supports internal and external audit processes
Typical Use Cases:
- Sharing SOC 2 reports with auditors
- Collecting evidence for compliance assessments
- Validating AWS compliance for your organization
AWS Audit Manager
What it is:
AWS Audit Manager helps automate the collection of audit evidence by mapping AWS usage data to compliance frameworks such as GDPR, HIPAA, and ISO.
Why it matters:
- Reduces the manual effort in audit preparation
- Continuously tracks compliance posture
- Helps demonstrate control effectiveness
Typical Use Cases:
- Automating SOC 2 evidence collection
- Mapping AWS usage to GDPR controls
- Monitoring compliance for AI/ML pipelines
AWS IAM (Identity and Access Management)
What it is:
IAM is AWS's core access control service, enabling you to create users, groups, roles, and policies to securely manage access to AWS services and resources.
Why it matters:
- Enforces least privilege across your organization
- Provides fine-grained access controls for AI/ML services
- Supports secure role-based delegation
Typical Use Cases:
- Allowing SageMaker to read data from S3
- Creating service roles for Bedrock or Lambda
- Enforcing MFA and managing user permissions
Amazon Inspector
What it is:
Amazon Inspector is an automated vulnerability scanning tool for EC2, container images, and Lambda functions. It continuously checks for known security issues.
Why it matters:
- Helps protect applications from known vulnerabilities
- Automates security checks in DevSecOps pipelines
- Sends real-time findings to Security Hub or CloudWatch
Typical Use Cases:
- Scanning inference EC2s or Lambda functions
- Securing SageMaker endpoints
- Identifying CVEs in Docker images
AWS KMS (Key Management Service)
What it is:
AWS KMS is a managed service for creating and controlling encryption keys used to secure your data across AWS services.
Why it matters:
- Enables encryption-at-rest and in-transit
- Supports customer-managed key (CMK) creation
- Logs key usage via CloudTrail for auditing
Typical Use Cases:
- Encrypting training datasets in S3
- Managing key rotation for AI/ML environments
- Protecting secrets and database credentials
Amazon Macie
What it is:
Amazon Macie is a data security and privacy service that uses ML to discover, classify, and protect sensitive data such as personally identifiable information (PII) stored in Amazon S3.
Why it matters:
- Identifies sensitive data like names, addresses, and credit card numbers
- Alerts you to publicly accessible or misconfigured S3 buckets
- Helps meet privacy regulations like GDPR and HIPAA
Typical Use Cases:
- Scanning training datasets for sensitive content
- Auditing AI data lakes for PII
- Automatically flagging non-compliant storage configurations
AWS Secrets Manager
What it is:
Secrets Manager helps you store, retrieve, and rotate secrets (e.g., database credentials, API keys, tokens) securely in your applications.
Why it matters:
- Keeps secrets out of code
- Supports automatic rotation of credentials
- Provides fine-grained IAM access to secrets
Typical Use Cases:
- Managing API keys for AI services
- Storing database credentials used in SageMaker pipelines
- Rotating secrets used in Lambda or Bedrock functions
Cloud Financial Management
AWS Budgets
What it is:
AWS Budgets is a cost management tool that allows you to set custom budgets and receive alerts when your usage or spending exceeds thresholds.
Why it matters:
- Helps prevent unexpected cost overruns in AI/ML workloads
- Supports budgeting for specific services, linked accounts, or projects
- Allows email or SNS alerts when nearing or exceeding your budget
Typical Use Cases:
- Monitoring SageMaker or Bedrock cost usage
- Creating budgets per department or project
- Alerting finance or DevOps teams when limits are exceeded
AWS Cost Explorer
What it is:
AWS Cost Explorer is a visual tool for analyzing and tracking your AWS spending over time. It lets you explore usage by service, region, account, or tag.
Why it matters:
- Enables granular analysis of cloud costs
- Identifies trends, anomalies, and opportunities for savings
- Helps teams understand the cost impact of AI/ML workloads
Typical Use Cases:
- Visualizing SageMaker and GPU usage cost trends
- Identifying unused resources contributing to waste
- Tagging AI/ML workloads to understand specific project spend
Compute
Amazon EC2 (Elastic Compute Cloud)
What it is:
Amazon EC2 provides resizable virtual machines (instances) in the cloud. You can choose from a wide range of instance types optimized for compute, memory, storage, or GPU-based workloads.
Why it matters:
- Offers full control over compute resources
- Supports AI/ML training and inference using GPU instances
- Scales from small experiments to high-performance distributed training
Typical Use Cases:
- Running deep learning frameworks like TensorFlow or PyTorch on GPU instances
- Hosting custom-trained ML models for inference
- Performing large-scale simulations or model training
- AWS Trainium Instances (Trn1)
- Accelerated Computing P Type Instances
- Accelerated Computing G Type Instances
- Compute Optimized C Type Instances
AWS Trainium instances use a custom-designed machine learning chip engineered for high performance with low power consumption, reducing the carbon footprint of training large-scale models.
Key Features:
- Up to 25% more energy efficient than comparable accelerated computing EC2 instances.
- Specifically designed for optimal performance per watt for deep learning workloads.
- Lowers environmental impact compared to other instance types.
Typical Use Cases:
- Large-scale deep learning training.
- Organizations prioritizing sustainability and energy efficiency in AI workloads.
Why it matters:
They are the most environmentally friendly choice, helping companies meet sustainability goals while training complex models.
Accelerated Computing P type instances are powered by high-end GPUs like NVIDIA Tesla and are optimized for maximum computational throughput.
Key Features:
- Delivers high GPU performance for ML and HPC tasks.
- Not designed with energy efficiency as the primary goal.
- Consumes significant power.
Typical Use Cases:
- Heavy ML model training and inference.
- High-performance computing (HPC) workloads.
Why it matters:
Best when raw GPU power is needed — less suitable for energy-conscious workloads.
Accelerated Computing G type instances use NVIDIA GPUs for graphics-heavy applications like gaming, rendering, and video processing.
Key Features:
- High computational power for visual workloads.
- Not optimized for ML training or energy efficiency.
Typical Use Cases:
- Real-time rendering, video processing, game streaming.
- Graphics-intensive applications.
Why it matters:
Excellent for graphics tasks but not the best choice for minimizing environmental impact.
Compute Optimized C type instances provide high CPU performance for compute-intensive applications.
Key Features:
- Maximizes raw compute power for CPU-bound workloads.
- Not specifically designed for energy efficiency like Trainium.
- Suitable for high-throughput applications.
Typical Use Cases:
- Web servers, gaming backends, scientific modeling.
- Applications needing maximum CPU power.
Why it matters:
Ideal for compute-heavy tasks but less ideal for organizations focused on lowering carbon footprint.
Networking and Content Delivery
Containers
Amazon ECS (Elastic Container Service)
What it is:
Amazon ECS is a fully managed container orchestration service that allows you to run and scale Docker containers on AWS without needing to manage your own servers.
Why it matters:
- Simplifies containerized application deployment
- Supports Fargate for serverless containers (no instance management)
- Integrates with SageMaker, Lambda, and other AI pipelines
Typical Use Cases:
- Running microservices for AI model inference
- Hosting REST APIs that wrap around ML models
- Scaling backend services that preprocess ML input data
Amazon EKS (Elastic Kubernetes Service)
What it is:
Amazon EKS is a fully managed Kubernetes service that lets you run Kubernetes clusters on AWS without manually configuring the control plane.
Why it matters:
- Provides more flexibility and portability than ECS
- Allows you to run AI/ML workloads using K8s-native tools (e.g., Kubeflow, MLflow)
- Scales AI model serving and training pipelines using Kubernetes best practices
Typical Use Cases:
- Running ML pipelines with Kubeflow on Kubernetes
- Managing multi-step model training and deployment workflows
- Hosting AI microservices using containers
Analytics
AWS Data Exchange
What it is:
AWS Data Exchange makes it easy to find, subscribe to, and use third-party datasets in the cloud, such as demographics, weather, or financial data.
Why it matters:
- Enables external data integration for AI/ML models
- Automates data subscription, delivery, and updates
- Helps enhance model accuracy with premium datasets
Typical Use Cases:
- Enriching ML models with weather or location data
- Using healthcare or financial datasets from third parties
- Automating ingestion of licensed datasets into S3 or Redshift
Amazon EMR
What it is:
Amazon EMR is a managed cluster platform that runs big data frameworks like Apache Spark, Hive, and Hadoop for data processing and transformation at scale.
Why it matters:
- Supports large-scale data preprocessing for ML
- Easily processes petabytes of structured or unstructured data
- Integrates with S3, HDFS, Redshift, and more
Typical Use Cases:
- Preprocessing datasets for ML models
- Running Spark ML jobs at scale
- Performing distributed feature engineering
AWS Glue
What it is:
AWS Glue is a serverless data integration service that discovers, prepares, and combines data for analytics and ML, using ETL pipelines.
Why it matters:
- Automates data cataloging, cleaning, and transformation
- Integrates directly with S3, Redshift, and RDS
- Supports Python- and Spark-based ETL jobs
Typical Use Cases:
- Cleaning and joining ML training data
- Building ETL pipelines for AI dashboards
- Creating feature pipelines for SageMaker models
AWS Glue DataBrew
What it is:
Glue DataBrew is a visual data preparation tool for users who want to clean and normalize data without writing code.
Why it matters:
- Enables non-developers to explore and prepare datasets
- Provides 250+ built-in transformations (e.g., deduplication, joins)
- Accelerates data prep for ML pipelines and dashboards
Typical Use Cases:
- Exploring AI/ML datasets visually
- Removing outliers, fixing nulls before model training
- Generating reusable transformations with no code
AWS Lake Formation
What it is:
Lake Formation helps you build, secure, and manage data lakes on AWS. It simplifies ingesting, cataloging, and securing data from various sources into S3.
Why it matters:
- Makes it easier to create a centralized data lake for AI
- Provides fine-grained data access control
- Integrates with Glue, Athena, Redshift, and SageMaker
Typical Use Cases:
- Creating data lakes for AI training and analysis
- Managing data access permissions for teams
- Curating and tagging ML training datasets
Amazon OpenSearch Service
What it is:
OpenSearch Service is a managed search and analytics engine that supports full-text search, log analytics, and vector search for AI use cases.
Why it matters:
- Supports semantic search and RAG (Retrieval-Augmented Generation)
- Integrates with Bedrock Knowledge Bases
- Includes k-NN vector indexing for similarity search
Typical Use Cases:
- Powering AI chatbots with semantic search
- Storing and retrieving embeddings for vector search
- Building analytics dashboards from log data
Amazon QuickSight
What it is:
QuickSight is AWS’s business intelligence and data visualization tool that helps create dashboards, reports, and charts from various data sources.
Why it matters:
- Allows real-time visualization of AI/ML results
- Supports embedded dashboards in apps
- Uses ML-powered insights (e.g., anomaly detection, forecasting)
Typical Use Cases:
- Visualizing model predictions or performance metrics
- Creating dashboards for business stakeholders
- Monitoring usage and accuracy trends for ML solutions
Amazon Redshift
What it is:
Amazon Redshift is a fully managed cloud data warehouse that lets you analyze structured and semi-structured data at scale using SQL.
Why it matters:
- Integrates with SageMaker for in-database ML
- Supports Redshift ML to run models directly in the warehouse
- Handles petabyte-scale analytics
Typical Use Cases:
- Running AI inference directly in SQL queries
- Building AI-powered dashboards from transactional data
- Training ML models on aggregated data