Skip to content

Initial Public Release — AWS IDP Pipeline (v0.1.8)

Latest

Choose a tag to compare

@yunwoong7 yunwoong7 released this 23 Nov 13:49
· 1 commit to main since this release

AWS IDP Pipeline — Initial Public Release (v0.1.8)

This is the first public release of the AWS IDP Pipeline, an end-to-end reference implementation for multimodal Intelligent Document Processing (IDP).
It enables analysis of unstructured data including documents, images, videos, and audio using Amazon Bedrock, Bedrock Data Automation (BDA), OpenSearch, Lambda, Step Functions, and LangGraph Agents.

This release contains all foundational features, including identity management, hybrid search, conversational AI, and full serverless infrastructure.


Core Features

1. Document Processing

  • OCR and text extraction via BDA
  • Key information summarization
  • Metadata extraction
  • Document structure and layout analysis
  • Support for large and complex PDFs

2. Video Analysis

  • Scene detection and segmentation
  • Chapter generation
  • Transcript extraction
  • Keyframe-level understanding

3. Image Understanding

  • Object detection
  • Scene classification
  • Text recognition
  • Embedding extraction for vector search

AI-Powered Automation

Bedrock Data Automation (BDA)

  • High-accuracy OCR
  • Efficient document segmentation
  • Optimized processing for large enterprise documents

ReAct + LangGraph Agent Workflow

  • Dynamic tool orchestration based on file type
  • Multi-step iterative reasoning
  • Error correction loops for improved consistency

Iterative Reasoning Engine

  • Verification-aware refinement
  • Multi-step decomposition for complex documents

Hybrid Search System

  • Vector search using OpenSearch
  • Keyword search for precision
  • Hybrid semantic + keyword retrieval
  • Re-ranking for optimal relevance
  • Dedicated index management stacks included

Conversational AI Interface

  • MCP server-based chatbot
  • Natural language Q&A over all uploaded content
  • Multi-turn conversation memory
  • Retrieval-augmented responses
  • File-aware context routing

Identity, Authentication, and Access Control

Cognito Authentication

  • Login and logout capabilities
  • Integrated with Next.js frontend

Managed Cognito Login UI

  • Custom Cognito domain
  • AWS-managed branded login page

Role-Based Authorization

  • DynamoDB-backed role storage
  • Automatic assignment of “Admin” role to the initial user at first deployment
  • Role management for additional users

Deployment & Infrastructure

Custom Domain Support

  • ACM certificate stack
  • Route53 hosted zone integration
  • API Gateway domain mapping

Full CDK Infrastructure

Includes stacks for:

  • VPC
  • OpenSearch
  • DynamoDB
  • S3
  • Lambda functions
  • Lambda Layers
  • Step Functions
  • WebSocket API
  • ECR repository
  • ECS backend for inference
  • Cognito User Pool
  • Document Management APIs
  • Next.js Web UI deployment

Development Tooling and Improvements

CDK Nag Integration

  • Enforces AWS security and compliance best practices
  • Updated due to Python runtime changes

Python Runtime Notes

  • Originally built on Python 3.13
  • AWS Lambda default moved to Python 3.14
  • CDK and CDK Nag updated accordingly

Testing and Validation

  • Passed RepoLinter rules
  • Passed git-secrets
  • Passed Code Defender checks
  • Verified removal of AWS account numbers and sensitive configuration files
  • No embedded secrets or credentials

Documentation

The repository includes:

  • Architecture overview
  • Deployment steps
  • Local development instructions
  • Agent flow diagrams
  • API structure and usage
  • Search pipeline and vector index documentation

Summary

Version v0.1.8 is the first complete release of the AWS IDP Pipeline. It provides a production-grade, extensible foundation for multimodal unstructured data processing workflows using AWS AI and serverless technologies.
This release is intended as a reference implementation for builders, SAs, and enterprise teams looking to create scalable, AI-powered document analysis systems.