Skip to content

AWS Setup Guide: Step-by-Step

This document provides a step-by-step guide for setting up the "barebone" AWS infrastructure required for your aiHub (Fargate/ECS) and apiGatewayAuthorizer (Lambda/API Gateway) services for both Staging and Production environments. The GitHub Actions workflows will then deploy your application code and configurations into this pre-existing infrastructure.

Assumed AWS Region: eu-west-2 (adjust as necessary).

Part 0: IAM User/Role for GitHub Actions

Your GitHub Actions workflows need permissions to interact with AWS services (ECR, ECS, Lambda, etc.). You have two main options:

  • Option A: IAM User with Access Keys (Simpler for initial setup)
  • Go to IAM > Users > Add users.
  • User name: github-actions-deployer (or similar).
  • Select AWS credential type: Check "Access key - Programmatic access".
  • Permissions: Click "Attach existing policies directly".
    • Search for and attach:
      • AmazonEC2ContainerRegistryFullAccess (to push to ECR)
      • AmazonECS_FullAccess (to update ECS services/task definitions - scope this down in a real production scenario if possible)
      • AWSLambda_FullAccess (to update Lambda functions - scope down if possible)
      • IAMPassRole (often needed if your ECS tasks or Lambdas assume roles, though the specific role to pass might need to be specified in a custom policy). A simpler policy for now might be to allow passing roles used by ECS and Lambda.
      • A custom policy to allow ecs:DescribeTaskDefinition (this is used by the workflow to get the current task definition).
        {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Action": "ecs:DescribeTaskDefinition",
              "Resource": "*"
            }
          ]
        }
        
    • Note: For true production, create custom IAM policies with least-privilege access rather than using AWS managed FullAccess policies.
  • Tags: Optional.
  • Review and Create user.
  • IMPORTANT: Download the .csv file or copy the Access key ID and Secret access key. You will only see the secret key once.

    • These will be used for GitHub Secrets named AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. These secrets should be configured within your "Production" and "Staging" GitHub Environments (Repository Settings > Environments).
  • Option B: IAM Role with OpenID Connect (OIDC) (More Secure, Recommended for Production)

  • This method avoids long-lived access keys by allowing GitHub Actions to assume an IAM role directly. Setup is more involved. See AWS and GitHub documentation for "Configuring OpenID Connect in Amazon Web Services." If you use OIDC, you won't need the access key secrets mentioned above.

For this guide, the workflows are currently set up assuming Option A (access keys).


Part A: aiHub Service (AWS Fargate/ECS)

Repeat these steps for Staging and then for Production, adjusting names and configurations accordingly.

A1. IAM Roles for ECS Tasks

  • A1.1. ECS Task Execution Role (Usually exists, verify/create once)
  • Purpose: Allows ECS agent to pull images from ECR and send container logs to CloudWatch.
  • Steps:
    1. IAM > Roles > Create role.
    2. Trusted entity: AWS service > Use case: Elastic Container Service > Use case: Elastic Container Service Task.
    3. Permissions: Attach AmazonECSTaskExecutionRolePolicy.
    4. Role name: ecsTaskExecutionRole (if it doesn't already exist).
  • A1.2. ECS Task Role (Application-specific permissions)
  • Purpose: Grants your aiHub container permissions to interact with other AWS services (e.g., SQS).
  • For Staging:
    1. IAM > Roles > Create role.
    2. Trusted entity: AWS service > Use case: Elastic Container Service > Use case: Elastic Container Service Task.
    3. Permissions: Attach policies needed by aiHub (e.g., AmazonSQSFullAccess - scope down to specific queue ARN in production).
    4. Role name: aiHubStagingTaskRole.
  • For Production:
    1. Repeat, naming it aiHubProductionTaskRole and attaching appropriately scoped policies for production resources.

A2. ECR Repository (Create Once, Shared) 1. ECR > Repositories > Create repository. 2. Visibility settings: Private. 3. Repository name: aihub (as used in workflows). 4. Other settings: defaults are usually fine. Create. * Note the Repository URI (e.g., YOUR_ACCOUNT_ID.dkr.ecr.eu-west-2.amazonaws.com/aihub).

A3. ECS Cluster (Create Once, Shared) 1. ECS > Clusters > Create cluster. 2. Cluster template: "Networking only" (for AWS Fargate). 3. Cluster name: aiHubCluster (as used in workflows). 4. Networking: Select your VPC and subnets. 5. CloudWatch Container Insights: Enable if desired. 6. Create.

A4. SQS Queue (Separate for Staging and Production) _ For Staging: 1. SQS > Queues > Create queue. 2. Type: Standard (or FIFO if needed). Name: aihub-staging-queue. 3. IMPORTANT: Under "Configuration", set Visibility timeout to 6 minutes (360 seconds) (must be longer than Lambda timeout of 5 minutes). 4. Create. 5. Note the Queue URL. This value will be set as a GitHub Variable AIHUB_SQS_QUEUE_URL in your "Staging" GitHub Environment. _ For Production: 1. SQS > Queues > Create queue. 2. Name: aihub-production-queue. 3. IMPORTANT: Under "Configuration", set Visibility timeout to 6 minutes (360 seconds) (must be longer than Lambda timeout of 5 minutes). 4. Create. 5. Note the Queue URL. This value will be set as a GitHub Variable AIHUB_SQS_QUEUE_URL in your "Production" GitHub Environment.

A5. ECS Task Definition (Initial Version - Separate for Staging and Production) _ For Staging: 1. ECS > Task Definitions > Create new task definition. 2. Task definition family: aihub-staging-task (This name itself will be the value for a GitHub Variable AIHUB_ECS_TASK_FAMILY in your "Staging" GitHub Environment). 3. Infrastructure: AWS Fargate, Linux/X86_64. Network mode: awsvpc. 4. Task size: e.g., CPU 0.25 vCPU, Memory 0.5 GB. 5. Task roles: Execution role ecsTaskExecutionRole, Task role aiHubStagingTaskRole. 6. Container - 1: _ Name: aihub-app-staging (This name itself will be the value for a GitHub Variable AIHUB_ECS_CONTAINER_NAME in your "Staging" GitHub Environment). _ Image URI: YOUR_ACCOUNT_ID.dkr.ecr.eu-west-2.amazonaws.com/aihub:initial-staging (placeholder). _ Port mappings: Container port 3001 (or your app's port), TCP. _ Environment variables: Initially, you can set NODE_ENV=staging, PORT=3001 (value for AIHUB_PORT Variable). Others will be injected by the workflow. _ Log collection: Enable CloudWatch Logs (e.g., group /ecs/aihub-staging-task). 7. Create. _ For Production: 1. Repeat, using: _ Family: aihub-production-task (Value for AIHUB_ECS_TASK_FAMILY Variable in "Production" GitHub Env). _ Task size: Adjust for production load. _ Task role: aiHubProductionTaskRole. _ Container name: aihub-app (or your chosen name; value for AIHUB_ECS_CONTAINER_NAME Variable in "Production" GitHub Env). _ Image URI placeholder: YOUR_ACCOUNT_ID.dkr.ecr.eu-west-2.amazonaws.com/aihub:initial-production. _ Env vars: NODE_ENV=production, PORT=3001 (value for AIHUB_PORT Variable). _ Log group: /ecs/aihub-production-task.

A6. Application Load Balancer (ALB) & Target Groups (Optional but Recommended) _ If you don't have an ALB, create one (EC2 > Load Balancers > Create Application Load Balancer). _ Target Group for Staging: 1. EC2 > Target Groups > Create. Name: aihub-staging-tg. Type: IP addresses. Protocol HTTP, Port 3001. VPC. Health check path (e.g., /health). _ Target Group for Production: 1. Create. Name: aihub-production-tg. Similar config. _ ALB Listener Rules (Example for Path-Based Routing on ALB's default DNS): 1. Select your ALB > Listeners > View/edit rules for HTTP/HTTPS listener. 2. Rule 1 (Staging): IF Path is /stage/aihub/* THEN Forward to aihub-staging-tg. (Set priority). 3. Rule 2 (Production): IF Path is /prod/aihub/* THEN Forward to aihub-production-tg. (Set priority).

A7. ECS Service (Separate for Staging and Production) _ For Staging: 1. ECS > Clusters > aihub-cluster > Services > Create. 2. Launch type: FARGATE. Task Definition Family: aihub-staging-task (from your AIHUB_ECS_TASK_FAMILY Variable), Revision: LATEST. 3. Service name: aihub-service-staging. Desired tasks: 1. 4. Networking: VPC, Subnets, Security Group (allow inbound on 3001 from ALB/source). 5. Load balancing: If using ALB, select it and the aihub-staging-tg. 6. Create. _ For Production: 1. Repeat, using: _ Cluster: aihub-cluster. _ Task Definition Family: aihub-production-task (from your AIHUB_ECS_TASK_FAMILY Variable). _ Service name: aihub-service-production (as per your workflow). Desired tasks: (e.g., 2 for HA). _ Load balancing: Connect to aihub-production-tg.

A8. Troubleshooting ECS Deployments

  • Error: Unexpected key '<keyName>' found in params during task definition registration:
  • Cause: This can happen if the task definition JSON retrieved by aws ecs describe-task-definition in the GitHub Actions workflow contains keys that are not valid when registering a new task definition version (e.g., enableFaultInjection, taskDefinitionArn, revision, status, requiresAttributes, registeredAt, registeredBy). These keys are descriptive of an existing revision but shouldn't be part of a new registration payload.
  • Solution 1 (Manual Fix in AWS Console - Recommended for immediate fix):
    1. Go to ECS > Task Definitions in the AWS Console.
    2. Find your task definition family (e.g., aihub-staging-task or aihub-production-task).
    3. Select the latest revision that has the problematic key.
    4. Click "Create new revision".
    5. In the JSON editor (or through the UI), carefully review and remove any unexpected or invalid keys. For example, if enableFaultInjection is present and causing issues, ensure it's removed or correctly placed if it were part of a valid structure (though it's typically not a direct task definition parameter).
    6. Save the new revision. This new, clean revision will become the latest.
    7. Re-run your GitHub Actions workflow. The describe-task-definition command should now fetch this cleaner revision, and subsequent steps should succeed.
  • Solution 2 (Modifying Workflow - More robust for CI/CD):

    • You can add a step in your GitHub Actions workflow (e.g., in deploy-aihub-fargate-staging.yml) after downloading the task definition JSON and before rendering it, to clean out known problematic keys using jq:
    # Example step to add in your GitHub Action workflow
    - name: Clean Task Definition JSON
      run: |
        # Ensure jq is available, or install it: sudo apt-get update && sudo apt-get install -y jq
        # For staging workflow, use task-definition-staging.json
        # For production workflow, use task-definition.json
        jq 'del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .registeredAt, .registeredBy, .enableFaultInjection)' task-definition-staging.json > temp_cleaned.json && mv temp_cleaned.json task-definition-staging.json
    
    • This approach automatically cleans the JSON during each workflow run.

Part A2: aiHubSqsProcessor Lambda Functions

This section covers setting up the Lambda functions that process SQS messages and start Fargate tasks for long-running processing.

A2.1. IAM Roles for SQS Processor Lambda Functions

  • For Staging:
  • IAM > Roles > Create role.
  • Trusted entity: AWS service > Use case: Lambda.
  • Permissions: Attach AWSLambdaBasicExecutionRole and create a custom inline policy:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["ecs:RunTask", "ecs:DescribeTaskDefinition"],
      "Resource": [
        "arn:aws:ecs:eu-west-2:YOUR_ACCOUNT_ID:cluster/aihub-cluster",
        "arn:aws:ecs:eu-west-2:YOUR_ACCOUNT_ID:task-definition/aihub-staging-task:*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["iam:PassRole"],
      "Resource": [
        "arn:aws:iam::YOUR_ACCOUNT_ID:role/ecsTaskExecutionRole",
        "arn:aws:iam::YOUR_ACCOUNT_ID:role/aihub-staging-task-role"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "sqs:ReceiveMessage",
        "sqs:DeleteMessage",
        "sqs:GetQueueAttributes"
      ],
      "Resource": "arn:aws:sqs:eu-west-2:YOUR_ACCOUNT_ID:aihub-staging-queue"
    }
  ]
}
  1. Role name: aiHubSqsProcessor-staging-role.

  2. For Production:

  3. Repeat the same process.
  4. Role name: aiHubSqsProcessor-production-role.
  5. Update resource ARNs to use production resources (task definition: aihub-production-task, task role: aihub-production-task-role, SQS queue: aihub-production-queue).

A2.2. Lambda Functions

  • For Staging:
  • Lambda > Functions > Create function.
  • Function name: aiHubSqsProcessor-staging.
  • Runtime: Node.js 20.x, Architecture: x86_64.
  • Permissions: Use existing role aiHubSqsProcessor-staging-role.
  • Advanced settings: Timeout 5 minutes, Memory 256 MB.
  • Create function.

  • For Production:

  • Repeat with function name: aiHubSqsProcessor-production.
  • Permissions: Use existing role aiHubSqsProcessor-production-role.

A2.3. SQS Triggers

  • For Staging (Cost-optimized):
  • Go to aiHubSqsProcessor-staging function.
  • Configuration > Triggers > Add trigger.
  • Source: SQS, Queue: aihub-staging-queue.
  • Batch size: 10 (process up to 10 messages at once).
  • Maximum batching window: 30 seconds (wait up to 30 seconds to gather messages - optimal for staging where latency is not critical).
  • Enable trigger: Yes, Add.

  • For Production (Balanced):

  • Go to aiHubSqsProcessor-production function.
  • Configuration > Triggers > Add trigger.
  • Source: SQS, Queue: aihub-production-queue.
  • Batch size: 10 (process up to 10 messages at once).
  • Maximum batching window: 10 seconds (balance between cost and responsiveness).
  • Enable trigger: Yes, Add.

Why these settings?

Setting Staging Production Rationale
Batch size 10 10 Process multiple messages per invocation, reduce Lambda costs
Batching window 30s 10s Staging: minimize costs; Production: balance cost vs latency
Empty receives/month ~2,880 ~12,960 Dramatically reduced from ~129,600 each
Message latency Up to 30s Up to 10s Acceptable delays for background AI processing

Impact:

  • Staging: 97.8% reduction in SQS requests (from ~130K to ~3K/month)
  • Production: 90% reduction in SQS requests (from ~130K to ~13K/month)
  • Combined: 94% reduction - well within SQS free tier (1M requests/month)

A2.4. Finding Subnet and Security Group IDs

For the Lambda environment variables, you'll need subnet and security group IDs:

  • Quick method: If you have an existing ECS service:
  • ECS > Clusters > [your-cluster] > Services.
  • Click your existing aiHub service > Networking tab.
  • Copy Subnet IDs and Security group IDs.

  • Manual method:

  • Subnets: VPC > Subnets (need PRIVATE subnets with NAT Gateway OR PUBLIC subnets).
  • Security Groups: EC2 > Security Groups (or copy from existing ECS service).

Example format: Subnet IDs: subnet-12345abc,subnet-67890def, Security Group IDs: sg-98765xyz.


Part A3: S3 Bucket for Large Payload Storage

The SQS processor Lambda and ECS tasks use S3 to handle payloads that exceed AWS's 8192 character limit for ECS container overrides.

A3.1. Create S3 Buckets

  • For Staging:
  • S3 Console > Create bucket.
  • Bucket name: delta-aihub-payloads-staging (must be globally unique, adjust prefix as needed).
  • AWS Region: eu-west-2 (same as ECS/Lambda).
  • Object Ownership: ACLs disabled.
  • Block Public Access: Keep all settings enabled (block all public access).
  • Bucket Versioning: Disable (not needed for temporary payloads).
  • Default encryption: Enable with S3 managed keys (SSE-S3).
  • Advanced settings: Keep defaults.
  • Create bucket.

  • For Production:

  • Repeat with bucket name: delta-aihub-payloads-production.

A3.2. Configure Lifecycle Rules (Auto-cleanup)

For each bucket:

  1. Go to bucket > Management tab > Create lifecycle rule.
  2. Rule name: auto-delete-old-payloads.
  3. Rule scope: Apply to all objects in the bucket.
  4. Lifecycle rule actions: Check "Expire current versions of objects".
  5. Days after object creation: 1 (24 hours).
  6. Create rule.

A3.3. Update IAM Permissions

  • Lambda Role S3 Permissions (add to aiHubSqsProcessor-staging-role and aiHubSqsProcessor-production-role):
  • IAM > Roles > Find your Lambda role.
  • Add permissions > Create inline policy.
  • Use the following policy (replace [staging|production] with the appropriate environment):
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:PutObjectAcl"],
      "Resource": "arn:aws:s3:::delta-aihub-payloads-[staging|production]/*"
    }
  ]
}
  1. Name the policy: S3PayloadWrite-[staging|production].

  2. ECS Task Role S3 Permissions (add to aiHubStagingTaskRole and aiHubProductionTaskRole):

  3. IAM > Roles > Find your ECS task role.
  4. Add permissions > Create inline policy.
  5. Use the following policy (replace [staging|production] with the appropriate environment):
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::delta-aihub-payloads-[staging|production]/*"
    }
  ]
}
  1. Name the policy: S3PayloadReadDelete-[staging|production].

A3.4. GitHub Environment Variables

Important: Environment variables are managed through GitHub workflows, not directly in AWS Lambda console.

Add to GitHub repository > Settings > Environments:

  • For Staging Environment:
  • Variable: AIHUB_PAYLOAD_BUCKET = delta-aihub-payloads-staging

  • For Production Environment:

  • Variable: AIHUB_PAYLOAD_BUCKET = delta-aihub-payloads-production

The GitHub workflows (deploy-aihub-sqs-processor-lambda-staging.yml and deploy-aihub-sqs-processor-lambda-production.yml) will automatically set these as Lambda environment variables during deployment.


Part B: apiGatewayAuthorizer Service (Lambda & API Gateway)

Repeat for Staging and Production.

B1. IAM Role for Lambda _ For Staging: 1. IAM > Roles > Create role. Trusted entity: Lambda. 2. Permissions: AWSLambdaBasicExecutionRole. _ For Production: 1. Repeat, name apiGatewayAuthorizerProductionLambdaRole.

B2. Lambda Function _ For Staging: 1. Lambda > Functions > Create. Name: apiGatewayAuthorizer-staging. 2. Runtime: Node.js 20.x. Architecture: x86_64. 3. Permissions: Use existing role apiGatewayAuthorizerStagingLambdaRole. 4. Create. In Configuration > General configuration > Edit: Handler index.handler. _ For Production: 1. Repeat. Name: apiGatewayAuthorizer. Role: apiGatewayAuthorizerProductionLambdaRole. Handler index.handler.

B3. API Gateway (HTTP API) _ For Staging API: 1. API Gateway > Create API > HTTP API > Build. 2. API name: my-app-api-staging. Next, Next, Create. Note Invoke URL. 3. Authorization > Create and attach: _ Type: Lambda. Name: lambda-authorizer-staging. Lambda function: apiGatewayAuthorizer-staging. Payload: 2.0. Identity source: $request.header.x-api-key. Enable caching if desired. Create. 4. Routes > Create: Method GET, Path /items (example). 5. Select GET /items > Attach authorizer > lambda-authorizer-staging. 6. Select GET /items > Attach integration > Type Mock (for now) or HTTP URI to your staging aiHub ALB endpoint (e.g., http://ALB_DNS/stage/aihub/items). * For Production API: 1. Repeat, creating my-app-api-production. 2. Authorizer lambda-authorizer-production pointing to apiGatewayAuthorizer Lambda. 3. Routes (e.g., GET /items) with this authorizer. 4. Integrations pointing to production aiHub (e.g., http://ALB_DNS/prod/aihub/items).


Part C: GitHub Variables and Secrets for AWS Workflows

Configure these within your GitHub repository's "Settings" > "Environments". Create an environment for "Production" and another for "Staging". The workflow files use simplified names (e.g., AWS_ACCESS_KEY_ID, AIHUB_PORT, LOKI_URL) which will be resolved from the variables and secrets you define within these GitHub Environments.

For the "Production" GitHub Environment:

  • Environment Secrets:
  • AWS_ACCESS_KEY_ID: Your AWS Access Key ID for production deployments.
  • AWS_SECRET_ACCESS_KEY: Your AWS Secret Access Key for production deployments.
  • VERCEL_AUTHORIZATION_API_KEY: The shared secret key that aiHub (production) uses to authenticate with Vercel (production).
  • LOKI_PASSWORD: Password for your Loki instance (if production Loki uses auth).
  • Environment Variables:
  • AIHUB_ECS_TASK_FAMILY: e.g., aihub-production-task (the family name of your production ECS task definition).
  • AIHUB_ECS_CONTAINER_NAME: e.g., aihub-app (the name of the container in your production ECS task definition).
  • AIHUB_PORT: e.g., 3001 (the port your aiHub container listens on).
  • AIHUB_SQS_QUEUE_URL: The URL of your aihub-production-queue.
  • VERCEL_API_BASE_URL: The base URL of your production Vercel app (that aiHub might call).
  • LOKI_URL: URL for your production Loki instance.
  • LOKI_USER: Username for production Loki (if auth enabled).
  • SEND_DEV_LOGS_TO_LOKI: Typically false for production.
  • AIHUB_SUBNET_IDS: Comma-separated subnet IDs for Fargate tasks (e.g., subnet-12345abc,subnet-67890def).
  • AIHUB_SECURITY_GROUP_IDS: Comma-separated security group IDs for network access (e.g., sg-98765xyz).

For the "Staging" GitHub Environment:

  • Environment Secrets:
  • AWS_ACCESS_KEY_ID: Your AWS Access Key ID for staging deployments (can be same as prod if permissions allow, or a different one).
  • AWS_SECRET_ACCESS_KEY: Your AWS Secret Access Key for staging deployments.
  • VERCEL_AUTHORIZATION_API_KEY: The shared secret key that aiHub (staging) uses to authenticate with Vercel (staging).
  • LOKI_PASSWORD: Password for your staging Loki instance (if staging Loki uses auth).
  • Environment Variables:
  • AIHUB_ECS_TASK_FAMILY: e.g., aihub-staging-task.
  • AIHUB_ECS_CONTAINER_NAME: e.g., aihub-app-staging.
  • AIHUB_PORT: e.g., 3001.
  • AIHUB_SQS_QUEUE_URL: The URL of your aihub-staging-queue.
  • VERCEL_API_BASE_URL: The base URL of your staging Vercel app.
  • LOKI_URL: URL for your staging Loki instance.
  • LOKI_USER: Username for staging Loki.
  • SEND_DEV_LOGS_TO_LOKI: e.g., true or false.
  • AIHUB_SUBNET_IDS: Comma-separated subnet IDs for Fargate tasks (e.g., subnet-12345abc,subnet-67890def).
  • AIHUB_SECURITY_GROUP_IDS: Comma-separated security group IDs for network access (e.g., sg-98765xyz).

This detailed guide should help you establish the foundational AWS infrastructure. The GitHub Actions will then take over deploying your application code and specific environment configurations into these structures by referencing these GitHub Environment-scoped variables and secrets.


Part D: SQS Worker Mode (Direct Processing)

The aiHub service supports two modes for consuming SQS messages:

Processing Modes

Mode SQS Consumer Env Var Use Case
Direct SQS Worker in the always-on ECS service SQS_WORKER_ENABLED=true Active productions — fast processing (~2 min/doc)
Queued Lambda → Fargate single-task SQS_WORKER_ENABLED=false Off-season — cost-efficient (~7 min/doc)

In direct mode, the always-on ECS service polls SQS directly and processes tasks in-memory using the same processTask() function used by Fargate single-task mode. This eliminates the ~90-second cold start per execution group.

In queued mode, the existing Lambda → Fargate pipeline handles SQS messages as before.

IMPORTANT: The Lambda SQS trigger must be disabled when the Worker is active, and re-enabled when switching back to queued mode. Both consumers cannot coexist on the same queue — SQS delivers each message to only one consumer, and the Lambda's event-source mapping reacts faster than the Worker's long-polling cycle, causing a race condition where the Lambda wins and spawns unnecessary Fargate tasks with ~90-second cold starts.

Switching Modes

Switching to Direct Mode (On-Season)

  1. Update ECS Task Definition:
  2. Go to ECS > Task Definitions in the AWS Console
  3. Select the task definition (e.g., aihub-staging-task)
  4. Click Create new revision
  5. Set SQS_WORKER_ENABLED to true
  6. Update CPU/Memory to on-season values (0.5 vCPU / 1 GB)
  7. Save the new revision
  8. Go to ECS > Clusters > Services and update the service to use the new revision
  9. Check Force new deployment and update

  10. Disable the Lambda SQS trigger:

  11. Via AWS Console:
    1. Go to Lambda > Functions > select your aiHubSqsProcessor function (e.g., aiHubSqsProcessor-staging)
    2. Go to Configuration tab > Triggers (left sidebar)
    3. Click the SQS trigger > Disable
  12. Via CLI:

    # Find the trigger UUID
    aws lambda list-event-source-mappings --function-name aiHubSqsProcessor-staging --region eu-west-2
    
    # Disable it
    aws lambda update-event-source-mapping --uuid <trigger-uuid> --no-enabled --region eu-west-2
    

Switching to Queued Mode (Off-Season)

  1. Re-enable the Lambda SQS trigger:
  2. Via AWS Console:
    1. Go to Lambda > Functions > select your aiHubSqsProcessor function
    2. Go to Configuration tab > Triggers
    3. Click the SQS trigger > Enable
  3. Via CLI:

    aws lambda update-event-source-mapping --uuid <trigger-uuid> --enabled --region eu-west-2
    

  4. Update ECS Task Definition:

  5. Go to ECS > Task Definitions > select the task definition
  6. Click Create new revision
  7. Set SQS_WORKER_ENABLED to false
  8. Update CPU/Memory to off-season values (0.25 vCPU / 0.5 GB)
  9. Save the new revision
  10. Go to ECS > Clusters > Services and update the service
  11. Check Force new deployment and update

The service will perform a rolling restart (~30 seconds) with the new configuration.

Mode CPU Memory Notes
Direct (on-season) 0.5 vCPU 1 GB Needs resources for AI processing
Queued (off-season) 0.25 vCPU 0.5 GB Only handles HTTP endpoints (chat, Sana)

When switching modes, also update the task definition's CPU/memory allocation.

Environment Variables

Variable Default Description
SQS_WORKER_ENABLED false Set to true to enable the SQS Worker
SQS_WORKER_MAX_CONCURRENT 3 Maximum concurrent tasks the Worker will process

Local Development

For local development, SQS_WORKER_ENABLED=true is set in .env.local. The Worker polls LocalStack SQS and processes tasks directly, which fixes the issue where LocalStack SQS messages previously had no consumer.