Skip to main content
This article covers connecting Unstructured to Amazon S3 Vectors.For information about connecting Unstructured to Amazon S3 without support for Amazon S3 Vectors instead, see S3.
If you’re new to Unstructured, read this note first.Before you can create a destination connector, you must first sign in to your Unstructured account:After you sign in, the Unstructured user interface (UI) appears, which you use to get your Unstructured API key.
  1. After you sign in to your Unstructured Let’s Go, Pay-As-You-Go, or Business account, click API Keys on the sidebar.
    For a Business account, before you click API Keys, make sure you have selected the organizational workspace you want to create an API key for. Each API key works with one and only one organizational workspace. Learn more.
  2. Click Generate API Key.
  3. Follow the on-screen instructions to finish generating the key.
  4. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.
After you create the destination connector, add it along with a source connector to a workflow. Then run the worklow as a job. To learn how, try out the the notebook Dropbox-To-Pinecone Connector API Quickstart for Unstructured, or watch the two 4-minute video tutorials for the Unstructured Python SDK.You can also create destination connectors with the Unstructured user interface (UI). Learn how.If you need help, email Unstructured Support at support@unstructured.io.You are now ready to start creating a destination connector! Keep reading to learn how.
Send processed data from Unstructured to Amazon S3 Vectors. The requirements are as follows.
  • An Amazon S3 Vectors bucket.
  • The AWS Region (such as us-east-1) of the target S3 Vectors bucket. Learn how to get the Region of an existing S3 Vectors bucket.
  • An index for the target S3 Vectors bucket. When creating an index, be sure to specify these settings:
    • Vector index name can be any allowed name pattern.
    • For Dimension, only specify a number that is supported by Unstructured’s available embedding models.
    • For Distance metric, only specify Cosine.
    • For Metadata configuration under Additional settings, Unstructured recommends that you specify the following 10 keys for Non-filterable metadata:
      • text
      • link_urls
      • link_texts
      • coordinates-points
      • coordinates-system
      • data_source-url
      • data_source-record_locator
      • data_source-date_created
      • data_source-date_modified
      • data_source-date_processed
    • There are no Unstructured-specific requirements for Encryption or Tags.
    Learn more about these index settings.
  • For the target index, the number of dimensions that are generated. Learn how to get the index’s number of dimensions.
  • The AWS access key ID and the AWS secret access key for the target AWS IAM principal (such as an IAM user or group) that has the appropriate access to the S3 Vectors bucket.
    • If you use identity-based policies to control access, the target IAM principal must have at minimum the following access permissions. Replace the following placeholders:
      • Replace <region-short-id> with the AWS Region short ID of the target S3 Vectors bucket.
      • Replace <account-id> with the AWS account ID of the target S3 Vectors bucket.
      • Replace <bucket-name> with the name of the target S3 Vectors bucket.
      • Replace <index-name> with the name of the target index.
      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Sid": "AccountBucketListing",
                  "Effect": "Allow",
                  "Action": [
                      "s3vectors:ListVectorBuckets"
                  ],
                  "Resource": "*"
              },
              {
                  "Sid": "AllowBucketAccess",
                  "Effect": "Allow",
                  "Action": [
                      "s3vectors:GetVectorBucket",
                      "s3vectors:ListIndexes"
                  ],
                  "Resource": "arn:aws:s3vectors:<region-short-id>:<account-id>:bucket/<bucket-name>"
              },
              {
                  "Sid": "AllowIndexAccess",
                  "Effect": "Allow",
                  "Action": [
                      "s3vectors:ListIndexes",
                      "s3vectors:GetIndex",
                      "s3vectors:ListVectors",
                      "s3vectors:QueryVectors",
                      "s3vectors:PutVectors",
                      "s3vectors:GetVectors",
                      "s3vectors:DeleteVectors"
                  ],
                  "Resource": "arn:aws:s3vectors:<region-short-id>:<account-id>:bucket/<bucket-name>/index/<vector-name>"
              }
          ]
      }
      
      Learn more about these S3 Vectors access permissions.
    • Learn how to attach an access policy to an IAM user, group, or role.
    • Learn how to create and manage AWS access key IDs and their related AWS secret access keys for IAM users.
    • Learn how to switch from an IAM user to a role for temporary access.

Create the destination connector

To create an S3 Vectors destination connector, see the following examples.
import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateDestinationRequest
from unstructured_client.models.shared import CreateDestinationConnector

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.destinations.create_destination(
        request=CreateDestinationRequest(
            create_destination_connector=CreateDestinationConnector(
                name="<name>",
                type="s3_vectors",
                config={
                    "region": "<region>",
                    "access_config": {
                        "key": "<key>",
                        "secret": "<secret>",
                        "token": "<token>"
                    },
                    "ambient_credentials": "true|false",
                    "vector_bucket_name": "<vector-bucket-name>",
                    "index_name": "<index-name>",
                    "key_prefix": "<key-prefix>",
                    "batch_size": <batch-size>
                }
            )
        )
    )

    print(response.destination_connector_information)
Replace the preceding placeholders as follows:
  • <name> (required) - A unique name for this connector.
  • <region> (required): The AWS Region (such as us-east-1) of the target Amazon S3 Vectors bucket.
  • <key> (required): The AWS access key ID for the target AWS IAM principal that has the appropriate access to the target bucket.
  • <secret> (required): The AWS secret access key for the corresponding AWS access key ID.
  • <vector-bucket-name> (required): The name of the target bucket.
  • <index-name> (required): The name of the target index in the bucket.
  • <batch-size>: The maximum number of vectors to generate a single batch. The maximum is 500. The default is 100 if not otherwise specified.
  • <key-prefix>: Some string to prepend to each vector key. Prepending a string to each vector key can be useful for distinguishing between different datasets in the same bucket. Learn more about vector keys. The default is to not prepend a string to each vector key, if this value is not otherwise specified.