Amazon Athena reads Parquet files directly from S3 and bills per terabyte scanned, so it’s a fast way to explore the data the Amazon S3 v2.0 connector writes—no warehouse to provision, no ingestion pipeline to maintain. By the end of this guide you’ll have a database namedDocumentation Index
Fetch the complete documentation index at: https://docs.suprsend.com/llms.txt
Use this file to discover all available pages before exploring further.
suprsend_db with three tables (ss_requests, ss_workflow_executions, ss_messages), and you’ll have run your first query.
Prerequisites
You’ll need:- The S3 v2.0 connector running and writing Parquet files to your bucket.
- Access to the AWS account that owns the bucket, with permission to use Athena and read the bucket.
- An S3 location for Athena to store query results (a separate prefix or a different bucket).
- Path layout:
per_type. The DDL in this guide creates a separate external table for each data point, with each one pointing at its own S3 prefix (<bucket>/workflow_executions/,.../requests/,.../messages/). That folder structure exists only withper_type. - Compression: any codec except
lz4. Athena can’t reliably readlz4-encoded Parquet—this is a known Athena limitation, not something specific to SuprSend.snappyis the default compression which works well with Athena.
1. Set the Athena query result location
Open the Athena console in the same region as your S3 bucket. The first time you use Athena in a region you have to set a result location:- Go to Settings → Manage.
- Set Location of query result to a path you control, for example
s3://YOUR_BUCKET_NAME/athena-results/. - Save.
2. Create the database
In the query editor, run:suprsend_db. If you pick a different name, update the CREATE EXTERNAL TABLE statements below to match.
3. Create the external tables
Run each statement to register one external table per data point. They use Athena partition projection onyear/month/day/hour, so new hourly partitions are picked up automatically—you don’t need MSCK REPAIR TABLE or a Glue Crawler.
Replace YOUR_BUCKET_NAME in LOCATION and storage.location.template with your bucket. If your connector writes under a path prefix (for example staging/), include it before the data-point folder—s3://YOUR_BUCKET_NAME/staging/workflow_executions/.
- Workflow executions
- Requests
- Messages
projection.year.range is set to 2022,2030. Widen the upper bound if you’ll be querying data beyond 2030.4. Run your first query
You’re ready to query. The cardinal rule with Athena is to always filter on the partition columns (year, month, day, hour) where you can—Athena bills per TB scanned, so partition pruning is the main way to keep costs predictable.
idempotency_key, which is shared across Requests, Workflow Executions, and Messages:
UNNEST(requests.executions).exec_id = workflow_executions.execution_id), see the linking columns reference.
Points to note
Working with string-typed columns
Working with string-typed columns
Some columns are stored as
string in Parquet even though they hold JSON, booleans, integers, or timestamps. Cast them explicitly:- JSON (
payload,response,metadata,properties,webhook_data,execution_failure_reason) →json_extract(column, '$.field') - Integer (
vendor_fallback_level,wait_time_in_seconds) →CAST(column AS BIGINT) - Boolean (
vendor_fallback_applicable,is_smart) → compare against'true'/'false' - Timestamp (
success_achieved_at) →from_iso8601_timestamp(column)ordate_parse(column, '<format>')
Data freshness
Data freshness
The connector rewrites hourly Parquet files when upstream data changes. The next Athena query picks up the new state automatically — no
MSCK REPAIR TABLE or Glue Crawler refresh needed.Changing the connector's S3 path
Changing the connector's S3 path
If you point the connector at a new bucket or prefix, existing tables stop returning new data. Drop and recreate each table with the updated
LOCATION and storage.location.template.