With batch exports, data can be exported to an S3 bucket.
Creating the batch export
- Navigate to the exports page in your PostHog instance (Quick links if you use PostHog Cloud US or PostHog Cloud EU).
- Click "Create export workflow".
- Select S3 as the batch export type.
- Fill in the necessary configuration details.
- Finalize the creation by clicking on "Create".
- Done! The batch export will schedule its first run on the start of the next period.
S3 configuration
Configuring a batch export targeting S3 requires the following S3-specific configuration values:
- Bucket name: The name of the S3 bucket where the data is to be exported.
- Region: The AWS region where the bucket is located.
- Key prefix: A key prefix to use for each S3 object created. This key can include template variables
- Compression: Select a compression method (like gzip) to use for exported files or no compression.
- Encryption: Select a server-side encryption method (
AES256
oraws:kms
) for AWS to encrypt data at rest. - AWS Access Key ID: An AWS access key ID with access to the S3 bucket.
- AWS Secret Access Key: An AWS secret access key with access to the S3 bucket.
- AWS KMS Key ID: The AWS KMS Key ID to use for server-side encryption. Only required when selecting
aws:kms
encryption. - Events to exclude: A list of events to omit from the exported data.
S3 key prefix template variables
The key prefix provided for data exporting can include template variables which are formatted at runtime. All template variables are defined between curly brackets (for example {day}
). This allows you partition files in your S3 bucket, such as by date.
Template variables include:
- Date and time variables:
year
.month
.day
.hour
.minute
.second
.
- Name of the table exported:
table_name
.
- Batch export data bounds:
data_interval_start
.data_interval_end
.
So, as an example, setting {year}-{month}-{day}_{table_name}/
as a key prefix, will produce files prefixed with keys like 2023-07-28_events/
.
S3 file formats
PostHog batch exports for S3 export data in JSON lines format. We intend to add support for other common formats like CSV and Parquet. You can follow the roadmap for updates on this and other features.