AWS S3
AWS S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. It is read-only and supports below file formats:
The S3 Wrapper allows you to read data of below formats from S3 within your Postgres database.
- CSV - with or without header line
- JSON Lines
- Parquet
The S3 Wrapper also supports below compression algorithms:
- gzip
- bzip2
- xz
- zlib
Note for CSV and JSONL files: currently all columns in S3 files must be defined in the foreign table and their types must be text
type.
Note for Parquet files: the whole Parquet file will be loaded into local memory if it is compressed, so keep the file size as small as possible.
Supported Data Types For Parquet File#
The S3 Wrapper uses Parquet file data types from arrow_array::types, below are their mappings to Postgres data types.
Postgres Type | Parquet Type |
---|---|
boolean | BooleanType |
char | Int8Type |
smallint | Int16Type |
real | Float32Type |
integer | Int32Type |
double precision | Float64Type |
bigint | Int64Type |
numeric | Float64Type |
text | ByteArrayType |
date | Date64Type |
timestamp | TimestampNanosecondType |
Preparation#
Before you get started, make sure the wrappers
extension is installed on your database:
_10create extension if not exists wrappers;
and then create the foreign data wrapper:
_10create foreign data wrapper s3_wrapper_10 handler s3_fdw_handler_10 validator s3_fdw_validator;
Secure your credentials (optional)#
By default, Postgres stores FDW credentials inide pg_catalog.pg_foreign_server
in plain text. Anyone with access to this table will be able to view these credentials. Wrappers is designed to work with Vault, which provides an additional level of security for storing credentials. We recommend using Vault to store your credentials.
_14-- Save your AWS credential in Vault and retrieve the `key_id`_14insert into vault.secrets (name, secret)_14values (_14 'vault_access_key_id',_14 '<access key id>'_14)_14returning key_id;_14_14insert into vault.secrets (name, secret)_14values (_14 'vault_secret_access_key',_14 '<secret access key>'_14)_14returning key_id;
Connecting to S3#
We need to provide Postgres with the credentials to connect to S3, and any additional options. We can do this using the create server
command:
_10create server s3_server_10 foreign data wrapper s3_wrapper_10 options (_10 vault_access_key_id '<your vault_access_key_id from above>',_10 vault_secret_access_key '<your vault_secret_access_key from above>',_10 aws_region 'us-east-1'_10 );
Creating Foreign Tables#
The S3 Wrapper supports data reads from S3.
Integration | Select | Insert | Update | Delete | Truncate |
---|---|---|---|---|---|
S3 | ✅ | ❌ | ❌ | ❌ | ❌ |
For example:
_13create foreign table s3_table_csv (_13 name text,_13 sex text,_13 age text,_13 height text,_13 weight text_13)_13 server s3_server_13 options (_13 uri 's3://bucket/s3_table.csv',_13 format 'csv',_13 has_header 'true'_13 );
One file in S3 corresponds a foreign table in Postgres. For CSV and JSONL file, all columns must be present in the foreign table and type must be text
. You can do custom transformations, like type conversion, by creating a view on top of the foreign table or using a subquery.
For Parquet file, no need to define all columns in the foreign table but column names must match between Parquet file and its foreign table.
Foreign table options#
The full list of foreign table options are below:
uri
- S3 URI, required. For example,s3://bucket/s3_table.csv
format
- File format, required.csv
,jsonl
, orparquet
has_header
- If the CSV file has header, optional.true
orfalse
, default isfalse
compress
- Compression algorithm, optional. One ofgzip
,bzip2
,xz
,zlib
, default is no compression
Examples#
Some examples on how to use S3 foreign tables.
Basic example#
This will create some "foreign table" inside your Postgres database can read data from S3:
_75-- CSV file, no compression_75create foreign table s3_table_csv (_75 name text,_75 sex text,_75 age text,_75 height text,_75 weight text_75)_75 server s3_server_75 options (_75 uri 's3://bucket/s3_table.csv',_75 format 'csv',_75 has_header 'true'_75 );_75_75-- JSON line file, no compression_75create foreign table s3_table_jsonl (_75 name text,_75 sex text,_75 age text,_75 height text,_75 weight text_75)_75 server s3_server_75 options (_75 uri 's3://bucket/s3_table.jsonl',_75 format 'jsonl'_75 );_75_75-- GZIP compressed CSV file_75create foreign table s3_table_csv_gzip (_75 name text,_75 sex text,_75 age text,_75 height text,_75 weight text_75)_75 server s3_server_75 options (_75 uri 's3://bucket/s3_table.csv.gz',_75 format 'csv',_75 has_header 'true',_75 compress 'gzip'_75 );_75_75-- Parquet file, no compression_75create foreign table s3_table_parquet (_75 id integer,_75 bool_col boolean,_75 bigint_col bigint,_75 float_col real,_75 date_string_col text,_75 timestamp_col timestamp_75)_75 server s3_server_75 options (_75 uri 's3://bucket/s3_table.parquet',_75 format 'parquet'_75 );_75_75-- GZIP compressed Parquet file_75create foreign table s3_table_parquet_gz (_75 id integer,_75 bool_col boolean,_75 bigint_col bigint,_75 float_col real,_75 date_string_col text,_75 timestamp_col timestamp_75)_75 server s3_server_75 options (_75 uri 's3://bucket/s3_table.parquet.gz',_75 format 'parquet',_75 compress 'gzip'_75 );