AWS Pricing Data Case Study

Rob Galanakis on April 6, 2023

financial information in a newspaper

There are at least a few projects that bridge the AWS API with SQL. Some of these are really cool, and outside the scope of WebhookDB, like being able to work with your infrastructure using SQL.

In our case, we'll see how easy it is to query the AWS Pricing Service with WebhookDB, including pulling it down for interesting analytics.

Customers in our Beta program can replicate AWS pricing data to their own dedicated database using:

$ webhookdb integration create aws_pricing_v1

(If you want to join our beta program, email hello@webhookdb.com)

It'll take a few minutes to sync — there are 4 to 5 million separate rates in AWS! After it's finished, you can check out the data:

$ psql `webhookdb db connection`

to connect to your database. Let's run the following commands:

adb1a13b53668915fcb0b> \d

to see available tables. Find your aws_pricing_v1_<abc> table. We can see its schema has some useful denormalizations:

adb1a13b53668915fcb0b> \d aws_pricing_v1_9158 +-------------------------+--------------------------+ | Column | Type | |-------------------------+--------------------------| | pk | bigint | | rate_code | text | | product_sku | text | | product_family | text | | product_attributes | jsonb | | product_group | text | | product_location | text | | product_region | text | | product_operation | text | | product_usagetype | text | | publication_date | timestamp with time zone | | service_code | text | | version | text | | term_type | text | | term_code | text | | offer_term_code | text | | effective_date | timestamp with time zone | | applies_to | text[] | | begin_range | numeric | | description | text | | end_range | numeric | | unit | text | | term_attributes | jsonb | | price_per_unit_raw | jsonb | | price_per_unit_amount | numeric | | price_per_unit_currency | text | | data | jsonb | +-------------------------+--------------------------+

Notice how, for example, price_per_unit is a number.

This makes it easy to do something like find the most expensive EC2 BoxUsage in us-west-2:

adb1a13b53668915fcb0b> select * from aws_pricing_v1_9158 where product_usagetype like '%BoxUsage%' and product_region = 'us-west-2' order by price_per_unit_amount desc limit 1 -[ RECORD 1 ]------------------------- pk | 12184777 rate_code | RM2R35DGV8VBDDHC.MZU6U2429S.2TG2D8R56U product_sku | RM2R35DGV8VBDDHC product_family | Compute Instance product_attributes | {"ecu": "NA", "vcpu": "448", "memory": "18432 GiB", "storage": "EBS only", "tenancy": "Shared", "location": "US West (Oregon)", "gpuMemory": "NA", "operation": "RunInstances:0102", "usagetype": "USW2-BoxUsage:u-18tb1.1> product_group | <null> product_location | US West (Oregon) product_region | us-west-2 product_operation | RunInstances:0102 product_usagetype | USW2-BoxUsage:u-18tb1.112xlarge publication_date | 2023-03-28 23:05:02+00 service_code | AmazonEC2 version | 20230328230502 term_type | Reserved term_code | RM2R35DGV8VBDDHC.MZU6U2429S offer_term_code | MZU6U2429S effective_date | 2022-12-01 00:00:00+00 applies_to | [] begin_range | <null> description | Upfront Fee end_range | <null> unit | Quantity term_attributes | {"OfferingClass": "convertible", "PurchaseOption": "All Upfront", "LeaseContractLength": "3yr"} price_per_unit_raw | {"USD": "6462648"} price_per_unit_amount | 6462648 price_per_unit_currency | USD data | {}

Any time you want to re-sync, you can run the backfill command)::

$ webhookdb backfill aws_pricing_v1

This will pull new pricing data into your database. Note that the historical data won't disappear, so you could even track changes over time. If you want a fresh database, you can do webhookdb integrations create aws_pricing_v1 again (don't forget to use webhookdb integrations delete to clean up unused integrations since each table is a few gigabytes of data).

One more fun thing you can do is download the whole table for use in something like DuckDB, which is a pretty amazing piece of technology.

$ duckdb D -- See https://duckdb.org/docs/guides/import/query_postgres.html D INSTALL postgres; D LOAD postgres; D CALL postgres_attach('<your connection string>'); D -- Did the attach work? Use your own table name. D SELECT product_usagetype FROM aws_pricing_v1_9158 LIMIT 1; ┌───────────────────┐ │ product_usagetype │ │ varchar │ ├───────────────────┤ │ USE1-EnrolledUser │ └───────────────────┘ D -- Download the entire database locally as a Parquet file. D COPY (SELECT * FROM aws_pricing_v1_9158) TO 'whdb_awspricing.parquet' (FORMAT PARQUET); D -- And turn it into a DuckDB database D CREATE TABLE awspricing AS SELECT * FROM 'whdb_awspricing.parquet'; D SELECT count(1) FROM awspricing; ┌──────────┐ │ count(1) │ │ int64 │ ├──────────┤ │ 4695834 │ └──────────┘

That's all there is to it. Go wild!

If you want to bring this ease of access to any API on the planet, give WebhookDB a try or get in touch.

Recent Blog Posts

AI-generated image of balloons and a computer
WebhookDB is Open Source

March 11, 2024

We're aligning our business with our values and community and going Open Source,

Read More →
zoomed in artificial snowflakes, each unique
Every API is Unique

June 8, 2023

Just like people, every API is unique in its own special way.

Read More →
webhookdb hook logo wearing angel wings
WebhookDB Gives You Wings!

June 1, 2023

Answer any question instantaneously, instead of drowning in documentation and tools.

Read More →
programmers reading code behind two doors, one with more cursing than the other, correlating code quality and cursing
Why would they do that!

May 24, 2023

Or, how to stop worrying and learn to love every API.

Read More →