Recommended approach to importing and parsing GA4 BigQuery event data in SQL server

gudnpqoy 于 2023-04-04 发布在 SQL Server

关注(0)|答案(1)|浏览(142)

We are using Python to connect to BigQuery to download daily events_YYYYMMDD.csv files. Those files contain a comma separated list of columns, and some of the columns contain .json data (some of which contain embedded .json as well).

Each of those daily files contain approximately 1 million records.

Using the Google GA4 Events Schema, we are building the GA4 related tables in SQL server.

event
user
device
geo
app_info
traffic_source
stream and platform
ecommerce
items

What would be the recommended manor of parsing the .csv files (which can contain multiple levels of embedded .json) into the appropriate tables (listed above) in SQL server?

We started to create a SSIS package to import the .csv file into a staging table, then create a script that would parse the rows. However, upon doing so, we realized we'd need to create a pretty complicated query containing multiple cross apply statements and were concerned about performance.

sql-server

来源：https://stackoverflow.com/questions/75883114/recommended-approach-to-importing-and-parsing-ga4-bigquery-event-data-in-sql-ser

1条答案

按热度按时间

wnavrhmk1#

The best practice is to flatten your BQ GA4 export before exporting it. You can find various approaches on the web to flattening it, but the main idea is to get rid of json in your event property/custom dimension column. Typical solution is to make a distinct column for each ep/cd.

After this, people typically just move the data into their preferred data storage as is. Any mutations in ETL at this point would be company-specific. And normally, the ETL logic gets bloated over the years, always aimed at improving the data.

赞(0）回复(0）举报 2023-04-04

我来回答

Recommended approach to importing and parsing GA4 BigQuery event data in SQL server

1条答案

相关问题

热门标签

最新问答