Data Warehouse Sync design philosophy

Melody Tsukahara

July 11, 2024 10:43

Available for	Roles	Super Admin
	Permissions	• Manage Data Warehouse Sync integration
	Packages	Advanced Analytics add-on Also requires that your organization has its own Snowflake instance

In order to utilize DWS to its fullest extent, you will need to become familiar with the structure of the schema.

Lever's Data Warehouse Sync (DWS) was built with three key concepts in mind:

The ability to fully control and customize the way data is presented to internal stakeholders
Support for single and multi-instance Lever customers
Efficiently handle small and large data sets

To achieve these goals, the feature was built with dimensional modeling in mind utilizing key concepts outlined by The Kimball Group. A key element of dimensional modeling is a separation of facts from dimensions. The facts are stored in tables whose names end with "_events". The goal of reporting is the ability to track the changes to pieces of data and then grouping or visualizing that data so that it becomes easy to draw insights. These "_events" tables contain that historical data.

The "_events" tables are fairly wide. They contain many columns with IDs that can be used to join the "_events" table with dimension tables. These dimension tables help provide context for the events. They reveal what other objects are associated with the event.

Additionally, there are reference tables, whose names end with "_enum". These are tables that list the possible values for a given field. Not all fields require reference tables.

For example, the ""offer_events" table contains a row for each action that was taken on each offer. Suppose a report requires filtering these events by the department associated with the offer. Looking at the schema documentation, we can see that the "offer_events" table contains a key for job_department_id. The schema also shows us that this ID is used to link to the ""job_departments" table. You can tell that this ID is a shared key between the two tables because the column is named the same. By joining these two tables on their respective job_department_id columns, each row in the resulting table includes the department for each offer event! We can use the other IDs in the "offer_events" table to make similar joins to understand things like who took the action, what fields were changed, or how the status changed. The offer_status_id column is an example of a field that links to a reference table. This is because there is a defined list of possible statuses that an offer can be in.

Compared to the fact tables (the tables ending with "_events") the dimension tables contain relatively few columns. This is because the database schema is designed using third normal form, or 3NF. This schema design is useful for avoiding data anomalies, ensures referential integrity, and simplifies data management.

To learn more about 3NF, check out the Wikipedia article on third normal form. It provides a great example of a table that fails to meet 3NF requirements, as well as an example of how to split that information into two tables that meet 3NF requirements.

This design philosophy can produce an intimidating number of tables. However, because of 3NF, it is very intuitive how tables should be joined together to produce a view that will help gather powerful insights into your recruiting processes.