Minnu's Blog on Informatica & Data warehouse concepts


Monday, June 16, 2008

Unity Dimension : An answer for multiple dimensions sharing the same role

PreFace or Business Requirement:
Having seen exclusively the scenarios for different manufacturers in the past, You may be unaware of the significant role logistics plays in the order fulfillment cycle for food makers. The company tracks a complex set of movements among various entities. Several complex scenarios involving products and multiproduct display packages affect subcontractors, distribution centers, and customers: recalls, intercustomer transfers, and so on.
Operationally, the business creates orders to effect and track these different movements: transfer work, sales, and return orders. But this aspect served only to muddle the problem. To support logistics, the model had to consider every movement regardless of the document that initiated it. Clearly, a single fact table had to record all inventory movements, yet with all the different possible origins and destinations, implementation was going to be challenging.

The enterprise model needed to treat customers, vendors, plants, and storage locations (distribution centers and warehouses) as distinct dimensions. These dimensions obviously represented significant business entities and were needed for other subject areas, such as revenue, cost, profitability, and production. The inventory movement fact needed to record the origin and destination of the movement.
There are two obvious solutions but not in Performance Perspective.
The first is to have eight foreign keys (FK): four for origin of customer, vendor, plant, and location; and four for each possible destination. The appropriate keys would be populated and the remaining six would be null. This solution fits into a dimensional modeling framework, but a query wanting to see all movements would challenge even a SQL savant.

The second obvious solution is to store two keys paired with a type code indicating to which dimension the key should be joined. Unfortunately, this method violates just about every rule we hold sacred in relational design and messes up the dimensional model as well. And looking at all movements joined with the proper dimension would take a SELECT UNION query with 16 separate SELECT clauses.
The obvious solutions are not solutions! A unity dimension is.
A Variant of Variant

A unity dimension is a spin on the variant dimension or you may call it as variant of an variant. Think of Kimball's example of an Account dimension in a banking data warehouse (explained in snowflake post). Such an application contains many types of accounts with widely varying attributes. Rather than a single dimension table with all the possible attributes for every type of account, create a primary dimension table with a common set of base attributes and multiple variant dimension tables containing unique attributes for each type of account.

Each variant table contains rows for certain account types. However, the variant and primary rows use the same surrogate key. This setup lets you store a single foreign key in the fact table that you can join to the primary dimension table, the variant table, or both, depending on query requirements.

The variant dimension addresses the problem of having a common entity with differing attributes. A unity dimension is the opposite, involving different entities that share a common role.
Unity Technique

A unity dimension involves the same technique as a variant, but its component dimensions are full. In my situation, the component dimensions were the existing Customer, Vendor, Plant, and Storage_Location dimensions. The unity dimension, Shipping_Points, contained basic information, such as name and address common to all four dimensions, and contained an equal number of rows as all four combined.
And, as in a variant dimension, all tables share the same surrogate key. This is important because it allows great flexibility in how you can use the dimensions. In essence, the primary keys of the component dimensions are foreign keys to the unity dimension.

While the backroom extract-load process is loading fact records, the natural key in the fact table record must be checked against the corresponding component dimension to obtain the proper surrogate key. The process is fairly complex because the load process must identify which version of the dimension record a natural key represents. However, since it's safe to assume that the source system already "
However, since it's safe to assume that the source system already "understands" that issue, you can usually identify the correct version of the dimension from the transaction context. Where ambiguity exists, the load process may be able to determine the version using multiple lookups and assign the correct surrogate keys to the fact record, saving the user work at query time.
In the final model, the Shipment_ Facts table contains two foreign keys to the Shipping_Points dimension, one for the point of origin and the other for the destination. (See Figure Below.) A generic query, such as an analysis of shipping volumes between any two points, joins to the Shipping_Points dimension to get descriptive information about the sending and receiving locations.
FIGURE : The unity dimension improves performance and simplifies queries.

Image Hosted by ImageShack.us

And, because the unity and its component dimensions share the surrogate key, it's easy to apply context to the queries. A query to analyze shipments between vendors and customers is a simple matter of joining the point of origin key with the vendor dimension, and the destination key to the customer dimension. The result is a structure that can handle both generic and context queries without snowflakes or complex SQL.
Maintaining the unity dimension is easy. It is simply a matter of performing the same update to the unity dimension as you would to any of the component dimensions. Whenever you add a new row to a component dimension, add a row to the unity dimension using the same key. If you update a component dimension row, apply the same changes to the corresponding unity dimension row.

However, for this solution to work well, you must share a common surrogate key sequence among the component dimensions. Doing so will ensure the key values are mutually exclusive. If you have the luxury of building a new warehouse, You are recommend to use one surrogate key sequence for most, if not all, of your dimensions.
Because a 32-bit key would allow for more than four billion unique values, a typical application shouldn't run out of keys for at least a few hundred years. (If you are truly gun-shy about a problem like the Y2K bug or you work for a telecom company or government entity, you could always use a 64-bit key.) If you have an existing warehouse and use independent key sequences for each dimension table, You are strongly suggested to consider reassigning keys before implementing a unity dimension.
You can retrofit a unity dimension into an existing warehouse without reassigning keys, but queries become messy. One approach is to define the unity dimension with its own surrogate key and store the component dimension's surrogate key and a type code, identifying the component dimension, as a composite natural key to the unity dimension. The fact table will contain foreign keys to the unity dimension as before. If you wish to query using a component dimension, you will be forced to join through the unity dimension to the appropriate component dimension, filtering on the type code. The result is a snowflake query that can become quite complex and performs poorly.