The question that is always asked with storing data with Slowly Changing Dimensions, is to what extent you want to keep the historical data. I would like to explain a situation of hierarchical data that is stored with the SCD method and how this affects the data.
As an example where I first got in touch with this subject, I’ll take the hierarchical relationship between employees and managers.
The employees’ data is stored as SCD Type 2, and this table also includes a column “Manager”. This column has a foreign key relationship to the surrogate key of the same table. So what does this do to your data?
To keep the example simple, I’ll take two employees (John and Jane), and Bill is their manager. This results in the following structure in the table:
Let’s say Bill moves to another location as of October 1st 2010. Since I have the column “Location” also set to a type 2 change, you might expect the following changes and additions to the table:
The changes for Bill seems correct, an extra line is added with a new surrogate key (line 4) and his original line has gotten an end date. Now I still know that Bill has been working in New York and then in Boston.
Nothing special than a simple SCD type 2 change until so far. But the above situation, is that really what should happen, isn’t there more that should be done? I probably wouldn’t have posted this if it was just that 🙂
Look at John and Jane. They have Bill as their manager because they both have the surrogate key of Bill in the column “ManagerKey”. The active record for Bill has now been changed to key 4, so Jane and Joe have to “change manager” as well. I have put this as a quote, because they don’t really change managers, but the link to their manager’s record in the employees table has changed.
Therefore, the end result of Bill’s change adds the following lines to the table for his subordinates:
In this example it doesn’t seem like a big change, but this entails that every single change in a manager’s data, the subordinates all the way down will get a new record in the table as well. In my case, there is a new CEO in the organization that led to these changes for every one of the 2500 employees in the company!
We’re not talking about a lot of data here, but in other hierarchical structures than employee data, this could lead to a massive data load that is not even necessary. Therefore you should ask yourself if you really need that information so much in detail. If you don’t need this, there are two options that look like the most straight-forward options:
- Change the column “ManagerKey” to an SCD Type 1 column. If this is used in the above example, the manager’s key in John’s and Jane’s profile will be overwritten from 3 to 4. A drawback of this option is that you lose the history of who has been whose manager.
- If you need to keep the history, you can use a foreign key relationship not on the surrogate key, but on the business key. In the above example, John and Jane would have the value 145 in the column “ManagerKey”, and nothing changes with their information and no new record for them are added.
It is advised to make this choice early on in your design phase, since going back from one to another could be a very time consuming job!