I am assigned with supplying a listing of metadata needs our data warehouse designers may need.

This isn't the company metadata (nice explanations etc), but instead data needed for change management (also called impact assesment), data lineage etc.

I have seen this short article Meta Meta Data Data - Ralph Kimball but as I am not the very first person to get this done I am tossing it towards the SO community.

The particular real question is this: What metadata do datawarehouse designers require to create, develop and manage alternation in ETL programs?

PS: I am attempting to keep the solution platform agnostic however for context it is really an Oracle database with PL/SQL and Datastage.

Inside my place of work, we've home-brew ETL. I'm able to help you raise a brow :). The minimal meta-data we've describes the next. Subscription particulars, Audit, Data-mapping, Run-order.

The subscription particulars again fall under two groups, vendor from whom the information was bought and teams/programs utilizing it. The ftp/http particulars, access qualifications will also be saved. Fortunately i was requested to possess absolutely zero SPs, Major exception "identity machines".

Audit particulars involve, date of information, last modified time, user who went it, failure/success count.

Data-mapping table describes the tables and column names which contain the data. We once had one more composite-key descriptor table. However we made the decision to eliminate that. The performance loss was paid out by asking data table proprietors to produce proper partitioning strategy.

Run_order table is yet another table we've which determines when the user can run (Y/N) and also the order by which runs can happen.

The meta-information is also saved by having an history (according to date). Therefore if anyone decides run an aged/historic subscription. The run would proceed.

Uses of the aforementioned: We are able to prioritize the information loads according to need for subscription. We are able to monitor failures in a generic level (wild birds-eye view). We are able to write generic code that may create dynamic sql queries (no hard-coding). Our load and extract processes have to make use of the data-mapping table, so no user can pull off stale information.

This appeared to operate to date within our experience.