Happy Monday everyone! For those of you exploring metadata-driven architectures (which I think is quite a lot of you!)... here are some ideas for you:
As a quick recap: the metadata-driven data pipeline is a technique commonly used in data engineering. Rather than explicitly declaring the source and the destination for a Copy Data Activity (for example), we instead design our pipelines so that the Source and Destination can be passed in dynamically. This means we can store details of the Source/Destination connections in another location, which is read at Execution time. This adds a lot of benefits: scalability, maintainability, and many more.
However, the point of this post was to start a discussion about how/ and where you can store such metadata. The two most common ways you see metadata stored (in a Microsoft environment) are
- In structured tables (like the Data Warehouse)
- In a JSON File (perhaps in your Lakehouse Files area).
However, I'd like to throw in a third option for discussion: storing your metadata in a Notebook (and passing it into your pipeline using mspsarkutils.notebook.exit().
Pros of this appoach:
- make your configuration trackable by version control (which is not possible with the previous two methods)
Cons:
- maybe more difficult to read, if you have quite a few Key/Value pairs
Thoughts? Where are you storing your metadata at the moment?