Jul 29 (edited) in Technical
Where's the best place to store your metadata?
Happy Monday everyone! For those of you exploring metadata-driven architectures (which I think is quite a lot of you!)... here are some ideas for you:
As a quick recap: the metadata-driven data pipeline is a technique commonly used in data engineering. Rather than explicitly declaring the source and the destination for a Copy Data Activity (for example), we instead design our pipelines so that the Source and Destination can be passed in dynamically. This means we can store details of the Source/Destination connections in another location, which is read at Execution time. This adds a lot of benefits: scalability, maintainability, and many more.
However, the point of this post was to start a discussion about how/ and where you can store such metadata. The two most common ways you see metadata stored (in a Microsoft environment) are
  1. In structured tables (like the Data Warehouse)
  2. In a JSON File (perhaps in your Lakehouse Files area).
However, I'd like to throw in a third option for discussion: storing your metadata in a Notebook (and passing it into your pipeline using mspsarkutils.notebook.exit().
Pros of this appoach:
  • make your configuration trackable by version control (which is not possible with the previous two methods)
Cons:
  • maybe more difficult to read, if you have quite a few Key/Value pairs
Thoughts? Where are you storing your metadata at the moment?
18
18 comments
Will Needham
7
Where's the best place to store your metadata?
Learn Microsoft Fabric
skool.com/microsoft-fabric
A community for passionate analysts, data engineers, data scientists (& more!) looking to learn Microsoft Fabric - the end-to-end analytics platform.
Leaderboard (30-day)
powered by