Logo

Data Vault User Group: Data Modeling (Virtual Meeting) - Shared screen with speaker view
Anna A
12:43
Hello Veronika and Christian
Anna A
12:47
🚀🚀
Anna A
12:58
Excited to be learning from the two of you today
Veronika Durgin
13:34
Hey Anna!
Siva Yannapu
14:07
Hello
Siva Yannapu
14:23
What is Data Vault?
Elsa Mayer
14:26
Christian's LinkedIn: https://www.linkedin.com/in/christian-kaul/recent-activity/posts/
Elsa Mayer
14:46
Veronika's LinkedIn: https://www.linkedin.com/in/vdurgin/
Siva Yannapu
15:38
How is it different from data warehouse/data lake?
Anna A
16:40
Audio disappeared not sure if it’s just me though
Metin Demircioglu
16:52
I can hear
David Doyle
16:55
@Siva - https://datavaultalliance.com/about/what-is-datavault/
Elsa Mayer
17:05
I’m not experiencing audio issues, Anna, is anyone else?
Todd Pinniger
17:13
Audio is good for me
Rajesh Polamarasetty
17:17
I can hear too
Luis Arteaga
17:23
Audio is good
Jerome Louvel
17:24
good for me too
Elsa Mayer
17:25
🙌
Anna A
17:29
Back in, all good. False alarm
Siva Yannapu
18:27
Thank you @david Doyle for sending the link
Veronika Durgin
22:28
Siva, I put together a list of various Data Vault resources if you want to start exploring https://medium.com/snowflake/the-door-is-open-data-vault-resources-35096d36a56
Veronika Durgin
22:45
Here are some of the resources that Christian put together as well https://modelyourreality.substack.com/p/a-data-modeling-library-for-2022?s=w
Siva Yannapu
24:38
Thank you @Veronika Durgin
Luis Arteaga
27:16
Question:Hi Christian,greetings from Hamburg/Germany.How do you handle transactional data from source systems that only contain quantitative information and timestamps?Intuitively, I would have thought of a link structure, but the source system does not provide explicit information about possible business keys. So no "natural" hubs to identify. Would you create artificial hubs containing fixed values just to maintain the link structure?
Hennie de Nooijer
31:27
Is the presentation available somewhere or will it be send later?
Luis Arteaga
32:24
Question:Would you create hubs that would not open up a link with any other hub in the foreseeable future?
Elsa Mayer
32:34
@Hennie the recording will be shared!
Hennie de Nooijer
32:56
Great!
Aravind Narayanan
34:45
Would you advise having measures related information in the link table, or should they always be in a satellite connected to the link?
Rajiv Gupta
39:00
I have a quick question, say I have dimensional model of around 5k table. If I am going to migrate to Vault 2.0 model. Can I say roughly it will be 5k*2(1 hub + 1 Satellite per base table in source)=10k tables(post migration)?
Rajiv Gupta
39:28
Or what should be the standard assumption?
Peep Küngas
39:52
Any recommendations for modelling PII (Personally Identifiable Information) data?
Anandapadmanaban V
40:33
Is it all ID columns in a Source table will convert to columns in Link table ?
Rajiv Gupta
41:42
What would be best recommendation when we are migrating from dimensional modelling to data vault model with data migration prospective?
Abdul Hameed Syed
42:50
@ Christian, does the data vault 2.0 modeling technique provide any performance benefits over traditional techniques such as dimensional modeling in Snowflake platform?
Anandapadmanaban V
43:40
Can we have _ID columns in Satellite which can be joined to other Hubs or Satellites directly instead of Links ?
Christopher Siegfried
44:42
When dealing with hubs that are distinct and have their own distinct business processes (and hence links), but ALSO share considerable overlap in other business process do you create a supergroup hub for the shared business processes or do you create a set of links for each distinct hub?
Christopher Siegfried
45:41
Or some other way entirely.
David Doyle
46:43
@Anandapadmanaban - the link construct is intended to represent the relationships between hubs. If the concern is around joins/performance point-in-time tables can be used to speed up queries/have less joins
Luis Arteaga
47:49
I can also recommend the book from John Giles "The Elephant in the fridge". Great book about Data Vault.
Elsa Mayer
48:25
^Veronika recommends this one too!
Elsa Mayer
49:23
Highly encourage anyone who hasn’t checked out the blog post she shared earlier to do so. So many good resources: https://medium.com/snowflake/the-door-is-open-data-vault-resources-35096d36a56
Anandapadmanaban V
49:27
Ok. Thanks for clarification. But there could be case, unit of work for Link table leads to more than 10 columns in link table. Is this a normal case ?
Luis Arteaga
50:14
Could you also recommend data vault user groups :), please?
Andrew Flower
50:24
Will there be a meet up for this group at Summit?
Gabor Gollnhofer
52:02
A very good intro book to DV modeling is John Giles' book, The Elephant in the Fridge - https://technicspub.com/fridge/It explains pretty well how to start DV modeling
Elsa Mayer
53:05
@Andrew there won't be a user group meeting but you should come to the Data Heroes Hub and meet Veronika! Here's a panel she's speaking on: https://www.snowflake.com/summit/agenda/
Elsa Mayer
53:57
@Luis there are additional user group recommendations in Veronika's blog post! https://medium.com/snowflake/the-door-is-open-data-vault-resources-35096d36a56
Andrew Flower
54:05
Thank you. Will do that.
Luis Arteaga
54:31
Thank you, Elsa.It is a great blog post. Indeed. Thanks Veronika!
James Daily
55:19
@Emanuel The Same As Link helps solve the issue of technical debt. Remember that the data vault needs to capture the source as it existed for audit purposes. It goes against the data vault methodology to apply 'soft' (business) rules to the data when ingesting it raw. It also sounded like you have both business keys and technical surrogate keys in your sources. Whenever possible, the business keys are a first choice and the technical keys can become business keys when they are used by the consumers of the source application. When there are business key collisions between sources (the same technical key values represent different real world keys), there is a technique for modeling hubs popularized by Nols Ebersohn (https://blog.certussolutions.com/blog/data-vault-and-enterprise-unique-keys) that can help.
Priyank Chinthapally
59:02
@rajiv: 1dim = 1hub + 1 or more sats, 1 fact = 1 link + 0 or more sats. Migration of dimensional models is a different situation and there are many ways to handle it. It truly depends on other factors (planning, time/resource constraints, etc). Ideal way would be to reload all source data into DV models and then create Dimensional models (if required + recommended) to ease the transition as well as to be able to support/scale for future. Interested to hear others thoughts on this.
Christian Kaul
01:00:26
gdpr/pii: https://medium.com/@christian_j_kaul/data-modeling-in-times-of-gdpr-f29813c25465
Rajiv Gupta
01:00:51
Yes, I am also just started exploring DV2.0, had initial discussion with Veronika, Its seems many more is required...sometime its really not required to change DM but those recommendation detail I am missing...
James Daily
01:02:25
Clarification, are the ID columns technical surrogate keys, or do those IDs represent business keys if you perform lookups from other tables?
Elsa Mayer
01:03:22
Will anyone here in addition to Veronika and Andrew be at Snowflake Summit in June?
Priyank Chinthapally
01:03:38
🤚
Todd Pinniger
01:03:39
Elsa, Veronika, Christian thank for the session today. Have to run.
Elsa Mayer
01:03:46
Thanks, Todd!
Rajiv Gupta
01:03:53
@Veronika @Christian, Do we have any blog or doc which give us some standard recommendation with regards to migration strategies. It not always green field project to what should be your recommendation.
David Teplow
01:04:47
@Elsa - I will be at Snowflake Summit.
Elsa Mayer
01:04:54
GREAT discussion here today, folks. Any topic requests for the next meeting?
Jason Jones
01:04:57
Thanks, folks...first time "caller" to the group...I'll be at the summit...have a great rest of your day
Rajiv Gupta
01:05:24
@Elsa, Migration strategies for DV 2.0
Elsa Mayer
01:05:25
See you there @Jason, @David, @Priyank
Priyank Chinthapally
01:05:27
It would be great if slides could be shared (not just the recording). Thank you all for this insightful session.
Scott Jameson
01:05:29
I get confused in the conversation between the RDV and BDV. As we are trying to do a more agile approach, sometimes we don't know the full business model before we have to start putting data into the RDV
Ralph Arguelles
01:05:34
thanks
Hennie de Nooijer
01:05:41
thank you!
Matthew Florian
01:05:46
Nice job Christian. You and I will be talking soon.
James Daily
01:07:55
@Christopher Generally supertyping and subtyping are anti-patterns. As you describe your use case, it sounds like you may benefit from business vault structures for query optimization.
Vladimir Osin
01:08:47
Thanks Elsa, Christian and Veronika. The presentation and resource lists are really useful.
Christian Kaul
01:10:31
slide deck downloadable here: https://www.obaysch.net/downloads/
Christian Kaul
01:11:54
my linkedin page: https://www.linkedin.com/in/christian-kaul/
Christopher Siegfried
01:12:35
Gotta go. Thanks Christian and Veronika!
Emanuel Oliveira
01:14:30
@james daily Im more concerned on the performance impact using SALs when creating BV and/or information marts
Aravind Narayanan
01:16:48
We need a Slack channel
Christian Kaul
01:17:12
this is the mike magalsky presetation veronika was referring to: https://www.youtube.com/watch?v=GrTT1-V7E68
Elsa Mayer
01:18:05
Slack Workspace: https://join.slack.com/t/snowflakecommunity/shared_invite/zt-m2rfuzp3-dPX5fC40Um~LVRVgUTKZ0g
Elsa Mayer
01:18:19
#data-vault
Luis Arteaga
01:18:26
Great! Thanks
Gabor Gollnhofer
01:19:32
Thanks Christian for the presentation & Veronika for the event!