Logo

Data Engineering User Group: How Snowpipe & Data Sharing Enabled Scalable Analytics at DoorDash - Shared screen with speaker view
Elsa Mayer
01:02
Thanks for joining early!
Elsa Mayer
01:10
We'll get started in about 10 minutes
Vijaya Kumar G
02:03
thank you
Elsa Mayer
07:25
Where is everyone joining from?
Vijaya Kumar G
07:38
from India
Kranthi Kumar
07:54
Hi Elsa am from India and I work for us based company
Elsa Mayer
07:58
Amazing!
Elsa Mayer
08:04
What time is it in India right now?
Vijaya Kumar G
08:15
me working as snowflake developer
Vijaya Kumar G
08:28
10.26pm
Elsa Mayer
08:47
Thanks for tuning in 🙂
Joseph Sgambelluri
09:10
US. Pennsylvania
Joe Tobey
09:44
US. Pennsylvania
Libby Theoharis
10:13
US, Kansas
Arun Kolal
10:17
Berkley Heights, NJ, USA
Donald Bachman
12:03
Oklahoma
Monalisa Patil
12:10
Texas
Deepak Murthy
12:14
San Francisco
Yesha Raval
12:16
New jersey
Mark Schmidt
12:19
Hi all. Checking in from Sacramento.
Krista Martocci
12:22
Brooklyn, NY!
Selma Catakovic
12:24
Switzerland
Scott Wimpelberg
12:30
Nashville, TN
Nick Zitzer
12:36
St. Paul, MN
Debasis
12:37
North Goa, India
Mai Phuong Nguyen
12:38
Cincinnati, OH
Dan Piston
12:38
Syracuse, NY
David Pan
12:45
Seattle
Sushma Jumledar
12:51
Jacksonville, FL
Joel Mousseau
12:52
Chicago Area
Michael Williams
12:53
Bridgeton, NJ
Chaitanya Pasapuleti
12:55
Des Moines, Iowa
Siarhei Bohdan
12:59
Alanya Turkey
Srini Karumudi
13:05
Dallas, USA
James Hollowell
13:12
Colorado
Jerimaih Dickey
13:57
Please send me the recording at the end, I will have to drop a little early.
Vijaya Kumar G
22:47
Recording session provided?
Elsa Mayer
23:57
For those asking about the recording, Data Engineering chapter members will get an email notification when it's posted to the event page! https://usergroups.snowflake.com/data-engineering/
Elsa Mayer
25:27
Data Cloud World Tour: https://www.snowflake.com/data-cloud-world-tour/
Elsa Mayer
25:53
Snowflake BUILD: https://www.snowflake.com/build/
Hitesh Yadav
27:07
Question: What is the frequency of the files generated by Flink?
Deepak Murthy
27:51
Question: As this is a events data, Why didn't you use Kafka and Kafka Connector to snowpipe to snowflake.
sandip sandhu
29:27
any idea on scale and cost for the event publishing pipeline to help compare with pother stacks?
Shashidhar (Dhar) Manasani
30:23
A technical question on Snowpipe - is Snowpipe creates any SNS & SQS services behind the scene to recognize the new file arrived ? or is there any different technology used for this?
manish shrestha
30:55
Where does snowflake reside? is it also housed in AWS
Kranthi Kumar
31:16
@shashidhar we can use SQS on bucket level
Jeejesh Rajan
31:21
Was Kafka Snowflake Sink connector considered for loading into snowflake vs snowpipe from s3 ?
Daniel Myers
31:21
Snowflake runs across AWS, GCP, and Azure
Anup Kesari
31:56
What is Schema Registry ?
Vinay C
32:27
how is schema evolution handled in CDC
Shashidhar (Dhar) Manasani
32:31
@Kranthi, I think we don't need to create separate service for it
Kranthi Kumar
33:05
@If we need to trigger the pipe which we create on snowflake we need in that case
manish shrestha
33:33
no in this particular use case was wondering if snowflake also resided in AWS
Dharani Bandaru
33:51
@shashidhar we don't need a separate service, we need to configure SQS notifications for the snowflake in aws s3 bucket.
Lucas Messias
34:18
What is the data latency you achieved with this pipeline considering the moment the data is generated and collected by Kafka until it is transformed and ready to be used by the team?
Kranthi Kumar
34:34
@manish snowpipe will not reside in aws it will be in snowflake only but using notification channel (arn) we need to integrate the bucket with our pipe
Deepak Murthy
35:38
Question: Regarding data share, how can we copy data share as cloning is not available
Hitesh Yadav
35:54
Question 2: Given the file frequency is near-real time or a mini-batch (30s to 1 minute) for each Kafka topic. What is the cost you are seeing for Snowpipe considering it is 0.06 credits per 1000 file?
Rajesh Veera
36:46
Are multiple snowflake accounts in same AWS region?
Kranthi Kumar
36:59
@Deepak cloning and data share is completely different
Shashidhar (Dhar) Manasani
37:18
@Kranthi & @Dharani, I think the pipeline is continuous ingestion process , trying to understand what technologies used behind the scene that Pipeline know there is a new file waiting to be process. We don't need to trigger snow pipe externally
Joseph Sgambelluri
38:04
Is it continuously polling then?
Kranthi Kumar
39:08
@Shashidar we need to create an SQS event inside the properties of our s3 bucket and the event type should be 'all object creation' then whenever an object is dumped into s3 it will trigger our pipe and load the data into target
Anup Kesari
39:41
can data share be done across AWS region?
jiteshwar Anjale
39:43
How the sensitive data is handled ?Do you explicitly encrypt sensitive data before loading it into Snowflake ?
Elsa Mayer
39:55
Please make sure you're on mute while Akshat is presenting! Thank you 🙂
SRIKANTH KANTIPUDI
40:08
With external data sharing, how does DoorDash handle encryption (of certain PII data elements if any)? How are external vendors able to decrypt data?
Anup Kesari
40:09
how do you maintain the metadata
Dharani Bandaru
40:32
@Shashidhar https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3.html we need to configure policies between snowflake and S3 and set the execution status of snowpipe to true so that it starts running. whenever there is a file in s3 it sends notification to snowpipe. snowpipe then adds the file to queue for processing
Deepak Murthy
40:42
@kranthi, we are getting data from some producers as data share, but we need to copy this data for our compliance needs. We cannot copy the data table by table.
Arun Kolal
40:50
Question -- Does DoorDash have a use case to encrypt PII info amongst external shares and you have to encrypt with different keys for different customers?
Lee Nguyen
41:00
Thanks for the overview!
Adam L
41:33
thanks Akshat, that was a great overview
Vijaya Kumar G
41:57
Thanks Akshat for giving nice presentation
Kranthi Kumar
41:58
Wonderful presentation Akshat Nair
Lucas Messias
42:11
Thanks for sharing!
noolu Setti
42:41
with in organization can data be shared between 2 accounts where one is aws and other is on azure acconts
Anand Palaniappan
42:52
With Flink doing Streaming & some transformations, what are we using Airflow for? Is it to do further transformations into Snowflake? Can we do the transformations before loading into snowflake and use Snowflake only as Data publish.
Christina Long
42:54
How have you implemented alerting/monitoring for Snowpipe?
Debasis
43:05
Thanks Akshat, Great presentation.
Patrick Berens
45:00
In our experience: Keeping your kafka partitions per topic small helps a ton with cost when using Snowpipe. It is technically creating a file per partition per topic. So I moved all our data into a single partition for most topics when writing into Snowflake using connector. This helped reduce our costs.
Kranthi Kumar
45:34
we create a separate database for data share which can load all the data not as table by table Deepak
Hitesh Yadav
45:47
Thanks @Patrick
SRIKANTH KANTIPUDI
45:56
With external data sharing, how does DoorDash handle encryption (of certain PII data elements if any)? How are external vendors able to decrypt data?
Elsa Mayer
46:20
Please raise your hand if you'd like to ask your question live!
Elsa Mayer
46:29
Thank you 🙂
Shashidhar (Dhar) Manasani
46:30
I am not finding raise hand
Parker Hatt
46:52
you can share if from AZURE to AWS
Joseph Sgambelluri
46:54
@Shashidhar It should be under "Reactions"
Jason Dy-Johnson
47:28
How do you handle schema changes in your S3 datalake files that are ingested through Snowpipe?
Hitesh Yadav
47:29
Its feasible
James Tobin
47:35
its feasible
Eric Christensen
47:45
It's feasible. I'm doing it
Kranthi Kumar
47:58
@Deepak use this command Create database db_name_temp from share account_name.share_name
Joe Tobey
48:14
Use replication across region and then share from that region
Anand Palaniappan
48:19
1. If a single table has all the vendors(say 100) data, how are we sharing for specific vendor. Do we need to create a view and share that view?
2. With Flink doing Streaming & some transformations, what are we using Airflow for? Is it to do further transformations into Snowflake? Can we do the transformations before loading into snowflake and use Snowflake only as Data publish.
Mahantesh Hiremath
48:55
yes its possible. We did it (Azure to AWS)
Joe Tobey
49:12
The replicated data needs to be copies before it can be shared. (this is what I have experienced)
James Hollowell
49:30
https://docs.snowflake.com/en/user-guide/data-share-replication.html
noolu Setti
49:37
mail id
James Hollowell
49:44
https://docs.snowflake.com/en/user-guide/data-share-replication.html#replicating-shares-across-regions-and-cloud-platforms
noolu Setti
49:47
to send questionsa
Elsa Mayer
50:59
For questions we don't get to, I recommend asking in the communigt! https://community.snowflake.com/s/forum
Mehul S
51:17
@Elsa - will we get link to view the recording?
Elsa Mayer
52:18
Yes! I recommend joining the Data Engineering chapter to get an email notification when it's posted: https://usergroups.snowflake.com/data-engineering/
Patrick Berens
53:02
What was the name of the modeling they use with Airflow?
Hitesh Yadav
53:22
Airflow is a scheduling tool
Patrick Berens
53:50
Yes, the modeling tool they use? I couldn't understand the name
Hemanth Janyavula
54:06
Whats the ETL tool using?
Hitesh Yadav
54:07
Oh I did not understand the q. Do not remember
Patrick Berens
54:19
NP thanks
Chaitanya Pasapuleti
54:27
When Snowpipe starts supporting Merge , is it only for Upserts ? If so do you have a scenario how you handle deletes
Tejaswini Ravi
55:15
So are you doing ETL instead of ELT? Why
Elsa Mayer
55:33
Quickstarts site: https://quickstarts.snowflake.com
Hitesh Yadav
55:34
Merge can handle deletes - https://docs.snowflake.com/en/sql-reference/sql/merge.html#matchedclause-for-updates-or-deletes
noolu Setti
01:00:18
Elsa mayer Reg my question Daniel mers told to drop mail regarding cross cloud data sharing with cross regions. to which mail I should drop.
Elsa Mayer
01:01:48
Please email me! elsa.mayer@snowflake.com. Thanks 🙂
Biraja Mohanty
01:02:39
for each table there will be a snowpipe or we have one snowpipe for multiple tables ?
noolu Setti
01:03:56
one snow pipe is enough for all tables and link with internal sqs of snowflake.Please create s3 triggers at main folder level.
Joseph Sgambelluri
01:05:50
Need to drop. These quickstarts look like just what I need for me to learn more about Snowflake
Elsa Mayer
01:06:06
Love to hear it, Joseph!
manish shrestha
01:06:26
Ditto. QuickStart is what I was looking for
Deepak Murthy
01:06:41
could you share the linkk
noolu Setti
01:06:56
can u share gihub link
Kristina Larabee
01:07:31
https://quickstarts.snowflake.com/ and https://github.com/Snowflake-Labs/sfquickstarts
Elsa Mayer
01:07:59
Thanks @Kristina!!!
Hemanth Janyavula
01:09:10
Can we have the recording
Deepak Murthy
01:09:47
Thank you
Elsa Mayer
01:09:48
@Hermanth I recommend joining the Data Engineering chapter to get an email notification when the recording is posted: https://usergroups.snowflake.com/data-engineering/
Gultekin Keskin
01:10:37
For JinjaSql: https://towardsdatascience.com/advanced-sql-templates-in-python-with-jinjasql-b996eadd761d
Arun Kolal
01:12:40
Have to drop. Thanks for a very informative session.
noolu Setti
01:13:34
can u share here
noolu Setti
01:13:38
linked in name
Lee Nguyen
01:14:10
Thanks everyone!
Rick Beebe
01:14:28
Thank you!!!
Dan Piston
01:14:33
Thanks!
Anusha Kandalai
01:14:34
Thank you
Padma Jayaprabhu
01:14:36
Thank you
Kendra Holmoe
01:14:38
thank you