Gen 1 vs. Gen2 Dataflows – Ep. 361

Mike and Tommy compare Gen 1 vs Gen 2 dataflows, why the Gen 2 experience still falls short in key areas, and when Gen 1 remains the practical choice for Power BI and Fabric teams. They also call out what improvements would make Gen 2 a true replacement.

News & Announcements

Fabric Community Conference Europe Recap | Microsoft Fabric Blog | Microsoft Fabric — We had an incredible time in our host city of Stockholm for FabCon Europe! 3,300 attendees joined us from our international community, and it was wonderful to meet so many of you in person. Throughout the week of…
Update to Direct Lake documentation | Microsoft Power BI Blog | Microsoft Power BI — We recently made a significant update to the Direct Lake documentation. Direct Lake accelerates time to data-driven decisions by unlocking incredible performance directly against OneLake, without the need to manage…
Differences between Dataflow Gen1 and Dataflow Gen2 - Microsoft Fabric | Microsoft Learn — Compare differences between Dataflow Gen1 and Gen2 in Data Factory for Microsoft Fabric.
PowerBI.tips Podcast — Subscribe and listen to the Explicit Measures podcast episodes and related content.
Power BI Theme Generator — Power BI.tips - The worlds best theme generator for Power BI reports. Increase your speed to develop stunning reports using this free theme generator. Themes are essential for any report developer’s tool belt. Visit…

Main Discussion

Gen 2 dataflows are positioned as the modern path forward, but the team keeps coming back to the same question: do they actually make day-to-day work easier yet? In this episode, Mike and Tommy walk through what still feels missing, what workflows remain smoother in Gen 1, and what would need to change for Gen 2 to be the default recommendation.

Looking Forward

If you’re evaluating dataflows for a new project, focus on reliability and the developer experience first, then work backwards into the “modern” choice. As Gen 2 matures, keep an eye on parity gaps closing - but don’t force a migration if it slows your team down.

Episode Transcript

0:43 good morning and welcome back to the explicit measures podcast with Tommy Seth and Mike hello everyone and good morning good morning gentlemen and happy Tuesday hello morning well jumping into things today our main topic for day today will be an interesting topic I think we’re going to have some differencing of opinions here we’ll see where this lands but it’s going to be talking through the conversation around data flows gen one versus gen two what are we seeing are the gaps what’s What’s Happening

1:14 are the gaps what’s What’s Happening Here and what are we seeing for adoption is how are how are how are we using it or or not using it or what do we need to see for it to be more widely adopted so we’re going to just kind to unpack the the discussion there but before we get into that as always we have a couple announcements so Tommy you brought us a couple links here so there is now officially a full Microsoft fabric conference recap Jason himelstein has done a great job of just summarizing all the details that come out really good

1:45 the details that come out really good article good summary of all the main key items anything that stuck out to You Tommy are that are important I think the I thought you would have said the nfc’s now with powerbi mobile apps I don’t know if you saw that I did see that I didn’t see a demo of it by chance so I’m not really sure if I it wasn’t also in the the main it’s it was announced it was for powerbi but it wasn’t in the main announcements area so they have like you announcements area so they have like the keynote kickoff of the know the keynote kickoff of the they didn’t talk about it there so it was more of a a hidden feature

2:16 was more of a a hidden feature there I maybe could see some use cases for this one but it feels a lot more like manufacturing so a nich a niche thing I think this is also very similar to what you were doing with QR codes and powerbi neat feature kind QR codes and powerbi neat feature works but I don’t see a lot of people of works but I don’t see a lot of people really just jumping in and diving into leveraging QR codes with their reports there is use cases for it just I don’t have any clients that I work with that need it any other things that stood out to you guys if you read through the list I certainly haven’t gotten into

2:47 the list I certainly haven’t gotten into it yet but the one that stands out very large to me is the data bricks Unity catalog tables available in Microsoft fabric I do have some feedback on that one so they definitely read did read the notes on that one for sure especially the the limitations it’s definitely a good very good experience so the the idea here is Unity catalog exists in data bricks and now you can just with a one button press there’s a new item you can go get that data bricks Unity catalog

3:17 go get that data bricks Unity catalog bolted directly into it’s like a new C the new activity and then you can just pick the tables you want from wherever that may live and it makes a data bricks item with also a SQL analytics endpoint and a default semantic model on top of those tables which is great it’s nice there’s just a lot of limitations on what tables you can use if you’re doing any streaming if you’re using materialized views you need to make sure you really check out the details of that because you’ll find that some of them work and some of them don’t so you really just need a standard

3:48 don’t so you really just need a standard Delta table to pull from you can’t really pull things from a I think I think I think that’s okay though if as long as there’s a working clean clean pass off point at at announcement here like I think I think that’s doable and and to I think for in in general for customers that are deeply engaged in Big Data activities data bricks is probably the thing they’re using so paths into

4:18 the thing they’re using so paths into fabric I think are the way that you start start to explore opportunities so it’s cool one thing you’ll also note and I want to point this out in the documentation they start talking about permissions and governance of the unity catalog you have it there is going to be two permission systems basically you have the unity catalog permissioning and you’re going to have a fabric permissioning based on those tables so today the way the connector works right now is it’s not pushing down credentials of the user into the unity catalog to see which tables you can or cannot see it’s a

4:49 tables you can or cannot see it’s a delegated permission I would just like just like a lot of the other data sources though yes and no but like if you do SAP that pushes user pass credential p through there were some Lake housee things that do credential so I think the expectation of the community is a bit more of we like the idea but we really want to start seeing credential pass through go back to Unity catalog because that’s the message the message is uni catalog is your single point of governance for cataloges and governance and who has access to what things yep so I’m there’s a lot that goes into that

5:20 I’m there’s a lot that goes into that configuration yes I think there’s a lot more coming so if they can get that worked out I think it’ll be really imp impactful and Powerful but right now the way I see it is it’s really help just be aware of some of the limitations you’re so needy Mike I’m so needy needy well this is this is the challenge I think of getting things out the door quickly because we need to have the features to be parody with things but there’s always like these it we get the feature but we get like most of the feature and there’s still refinement

5:50 the feature and there’s still refinement coming later on so I think that’s just it just feels like that’s the pattern right now we get a lot of the initial releases of things to get things out the door and we’re quickly waiting for like okay we’re really close can we get a couple more things to refine and make this a bit smoother for us I we wanted to talk about this last week but one I think one of the bigger things that went under the radar was announcing service principles support for fabric apis which is crazy that this wasn’t out before well and again this is one of

6:21 before well and again this is one of these like here’s a whole bunch of apis you can only use user credentials with them and now we’re getting like oh here’s all the service principal stuff and they’re not all covered just a lot more that have been covered recently so I really like this super big feature here especially if you use a lot of functions as well the first time I ever looked at the fabric documentation for apis I was expecting some nice you apis I was expecting some nice try it out here in the in the know try it out here in the in the service it was all in C sharp or C+ it was insane like it’s like hey

6:52 C+ it was insane like it’s like hey if you want to create a connection to fabric here’s the giant C sharp code you need it’s like where’s this coming from it’s like so sounds like an opportunity to me Tommy to for you to broaden your language skill sets I’m not not even Clues there’s other a couple other things so Tommy I think you pointed out another article so moving away from so again check out the announcements that the the link is in the description of this video it’s also in the chat as well if you want to go check out the

7:22 well if you want to go check out the full announcements all from coming from fabric conference a lot of really interesting things really good announcements again a lot of these are improving our experience with fabric really really enjoying the direction they’re moving here Tommy you had another article here around updating the direct Lake documentation updates to direct Lake documentation so Microsoft is putting a lot of work in how they actually go through their documentation in terms of the wording for example for the Delta Lake and parket they’re GNA adopt that in

7:54 parket they’re GNA adopt that in terms of from a format point of view they’re going to accelerate some more things with one lake with some of the documentation and they’re just trying to enhance basically okay we’re dealing with not just one Lake but we’re dealing with Delta tables park that’s going to be our primary gold standard and is through we’re going to talk more about directly so some more enhancements that’s really good to see because again if this is the heartbeat

8:24 because again if this is the heartbeat of fabric this is incredibly important I would agree and I I’m also seen a lot of notion do you guys follow MIM at all MIM on like Twitter or or LinkedIn or something ConEd on LinkedIn I think yeah so MIM has been doing this really interesting thing so there’s been this language there’s a lot of these open source formats there’s there’s Delta there’s iceberg there’s udy I believe as well maybe some other formats that are out there but MIM has been really touting on I think it was Twitter is where he’s making

8:54 was Twitter is where he’s making these messages but he was saying it doesn’t really matter what data format your your format is in in you can bring them all to one Lake and one Lake basically delies them makes a makes a a metadata layer on top of them so you can read them and understand what’s inside the file system so it was really interesting to me to see people of the community and and he’s now from Microsoft but going in and deep diving on these are all the different formatted supported formats that you can go get and I think this is a great play leave your data where it’s at use you’re using snowflake great you can connect to

9:25 using snowflake great you can connect to Iceberg if you’re using data bricks great you can use Delta tables it lets you keep the data where exists and the teams that are strong in those tools can continue to reside and build in those tools so anyways thought that was a really good point and I think Microsoft is putting a lot of investment around direct Lake and direct Lake things I had some great conversations I actually had lunch with chrisan Christian Wade for one of the days we were there and we were just hashing ideas back and forth and he was telling all all the great things that are coming from direct lake so Microsoft is very

9:56 from direct lake so Microsoft is very motivated to continue investing in

9:57 motivated to continue investing in direct Lake and I’m very excited to have the direction of of that with what they’re building so very pleased about direct Lake items another good article out there for that one and then one thing I’ll just point out here Tommy I was going over to your GitHub recently and so this is actually a request Tommy this is a request from one so I was talking with an individual over in Stockholm at the conference and he’s like where’ Tommy’s page go I can’t find any Tommy’s GitHub pages so Tommy I think you have you’re you’re rebuilding some things

10:27 you’re you’re rebuilding some things in GitHub some stuff has happened yeah something happened which could help us yet to let me know exactly what but they actually suspended my account and I have had a ticket in for two months actually I have two tickets in with a followup gone through all the correct Avenues and doors and I’ve yet to hear back so I’m recreating my GitHub so it’s gonna be a new I don’t know what you could do on GitHub to yeah maybe it was

10:57 could do on GitHub to yeah maybe it was a python maybe it was a fabric notebook maybe you figured it out maybe there’s too much AI on your on your it was was do too much Tommy you should you should hound them daily because if you don’t know what it was and there’s something in your GitHub repo or you might just do it again yeah you might load up the the thing that got that thing your account suspended into the new one and that one would get suspended after all this work after all this work so yeah I I’m trying

11:27 after all this work so yeah I I’m trying I think I have a lot of my local rep or my repos on a local computer so I’m going to try to transfer them at least the ones that I created that I have they’re not trademark or anything they’re just things I probably why people are asking it’s very sad losing some of the guests but yeah if you go to the account handle for pulia bi that’s the new one okay little baren all right good but I’ll plan there we’ll get it we’ll get it launched we’ll get going okay excellent all right well then pul bi will be the new one so for

11:58 pul bi will be the new one so for those of you who have been following economy Advocate and following Au you won’t be able to find them at pul Thomas anymore at GitHub it’ll be now pul bi which is awesome it now matches your brand a bit more so you can push towards the brand that works excellent all right I have one more last item here and I can’t find where it’s noted here on I’m trying to find the September update so in the September update of the Microsoft blog there is introducing the fabric metrics layer I if you guys kind caught this one

12:30 layer I if you guys kind caught this one if you search that article for the word word goals what about existing metrics and scorecards you’ll be familiar with metric scorecards I’m reading this from the article right here they’re not going anywhere scorecard metrics will be now be renamed to the original name of GS GS no no so what we had so much pain about going from renaming of goals to metrics and here we are so the metric Hub will be

13:02 here we are so the metric Hub will be this idea of creating a semantic model where you can pick various items and make a like a perspective of it and you can give that to people to go build on top of now what they’re doing is they’re taking the thing called metrics and now renaming them back to goals which is what they originally named funny to me it just makes me laugh so we’re waffling on naming schemas here but at least we’re going to have goals and now the metric Hub and the metric Hub will be this new thing that’s going to be out here fairly soon I think the announcements made it’s supposed to be rolling out I believe

13:32 supposed to be rolling out I believe it’s in preview right now and so the metrics layer is this is what we talked about on the the YouTube channel quite a while ago with Carly Karly gave us a really good demo of what this was how it’s going to be used there’s been a lot of refinement and features it’s getting out the door now finally so we’re very excited to see this come from Christian Wade’s team this is very much needed exciting to see this happen but we’re just be aware your goals will now turn back into your metrics will now turn back into goals it says this in public preview and it was the third part of the paragraph but

14:04 was the third part of the paragraph but have you seen a place to enable it or I I have not seen it yet so I have to go check out the admin tenant settings it’s taking it sounds like it’s out in preview it’s supposed to be rolling out to each of the tenants I haven’t see anything yet so I’m I’m actually very excited to get my hands on it and I hope to have some feedback on that one pretty soon I think it’s actually going to be a candidate for another exercise here pretty soon yes Andrew I think you’re also also interested in seeing these metrics the new metrics Hub up anyways cool

14:34 the new metrics Hub up anyways cool stuff there I just wanted to point that out I thought that was really funny that we’re now waffling on names that’s a good find good find Mike well I read that I’m like wait a minute we were arguing about goals and metrics before we are going there’s a lot of argument a lot of argument I’m glad our argument came out and I’m glad we just landed on goals it’s GNA just be goals now so anyways it is what it is oh my all right enough of the things let’s let’s just talk about moving over what is what’s our main topic for today our main topic for today is

15:05 today our main topic for today is jumping in and let’s talk about why would we want to use or migrate to J data flows gen two or stay in jet data flows gen one or what’s what’s our thoughts here so we’re just going to unpack is this is a previous conversation we had in other podcasts around what is lacking in maybe data flows gen two why would I not want to move my data flow gen 1es over to gen two and if there are anything any new features that we’re getting in gen two that we’re we’re excited about and maybe would want to push us more towards moving out of data flows gen one Tommy

15:37 moving out of data flows gen one Tommy you gave us another another nice article on this one there’s actually a a feature overview it’s actually the differences between both tools so you have a getting from data flows gen one to data flows gen two so Microsoft kind to data flows gen two so Microsoft gives us a a migration pattern here of gives us a a migration pattern here and it gives you a bunch of check boxes and check marks here of like what we can and cannot do from the two different so I think this is be the the article we’re going to start from we’re going to go through this article together anything Tom you want to add I think let’s start with why

16:08 to add I think let’s start with why we’re actually having the conversation and probably also the purpose of the Gen gen flu data flow so this topic for conversation actually come around because we’ve talked about this multiple times not necessarily one particular episode but during our announcements the frustrations or or what we’ve realized the limitations with the jlow Gen 2 our excitement for jlow Gen 2 but and we keep touching this we keep touching this

16:38 keep touching this we keep touching this but I think it’s time to actually have the real conversation the heart tohe heart on okay we may feel that some things are lacking in gen two but man gen one is still out there and available so let’s probably start at the beginning or on the data flow gen one right before fabric before everything what did data flows gen one

17:09 flows gen one solve good question I’ll I’ll throw a couple key points that I think that was the solution there right data flows gen one was really our power query solution in the cloud it was our move from how instead of doing it in desktop instead of making those common tables I could then easily produce those tables directly in a data flow gen one and and the next part is it would save the data so a couple other features that I really liked about gen one and we’ll find these are very s similar to Gen 2 as I ran my data flows gen one it

17:39 Gen 2 as I ran my data flows gen one it would connect to a SQL database or some backend server it would then save that copy of that data down to a storage account inside my EMP powerbi and then anytime I want to access the data from the data flows gen one I’m not going back to the SQL Server anymore I’m just refreshing from that basically storage account that J flows gen one stored information into so it’s basically removing load from this the backend system imagine you have a dimension table dimension products or product Master if you want to reload that data in 7 8 n 10 models or refresh

18:10 that data in 7 8 n 10 models or refresh a data set every hour no problem you could have the data flow run one time at night it would load all the new data and then throughout the day as your semantic models refresh it’s not always going back to the SQL Server causing additional load so if you’re trying to really load things out of production or get out of the operational systems you can remove that addition add accessing data load off of that machine and then only give it to data flows Gen 2 so that those are the kind flows Gen 2 so that those are the key features anything else I missed of key features anything else I missed SE Tommy that you really liked about data flows gen one no I think you’re hitting it pretty

18:42 one no I think you’re hitting it pretty on the head in terms of what the game changer was for Gen one data flows when they came out was we had reference tables Master tables just trying to get a source of truth that we can reuse but utilize a business user case with power query man before data flows gen one we had power Query in Excel that we load load and just because power query was at the time the best way to do it and we

19:12 time the best way to do it and we finally the amount of solutions we’re able to solve both from having one source of truth from multiple powerbi users or for myself I’m always going to reuse the customer data I’m always going to reuse our reference for our our sales quota that’s that was great but to do multiple Transformations over that as well and then I just have a static not I don’t have to load the whole thing table really changed from a governance and

19:42 really changed from a governance and also how we looked at powerbi because before like oh the file the power query file the query that we use it’s in this SharePoint folder if you want to get dat so that by far and yeah we love gen

19:56 so that by far and yeah we love gen one incremental refresh host so like it had some amazing features out there that I know for myself I was dependent on yeah was the great thing about it is centralizing the systems of processing right like up into a place of shared the shared ecosystem as opposed to like Tommy’s pointing out the many different various local or yeah local solutions that people

20:27 yeah local solutions that people still had to come up with me think about loading that in my example I gave earlier about the product tables having a master product table that you’re loading over and over again to many different models every time you do that it requires some level of compute from the semantic model to go back to that Source system and and you’re almost overspending a little bit right you think about like wow I’m going back if I refresh this model three times per day I’m always going back to that Source system so when I’m spending money on my SQL either as your SQL or on Prem I’m making that machine give me the data

20:57 making that machine give me the data right so that’s one expense there but then you’re actually running potentially running your semantic model refreshes that run longer potentially another thing that was a challenge was we didn’t really have the ability or in the early back in my day back in the early days of powerbi we didn’t have the ability to create incremental refreshing on different tables at different speeds right there’s certain tables that need to be refreshed very frequently and there could be other tables that refresh slowly again back to my product Master

21:27 slowly again back to my product Master information right if I’m looking at transactional data I may want to refresh that multiple times per day but if I’m looking at a product master that may be something that actually refreshes much slower so data flows gives me the ability to have two different data flows on two different speeds right I could I could make Separate Tables refresh at a different Cadence basically which is again another timesavers I I do like that point about it’s almost putting guard rails on some of the production systems right because especially in self-service scenarios right where you

21:58 self-service scenarios right where you have people creating connections to SQL servers Etc like China or China powerbi yeah I’ve always described as like a bull in a china shop when it com dat access when it comes to data access because it’s just like like the things it can do depending on how many different queries are in the power query it’s just like boom there goes the seal server resources to like suck things up so especially when

22:28 like suck things up so especially when you throw in ways to limit the data SE the the data sets that you’re accessing through incremental refresh even right it it helps those systems maintain operability for sure while you still have access to the data than in this other store and one thing I really want to point on here that I think is a a big differentiator that is going to create more friction when you go between gen one and gen two when we think about incremental refresh the incremental

23:00 incremental refresh the incremental refreshing inside data flows gen one I think was very well done you had the ability of creating a incremental refresh poly that would do a drop and replace or an update old partitions mentality and I really like the data flows gen one incremental refreshing structure it was it was built super solid I understood it it was easy to let people you to to teach them like hey I’m going to I’m going to make a you I’m going to I’m going to make a I I I use the analogy of like a know I I I use the analogy of like a snake right I have a Year’s worth of

23:32 snake right I have a Year’s worth of data and I’m always going to refresh the last seven days so I’m going to add one more day to the pipeline and then you more day to the pipeline and then delete and replace the last seven know delete and replace the last seven days because there’s always this sometimes transactional systems don’t get all the final records for a couple days because things get synced and the transactions come in and so sometimes you want to just recently replace those things so the ability for me to drop and replace individual days of data using data flows gen one I think was extremely help for me and I really like that feature the reason when we start talking about data

24:02 reason when we start talking about data flows Gen 2 you’ll see why I’m pointing this out this out specifically anything else for data flows gen one that you liked crickets there they’re obviously the frustrations but honestly they also came out with the ability with the gen two or gen one Mike you actually made a note of this initially was that the power in data flow gen one was updating with

24:32 in data flow gen one was updating with features before desktop and I thought that was really interesting too where it’s like oh it’s actually have I have more features available to me than I do in desktop in the service so that also was a major help as well but no I think that’s a pretty good use case for where gen one where plays Cloud first man Cloud First Once something goes to the cloud cloud first

25:02 something goes to the cloud cloud first you’ll always get the most recent stuff or the more newer stuff in the cloud that’s that’s true agree there so let me go through a couple gripes that I had so I had a couple with data FL gen one yes as good as we’re we’re touting here there was a couple I think weaknesses that I thought were a bit challenging inside data flows gen one I think maybe Microsoft’s going to try and address that with data flows Gen 2 so one thing with data flows gen one was the inability for you so when you started there was this whole idea of like Downstream or dependent streams remember do you remember that feature in data

25:32 do you remember that feature in data flow gen one where I had I would load raw data using data flow gen one and then I would do some transformations to it so it would know okay this data flow has a downstream dependency on a secondary data flow right load raw data go to the tables connect to it grab a window of information and maybe incremental refresh and then data flows gen two you could then or Sor D FL on one you could then set another data flow behind it that said had an independency so if you kicked off the Upstream data flow it would know there’s a dependency Downstream and then do the

26:02 dependency Downstream and then do the downstream activities one that was a pretty cool feature but it didn’t always work very smoothly for me and it was hard to determine like when something was starting how many tables Downstream were working if I had two Upstream tables or data flows that were trigging about the same time there seemed to be some blocking effect that like once you did once you kicked off One initial table refresh it blocked everything else Downstream so you potentially had sometimes random errors of imagine you

26:32 random errors of imagine you have a a three tier data flow right data flow one is loading raw data data flow two is shaping transforming cleaning up the data and then maybe you have a third data flow that’s merging a couple data flows together and anywhere in that path whenever you started the most fundamental the one that’s the base of the pipeline if you refresh that every single pipeline Downstream turned was basically blocking it said you couldn’t change the data dat and to me that was a little bit weird on why it was doing that and it was not

27:03 why it was doing that and it was not very intuitive as to what was occurring there anyway so that was that was one of the challenges there who can edit the data flow like I remember data flow is Gen one I was constantly taking over the data flow all the time Tommy would build one and then I’d need to make a change and you can’t see this data flow you don’t have access to it I’m like why why like I’m not going to change it I’m just looking at it I just wanted to observe what was in the code so to me that was a little bit of a weird experience and I think the last thing that was weird is the

27:34 thing that was weird is the whole publishing experience every time you made a data flows gen one every table you made was a physical table that was written to storage so it didn’t really matter if you if you turned it inactive I don’t think it would write anything but if it was active every table became a physical table and I don’t necessarily need all of that sometimes I just want the data flow to build the logic and then produce you build the logic and then produce I don’t need every single table in know I don’t need every single table in the data flow to produce an actual output of a table but that was the Assumption with data flows gen one every every data flow gen one made a

28:05 every every data flow gen one made a table so I think it was maybe what I would call a bit heavy-handed in it was heavy-handed and that was always processing all the DAT all the time I don’t know if I need that so any other gripes that I that I missed that you you guys have or you hit the own you hit though the ownership one on the head and then also two and I think Seth you actually brought brought this up in one of our very earliest episodes as much as I love data flows gen one and can

28:35 data flows gen one and can overcome a lot of challenges it’s not meant to be a a data engineering or a I think a data process center because once you run the query there was no really in terms of storage if something if you broke the query if something happened that would break everyone’s report and there was no backup there was no Version Control here so it worked great but but you’re still very at

29:07 but but you’re still very at the whims of knowing what you’re doing and then yeah Mike to your point the ownership side of the collaboration was not not existent it was it it was very focused on a single person building that item yeah all right we’re about halfway through here maybe we would be a good idea to transition over to the next part of this so thinking about the decision guide for fabric or the the data flows let’s talk about data flows Gen 2 where’s that different Tommy give me some bullet points here around where you

29:38 bullet points here around where you think data flows Gen 2 pops out your audio may be going very slow here so something’s wrong on your audio I’ll go over to Seth Seth how’s

29:52 I’ll go over to Seth Seth how’s what do you see are some of the the different talking points for data flows gen gen two yeah well just in what are are some what are some of the the larger big differences between generating going going to Gen 2 and I guess what’s interesting is it’s it it you’re all on your own what’s interesting are you still with me Mike I’m still with you okay is is it it feels like a big platform shift as

30:24 feels like a big platform shift as opposed to an upgrade to a new version of something right and and there’s probably a reason for that because there was but typically like in the V2 you get everything from V1 and you don’t in gen two right so correct I I think in some respects that’s a bit jarring and why we’re having the conversation around like are are there still more applicable reasons to stick with a gen gen one versus gen two yeah so I I know

30:54 gen one versus gen two yeah so I I know I froze and I don’t know if anyone mentioned this but I’m going to mention the number one reason with Gen 2 Data flows where the GameChanger part of this is so some we already talked about this I can push the data somewhere I can actually push my power query to a Lakehouse to destinations to the destination yes which is absolutely a GameChanger so I can actually have storage in my own organization’s data whether it’s in Fabric or could be any

31:24 whether it’s in Fabric or could be any SQL SQL database so the the options they give VI is azure SQL database so if you have an Azure SQL database you can pick things out of fabric push it back into that SQL Server you have a lake house where you can store the data and this is on a table bytable basis so you can have 10 tables in the data flows Gen 2 and only write down three or four of them whatever you think you need to come out of that data flow you also can write data to custo and you can also write it to synapse analytics or SQL DW if I look at this if of of those features the

31:55 look at this if of of those features the one by far I use all the time is Lal lake house is by far the data flows Gen 2 if I’m using it it’s going to go to lak house but I also feel like something happened in the transition between data flows gen one and gen two data flows gen one can be used in regular powerbi you can use it there data flows Gen 2 is now an exclusively only fabric element you can only use data flows Gen 2 inside fabric which I think makes sense because you can write data to the lake house that’s a fabric element as well but now you’re you’re

32:27 element as well but now you’re you’re forced to transition over to that new that new world if you look at the feature comparison now it has been it’s in preview I believe right now incremental refresh is announced for data flows Gen 2 but my argument would be is I really don’t like the data flows Gen 2 incremental refresh it’s not cutting it for me because it only I believe if you read the the documentation I’ll go try and find the article on it and put in the chat window here as well data flows Gen 2 does not allow you to

32:57 flows Gen 2 does not allow you to write write into a Lakehouse with incremental refresh and the incremental refresh in data flows Gen 2 requires that you have a created date of a record and an updated date of a record which in certain Data Systems you’re not going to have the updated record if you don’t have a data system that’s tracking the updates or you can’t trust it I’ve had a lot of organizations it’s there but they don’t trust that the development team is actually accurately updating that record then you can’t you have to load the table entirely every single time and so

33:27 table entirely every single time and so the incremental feature is is not as relevant and I really want that drop and replace mentality that we had with data flows gen one so I I feel we went we had a regression and a pretty large regression at that in the incremental refresh area when we’re talking about data flows Gen 2 and that’s one of my major complaints about gen two is it doesn’t it it’s not as easy no do you feel so if if gen one and I wrote this down or I thought I did right I

33:57 down or I thought I did right I did right in the beginning what was what was the the goal of data flows gen one Tommy said putting powerquery in the cloud who use who uses power query the business all the business users 100% are these two different tools for two different audiences with the same name but different Generations that’s a great point I I want to argue no it’s not it’s

34:31 point I I want to argue no it’s not it’s it’s still designed I think I still think it’s designed for the business business users I think what I’m finding is there’s there’s more what you’re so let me step back for a second data flows Gen 2 is data engineering it’s data engineering for the business user a nice easy to use interface I have very common Transformations that I’m going to be doing it’s click buttons and things happen awesome it gives you a nice list of steps this is great any any normal data in tool gives you a very similar experience now like Talent other

35:01 similar experience now like Talent other data engineering tools they’re going to show youf it’s all diagram based yeah all diagram based and their sequence of steps that you do to produce the data so I really think data flows Gen 2 is still designed for the business however we’re now merging into more of a a data engineering realm and so I think data Factory and data engineering has gotten their hands on a little bit of what’s happening in data flows and so yes it’s still the same experience it looks the same it’s doing a lot of things under the hood and I think there’s it the the

35:31 the hood and I think there’s it the the jump from a business user in gen one to a business user in gen two is actually minimal however there are certain patterns that you used to do that you can no longer do anymore and I think those are that that’s my my biggest gripe is you can’t bring me a new data flows gen two and limit what I would call some of the most impactful features of data flows gen one yeah but isn’t there a tradeoff right like even even from what Tommy’s saying from the standpoint that if one of the major differences between gen one and gen two is I can push my data somewhere I have a

36:03 is I can push my data somewhere I have a lot more control totally about each each table in tier point I can put one in this location one over here Etc so there’s a lot more scalability like like what are are those trade-offs big enough or are you still stuck in the realm of like it’s still inconsistent from the way you would want it to work work I think well let me give you another another reason why I don’t use data flows Gen 2 right it’s it’s not

36:34 data flows Gen 2 right it’s it’s not just because data flows Gen 2 isn’t a good tool it really is a good tool I think what has happened is I’ve also gotten a lot of other tools that are very competitive as well so I’m going to lean on I really like writing code in notebooks notebooks make a lot of sense to me I actually know a lot of SQL I’ve been doing a lot of SQL work over these years so for me writing some SQL code and writing some data engineering inside a notebook is actually a more Rich experience and if I compare the amount of time I spend writing well not time but if if I compare the compute running

37:06 but if if I compare the compute running a data flow Gen 2 job and then comparing it to what a notebook can do from a consumption standpoint notebooks blows Gen 2 out of the water it’s it’s just so much cheaper is that really a valid comparison yeah honestly I don’t think it is because I’m a data engineer like I’m doing more data engineering things and I would agree with you Seth like the business user is going to use data flows Gen 2 because that’s what they’re comfortable with moving forward but for me who knows a little bit more and have done a little bit deeper Dives if I’m starting with fabric

37:36 deeper Dives if I’m starting with fabric and I’ve got an F2 F4 f8 I don’t have a lot of capacity Lan around and if I start running into thresholds of like ah I’m out of capacity I think I’m going to start trying to me as recommending to new users is go learn notebooks it does all the things you can do and more and if you’re trying to integrate data flows Gen 2 with a pipeline I can’t pass parameters into data flow gen right now it can do all things and more without any of the user interface that you can follow along with not yes and no I will say I’ll also

38:07 with not yes and no I will say I’ll also give notebooks a really solid argument notebooks has data Wrangler which is the python version of power query for m so it’s it’s basically it writes functions for you it does all the Transformations you need most your common data Transformations and I would argue the data Wrangler experience is actually a bit more Rich than you get in data flows gen two with a it gives you a better UI to work with so I’m with you Seth I understand I’m I’m making a very big you’re poking it’s a it’s a good point

38:37 you’re poking it’s a it’s a good point it really is it’s it’s a big jump to say use this user interface and then go write a notebook with all code in it like that to me that’s a big leap for business users and I agree with you but I do think once the users in the business start trying to get through that step they find the notebook experience is much it’s much richer I can do much more things with it’s easier to understand the logic and the code that’s being built there and I do I see them with a little bit of resistance at the beginning but longer term they start finding a lot of value in it so I know I

39:09 finding a lot of value in it so I know I know Tommy you keep on to jump in here but I want to lean on one more thing though like you do you think that a user coming in and using data flows Gen 2 would run into the same kind flows Gen 2 would run into the same I’m not going to call them gripes but of I’m not going to call them gripes but things you don’t like or does it require the context of using both platforms good question I think I think if you didn’t know what data flows gen one did you’d be totally happy with data flows Gen 2 I think the fact that I had

39:39 flows Gen 2 I think the fact that I had prior knowledge and had done so much work in data flows gen one and did a lot of things in there sure I was expecting the same level of capabilities when I

39:49 the same level of capabilities when I went to Gen 2 so if you’re a business user you’re using gen1 just be aware there’s some I think feature gaps that don’t exist yet data flows gen two can do everything you need to it might cost you a little bit more in cus but it it it doesn’t do a there’s some there’s just some missing features for me and and to me those that’s enough to say I need to look at other other spaces other tools inside the fabric ecosystem that get the job done for me at lower cost with more flexibility yeah and that’s and that’s

40:20 flexibility yeah and that’s and that’s why naming once again I think is is interesting to me because like this this is the new generation of Gen 2 but it’s not it doesn’t like it doesn’t feel like it derives from and that was my point is like so you’re you’re calling it the same thing but it fundamentally operates in a different way and it has different capabilities than gen one agree so so that’s that’s a little bit of a why I think there’s some friction in even

40:50 think there’s some friction in even maybe making the some of the assumptions that you are right like yes data data flows gen one should do this if you’re calling it data flows gen two it should also do that or have some thing that makes makes my life just as easy as it was in in gen one and and I I don’t know if that’s the the conflict in and of itself rather than saying here’s the here’s a new thing calling it flows nope not that goals nope

41:27 metrics and the those are those are some of the big things that I look at here there’s so I think if Microsoft really wanted to be competitive with data flows Gen 2 right I think that’s a great tool I think there’s just a couple main missing features here I think using pipeline parameters and pushing them to a data flow would be immensely helpful that would also resolve my incremental refresh issue if I needed to do that frankly incremental refresh only writes to SQL databases it doesn’t write to a Lakehouse also a Miss why on Earth lak housee is the thing I use all the time everyone uses lak houses now all

41:57 time everyone uses lak houses now all the things in fabric talk to a lake house the fact that I can’t incremental refresh to a lak house is like absurd to me this is this is a basic feature that should have been there way earlier so that I’m really that feature harps on me data flows gen two is more expensive than data flows gen one or sorry is more expensive than notebooks what it what costs so much more to make a data flows Gen 2 run what’s so expensive around that can we can we do something to make it more efficient it’s supposed to be like the the the new shiny gen flow gen flows data

42:27 new shiny gen flow gen flows data on on the on the neighborhood here let’s let’s bring the cost down let’s make it more efficient haven’t we not learned something from building data flows gen one that could make it faster or more efficient how much do you do you like we’re talking anecdotal do you did anybody do any testing or have you done testing where it’s like you can say hey datta flows Gen 2 cost this versus a notebook Run Okay so percentage I’m I’m throwing a lot of generic numbers out here I’ve seen a couple blocks I’d have to go dig them up but I have seen anywhere between 30% to 50% more than running a notebook okay so same job

42:58 running a notebook okay so same job taking a table lots of Records picking it up doing some transformations to it and putting it back down anywhere between 30 to 50% more cost on the data flows Gen 2 versus just a notebook yeah that has been enough that I’ve actually heard teams and I’ve seen posts about their team their data team is saying look we know it’s not effective for us to use data flows Gen 2 we’re teaching we’re spending the time to teach our entire Team how to do notebooks so we can not do any data flows Gen 2 and only use notebook books but I think I think

43:28 use notebook books but I think I think that is a valid thing like 50 is a little a little H but that’s what that’s what to me user interfaces and other tooling to help you get somewhere faster is designed to do and if it costs a little bit more then cool as long as there’s another option for me to not incur that cost but yes that requires time and effort and that that was my only point in this whole equating notebooks to data flows gen two or anything in real is like there’s a

43:58 anything in real is like there’s a there’s a learning curve and you’re investing time in learning those platforms and making it more much more streamlined because you’re you’re mainly code focused as opposed to more UI focused what I’m saying yep Mike I want to go back to something that you you were talking about about people switching to python notebooks and I don’t think the overlap of both the type of person and more especially their skill is is enough for someone to say

44:29 skill is is enough for someone to say that makes a lot of sense because again if you’re taking notebooks which I completely agree with you by the way it is a wonderful experience actually how fast it’s been how easy to connect grab data and yeah from a cost point of view and speed however a user and especially the business team because if all this time data flows has been meant for someone who is Maybe not developer based well I know how to change Types on

45:01 based well I know how to change Types on a on a date time I can easily remove and add columns if I’m going into this and going I’m going to switch to notebooks because that’s more cost effective and faster but you’re immediately needing someone either going to go through a lot of frustration on wait what’s the difference between a spark data frame and a a panda data frame and why do they behave differently with the data types and I just want to change the data type from the

45:31 change the data type from the such and such where are all these commands if they don’t if they have co-piloted notebooks if they have the f64 that’s a huge help but hopefully they know where to start to find like I don’t even know what data I’m looking at that’s part of the beauty of power query is you connect the data and add a column and I can see a preview of that column immediately that live cash and yes you can do that in notebooks you yes you can do that in notebooks with with head or print

46:02 know with with head or print thing but it’s still like okay I know this is necess and I need to see those values it’s much more difficult with that open canvas that is a notebook to say one you’re going to either get very complicated or messy or two there’s going to be a ton of frustration The Notebook experience is for someone who knows or is going to know notebooks not someone who’s just transer from Power quer I get what you’re saying yeah but data flows is supposed to be

46:34 data flows is supposed to be the I don’t know Jupiter notebooks and I have not learned python yet agree but I can still push data to our lake house and and it does that for a good majority of use cases I agree like it’s it’s it’s solid and it’s there I’m there I’m just my op my opinion here is you now have thrown data flows into now a world of other data engineering tools and now there’s an a Cho an option of choice before we had no choice right data FL gen one just powerbi there was no choice

47:04 gen one just powerbi there was no choice it was Data flows were nothing else because that was the closest thing we could get to lak houses to me the game changer has been not data flows the game changer for me has been we’re now getting semantic models with direct lake or Delta Lake tables in them right so that was the H when Microsoft did that move that opened up the world for a ton of other data engineering tools and so to me that opening of standard of how they store the data and prepare it for the semantic model that rocked my world I was like oh my gosh said we’ve

47:35 I was like oh my gosh said we’ve been talking about this for years I kept saying why do I need a SQL Server stood up just to load data into parbi I just want to be able to write a Delta table and just read it right into my semantic models we two years before they even announced something I was like this just makes sense to me why isn’t this just stitched together and here we are now we have it I’m like wow this and I think that was to me that’s the Lynch pin of like we don’t have to store things in CSV files it’s more efficient it’s arcade tables that shifted the entire ecosystem and allowed Microsoft to bring spark and Spark SQL and Native execution

48:07 spark and Spark SQL and Native execution engines and all these other like so now we have all these other tools as your data Factory these are things that did not previously exist inside the powerbi ecosystem and so they’ve taken all this great Azure stuff and brought it to powerbi and I’m like great now I have at my full disposal anything that’s dat engineering I think you bring up good points Mike but that one but it where my mind goes is if you’re if you’re creating ecosystems that work with third parties especially those

48:38 with third parties especially those that have been out a while then you’re going to be your tooling is going to be compared against them that’s true and maybe that’s where some of the frustration comes from related to like just going from data flows gen one to two is this this the the product should be evolving and keeping the best of what it is not starting over every time you rename it and I’m not saying it’s starting over right and maybe this is the platform shift to to get to a place where you can play with all of these backend systems and you can scale it and

49:09 backend systems and you can scale it and you can build it but you do you do bring up a good point from the standpoint that it it’s there’s a there’s a much bigger ocean that you’re jumping into when these systems can interact with each other and that’s a good thing but it’s also a bad thing I think in some respects because you’re going the product itself can be compared against those other other tools as opposed to just being in the self-contained ecosystem and and you don’t want to be there right like you don’t the more

49:40 be there right like you don’t the more these platforms integrate I think the better off it is for Microsoft in the

49:45 better off it is for Microsoft in the long run because the fabric Vision has something that I haven’t seen in other tools yet it’s just the breadth of everything that Microsoft has been doing behind the scenes for many many years but it’s got the self-service component to it right and and if we talk about you to it right and and if we talk about even mentioned this last time you know even mentioned this last time you know even mentioned this last time some of the learnings I’m doing know some of the learnings I’m doing around data mesh like it it is this idea that you’re bringing more business people into this this framework where you’re able to easily discover the

50:16 you’re able to easily discover the information that you want faster yes and fabric is developing that ecosystem and I I think the criticisms that come along the way are Justified but also will change right we’ll probably have in six months we’ll probably have another data flows versus one two I think you’re right don’t ever don’t ever use gen one again thing but that’s but that’s the evolution of a lot of these conversations I think even if I look back at my 360 previous episodes like if

50:48 back at my 360 previous episodes like if we went back to episode one and just we did it well I nailed episode one though be honest like I I think episode one was like a DI was dialed and like we much to Tommy’s chrin Tommy hated my comment but I think episode one was we are going to be in the cloud it will be only in the cloud and here we are doing more and more with all things Fabric in only in the cloud so power powerbi but you’re right Seth I agree with you if we look back at a lot of our

51:18 with you if we look back at a lot of our episodes we we would probably have we have changed opinions at Le the tool evolving the tool evolves the feature the feature sets improve it’s naming gets corrected yes naming gets correct brought back to what it was [Laughter] originally I I would and let me let me preface a couple things right if I think if the data flows Gen 2 team had done just one or two key feature changes early on in the process I think I would have been 100 even if it cost a little

51:49 have been 100 even if it cost a little bit more to run I think I would have been yep do it just just use data flow Gen 2 I’m happy with it I think the features for me that really made a difference to start exploring other data engineering tools was I can’t so we have a data flow we have a parameter you can make a parameter in a data flow it’s it’s literally called the word parameter the fact that I can’t go to a pipeline and inject my parameter into the data flow before it runs a customer a start end date of

52:19 runs a customer a start end date of something the fact that I can’t programmatically adjust that is just to me a major M and I was like that that was this is why pipelines exist pipelines exist to integrate with all these it sets the the orchestration it sets the pattern of how you load data I use pipelines all the time to mean I use pipelines all the time to move data around it’s like it’s a staple and so the fact that it didn’t integrate that one part so tightly with the actual pipeline piece and I did I just want to call this out Alex Powers did a great blog article in October of 2013 so

52:49 blog article in October of 2013 so it’s it’s been out for a while but Alex Powers it’s not aboutthe cell. com great had a great example of hey you can actually use a SQL analytics endpoint or SQL data warehouse to update a date and then use that date to consume inside the the the parameters of the pipeline so you can you can do an incremental research from a pipeline but if you look at his blog post there’s a lot of steps and it’s recall and it’s not just data flows and a pipeline it’s data flows it’s a warehouse and it’s a pipeline so all these things have to be bolted together to make it work again to

53:21 bolted together to make it work again to me that was a major Miss the incremental refreshing part was another major Miss in my opinion like the fact that I can only write to SQL really limits the ability of me using that and then the the only other one I haven’t talked about yet is Microsoft is really touting the idea of bringing everything to git git integration and I can’t tell you already since I’ve been using it it has saved my butt a couple times with people making changes at things they shouldn’t have been making changes to and I’ve been able to observe what has happened it doesn’t require you

53:51 what has happened it doesn’t require you to be technically knowledgeable about Git it just makes it easier for you to track your changes over time and the fact that data flows Gen 2 does not integrate with the git integration it’s going to be announced so that was one of the big announcements at fabric conference had said by the end of the year 2024 we’re going to have get integration for data flows Gen 2 which is incredibly needed again there’s probably a lot of technical debt that I’m like washing over and like oh it’s so easy just fix it and make it work they’re probably like shut up Michael this is so hard so I’m probably overestimating the the Assumption there

54:22 overestimating the the Assumption there but those three things I think for me really push me away from like I can’t use it yet I’m not recommending it until those gaps get solved and once those gaps are solved I’m really going to be excited about using it again other thoughts any other things we should cover off here we’re getting close on time we should probably do final thoughts on this one I think for me yeah goad do you have a question for me no you did say my name I I did cuz it

54:53 no you did say my name I I did cuz it looked like you were going to say something but I can I can walk over you like I usually do go ahead go ahead I’ll just sit here that’s fine I I I think what’s interesting to me in the whole data flows one data flows 2 gen conversation and pipelines that you’re building are what is what is what is the purpose for them and where right self-service versus Enterprise are you trying to use it for can you use it for the Enterprise is it data loads are are too large and you’re not but the other thing

55:24 large and you’re not but the other thing in here is like there’s so like as I think about an ecosystem we we we haven’t leveraged data flows heavily because we have always built things in the Enterprise space y as as a lot more of the self-service in the organization you self-service in the organization grows I think there definitely know grows I think there definitely opportunities there but the other thing is just the sprawl right like to me it’s it’s I don’t want to be using six different tools or have people just spin up their own things because it’s so much harder to manage so there may been

55:56 much harder to manage so there may been use cases for us to use the data flows platform but we just haven’t gotten there and and to your point Mike there there are those few things like incremental refresh parameter like things I have in ADF I’m not going to give up right like moving in into different platforms but I think that’s what the adoption patterns could and would change when you’re you you seem to be going almost business Le to a tool that could graduate to Enterprise

56:26 tool that could graduate to Enterprise level right but we’re just missing some of the key features in there right now but you would need those in order to adopt a wider audience of usage yes and and I think a lot of your points are are valid jumping into notebooks probably a little Advanced for a lot of the users but at the same time what I I do like that there are many many options even within the Microsoft ecosystem alone and the more time and effort you put into to

56:57 effort you put into to learning what those are and the capabilities and I like I like talk that we’re talking about notebooks a lot because I use nothing else as well in that they’re they become so extremely powerful in Your Arsenal of like data movement manipulation reporting anything that it is it is really wild so do you do you guys com Tomy go ahead it looks you’re going to say something before I say something

57:31 else no no you’re not going to say something say something Tommy if you’re saying something say something so hold on go ahead we’ll just wait here patiently Tommy until you get your headphones figured out all right well Tommy’s not going to talk so I will go ahead and and say something else here so the only other thing I can think of do you guys like the new color for the icon it’s purple versus versus green do why it’s green more money why why why it’s purple versus green I don’t know why I don’t

58:02 versus green I don’t know why I don’t know why there’s two icons the same color it’s a purple icon for data flows gen one and it’s it’s a green icon for data flows Gen 2 and I was like yeah it’s it’s green because it’s fabric it’s they they’ve Fabrica ised it so that’s why it’s the the the green icon versus the purple icon anyhow with that being said I think that’s about a wrap for a data flows gen one gen two comparison I hope you found some of these points valuable I hope we’ve encouraged you to run some experiments around using notebooks maybe that would be relevant for you

58:32 maybe that would be relevant for you as well also don’t worry I think the features are coming so if you’re not using data flows Gen 2 right now and if you’re looking at moving over to fabric things I think hang tight I think the the features that we’re looking for will eventually get there a run has already or a mirror has already committed to having data flow Gen 2 inside get integration by the end of the year so that’s another big win for us and I’m excited to see where the rest of the stuff is going to go so with that we really appreciate your ears for this hour of time hope this was helpful and a good conversation

59:02 this was helpful and a good conversation I hope you’re using this in your organization already and help you evaluate whether or not you want to migrate from data flows gen one over to data flows gen two with that being said Tommy where else can you find the podcast you can find us on Apple Spotify wherever get your podcasts make sure to leave a rating it helps us out a ton you have a question idea or topic that you want us to talk about in a future episode head over to powerb tip podcast leave your name and a great question and finally join us live every Tuesday and Thursday a. m. and join the

59:32 Thursday a. m. and join the conversation all of power tips social media channels excellent and with that thank you all very much and we’ll see you next time [Music]

Thank You

Want to catch us live? Join every Tuesday and Thursday at 7:30 AM Central on YouTube and LinkedIn.

Got a question? Head to powerbi.tips/empodcast and submit your topic ideas.

Listen on Spotify, Apple Podcasts, or wherever you get your podcasts.

Gen 1 vs. Gen2 Dataflows – Ep. 361

News & Announcements

Main Discussion

Looking Forward

Episode Transcript

Thank You

More Posts

Git Best Practices Diff Noise & Naming - Ep. 513

Publish to Web vs. Embedded - Ep. 512

Scaling a Power BI Side Hustle - Ep. 511