Revisiting Dataflows Gen 2 – Ep. 490

Mike and Tommy revisit Dataflows Gen 2 after giving it a hard time for months — and the verdict? It’s actually good now. New parallelization, modern query evaluation, and smarter pricing make it a real option again. Plus, the PBIR format admin setting and creative image visual tricks.

News & Announcements

PBIR Format Admin Setting

Power BI PBIR Format Admin Setting | Under the kover of business intelligence — The Power BI Enhanced Report Format (PBIR) will soon become the default, and that’s a good thing because it significantly makes git integration easier. You can already enable it in the preview features of Power BI…

Mike and Tommy break down the two sides: Desktop (turn on PBIR in preview features per user) and Service (the admin setting handles everything tenant-wide — new files auto-create in PBIR, existing files convert when opened and saved). For organizations with thousands of reports, this is huge — no need to manually reconvert everything.

Mike’s take: wait until GA for risk-averse orgs, but once it’s out of preview, turn it on immediately. The combination of PBIR + workspace Git sync is a massive governance win. Tommy adds that he’s now editing reports directly in the service almost every day — PBIR makes that workflow seamless since edits no longer lock files.

Updated Image Visual

Cool report design tips with the updated Image visual - EXPLORATIONS IN DATA STORYTELLING WITH POWER BI — November 2025 Power BI update brings new settings for the image visual. Previously, selecting the image visual opened up file explorer to drop a PNG or JPEG on the report canvas. To add SVG images we needed to use a…

Mike notes why these URL-based features take so long: Microsoft has to protect against code injection through URLs, not just images. Every new input vector requires security hardening on their end.

Main Discussion

The History: Why They Abandoned It

Tommy’s experience was brutal: a single Dataflow Gen 2 completely tanked his F2 capacity — couldn’t even access the tenant. Even on F4, advanced transformations (especially grouping operations) caused massive slowdowns. The verdict at the time: Dataflows Gen 2 was slower AND more expensive than Gen 1, and notebooks crushed both on cost and performance.

Their recommendation to customers became: build your first dataflow to understand the data, then immediately migrate to notebooks. Tommy credits the original Dataflows Gen 2 failure for forcing him to learn notebooks properly.

What Changed: FabCon Europe Announcements

Microsoft acknowledged the complaints and delivered major improvements announced at FabCon Europe:

New Scale Options (under Options → Scale in the dataflow):

Partitioned Compute — parallelizes query execution across partitions
Modern Query Evaluation Service — substantially faster dataflow runtimes
Concurrency — controls number of concurrent evaluations
Fast Copy — existed before, still available

Smarter Pricing (2-Tier Model):

First 10 minutes per query: 12 CU (25% reduction)
Beyond 10 minutes per query: 1.5 CU (90% reduction)

New Data Destinations:

Lakehouse Files (CSV) — write directly to CSV for Spark/Python processing
Azure Data Lake Storage Gen2 (Preview)
Snowflake (Sneak Peek)
SharePoint (GA)
Database schema support improvements

⚠️ Important: Old Dataflows Gen 2 don’t get these new scale options automatically. You must create a brand new Dataflow Gen 2 to see all four settings (fast copy, partition compute, query evaluation, concurrency). Workaround: export your old dataflow as a template, create a new one, import the template.

The Verdict: Actually Good Now

Both Mike and Tommy report substantial improvements. Tommy’s experience has been “very positive” — a complete reversal from months ago. Mike sees speed and parallelization working well.

Their updated recommendation:

Still follow bronze/silver/gold patterns — first dataflow copies raw data, second transforms
Notebooks still reign supreme for cost optimization and high-volume data
But Dataflows Gen 2 is now a legitimate option for the 60-70th percentile of transformation workloads

The Great Debate: When Do You Graduate to Notebooks?

This sparked a passionate back-and-forth:

Tommy’s position: Dataflows Gen 2 is the managed self-service tool for Fabric. Business teams without data engineering backgrounds should start here — and may never need to leave. The handoff and training advantages of Power Query’s visual interface are significant. Not everyone needs to learn Python.

Mike’s position: At some point, capacity pressure will force the conversation. When you’re on an F4 and need to decide between doubling your spend to F8 or optimizing existing workloads, that’s when notebooks become the answer. The learning curve isn’t as steep as people think — data wrangler gives you a Power Query-like UI that generates Python code, and Copilot in Edge handles the rest.

Where they agree:

Dataflows Gen 2 is no longer a stepping stone you’re forced to abandon — it’s a viable long-term tool for business teams
Notebooks are still the most cost-efficient and performant option for data engineering
The decision point is capacity-driven (Mike) or criticality-driven (Tommy)
Fabric is a Swiss Army knife — use the right tool for the job, and be aware that optimization options exist when you need them
V-Order sorting (a notebook-only optimization) matters for gold-layer tables feeding Direct Lake semantic models

Mike’s “Min-Maxing” Framework

Borrowing from Alex Powers: organizations should be “min-maxing” their Fabric solutions — minimizing cost while maximizing value. If you’re not periodically reviewing capacity metrics and asking “can we do this cheaper?”, you’re leaving money on the table. Data flows may be the first place to look when capacity gets tight.

Looking Forward

Try this on one real project first: choose one idea from the discussion and apply it to a current report or model. Once it’s stable, write down what changed (and what didn’t) so the lesson sticks.

Episode Transcript

Full verbatim transcript — click any timestamp to jump to that moment:

0:00 Heat. Heat. [music] Good morning everyone and welcome back to the Explicit Measures podcast with

0:32 Tommy and Mike. Hello and welcome back to the show. Tommy, how you doing today? Oh dude, I’m doing great. How you doing? Doing well. Just clipping along. We’re just trying to stay warm here up in this snowy northern area that we are in. We’ve been getting tons of snow. December has been extremely cold for us. we’re down to the single digits now this week. Oh yeah, we are at zero degrees. We had negative -2. Dude, just the weather. We’re not even gonna have white snow for Christmas. Do that? Is it all gonna melt before we get there? We at least here in Chicago, we’re going to get 45 degrees on Thursday and rain. Well, first it’s going to rain when it’s like 10, which I don’t know how that

1:05 Happens. Then it’s going to be 45 degrees. Then it’s going to be Christmas. That’ll be interesting. Frustrating. Well, that being said, let’s jump into some of our main topic today pieces. we’re going to be revisiting dataf flows gen 2 this new experience what was announced recently in Microsoft’s fabric conferences was the a lot of acceleration that’s been happening so there’s a new two new things you can use here fast copy has been around for quite a while but there are some other scaling options you can turn onto your dataf flows gen 2 one is allow the use of partitioned compute

1:38 Which should parallelize more of your compute running and then also working on query evaluations so allow the use of the modern query engine which again should also speed things up as well. So those two things are available to you. Actually there’s also another one called concurrency the number of concurrent evaluations is you’re allow it allowing it to run on. So the idea here is making your your dataf flows gen 2 go much faster process much quicker than what has previously been u used in dataf flows gen 2. So that being said that’ll be our our main topic here. We want to give dataf flows gen 2

2:10 A bit more love with these new features. unpack these features a little bit and describe our experience with using them and turning them back on after we abandoned dataf flows gen 2. we didn’t really love it that much. We gave it a lot of heat for a while there. [snorts] All right, that being said, Tommy, you got some news for us? Yeah, so there is a few things from SQL over or SQL Kover which is Cohen Verbbeck and it’s simply talking about the PBR format admin setting. We have mentioned before Rory Romano had a blog article on November 17th that said that

2:42 The P BIR how do we say that now? We have PEX PB pit. Is this Pure? How do I just call it PBIX and then PBIR? I don’t really say the word of it. I feel like you said PEX before. Well, PBIR the power I do say it, but I feel like now with all the different formats now, it seems like better just to call it out. Yeah, the PBIR PowerBI enhanced report format is as Ruy said is going to be the default way reports are going to be created. They’re

3:15 Going to be converted to PBI when edited and save. And what SQL K over is talking about is there’s an admin setting which is really nice that automatically converts and stores reports using the PowerBI enhanced metadata format. So this is something if you enable on PowerBI desktop when you say the PBIP save option which is still there and then you can store it as a semantic model using timbol and there’s another option underneath that to store reports using enhanced metadata format that’s in the desktop for preview

3:48 Feature. That’s an option. Well, in the admin setting, you can basically in a sense over overwrite that where you say, hey, any report that’s using the enhanced metadata format or store reports, we can store it in the PBIR format. so that goes across your tenant, which is crazy. Yeah. So, think of it this way. I I like to unpack this feature in like two ways. There’s this idea of like the PowerBI desktop application is now using this PBIR format already. So that’s something you have to turn on manually in desktop to start saving

4:22 Things as a PBIR. That’s on a per user basis. Every user who runs desktop, they would need to turn this setting on in the preview features to be able to use this new format, which is fine. It works okay. But if you if you imagine your organization, you’ve already got thousands of reports already up inside.com. Oh yeah. But what about all those reports? How are you going to convert them? So there’s not this one action of taking an existing report and just automagically converting it for you. You have to open and edit the report or

4:57 Open in edit mode and then save it again or or close it out as weird. Well, I think this is because the conversion is happening inside the renderer of the PBIR format. So inside desktop when you have like the definition of the file you open the file up it renders the file and then it saves it back down like as parts right whether it’s a PBX original format or it’s new that the new PBIR format it doesn’t really matter which one you’re using desktop is somehow going to have to read everything that was there convert it all and save it into a new format. So

5:31 Because of that viewer being responsible for like the read and the edit translation and then saving back down that’s my impression here is what’s happening here in this admin settings. So the admin settings is basically allowing you as an organization to turn on everyone who’s creating new files in the service will automatically be created format and if someone opens an existing file it will be converted and when it’ll be stored and saved again as the new PBIR format as well. So there’s like

6:04 Two parts to this. There’s like the desktop version service version. The admin settings dealing more with the service version. So Mike, what’s your take on this being now something in the admin setting where this is going to be the default way for the first time in 10 years or plus 10 years that we have a new file extension. we’ve had other file extensions appear before. We’ve had the PBIDS with data source settings. We’ve had a PBIT show up eventually. So this is a this is a pretty substantial change I think to many organizations. I think one

6:36 That will help you manage and make it easier for you to govern the deployment of your reports. So, I think this is a a very worthwhile feature here. For organizations who are a bit more risk adverse, I probably would wait until this as out of preview to really start turning on the admin settings. , I don’t think you’re going to have to worry too much about , there’s this there’s this gap between like, okay, we have a whole bunch of files. Do I need to go back and replace all of them? Do I need to open every single one of my files up, replace

7:08 It, and then put it back down as the new format? I don’t think you’re going to need to do that. I think everything will work just fine without having to do that. , I would recommend once this gets out of preview to seriously evaluate and consider turning your PBIR formats on. I think it’s I think it’s going to be the future of the way things are going to be moving towards PowerBI. Anyways, I’m hoping in the next couple quarters, maybe Q1 of 26, maybe Q2 of 26, maybe this feature goes fullon GA and we don’t have to worry about it anymore. So, we’ll we’ll see what happens there. [clears throat] Microsoft has not given

7:42 Any timelines around this one. I’m just being hopeful and optimistic that we’ll start seeing this thing go GA sooner than later. I think this also so from the prof professional developer Tommy like you and I ones [snorts] who like to actually like manipulate files go use API calls in the service do automation of things I think the PBIR format gives us a lot of extra capabilities there’s a whole bunch more things we can do so I think if if you’re really serious about PowerBI this is a format feature you’re going to want to use a thousand% and I think so too it’s interesting too when we talk about the

8:18 Business users and if this is going to be the default way. But the nice thing though too is if they’re in the service, they don’t really have to worry about this, right? It’s totally seamless to you. You don’t you don’t care about it. You don’t even know it’s there. Exactly. Exactly. So that’s really interesting. I think this will be more interest. So there’s like two very low hanging fruits I see with this one is one is just turn on the PBI format. That seems like a very simple option to set when people are building reports. The second thing here that I think is going to be really interesting, Tommy, is this whole, GitHub or or Azure DevOps

8:52 Synchronizing between the workspace and those work those items. While you don’t need to know all about Git repo development and branching and everything else, all you really need to care about is this workspace is synchronized to something in Git, DevOps or GitHub. I [snorts] think that right there in itself is a major win. and then professionals or or the the the the BI central team or if there’s any issues or conflicts or problems just having that feature turned on I think protects us a lot against major issues

9:25 Or problems or breaking changes that someone may be doing right intriguing yeah so I don’t think that again I don’t think that the CI/CD the the synchronizing workspaces with git is a big feature it it’s not hard to understand right you have items in the workspace they’re either committed or they’re uncommitted Simple, right? Make changes, build things in the service. It’s either on or off. When something goes ary, having the PB format on and being able to see the changes of those files will be very helpful for us as admins or governors or, admins of the

10:00 Workspace, whatever you want, whatever that level of admin is for helping to organize things out. This will allow us to be able to resolve, I think, some of these last changes in wins capabilities. So if there’s two people editing the same file and something gets messed up, at least you have a track record of what what was happening to the file and so you won’t you have the opportunity to not lose work. No. And it’s funny too because we also have the version history in the service, right? So that’s a whole other thing that’s without PBI that you can do. So I think this is one of those things maybe

10:33 From a managed self-service side too. Actually I’m going to take that back. I think this is the seamless experience right where with the PBIR not necessarily for the business users because now to your point I never have to worry about if I need to make a quick change in a report or Mike I’ve been doing quite a bit in the service sometimes on editing the report without going oh well I’m going to have to download this at some point so I like that that’s definitely like a in my mind that is a big win in my mind like just being able to have it a bit easier to like make a quick edit save it and it’s good to go like that that’s a

11:06 Major win for me. [snorts] How often now are you editing the reports in this service as a in a sense a start point rather than going okay let me get the git file let me download open up desktop almost all the time now I’m using it fairly frequently every day I’m editing manipulating changing files directly inside the service so I have no problems with editing files and service now the fact that when I make edits it doesn’t lock the file into the service is a win for me so that that’s one area that I feel like it’s just much better for me in

11:39 That space. I would completely agree with that. And I think the the hardest thing is not all the features in desktop or or in the service are available in desktop, which is frustrating, but we’re we’re getting there. We’re getting there. I do like it. All right, moving on from that article. You have another article out there, Tommy, as well. Carrie, our favorite Danb visual designer is coming out with some interesting ideas here. here’s another article that’s also in the description. Both these articles, if you want to watch them or or read more about them, they’re both in the description of this video. So check

12:12 Out the description, it’ll be there. Yeah. So, this is Carrie Kosco and she’s talking about the November 2025 updates, which allows us to when we select an image that was opened up to drop for PNG or JPEG. , if we wanted to add a SVG, we needed to use a button to show that image. But now in the new update, we can now add an image via upload data or URL which is again Mike I think this definitely falls into our about time or how was this not here before but still it’s a great

12:46 Introduction where we can actually say use the image of a URL we can head over and we can actually edit that image and you can actually use some tools as well to get like hey if you want to have a font for example and make that like or like a text that looks really fancy. We can make that an image as a CSVG SVG or a URL and simply just paste that in as an image which is great. So now we can use in a sense almost like temporary URLs for a lot of different use cases.

13:18 Yeah. I just want to be very clear here what Carrie’s using in this technique is she’s taking images pushing them to SVGs in some web application. So there’s a I think there’s a tool that she’s using. there is an HTML link. It’s cfont img.com. So it’s just like basically a upload your image here. You can set a background. You can have no background transparency and then you just save it as some image and then the image becomes a URL that you can just pass to whatever application that you need. So to be clear here, this is not custom fonts.

13:51 This is not this is not anything else there. This is this is just creating an image, storing it on a website and then accessing that image with a with a URL or or an item there that’s publicly available. Again, the idea or the concept here is this is a publicly available URL. Anyone can get this URL. So, if you have images or things that are secure, you wouldn’t want to store those there on that public website. But it’s a very good distinction. Well, and I think I think the reason why Microsoft doesn’t let you just randomly put URLs into PowerBI desktop, I think they need a lot of protection around what that URL

14:24 Is doing like like there’s this like injection prompt injection or it would be like malicious code injection, right? You can store a lot of code on a just a random URL in addition to just being an image. So Microsoft has to put some protections in place that says look we’re going to let you use this URL image. we got to make sure that you’re not injecting code that’s going to like do bad things to their systems. So the reason I think a lot of these use a URL should exist but don’t is because Microsoft themselves is saying look we don’t want to open

14:58 Ourselves up to more vulnerability and let that code potentially come in there. So I think that’s why these features take longer to build because you got to do a lot more protections on the Microsoft side to accept things like this. But it’s nice you can go use the image item. You can just take the image source and you just enter the URL which seems to be like a pretty easy fix here. So that’s anyways neat feature there. Love it. All right. All right. Those are our two news items. Let’s move over to our main topic here. So I just want to take a note here and just really revisit

15:30 This data flows gen 2. I know Tommy and I had talked a lot about dataf flows gen 2 in the past and we gave it a really hard time. we t Tom Tommy you had done a couple projects where you were loading some data in in loading that data you found that with a couple data flows it basically pushed over your F2 fabric skew fairly quickly. Yeah. and just using F2 and even at one time F4 a single data flow completely slowed down my entire tenant where I couldn’t even at one point get into the tenant because the capacity was

16:05 Loading and the transformations was in a sense yeah they were probably a little more advanced but you want to test the limit of what it can do it wasn’t anything in a sense too crazy and we were following more or less the best practices load load the data in as a data flow store actually in the lakehouse rather than trying to do all the transformations from the source. Yes. But certain things where you’re iterating over especially when using the grouping it just completely slowed it down and dude is it’s it was a complete game or deal breaker

16:37 In terms of actually want being able to use it and it was a a productivity killer for you because you had a then Tom I think what your your mentality was okay this this data flows gen 2 is just too expensive to run. is consuming too many CUS just to get stuff accomplished and you immediately had to like move over to okay let’s let’s not use that anymore let’s immediately go over to notebooks and then things were able to run and you’re able to process the data you need to but you had to use notebooks now so a thousand% our our story has traditionally been if you want to have more efficient data pipelines it was use notebooks not dataf

17:12 Flows gen 2 which I think in in looking backwards on this one mo most of your users that coming to fabric I think are going to be powerbi.com purists that’s what they used that’s what they built right so I if you look at the audience here we’ve got I don’t know how many millions people Microsoft has announced recently but like March of 2025 Microsoft was saying like 30 million monthly active users on top of PowerBI and I’m sure it’s grown since then but that number of users you’re going to see

17:45 The word fabric all over the place there’s going to be there’s going to be people and organizations wanting to explore What is this new fabric thing? Well, the first thing you’re going to do is Oh, cool. I see. Data flows. I know those things. I I know you. Like it’s like it’s like Buddy the elf at at the Christmas like data flows. I know him. Right. Yeah. It’s like I know this. I don’t know Python. What is this warehouse? Like but I can do it. I can do power query. And I think there’s a hesitation to your point Tommy to learn Python or think like that’s too complicated. I just need something that lets me click it and I can get done what I need to build. So, I

18:18 Think a lot of users from powerpay.com come over thinking they’re just going to use data flows gen 2 and it works, but it was pushing their capacities over. It was it was causing them to go to higher levels. And so, when you do some comparisons on those initial runs, you had data flows gen one, you had dataf flows gen 2, and then you had like it was like data flows gen two with CI/CD turned on or something like that. That was basically dataf flows gen 2 normal. So, so when you compare the two and then you compare it against notebooks and you say, “Look, I’m going to copy the same

18:51 Amount of data from the same source and do it three different ways, the dataf flows gen 2 was like the most expensive one.” And it was Yeah. and the slowest. And so you’re like looking at going, why on earth is dataf flows gen two slower than dataf flows gen one? and I I norally don’t want to use notebooks, but if I need to use my capacity and optimize what it’s doing spend-wise, I’m going to have to learn notebooks. And so I I clearly remember telling many customers like, okay, build your one data flow, understand how it works. Once you have your data

19:23 Engineered, let’s let’s rebuild that into a notebook. And I actually had a number of customers migrate off of dataf flows gen 2 directly into notebooks to make it easier for them to to consume things. I I want to thank data flows gen 2 the first iteration because it forced me to focus on notebooks and at the time I really transitioned to everything I’m like all right I am forced self-learning this because data flows gen 2 or was not an option at the time just because even in simple things and trying to optimize it or try to scale back all the things I was doing it just didn’t make

19:56 Sense to do. So now, okay, so that was that was like the way the world was before. Now we’re taking another revisit in this one. So Tommy, let’s let’s revisit now again. So the announcements were made. Microsoft said, “Hey, look, we recognize that data flows Gen 2. We get a lot of heat from a lot of customers that it’s just not efficient.” And so Microsoft let’s said we’re going to rethink the engine that runs data flows. And they’re going to rebuild it basically or not from scratch, but they like rethink it, I guess, is would be the way to to articulate that. And then they came up with a number of these really big

20:28 Improvements. So one of them was being able to leverage the data flow, parallelize the queries and start optimizing it a bit more. So let me come back to your comment here Tommy. I know you’re still using data flows at some degree. What’s been your new impression of the newer ones? And have you turned on the accelerators that have been out there in dataf flows gen 2? Yeah, so we definitely turn it on. Honestly, like I said, we had we had to go back and create or recreate some of those data flows just to true try it again because the other ones, like I said, were just turned off. And again,

21:01 Some of the big updates that happened, I think they really started around Fabcon Europe where they were talking about the smart pricing, the more value where we’re actually talking about allowing dataf flows gen 2 to run significantly faster with that modern query evaluation service which substantially drives faster data flow runtimes and also to paralyze those query runs. , and actually basically what that allows you to do is run queries through partitioning and these really enhance the design time and

21:34 Honestly Mike that with some of the expanded output options because the some of those output options were a little slow. We can only really push to a table SQL database but now I can even push the destination for the output to a fabric lakes house file. I can put it to the lake source gen two snowflake and even shareepoint as well which can become very helpful. So all these different things, what they allowed us to do was really go through and say, what, Jen, they deserve a

22:09 Revisit. And Mike, my experience has been positive. It’s been very positive. Might as well. Yeah, I I have seen substantial improvements on existing data flows gen 2 that we’re running in comparison to other data flows like J like a dataf flow gen one thing. So already I’ve seen good improvements. I like the speed here. The parallelization works well. If you have more complex I still think the pattern exists, Tommy, like I still think I would follow the pattern of if you’re going to use a dataf flow gen two. The first thing you should do is just copy the data.

22:41 Don’t don’t build a bunch of transformations in that first data flow. So almost treat the data flows gen two like you would do a notebook or bronze, silver, gold experiences. I don’t think I’ve changed that mentality. I’m still sticking with it. Right? one data flow to copy the raw data, second data flow to pick up that raw data, transform it, join it, stick it together and then push the data back down to the lakehouse as well. This and this is a good point too where the data even with the improvements does that mean that your whole process should be centered around a data flow gen two. It’s like hey we had the notebook but oh

23:13 These features now we can move everything back to those major transformations. I still wouldn’t I still wouldn’t say okay now all those transformations we’re going to do in dataf flows gen two let’s be honest here Mike and let’s be frank at the end of the day a notebook and using spark python is still going to be more optimal is right now the in data engineering in general one of the most optimal ways to transform manipulate and move your data around and this is not just in fabric Mike this is really in general

23:49 Yeah. Agree or disagree with that? No, I will I will agree and I think I want to I want to bring a parallel between the amount of code you write and the efficiency you get out of like the the the running of the job, right? So, okay, I I agree with her statement. Wholeheartedly agree. There’s like data flows. It’s graphical in nature. You pay a bit of a premium for using that tool because it’s easy. you click things and it just works. Great. That’s that’s super helpful. But then you if you go a little bit

24:21 Further with that and say look instead of using the the data flows to transform data, if you go to notebooks and write your own Python code, build functions, write that and there’s also to Tommy to some degree you can even tune a bit more of the Python experience or the the notebook experience. You can pick what size cluster you want. You can have a big cluster, you can have a smaller cluster. you can decide if you’re going to use a Python notebook or a Spark notebook. Python notebooks are a single machine. Spark notebooks have multiple machines, so they’re going to cost more because you’re spinning up more compute to do

24:55 The job. But Spark has a lot of optimizations. There’s a lot of you it’s very good at reading and writing from the lakehouse and using delta tables. , one thing that I I don’t think I see anywhere, Tommy, and this is one maybe that I’ll I’ll just point out here. When you’re making a dataf flow gen 2, the assumption is when the dataf flow gen 2 is creating tables in the lakehouse, it’s automatically vordering them as it brings the data down because there’s no setting. So, if you think data engineering, right, let’s just conceptually pull back just a

25:28 Moment here. Like if you’re doing bronze, silver, and gold for data engineering things, you want that bronze and silver layer. You want to you want to think about who needs to access the data. Where’s the data going to go? And the reason that this is important, and I’ve done a Santosh from the Microsoft Spark team has done a couple videos with me around optimizing the Spark engine and working with Lakehouse and Delta Tables. There’s there’s some tricks you can do to make things go faster and be more efficient. One of the tricks you can do is you can

26:00 Turn on vorder sort or not in Spark. I don’t see that setting anywhere inside data flows. I think it’s on by default. And so your bronze and silver tables, if you’re not planning on surfacing those tables directly inside PowerBI semantic models, there’s really no reason to have bronze and silver with vorder turned on. When you get to gold, it makes sense to ver especially if you’re going to use direct lake in semantic models because then the tables already formatted,

26:32 Sorted, ready to go, and it’s going to make the PowerBI semantic model run much more efficiently. So the recommendation is v vorder sorting which is a notebook experience. You can turn that on and off when you write tables down to lakehouse. That’s a gold level feature. And so that’s that’s where I look at this and go there’s probably some further optimization flags or switches that I would like to see come out of data flows gen 2 and that might be one of them, right? Give me some more indication around like let me know let what is this table doing when I write it

27:04 Down to lakehouse? Does it have a v order sort or not? So yeah and I think this with the no data flows though Mike again most people who are doing data flows would you agree or disagree with this they don’t know what vorder sort is I would agree okay but that’s but that’s my point though is like if we’re talking like cost optimization right it’s a concept and you could just make the general rule that says look I’m using data flows from bronze and silver yes but you may not know what it is but

27:37 There’s no I can’t I can’t adjust it. It’s not. So, so the whole purpose of like turning it on or off would be like save yourself some money. So, to larger tables, it doesn’t sort things for you. That’s that’s really the only purpose there is how I would see it. All right. So, I think we’re on the same point where notebooks still re supreme more or less in terms of anything you’re going to do. Okay. Yeah. From a from even from a C usage standpoint, like notebooks are still going to be cheaper. Yeah. And honestly, you can do the transformations that you want to do and the speed that you want to do and the cost you’re going to do it. notebooks are going to really at this point re

28:09 Supreme. However, data flows however now showing for the first time not just for minimal based transformations but for I would say not necessarily the high-end but let’s say the 60 70 percentile of transformations you’re going to want to do maybe and that’s maybe a little higher up but that data flows can work and can be more or less efficient. Again saying that not saying the most efficient but pretty efficient. your cost is going to be okay. I think for you and I still and I think for me as an

28:42 Individual where I’m landing is if I’m doing something for myself and or for enterprise company that already has engineering notebooks intact I’m focusing on notebooks. However, Mike, if I’m going to an organization that is just beginning to adopt a fabric and they have PowerBI, but they don’t have a, an extensive data engineering background, I’m probably setting up data flows first because eventually I think about two things, Mike. I think about

29:14 The ease of use for training and the handoff where those are the biggest things because yeah I could create a notebook for them right off the bat and then they have to figure out if something changes updates either I’m just the one managing that but at some point they’re going to want to own that. So data flows then actually become for me when I’m dealing with clients who are new to fabric and without that either snowflake or data bricks background data flows are the starting point for me now ra or or going to be the starting point rather than using the notebooks and that’s a

29:48 Difference again if I’m doing something for my work or my own organization I’m I’m using data flows or notebooks excuse me but for a lot of organizations just starting off. Well, we can now start with the data flows. I think it really matters on like what is your team’s comfort and audience here, right? So, to your point, Tommy, I think a lot of the companies you’re working with or the companies that we work with in general, they start with they have PowerBI and it’s it’s more of the question of how do I start leveraging fabric to make my data systems more efficient, right? That’s

30:21 Usually the conversation we’re walking into. And because of that, I’m like the immediate the immediate question here is well, yeah, we should definitely start with data flows gen 2 because you already understand how the data flow experience works. That’s the easy lowhanging fruit. Now, I also am very mindful of assessing their skills and making sure that we also review that. So, so there is an opportunity for us to have, okay, well, yes, you’re you’re doing the data flows gen two thing. Totally happy with that. That works, but at some point you’re also going to look at it going, well,

30:52 Maybe there’s a couple team members that should be learning Python notebooks, should be learning how to write things in Spark, right? So, I’m not closing the door fully to Spark, but I’m also assessing where the team is at. And, at some point, you have these conversations where a couple data flows have been made, you’re getting some tables loading in, you understand the process. I think from a discoverability standpoint like when you’re building something for the first time it’s very helpful to have the experience of the data flow because you have the steps you have the preview table you’ve got the graphical interface of showing you like

31:24 The transformations the merging the joining of tables I think all of that is very fundamental for when you’re exploring the data for the first time loading it transforming it getting it right once you understand that and it works then I want to say come back and optimize then then we say Okay, we know what it does. It’s much easier to take that experience over to a notebook and say, “Okay, I’m going to load this table. I now know I need to promote headers, remove these rows, remove these columns, add this, group by that, merge this, join that.” Like, you can

31:58 Literally see the steps as they go and now you’re not worrying about like the process and trying to learn Python at the same time. Does that make sense what I’m describing there? Yeah. No, 100%. But I still think you run into the thing even when you get to Python for that relatively new organization. It is it’s a conceptual change too not just in the the language but the that that workflow too with the different cells how do you and it’s not just creating that code in notebook I think for a lot of people too is that test and verify right yes I know you can get the output but the nice thing on no the great thing the great

32:33 Thing about data flows and power query is the really intuitive user interface and after every every step I get those results it’s really quick too if I have a step that I don’t like or that doesn’t work to go back. with with notebooks, it’s like, okay, which cell is that? And you might have to scroll up a bit. You have to spin up the instance. you have to make sure that some of that code isn’t already pushing to pushing to a lakehouse because I’ve made mistakes, Mike, where I had some cells without headers where you’re like, “Okay, let me just

33:07 Run this because in order to get this like, oh no, that actually overwrote a file because notebooks are so in a sense, it’s like an blank page where you can say, what, if I want my second cell to push back to a lakehouse or push as a table, you can write whatever ever you want. , and what? If you don’t understand what you’re looking at from the the code point of view, especially have a lot of lines in a cell, if you’ve never done those best practices, it’s hard to test and verify because you can set it up however you want. You can have a single cell that more or less does everything

33:42 Or you can have a lot of many cells that do some, , one or two steps. the data flows in that power query interface allow almost it’s almost like gutters at a bowling alley or not gutters the guard rails at a bowling a bowling alley sure where it’s sometimes really hard to mess up especially with the steps because I can go through all the transformations and could be completely wrong but not until I publish it is it actually going to get pushed and outputed to that data destination that’s a big difference where even if I create these steps well

34:15 Everything I’m looking at is still a past or preview version. , and so you really have those guardrails up. So I agree with you to an extent, but you’re still dealing with, again, it’s not just the language you have to learn with Python. It’s okay, what’s the, in in a sense the organization of your notebook that you never had to worry about with , power query data flows. Yes. Again, I this I argue this is this comes with the territory, right? So my

34:50 Argument here is like we realize there’s more effort to go to notebooks, but I’d also argue too, I’ve never had anyone need to spend weeks and weeks and weeks learning how to use a notebook for it to get to work. I think the I think the jump the learning curve is very minimal. A day maybe two at the max to go to start from what you were doing in data flows to just using the notebook. It’s not that difficult. Apologies if I didn’t hear this part. Bro, you’re saying for someone with some notebook experience or someone with no Python notebook experience, you’re

35:22 Saying a day or two or a week? Well, I I lean heavily on co-pilot, right? C-pilot in your browser, very straightforward. And again, a lot of these when we’re talking people that are ready to go, again, you have to assess the skills of the people that we’re talking about here, right, Tommy? Like I’m not saying just, blindly say accept this do, right? I’m also I’m also arguing here to some degree is the point here is because of because how you get so much assistance from co-pilot again I use edge a lot with edge comes my co-pilot that’s right on side of edge I

35:55 Can ask any question around Python how would I write a function that does XYZ thing even now in notebooks there’s a lot of helper functions you can click on a table you can say load table to notebook and the code will be just automatically generated for you so there’s a lot of things there that already are ready and out of the [snorts] I’d also argue when I show users the data wrangler experience like how do we teach people code? We like teaching them by using a user interface and then showing them the output of that user interface. So I really like using data wrangler in notebooks. Most of what you can do in

36:29 Power Query exists inside data wrangler as well. So here’s a notebook. Rightclick this table. Here’s how to load it as a data frame. conceptually were there. Once you have the data frame, you can go over to data wrangler and now it feels similar to what you’re doing in power query. You can see a preview table. You can edit things. You can click on all these different transformations. Most of the standard ones that you need are already inside data wrangler. And then what happens is you hit complete that code and that code comes back over to the

37:01 Notebook and it’s all commented has single lines. So I think I think the barrier to entry is not super great. Again, to your point, Tommy, are they going to be comfortable in two days? Probably not. But will they know how to use it? Will they understand like it’s not that far of a jump from what you’re doing in data flows to now notebooks? I I do think it’s it’s an easy stepping stone to where people go. And once I’ve taught them, they love it. The whole notebook experience. You can do a lot of different things. It doesn’t feel extremely difficult. And

37:34 You use notebooks and pipelines, no problem. executing that not an issue. So I I feel like there’s a lot of things that have been that that are easy to use. This is where I think dataf flows gen 2 has been catching up. Right Tommy? One of the other challenges we’ve had is parameters in the pipeline. Yes. Right. So first and foremost our architecture is the same in this regard. Right Tommy? We’re always running like pipelines to orchestrate the movement of data across different things. Call call the original bronze table. Load it down. That’s a copy job activity from the

38:06 Pipeline, right? If you have that data in bronze, how do you manipulate that data using a data flow? I think the pipelines are extremely powerful from orchestration standpoints. Would you agree there, Tommy? Yeah. Oh, oh, completely, Mike. pipelines have been actually part of I feel like you have to have a pipeline now and and even if you’re using the notebook because you can schedule in the notebook but from the to your point the organization side that the UI and also if you’re going to do any freaking handoff or work with a team how do you not use a pipeline

38:40 So okay and comp but you can do data flows there too let me propose something to you because I think I’m gonna still I’m I’m not 100% % with you on the boat in terms of one or two day or week turnaround from someone who’s newer to the notebooks or Python. But I can see what you’re saying. But let me propose something else to you in terms of where the data flows fit in where rather than it being a stepping stone to always do Python where take if we do already even have the data engineers that are using notebooks is data flows gen two now the

39:14 Default if we had managed self-service before with PowerBI is data flows gen two now the equivalent for that where we already have our updated tables clean tables from the enterprise point of view and that’s going to be through our notebooks our pipelines are the the most efficient best way but a lot of teams we know this need to take like let’s say all the marketing touch points all the emails that were sent out well we can have that in a lakehouse and if they do need to do transformations to filter or to create a a specialized table of that for a

39:48 Specialized purpose because again remember this notebook or what fabric I’m not just creating a table to now reuse in PowerBI a marketing team especially not needs to have different tables that they need to feed for audiences and campaigns and other systems which is now a great solution fabric where it’s like well we just need to see everyone who opened the last 30 days and I need to feed this to to Mailchimp or Oracle’s email system whatever that case may be. Well,

40:23 Now they already have the in a sense the cleaned version of those that marketing table. They just use data flows. The data team doesn’t actually have to do much. They just have to give them access to the lakehouse. And that’s where data flows really I think may shine the most. It’s the managed self-service of fabric. Is that what I’m trying to say? Well, yes. it’s a progression. Do do you do I only stay in data flows gen 2 forever? [snorts] Probably not,

40:57 Right? for for the business teams, I don’t see why this is diff I don’t see why notebooks would I I don’t see why I would never push notebooks to business teams. There’s people in business teams building full databases inside Excel. Like to me, that’s that’s the skill set I want for people to write notebooks and build complex data transformation. People understand stuff. anyone in finance, dude, they’re building some incredible things inside Excel probably shouldn’t be built there. So, I’d argue it depends on the teams and the skills of knowledge of that team. , let me give you one other example here that

41:30 I’ve been unpacking with other people. I’ve been working with a company or talking with a company. Their leadership has traditionally been hiring a bunch of .NET developers to build net things inside their systems, right? So, there a lot of engineers right out of school, a lot of developers to build custom applications that the company uses. They’ve been finding finding an immense amount of value in Power Apps like a structured system. So what they’re able to do now is they’re able to build more things faster using Power Apps because now you don’t have to

42:04 Write every little line of code. It just has a lot of structure and infrastructure supported for you. You can just create what you need to create and use it. So one that’s that’s becoming a thing and so they are because of this and I’ll come back to the point here I promise the point is they were hiring extremely technical people but the tooling has changed and now they’re able to hire new students out of college that are able to do power designer level things are still computer science people but now you’re giving

42:36 Them a brand new medium and they’re hungry they’re willing to learn and they’re building these new systems which is helping them produce faster, build more things and what’s happening now is that means the skills of that team are now shifting. So over time they’re going to hire less super technical net developers and they’re going to hire more people that are in this power apps app building space. The same thing is going to happen for the business, right? We were hiring lots of business users that were only Excel. six months, a year from the

43:10 Road down from now, people are going to move on. You’re going to look for new people, right? If you’re aligning on technology that’s like, hey, our team is aligning on Microsoft fabric experiences. I’m not going to handcuff my team and only let them only use dataf flows gen 2. M I would encourage them to like again this goes back to maybe Tommy the business rules here like what’s efficient what’s not efficient right if you’ve got some data flows gen two that are just working they’re coming along no problem at some point you’re going to need to go back as a business and say do we need to

43:42 Optimize anything right so phase I think there’s there’s these project phases of things where phase one is just get it working may not be the most efficient thing but just get it get it running yeah and I feel like to me that’s a lot of like data flows get it running get the data and very quickly get the project moving. So data is appearing to the customers. Phase two of this is starting to optimize. Okay, so we have one monolithic data flow. Maybe that needs a breakdown to two data flows that are a bit smaller. Maybe we’re using data flows gen one. Maybe we need to upgrade to gen two and use some of these optimization techniques under the options. So there’s an opportunity here

44:17 For users to come in and take a second pass on the architecture. And I know Tommy in building things for companies and systems, you do too. The first pass is usually okay but not as efficient as your second pass. Your second pass you get much better requirements on what you really need to build. what was the thing that we did the second system problem? Yes. You remember that? Yeah. So that’s another thing too where I because I would argue with the first pass like yeah 100% a lot of times that happens

44:50 But again that’s thinking for I think to me my my disagreement or the only place I would differ with what you said is everything is right that you said except for when it comes to more of the business teams and I know we’re going to I think forever we’ve been disagreeing about this since fabric came out and I don’t think that’s going to change in terms of the ramp up time or the skill up for people to start using notebooks like there’s still a point here where for where like where data flows actually sits where I don’t think

45:25 It necessarily has to be just for like okay this is a stepping stone data flows is always going to be a stepping stone I think there’s a lot of teams who are data but they want to use fabric so how can we actually get them started but also get things to push to a database There was a lot of things I was doing in the beginning without notebooks that I was finding pretty awesome. , and I think that has to be something to be considered in terms of data flows can be a great I’ll say medium to long-term use case

45:59 For self-service for teams who just need to get started who don’t they’re they may not have an analyst. Maybe they have some technical and I agree with you Mike. If we can push them this part of the maturity, if we can push them to a notebook, that’s the more efficient way, but there’s a lot of teams that they have the that operations person on each of their departments where they handle all the technical things. And there’s just a lot of moving parts and fabric is meant to be too. It’s not just for the people coming from data bricks or you and I. There’s a lot of people who are never going to have that point or have the need to

46:33 What I need to learn Python. and I need to do these trans that’s where my career is going. They’re more in the point where they’re doing a lot with the business but they’re also that u crossover with some of the technology where I don’t want to that would be too I think extreme to say but what I’m trying to say is learning Python is probably not in their best interest in terms of all the things they have to do as well and I’ve seen this I’ve worked in it. worked with departments like why are you guys still doing this? It’s like, well,

47:06 I manage all of our systems and I, I’m not an expert in any of them, but we have to just get them up and running. Eventually, Mike, I would agree with you because that’s where we came in when as the BI team where it’s like, okay, this is now at a point where you’re going to need some professional help and that’s where we came in. But that’s not always something where I’m thinking about, especially if I want to give people access and start getting using getting their data into fabric, too. But but I’m going to argue Tommy here. Why would you ever not So

47:42 There’s going to come a point in time when you’re going to need to bump up to the next larger capacity. Mhm. So when is like I understand you’re saying it’s easy, you can get in, you can use it and and now with the efficiency gains that we get, right, it’s going to be on par with what you see other places, right? So it’s it’s going to be on par with developing notebooks or speeds the compute will be more comparable than what you had previously data flows gen one or the the earlier versions of data flows gen 2.

48:15 Okay, fine. Right. But my argument here, Tommy, though, is when like when do when do you migrate to this next space? Like when do you what’s the dis what is the decision points? Like I’m I’m not saying that you would always do it. I’m not trying to always push people always to notebooks. you may be fine with the data flows you’ve got. It’s okay. But my gut tells me at some point there’s going to be an evaluation, a transition. Someone’s going to say, “We need to reduce costs on something here.”

48:49 Or you bump up against a you’re on an F4 and you need to bump up to an F8. And leadership’s like, “Do we really need to go to that level? Do we really need to get to that height? Is there other architecture decisions we can do to keep things low?” And so to borrow a term from Alex Powers here, I think organizations tend to do this thing called minmax. Have you heard of this term before Tommy? Minaxing. So minmaxing a solution is the idea is I want to spend the minimum amount of money as possible but get the maximum amount of value or compute out of the

49:22 Solution, right? Minmaxing minimizing the cost, maximizing the value. If you’re not doing this as a business, what are you doing? like what value are you adding other than like you’re just giving data out to people? Like there’s always this concept of like you should be minmaxing things. You should be looking for the best optimal way and your first couple projects maybe don’t do a great job of this and that’s fine. But if your value is really high on things and you’re finding immense amount of value from from processes or systems or tables that you’re generating or producing,

49:55 You’re going to want to at least think about look at the capacity monitoring so you’re not running out of things because When you run into those end of capacity experiences, that’s when reports stop working. That’s when people start complaining. That’s when leadership comes back in and says, “Hey, what are we doing here? What can we do to optimize more of this?” I think that’s a conversation you’ll eventually run into. So, let me just pause right there. What are your thoughts on like what is the thought on Tommy? What do you think in le of what you said about not pushing the team to go learn notebooks and Python which I agree not

50:27 Everyone should learn that and I agree with you there but what’s the decision point for you now looking at this when do you do this does it go to Yeah because I and I see what you’re saying because there’s the other side of the coin Mike where you can get too far in the weeds and you’re like, “Okay, we’re going to have to now backtrack and reverse engineer all this all these things.” Yes. I think the biggest thing is the data

51:00 That’s being the data that’s being used and I think the the the amount of usage, right? Because it’s maybe we take the same and I’m talking in it, since so do so many words out loud here. Think about what we would do with managed self-service or self-service at an organization. We would provide these semantic models and have the those workspaces where they had their own playgrounds for teams but we weren’t really doing too much of the monitoring. It’s like hey are they doing the right DAX and the right report design but as soon as we started seeing that there was a threshold of the

51:36 Usage of the reports that were being built by the self-service and the amount of reports that were also being built. That was when the discussion or the that evaluation tended to happen in terms of that upskill and also the leveling up what they already had because like clearly you guys have a need not just for the reports and you have a lot of people who are creating but also utilizing your reports. Well, if this gets any larger, we want to make sure that

52:08 That’s verifiable, it’s trustworthy and it meets our standards. So there was always that threshold that we looked at. We’re like, “Wow, the whole department is looking at these self-service reports and they’re looking at them every week. should that now get in a sense upgraded or in to the next level and from a either promoted or from a cert certified point of view or just managed by BI because that becomes a lot on the plate of people who are just doing self-service because again people in self-service if are not that’s not their only job usually is just to do

52:43 Data. So we always had that discussion point of that inflection point of there’s a lot of data flowing through here. To me I I would look at it the same way at least right now where all right we’ve given people their workspace and that access to some of those lakeouses but they have 18 data flows or that are going on right now and they’re pushing to all these different lakehouses every day. Okay, that’s where a discussion happens like what are you guys doing with this like what where are you pushing that data to? Is it something where that’s critical for the

53:17 Business? That’s to me where a discussion begins to happen where it’s like we now need to talk about getting this into a more optimal fashion. If it’s something that’s just either for a project like like hey we’re doing this campaign for x y and z or this is something for the next six months we’re testing out we’re testing this new product or system out that’s one thing but when it’s critical for that department or for the business to function that’s usually for me when it’s like we probably should talk about the an optimal flow of this.

53:51 Yeah. But again, I think that’s for me the decision point is you’re bumping up against the end of a fabric capacity. That’s when you start reflecting on what are we doing or at least should do it. And I do agree like there’s sometimes where you just need to bump up against to the higher higher capacity. But like going from an F any FUS upgrading, you’re doubling the price of the skew. So you can’t go from an F2 to like an F3. You can’t go from an F4 to an F5 or an F6, right? F4 to an F8. So there’s a pretty big price jump between what you

54:25 Were doing and what the next level is. So unless your business is like really committed to like look, we’re not willing to come back and optimize things. We’re going to push you up to the next highest level, that seems a bit aggressive or a little egregious in my opinion. And so that’s where I’m like it to me the decision point around a lot of these optimization things come around, hey, we’re in this place. We’ve built some things. Dataf flows Gen 2 is working fine. When you’re bumping up that threshold or reports are starting to get a failing, the first thing people do is they go to the admin portal. Oh no,

54:57 What’s happening? Stuff starts getting rejected. You Google some messages, error messages. Okay, we have a problem. You go get the fabric capacity metrics. Oh, look, we’re spiking here. What’s causing the spikes and it could just be like a conflict. It could just be like two processes ran at the same time. No problem. It could be a user building something in analyzing Excel that was very egregious to run against your query. We don’t know like there could be a number of multitude of things that are happening in the fabric capacity that’s causing the degragation of performance but when the if you need to go spend

55:31 Double the price on fabric or go back and look at what you have existing first place I’ll look let’s go see how many data flows you’ve got running like how much cons like I’m going to go look at my capacity and say where is the most usage CU usage coming from what is that experience what can I do to make that less effort on top of my capacity. Does that make sense there? All I’m trying to all I’m trying to point out is like there that’s to me that’s the decision point. That’s when we start looking at optimizing. So yours is pure capacity based then. I think so. I I think it’s why why what other requirement what other criteria

56:05 Would you use? I already said then the amount of data flows being used and if it’s critical to the business like why does that why does that why does that push you to go to to notebooks? because that’s going to be something if it’s critical to the business we want to make sure that that is not going to break that is something that’s probably BI is going to start owning and that if it eventually breaks that’s going to hurt the business the normal data flows again that’s something that more or less the marketing team owns but if it’s Yeah. Yeah. But let me hold on let me unpack

56:37 Your logic here. Yeah. Sure. Sure. If you have a lot of it and you need it to be reliable and now you’re saying now we need to move away from that to notebooks, you’re just telling me that you don’t feel you don’t have confidence in dataf flows gen 2 being reliable and we’ve already so I feel like it’s a counterargument because the the fact that you’ve already built all these data flows for marketing or they have built their own data flows and it’s still working and it’s missionritical has no bearing on whether or not I should optimize it or not. Now, I think if it’s working today, what will cause

57:11 It to not work tomorrow? And I think that’s I think the idea here is if it’s working for them, fine. Let them keep doing it. There’s no there’s I’m not going to push for the business to migrate its stuff over to notebooks just because of reliability issues because I think dataf flow gen 2 is reliable. It’s going to work. All right. So you’re telling me then the only reason then to use data flows is a temporary stepping stone. No, I don’t think it’s a temporary stepping stone. I think it is a a

57:44 Comfort zone of where people are coming from PowerBI. They’re going to build things there. I think as you become more balanced as a data engineer, you’re going to learn new techniques, new tools, and in ever in any case, the more code you write, the the more code that’s just being written against bare metal or bare machines, the more efficient it will be cost to run. that that’s just a correlation. Microsoft has also said like you’re getting a nice pretty UI inside data flows, you’re going to pay a bit of a premium for that. So, like that’s a that’s the recognized experience there.

58:16 So I don’t have any I don’t have any problems with dataf flows gen 2. I think they work really well. I think they’re very reliable. They run like most of the powerbi.com space right now today right is but when you go to fabric the whole reason you go to fabric is you have more options. There’s more variety of things you can do. You can build different patterns. You can go more hypers scale. So I think the data flows gen 2 if it’s working in your business department yeah leave it alone. It’s fine. when you get to like high volumes of data that’s where I feel like other tools make a better do a

58:50 Better job of that more more refined pieces right so the thing I described earlier right optimizing when you get into optimizing phase that’s when I think you really want to look at other tools because you’re trying to keep cost down and still process more data okay so I think where I think we’re going to land here is hopefully we see data flows continue to get optimizations I’m still and and I think the biggest thing is regardless of the fact I think where we land necessarily right here is data flows have potential man and they

59:23 Have potential for the business which is a something we have not been able to say for really since we’ve had gen two yes and because let’s Mike let’s go back to the initial data flows gen one when when we thought about where where did that actually have a part to play dataf flows gen one at least when it came to the PowerBI teams was essential especially when you do think about data they didn’t have access to the data engineers or to data bricks or to the source systems but we

59:55 Needed to make some major updates I know of reports and systems that were mission critical on gen one data flows to get it right because of the amount of transformations needed to do we had to get things for from a time point of view like things were timesensitive how we got data in to be able to combine things and they were mission critical and again yes we didn’t have notebooks we didn’t have lakehouses so I still think there’s a part to play especially for the business for dataf flow gen 2 but I think we know notebooks are going to be the king

1:00:31 They’re going to reign supreme especially when they are dealing with pipelines and the normal workflow of things but if we’re going to have a fabric in a sense platform for the business and just for the business there is a there has to be at least a consideration of where could data flows actually be played in that and and but I agree with you though too at the end of the day you should always be looking especially if you are the data or the admin team to be putting notebooks as the front like where we’re eventually going to

1:01:03 Go again I’m not I’m not going to force you to go straight to notebooks I want you to be comfortable in what tools you’re using all I’m trying to make the point here is when you get into fabric the the expand opens up, you have more opportunities. There’s more variety of tools you can use at your disposal. Notebooks just seems to be a really good option. I really like it. I think it’s a great step on the maturity scale of data engineering getting further down the path. Right? I think data flows gen one is a good starting point. A lot of people are already using it today in existing PowerBI environments. Data flow data flows gen 2 much better performance now. We’re getting much more improved

1:01:37 Experiences with it. And you still get the same nice pretty UI that you had previously. So that I love. So the fact that we now have this experience, it’s it’s a growth thing for data engineering teams, right? That’s that’s a growth pattern. And now I also feel like notebooks and functions and UDFs that are all there available to us in fabric, this gives us another layer of really rich usable tools at our disposal. And so all this is doing is so I think of this as like I think fabric of as like a Swiss Army knife, right? All we’re doing is we’re getting more multi-tools on the

1:02:10 Swiss Army knife and we can pick and choose which tools are best for our solutions. I know this becomes overwhelming for people because now you’re like, well, which what should I use? Do I use a data warehouse? Do I use a lakehouse? Should I just keep using my data flows? Should I be using lakeouses and now notebooks? Like there’s all these different combinations of patterns you can use. And this is I think to Microsoft’s credit here, part of the brilliance of what they’ve done with Fabric is any tool, any user, it works for all of them, right? So I I really think it goes back to build something

1:02:42 That works for your business. Use what you have today, but be aware that there are other optimization patterns you can use to make things and build things more efficiently. That’s something you have at your disposal. So without any extra installing of hardware or software or anything else, it just works. That’s the part that really excites me here is I can start in dataf flows gen one, migrate to two, and then migrate into notebooks with relative ease. And the the data structure itself doesn’t change. I can keep that data structure the same throughout that process. All right, I know we beat this topic up

1:03:15 Quite a lot. I like dataf flows gen 2 and I’m way happier now to start using dataf flows gen 2. One thing I will call out here because there has been some improvements with the performance of dataf flows gen 2. One thing that you will need to be doing here when you build your dataf flows gen 2. So if you want to take advantage of these new performance options that we’re Tommy and I are talking about, you have to rebuild a new dataf flow gen 2 inside the workspace. So there is an option button coin. So, I just want to call this out real quick. On the home ribbon in the options button on the page, there’s

1:03:50 Going to this there’s going to be a setting called scale. So, [snorts] under the data flow, you have the setting called scale. If you have a data flows gen 2 that was made at the very early days of gen Gen two data flows, you’ll see allow for fast copy and that’s about it. So the newest version if you create a brand new data flow today you’ll see all the options fast copy partition compute query evaluation and concurrency. From my awareness there is no way to upgrade an old dataf flow gen 2 to a new one. The way to handle this is go into your old dataf flow gen 2,

1:04:25 Export the template of it, create a brand new dataf flow gen 2 and import the template. So you can keep all your old dataf flow gen 2 stuff around, but you’re going to have an older version that’s was built, like in January, February, March of this year, and now the newer version, which again, I think was announced later this year, Tommy. So like, October and newer, right? Now you can create a new data flow, too. So just be aware of that. If you want these additional improvements or options, you will need to create a brand new dataf flow gen 2 inside fabric to leverage these new

1:04:58 Options. I think it’s worthwhile. I definitely think you want to test it out and let us know in the comments. Actually, if you have data flows that are running, if you if you are updating yourself to use these new data flows, check it out. Let us know. Are you seeing real performance gains? Is it getting more efficient for you? I’d like to know. So, we’re we’re finding good value from it. We’d like to know if you are as well. Okay. All that being said, thank you all so much for listening for this really long episode. Again, we keep going long on all these things. Well, this is one of those we’re like, “Hey, we’re got we’re just going to cut it short today.” We thought it was going to be a short

1:05:30 Episode and here we go talking the full hour all the way through. Anyways, it was the nuance pieces, right? It’s the it’s it’s in the details. It’s in the weeds here. That’s where the conversation really happens. Well, thank you very much. Yes. Thank you very much for listening. We appreciate you all as our listenership. , we we thank you very much for participating and we hope you find this episode very valuable. If you want these episodes as soon as they’re released, make sure you go over to the PowerBI tips YouTube channel and subscribe. When you subscribe there, every episode will be released as we record them. , and so you’ll actually have earlier access to some of the

1:06:01 Episodes. So, we’d highly recommend that if you like these episodes and you want content as soon as they’re released. Tommy, where else can you find the podcast? Dude, you can find us on Apple, Spotify, wherever your podcast. Make sure to subscribe and leave a rating. It helps us out a ton. And honestly, let us know about the data flows, too. Do you have a question, idea, or topic that you want us to talk about in a future episode? Well, head over to powerbi.tips/mpodcast. Leave your name and a great question. And finally, join us live every Tuesday and Thursday, 7:30 a.m. Central, and join us on all of PowerBI.tips social media channels.

1:06:35 Thank you all so much. We appreciate your listenership, and we’ll see you next time. [music] [music] down.

Thank You

Have you revisited Dataflows Gen 2 with the new performance options? We’d love to hear if you’re seeing the same improvements — drop us a line!

Want to catch us live? Join every Tuesday and Thursday at 7:30 AM Central on YouTube and LinkedIn.

Got a question? Head to powerbi.tips/empodcast and submit your topic ideas.

Listen on Spotify, Apple Podcasts, or wherever you get your podcasts.