Looking at the Warehouse Roadmap – Ep. 454
Where is Fabric Data Warehouse headed? Mike and Tommy review the public roadmap, discuss what features are most anticipated, and share their take on how warehouse competes with and complements lakehouse within Fabric.
News & Announcements
-
Mastering Declarative Data Transformations with Materialized Lake Views — Materialized lake views bring declarative transformation patterns to OneLake, enabling efficient incremental processing.
-
Notebook UDF Integration with Native Support for Pandas DataFrames and Series via Apache Arrow — Better Python UDF support in Fabric notebooks with Arrow-based pandas integration.
Main Discussion: The Warehouse Roadmap
Where to Find It
The public roadmap lives at roadmap.fabric.microsoft.com — filtered for the Data Warehouse workload.
Key Roadmap Themes
Mike and Tommy identify the big themes:
- Performance — Continued investment in query performance and concurrency
- T-SQL compatibility — Closing gaps with SQL Server and Azure SQL
- Developer experience — SSMS support, better tooling, familiar workflows
- Integration — Tighter connections with other Fabric workloads
Warehouse vs. Lakehouse
The perennial question: when do you use warehouse vs. lakehouse?
- Warehouse — Best for SQL-first teams, structured data, familiar T-SQL patterns
- Lakehouse — Best for Spark-first teams, unstructured/semi-structured data, notebook workflows
- Both — Many organizations use both; the key is understanding which data lands where
What’s Missing
Mike and Tommy note areas where the warehouse still needs investment:
- Some T-SQL features still not supported
- Performance tuning options are limited compared to dedicated SQL engines
- Cross-database queries could be smoother
- Monitoring and diagnostics need more depth
Looking Forward
Fabric Data Warehouse is maturing rapidly. The public roadmap shows Microsoft’s commitment to making it a first-class SQL analytics experience. For organizations choosing between warehouse and lakehouse, the answer is increasingly “use both for what each does best.”
Episode Transcript
Full verbatim transcript — click any timestamp to jump to that moment:
0:00 Good Morning and welcome back. Oh, I got Brad here already. Brad, hello. Welcome.
0:33 We’re just going to jump in, I guess. Probably at the beginning. Michael’s having trouble with figuring out the internet of things, so we’re going to roll with Tony. Welcome. Yeah, I know. I love it. See that impression? Yeah. Well, if you couldn’t tell by our main title today, we’re talking about data warehouses and Brad will be our expert from Microsoft here talking with us. Brad, thank you for a fast and quick introduction onto this one. So, of course, sounds like we’re seems like you’re hanging out for the news. So, we’re just
1:05 Hey, I’m here. I’m here for it. Excellent. All right, Tommy. , before we get into our main topic, we do a little news items. There’s a lot of articles always coming out. We want to talk about things that are coming out that are interesting. Tommy and I are enjoying right now. , Tommy, kick us off with your first one here. We got some Oh, man. Materialized lake views. Is that what we’re starting with? All right. What’s what’s a materialized lake view? So materialized lake view is something that they talked about already in the past, but it’s really a powerful way for us to streamline and automate our data within the lakehouse architecture. Some
1:39 Of the key steps is you can mirror the SQL data, create the shortcuts, build the material lake view. Why would you do it? Well, it reduces manual coding. It enhances the data quality and the governance across the layers and also can give us a little faster insights into our dashboards. And there’s it makes it a lot more scalable from orchestration and performance and security. So there’s a ton here. This is obviously one of those I’m not just building a lakehouse. We’re we’re jump we’re diving in. We’re not going stuff. We’re we’re diving in. We’re doing our
2:11 Butterfly strokes thing right into the lake. So this if you are using lakeouses on a very high production level, this is something you have to look at. The one thing I’m going to pull out here is one of the features that I I’m looking at this and going, “Ah, I’m glad they did this.” In the documentation that they give, and I’ll I’ll make sure I put the link here in the the chat window as well for those who want to watch and follow along on this one as well. Step number five in this list of items, again, I’m getting very specific here, is in this materialized view, they’re talking about
2:45 Setting up your schedule. What’s what’s the refresh speed on which you need these things to be refreshed? I think this is amazing. This is I’ve often said like you think about like data warehousing or getting data out of the SQL databases. It’s not transactional always in nature and sometimes I’m fine with a load in the middle of the night and that’s all I need for the day. Mirror what I want to mirror drop it into lake. I don’t want to run compute every hour throughout the day and waste a lot of CUS that I may not be using. I need the data when I need it but I don’t necessarily need it every single second. So, I like that part of this. I think
3:18 That’s going to be very helpful there as well. And I’m interested to see more of where we’re going with materialized lake views. I think these are very useful. , I do feel like, and again, since Brad’s here, I feel like there’s a bit of a mind shift happening a little bit. I think a lot of times we were thinking when I come in from the SQL world, I think a lot about staging tables and then getting rid of records I don’t need and being very tuned on the storage side of things. I’m going to be very careful about not over storing things cuz I had a a server on prem and we can’t fill it up too fast, right? It’s it’s a certain
3:50 Size. On the cloud side, it feels like that mindset has shifted slightly. Like I’m I’m favoring materialized things more than I am storage things because it’s faster for me not to have to recmp compute it or it’s cheaper. Basically, it’s cheaper for me to compute it once, save it, and then just use that over and over again. Brad, what are your thoughts on this too? Is is this a mind shift difference that we’re seeing a bit more with cloud? Yeah, like I said there storage is so cheap and it’s just there. It’s commodity in the cloud. So if we if we can make things much faster by just
4:21 Pre-calculating pre-torring then obviously that’s a quick win. I think the thing that’s that’s interesting about the materialized lake views that that I like is the the fact that it’s it’s like u trying to say you can use it between pispark and and SQL spark SQL over there. Yes. So like we can’t use it against TSQL. I don’t think so. At least at this point anyway. I haven’t seen that. but being able to to cross languages like that over on the the Python side being that you can
4:54 Support multiple languages in notebook I think is is super smart and a really great way to do that. So that that helps bring some people together and back to the stuff we’ve talked about over the last two weeks like give everybody their ability to to use their skill set. So I think that’s fun. Awesome. , Michael in the in the comments asks, , is this still on a separate lakehouse? , you can’t have bronze and silver on the second one. , wasn’t quite clear to me blog. I I’m going to actually refer Michael. , this is brand new, newer feature out here, just recently released. I’m also going to put the link to the actual
5:26 Documents for materialized lake views. It’s in preview. So, , what you see here may change slightly to the end of the actual items here, but check out this other link, Michael. , there’s another more detailed version of what’s going on here. And I think the area whenever Tommy and I read documentation from Microsoft, the first thing we should go to is the limitations. See what’s what’s in scope and what’s out of scope. That’s usually a good place to frame your mind around things. So maybe what you’re asking for, Michael, is directly there. , I would also check out the limitations there.
5:58 Maybe what you’re talking about is inside that limitations for now. We’ll see. Likely those are going to be removed later on. Okay. Awesome. First topic. , second topic here, Tommy. You got fabric notebooks UDFs. Okay, let’s be clear. This sounds like user data functions and not userdefined functions. That’s a DAX thing. User data functions is now inside notebooks. Is that what I hear? Yeah. , I love By the way, thanks Microsoft. Oh, we actually have some for Microsoft we can say thank you to. Thank you for having three things called UDFs.
6:31 SQL UDFs user. So, no. So, this is user data functions. The Python thing. I don’t know why you guys confused about this. It’s pretty straightforward. So, yeah, it’s very obvious to me. Very. Yeah, I don’t think we can make this more clear thing. By the way, we have a fourth UDF. No. , user data functions. The one that you created Python and just created functions. Is it going to be with trans analytical? Well, part of the preview is I’m if I’m again
7:06 Could be mistaken here, but this allows you to use UDFs directly in a Python notebook. So, there’s native Panda support. Yes, you can use in notebooks. I can use it with the notebook utilities package and an autocomplete. So, a lot of features here, but Mike, we’ve been talking about user data functions a lot. We’ve talked about notebooks a lot. seems like a no-brainer to me yet. Yep. Whatever reason, I’m having some hesitation on the process here. So, what’s your take? Oh, this is a good question. ,
7:38 It’s nice to have be able to use anything from anywhere from a notebook. , this is one characteristic. I feel like notebooks have been just rocking on it right now. Notebooks can pretty much do anything you want anywhere in fabric, which is again very nice, but notebooks are very heavy codecentric. So, you have to be comfortable running notebooks and Python and things like that. So, I I understand there’s a bit of a trade-off here. I also am thinking about like you could build custom user data functions to do a specific thing. Now Tommy, we’ve talked about this a couple times in the podcast. There’s like a hierarchy of efficiency on moving data, right? At the
8:12 Lowest end, the the least efficient way to move data is data flows gen two. It just sadly it is, but that’s the way that’s the way the CUS roll out. The next most efficient way seems like it’s notebooks. Notebooks seem to be pretty efficient. Copy jobs up there somewhere near, , data flows gen 2. It’s expensive, but it it does make the UI easier. So, if you have the extra CU compute, go ahead and use it. No, no problem. Then I get to notebooks. Notebooks seems a bit more efficient, right? It there’s a little more code, but it runs at less CUS. UDFs seem to be like the most efficient thing
8:44 That I’ve seen so far in fabric. They run quickly and they seem to be cheap compared to everything else that runs. , but you’re writing really a lot of code and you’re doing a lot more work inside the functions themselves. So just my my rough testing. So who’s to say you don’t test something in a fabric notebook, figure out, hey, that might be interesting. Let’s go use it in a UDF and you build the UDF off of the what you’ve learned in the notebook. And then in the notebook when you want to call, you just call the function and maybe that runs a bit better of an API and does some extra things. Or now that the
9:17 UDF is available, you can use that in translitical task flows, analytical task flows. Analytical task flows. So it’s translitical. You can now run those same functions from a report. So now you can do something in a notebook and a report and that uses the same function. I think this is this is good practices around designing code. My only hesitation is it’s now one more place we’re putting code. It’s just another you got to know it’s there. Yeah. And but Mike, I’m going to have to I’m going to push back a little. Is it a good place to put code? Because one, you
9:49 Need this is working with Apache Arrow. It doesn’t say this works without Apache Arrow. Two, this is not really pushing data, especially with user data function, at least with the trans analytical point of view. I will always quote my favorite character from any movie, Ian Malcolm Jurassic Park. He says they didn’t ask themselves just because they could doesn’t mean they should. And I think this is to me I look at this cool but if I already have IDE with VS code if I already have a place I can build a user data function right why
10:22 Do I want this in a notebook unless I am doing exactly what they’re bas the example here if that’s my scenario like wow they wrote that for me for me at least right now I’m hesitating to say why would I add it to additional place well I think about the the example of the I can’t say for all examples, right? But like there’s the abil there’s going to be common things you’re going to want to use multiple places. And I’m also thinking about there’s action based things, right? So UDFs can do anything. It’s just a it’s a function that can run other API calls. You can do IoT stuff, go make it go get data in UDFs also have
10:56 The ability of capturing data and putting it somewhere else. , , I think about the idea of like you need the surface area of a user data function to be as large as possible because then it gives the creativity back to the developers, right? Whatever they want to build, you can you can build the most that you have there. So, I’m pro more options. My only hesitation here is just governance. So, I I think it’s a win. I would probably disagree with you on that one, Tommy, there. I think it’s a good place. Brad, what are your thoughts on this one? I was going to say I guess my my question is what what is it that you’re writing? What data is it that you’re writing that makes you say that the user
11:29 Data function is more efficient than a Python notebook or the these other routes? Like, I agree with your hierarchy there with with the three pieces that you laid out. , but I don’t I don’t necessarily outside of like translitical scenarios, I don’t necessarily see functions as being like the thing that I’m going to go to to move data around. I I see it as like I’ve had customers I’ve talked with and we set up a a function so they could pass some stuff in to do an auditing framework and like that was the most interesting like random use case for it
12:02 That that I’ve seen so far. There’s a lot of stuff like Michael was mentioning in the chat there where folks are using it to say, “Hey, I want to have this common framework to be able to to do XYZ to my my data and that that thing.” , but given the like compute size that that runs behind these, I I wonder if at what point the scale is going to become large enough that it’s going to no longer be as efficient as some of the other methods that you mentioned. H so like for translitical stuff where it’s like small small read small write
12:36 That stuff I think you’re probably right but I would I would hesitate if you get beyond like very small data sets I think but to your point got to test it and see let me let me maybe expand let me expand some of my idea because so I I agree with the challenge there and I think it’s actually a good fair challenge there as well I’m just going to give you some of my experience of what I’ve seen when I run things just in Azure right so if I’m going to Azure and like So when I spin up a like a Python notebook or a Spark notebook, I’m getting like a driver and a worker always like so Spark is probably the one
13:08 That’s maybe a bit more expensive. Now if you just do a pure Python notebook where I have a single node cluster and it’s running one, so maybe that’s a bit more competitive price-wise, but whenever I’ve run Azure functions, they’ve been like dirt cheap. Now again, I’m using an analogy of what I understand from the Azure world and I’m bringing it to fabrics, right? So if user data functions are as your functions brought to fabric, right, anytime I’ve run functions before in the past, I could run a ton of them and like it’s almost cost me nothing to run them. So to your point, Brad, there might be a
13:41 Tipping point here of like look, those functions need to be a certain size of memory, right? If I’m going to load hundreds of millions of rows using a function, I might not be able to run that might be like a good use case for the function is. If I need to run something that’s like doing a regular task over and over again, the functions are efficient, fast, and super cheap to run. , there’s my my opinion again, I don’t know the back end. This is me speculating based on my pricing that I’ve gotten from Azure is functions are like almost next to nothing to for Microsoft to run. Everything else is
14:14 Coming with a bit more , , net. It’s it’s a bigger machine. It’s bigger computers. It’s bigger stuff because the UI of stuff cost things, right? It’s cost more money to be able to spin up a UI and there’s also from Microsoft’s standpoint a lot of development that goes into building the UIs. That’s one of the reasons why I think data flows costs the way it does because there’s a pretty subsizable team behind it to build the UI to make it easy to use that to maintain that. So in order to keep that going like the price you pay for that is a little bit higher. So it that’s just feels like what the
14:46 Pricing is doing. Again, this is me of an outsider looking at Microsoft not being in Microsoft. I don’t know. This is just a speculation in my opinion. Brad, you just got to see a really rare moment. It’s when Mike’s brain opens up and you see the mind of an engineer. Like, I don’t know if you noticed you did this, but you went straight into code in your talking. You went, “Well, if this is this, then this.” Yes. Completely created a bullet statement. It was It’s a glorious moment when that happened. I should have started with a switch. Probably a bit faster if I just did a switch. But I I love it. No, I and I I think
15:19 Mike I I do want to explore it. really honestly to your point like right now it’s like okay but I think a big thing is always okay let’s see the examples let’s test it out let’s see what scenarios this makes sense in and so definitely not opposed at all it’s like okay this is a scenario do I have that one maybe not but I think a lot of these things is like okay let’s see the different use cases let’s see when I can implement this maybe it’s going to be outside of trans analytical but no I think this is happen I think this is something we’ll we following up
15:52 And one thing I think that’s a an unblock here for me that I when I look at this, right? So when we’re in a notebook, , pandas is by far I I’ll argue this all day long. Pandas is not the most efficient way to use and manipulate data. It just isn’t right. If you look at MIM and what he does, he’s using like duct DB and all these other things that are like there’s way more efficients of like shaping data and moving data around. It’s it pandas is not the most efficient step to run. Again, we’re getting very nitpicky here, right? If you’re writing notebooks and you’re using pandas, it’s probably fine. it’s probably okay but just note that there’s other ways of shaping and
16:23 Transforming data and if you use the spark data frames it’s a bit faster more efficient right so one of the things that is interesting here in this article that they call out which is the UDFs now natively support pandas right so that’s the Apache arrow experience that was one of the probably an unblock for you right so you wouldn’t it would be very confusing if I had a spark data frame or a pandas dataf frame and I’m trying to send that over to a user a UDF and it’s like I can’t I don’t know what you’re talking about. It’s just I don’t understand the format. So I think there’s something here also where
16:55 There’s like even when you’re working in a notebook or in data structures, there’s a there’s a framework that you’re building the data around and to just be able to lift that data structure across all platforms is going to be helpful, right? And again, I’m going to I’m going to pick on you again, Brad, here, so sorry you’re here for the news. when I’m talking through notebooks or reading things to and from the warehouse I’m not sending pandas data frames to data warehouse right that that’s not a thing at this point correct question
17:26 So we do have there is in in spark you can write a data frame to the warehouse there is the synapse SQL connector the same thing we had in the synapse stage we brought that over earlier this year. But to your point, it’s not the panda like you can’t send a pandas data frame. It’s got to be a Spark data frame in order to do that. But it’s fairly easy to convert a Pandanda’s data frame into a Spark data frame and vice versa and do that. But , I think because user data functions are are running the
17:59 Python 3.11 environment, their runtime, I think naturally we had to we had to give some way inside of Python purely to to be able to do this thing. So you’re probably not going to be doing that like native data frames like Spark data frames because they don’t exist in there. Exactly. Right. Again, this is like my point of like you’ve got to figure out some and and Brad back to your point around pandas to warehouse. I also agree with you. Like so that makes sense, right? But there’s also this little bit of a a hiccup. This is where I wish the notebook team would do better, right?
18:31 Let’s say I did have a pandas data frame and I’m trying to write it back over to the warehouse. Maybe Microsoft should just be like, “Okay, notebook team, you understand what’s going on here. let’s just let’s just make it let’s just switch it for you automatically and just go from there. I know there’s implications with this, but yeah, having it to go anywhere I want is helpful, but I think in this is this is where like the edge of my Spark knowledge comes into play here. but like pandas like a panda data frame there is a a tossql function that you can use on there to write it to
19:05 SQL. I think that the the issue there is that it doesn’t allow like identity pass through like impersonation or anything like that. You have to pass it a username, password, and a connection string to go along with that. And so I don’t know that pandas natively will allow us to like do SPN or manage workspace identity or something like that to be able to go through. but yeah, interesting idea for for sure on the front. Okay, good. All right, we beat that one to death.
19:37 Now, let’s Tommy, let’s go to the the final topic here. So, final topic. Tommy and I are going to be coming out of our we’re crawling out of the basements. We’re coming out of hibernation. Tommy, get what’s going on? We got an event here in Chicago coming up. My goodness, we should have started with this, but we are having our next Chicago PowerBI Microsoft Fabric user group downtown Chicago. it’s a crash course on fabric. If you are like, what, this is cool. You guys talk about it, but I want to see it in action. I want to demo it. I want to ask you, could you do this and do it in real time
20:10 With us in person? Well, guess what? You can. October 7, Octo October 2nd, we are hosting at the Microsoft office. It’s on Randolph Street downtown. And we’re ch we’re trying something new, Mike, because we’ve been hearing from a lot of people that 6 PM is always a little late. So, let’s get you out of work. We’re doing this at 300 pm. We’re having skipping work. Skipping work. and skip work for work. It will be for work, but we are still have food, of course. So, we want to see you there. If you’re in the Chicago area, anywhere around
20:44 Downtown, take the L. We can see you there. But it’s a crash course and fabric. Ask us questions. Be very interactive. Awesome. More information and details in the meetup that is also in the chat window, also in the description here as well. So, if you want to go join us, make sure you sign up for the meetup. There is a slight I think you still do the slight payment fee just so we know you’re we want to weed out people that are like not going to show up. So there’s just a little like it’s five bucks $3. $3 $3 $3. It’s just it’s the it’s the cheapest thing that Meetup will let you use, I guess. So it’s like a $3 fee just
21:17 To show up. That helps a little bit. It offsets some of the food costs, but food will be provided. Tony, what do you typically do? Is it subs pizza? Like what are we doing? Yeah, we usually do the Jimmy Johns. I’ve tried the deep dish, but what happens? Everyone just falls asleep. So Jimmy Johns, it is Jimmy Johns. Yeah. All right. Awesome. So that’s great. Tommy, we’re at probably at time. Let’s jump into our main topic. What do you say? I think that’s perfect, Mike. I Brad, we are excited to talk about the future and in a really we’ve talked a lot about things of the past where we’re
21:50 At right now, but Microsoft has been doing a lot with the roadmap and especially focused around the data warehouse. So, and I’m looking at this plan here and the link is in the chat as well. It’s also will be in the video description and the podcast. And I’m looking at this and quite a bit of things planned. And let’s take this from an assumption that I’m dumb, which this would be true, but but let’s take a look at this. And I’m looking at all this and I don’t know where to start and I don’t know what I want to be or what I should
22:21 Be excited for. So it I think let’s just kick it off with with the roadmap with Microsoft working on what they’re going to plan from all the conversations we’ve had over the past three episodes. Where has Microsoft’s objective been around what you want to work on and for whom? Yeah. So I think when we first planned on doing this back in in July and then we we had to shift things out a little bit. I was like, man, we’re going to have so much to talk about in the in the road map episode and then we ended up
22:53 Pushing out now we’re at the end of August. Like, man, a ton of the stuff that I was planning on talking about isn’t there anymore. Like, we already released it. , the good news is we got to talk about it over the last three episodes. So, that that’s been good. , so when I look at the road map, I break this down into four categories. , I’ve got a couple things for each each one of these. The first one being architecture, the second one being the the no code folks or citizen developer. Then you got the code first people or maybe not the well and I’ll just tell the last one is pro
23:26 Code. So low code no code or low code no code low code no code low code pro code. There we go. And then I texture that feels like it should be a t-shirt. I haven’t heard that no code. There we go. We’ll I’ll see what I can do. I’ll see we’ll see what we can get for for FabCon next year. We’re back. But so if we start like in the the architecture side of side of things, I think there there’s two things. I think only one of them is actually on the official roadmap page. The other one I know we’ve talked about it. I don’t know if it’s actually on the the road map or not, but we I think we did
23:58 Announce this at at maybe Fabcon. It should be coming here pretty soon. but the first one real quick is is CMK for the data warehouse, the customer manage key. So if you’re somebody who’s in the like the administration world, you’ve probably or worked with SQL Server for any period of time, like transparent data encryption and bring your own key and all that stuff comes up constantly. It’s a compliance thing for a lot of organizations and as you all have probably seen, , there’s been a big security push from the Microsoft side over the the last year or so.
24:33 Things like workspace level private link and all that stuff rolling out now. So this is just one more thing that checks the boxes on that so that if you’re an architect you don’t have to worry about that piece of compliance anymore. So we’ll we’ll be respecting the the customer manage key for the one lake side of things. so that’s not really necessarily like a a warehouse specific feature. it actually technically I think falls on the the platform side but there’s always the like to be honest this is one of the challenges with
25:06 Fabric is we release a feature and then the question is all right well which workloads does it apply to workspace level private link is is a good example of that. It’s like hey oh awesome we got workspace level private link now but which workloads actually use workspace level private link is is in the question. So good news with this one is it’ll be picked up by the warehouse. And I I’m going to jump on your comment here around the CMK for everything. So customer managed keys in general, right? I don’t I don’t want to be the business of managing
25:37 Infrastructure at all anymore. I want all the like me, I’m lazy or efficient, whatever you want to call me. Like I just want all I want Microsoft to own those things. Yeah, thanks. You’re generous. I I just want the I want the check boxes, right? I just want to be like, okay, , security says best practices and security are being led by Microsoft. And honestly, I like the idea of like Microsoft’s security team is going to be bigger and larger than my security team any at any day, right? So, they’re going to pay more to have a security surface area that’s legit. They’re going to keep up on all
26:09 The TLS, , protocols and standards and which ones up to date and which ones are not. So, I’m gonna I would like to think that Microsoft is going to have much deeper pockets in the security area than I will. So, I’ll just follow suit with what they do and as long as Microsoft recommends these are best practices according to standards, roll it out, give me the give me the checkbox and I can go use the feature that I need to. So, someone from security compliance comes in and says, “Hey, Mike, you are you doing private link?” Like, “Yep, it’s here. The checkbox is there. We can add it to our VPN. We can lock people out. Let’s prove
26:42 It. Watch you.” And it does it. So, I like this. I don’t want to be in the business of infrastructure management anymore. that seems like it’s we’re beyond those days at this point. So, I I need to plug in here because to me there’s a bigger story what you’re talking about, Brad. We were we’re talking about this internally right before we got on and this is a conversation Mike and I have had had since fabric came out about if you’re a fabric expert. Well, does that exist? Because all the things we’re talking about have existed in very technical terms before fabric. the data
27:14 Engineering, the security side and it sounds like the feature especially with CMK’s and also that low code no code pro code well it’s alleviating a lot of those processes and a lot of those technical widgets that an expert would have to do in each of those and making it a lot easier for people to turn on to turn off almost to allow that fabric expert to exist because again can you find someone who has 20 years experience in data engineering 20 years exper experience in data science, 20 years experience in semantic models.
27:47 Good point. And and to me, I’m I’m hearing all this and I’m thinking, oh, we’re just alleviating a lot of those, , it under the rock, so to speak, necessary configurations to allow someone to know the not the basics, but the main features. And is that at all a conversation around what you’re doing around the CMK and also that low code, no code, pro code? Yeah, obviously we want that to be as as easy as possible for all of this stuff. Like ideal world we land a feature and it
28:21 Just it gets picked up by everything. And to your point, that’s the only possible way that we’re going to get people to be fabric experts as opposed to fabric security experts or data warehouse or whatever it happens to be. Like there’s just there’s just no way around that because of the size and scope of of what’s inside of fabric. it it is literally like six different products as we were chatting about earlier this morning. , and so I think, , we have to continue and we need you all to continue to push us quite frankly on
28:56 That to to say, “Hey, you guys landed this feature, but there’s all these caveats now that we have to to take into account because at the end of at the end of the day, like you can we can make it we can make it easy and it just applies to everything and it just works. We can make it a little bit more difficult in that you have to go feature by feature and apply it or we can make it really difficult and put it at the platform level and then you have to go figure out on your own which one of those things that it happens to be
29:28 Applying to. We can help with that last one with documentation a little bit, but it’s still like people aren’t going to go into the portal and say, “Oh, well, I’m going to go put my customer manage key in here.” And then, oh, by the way, there’s a link at the bottom here that says go to this doc page to figure out which one of these things it actually applies. Like, we can’t do that stuff and make people successful. I don’t think so. That’s where I say we we need you folks and others in in the community to to continue to push us on those usability things so that we can have true fabric experts out there. So, I’m going to I’m going to put the
30:00 Analogy here to food, Mike, here, but what I’m wait has to talk about food. Let’s , let’s be real, right? I make sense of things. You can purchase the cake, you can buy the cake mix, or you can get the flour and start yourself. And those are the layers that we’re going here. I prefer to buy the cake. I’ll go to Costco. I’ll get my sheet cake. We’re good to go. Just keep just keep spitting out that SQL data warehouse sheet cake. I’m good to go. That’s sheet cake. S ee t like a sheet of cake. Like that. I’m not sweating on the podcast. Is that a sheet cake for one or is that
30:32 A sheet cake for the family or for me? Just for me. Yeah, just a big I just, , freeze part of it and and then I just slowly eat on it all the all the time. Mine is family size, but it’s for me. So that’s how I am with pizza. I just Yeah. What’s the largest pizza? That’s okay. That’s for me. Oh yeah, that’s everybody else eating the size. Man cannot live on bread alone, but Tommy could definitely live on a lot of pizza. Like that would be that’s that’s I can live on bread alone. It’s like the Joey special. Two pizzas. Exactly. Thank you. Somebody thing. All right. I know you’re
31:04 So So the other I want to throw one more just while you’re talking creature comforts here. I do want to throw another one around like things that you’re making it easier for us. And I think this fits your last comment there, Brad, which is like, , we’re trying to remove some barriers, make it there’s less cliffs to fall off. Like when you’re going down these experiences, it’s not a okay, I got close enough to the edge of this surface area and now it I have nothing to help me. One that I’m very excited about, which is refresh the SQL Analytics endpoint using an API. albeit much more prodeveloper level stuff, but the fact that this exists, I’ve had some
31:37 Challenges with I’m doing lakehouse table changes and then the definition of the delta table is changing, but the warehouse wasn’t like picking up on it quick enough. And there’s like some timer and I needed to be a bit more responsive and when I’m used to using notebooks, I write the table and if I read it in the very next cell, it grabs the the latest definition and pulls it in. Now again I don’t know all the technical pieces behind this but the fact that there’s a programmatic trigger that I can just say okay look I’ve added some data now warehouse refresh yourself go get go get the
32:11 Latest definitions of all the delta files either by table or by I’m guessing this is by table or by the whole lakehouse schema or things. This is going to be helpful I think because that reduces some friction for me when I’m going between like notebooks and warehouses. So that that I really like. Yeah. Yeah. Yeah. No, that that one’s really interesting because it it’s one of those things that solves a problem that is unfortunate that we end up having to surface because there is like you said that little bit of delay. It runs on a on a schedule every couple of minutes to be able to refresh the metadata on the SQL Analytics endpoint.
32:45 And this isn’t a perfect solution. Like I’ll be I’ll be honest, the the API is not the ideal way to do this for all scenarios. Like think a mirroring database for instance that has has data changes coming in or has a schema change or something like that and you want the data as soon as it lands in like as quickly as possible basically. we still have to sync the do the the sync over to the analytics endpoint for those tables as well. You’re not going to go run this API like every minute just to ensure that everything’s up to date. So there are some places where it
33:17 Doesn’t necessarily serve the the end user purpose and that we honestly are have to do better and we’re working on some things to make that experience better. But exactly what you were say saying there for the for the Pro code person who says like this solves the the biggest problem that we had with the the endpoint was people that were coming in and doing the exact thing you said hey I’m loading data into bronze do a notebook turn it into silver into tables then I go run my my TSQL procedure to move it into the warehouse or to read it into something and it gives you an error message where I do a delete and or a compaction or optimize or
33:51 Optimize and then it says hey your file doesn’t exist but you’re like well I know my file doesn’t exist this I just ran vacuum and optimize on the table. yes now we can we can do this like the today the way that the the API works is you just run the API and it will give you it’ll go refresh everything. Now if you look at the official docs like this is a a point in time thing. but if you go look at the docs that is the only thing that is supported and technically that is the only thing that is supported. However, there is a a currently undocumented way that you can
34:26 Go refresh individual tables because often times we don’t care about refreshing everything. We care about couple of tables the table. Yes. Especially in the pro code area where I’m like I know I’m operating on this thing and there’s like that one table that’s emergency emergency. We’ve got to fix it or I’m trying to get downstream things rolling. I just need it to be when I’m done loading it. Let’s just make it sure it’s updated. Yeah. So that will be changing that will be coming as a a fully documented piece of functionality at some point in the future.
34:58 So be on the lookout for that. You can use it today. I would just say loosely take a dependency on that if you want to. but if you go look at like semantic link labs, this does exist already in there. So if you need like a code sample and you don’t want to use necessarily semantic link labs, I literally had this conversation with somebody last week. they said we we aren’t supposed to import any other libraries other than these handful of things here and so they were like we we loved that but we can’t use we can’t use semantic link labs and so they they just went out and found how how they were
35:30 Calling it over there and called it a day I think Mark Price Mayor from from the cat team he’s got a a Python notebook out on his GitHub account that’ll that’ll show you how to do that as well. So super easy it’ll be 100% officially supported in the hopefully very near future. Awesome. And that’s that’s the stuff that it’s the really nice like I like this like this is what I expect road maps to be doing is like taking these real world friction points and and then like refining them a bit and giving us some more things here. Okay. Sorry I’m gushing about that feature. We’ll move on. Brad, your next feature to you
36:02 Tommy did you have something to do first? Yeah. No on the other side of the coin Mike there’s the things that you’re like oh this should have been out when warehouse came out. Like to me, I I’m really happy to see this. These are the ones I hate to talk about. Knew we were going to do this. Do you want me to skip it? I can skip I’m just joking. I should have put guard rails around this conversation when I agreed to do it. Oh, you should have joined. We don’t to be honest, Brad, we’ve been working with Tommy for years now just to have, , , guard rails and
36:35 Limits to things, but , Tommy just like it’s Tommy. He jumps the he’s jumping the jumping the rails here. So, like this is great. Me and Mike are very nice people. We don’t curse. We don’t do any of that stuff, but we’re going to we’re going to push the boundaries. , all right. Well, what do you got? Which one is it? The audit logs. Going to be one of a handful. It’s going to be the SQL audit logs. And Oh, really? I that was not where I thought you were going to go with this. Interesting. For me, I think just from my own background, the compliance side and tracking everything has always been such an integral part. And again, I’m seeing this becoming more and more part of
37:10 Team’s process. again, you sold me. Congratulations after four weeks of data warehousing. I’m I’m all in, but this is it’s very hard to do that without having to be able to backtrack, especially in a team environment. So yeah, walk us through this and what’s going on. Yeah. So if you come from SQL Server world or like SQL Server world specifically, there’s there’s always the audit log that you can go out and you can get information from. It’s immutable, so people can’t go out and change it. Like I did a did a project a
37:43 Long time ago. , like when I first joined Microsoft, I worked with one of the departments of a department of elections for one of the states up in the northeast and they were like, “Well, nobody is allowed to have access to this database and the in order to change data because it was all about like a campaign contributions and like you’re not allowed to submit after a certain period of time and all this stuff.” So like we put all these things in place and we’re like but there’s always got to be somebody in there who has the ability to do all this stuff. Like you always have to have a DBA. like at some point you have to trust someone but we all know that trust can only go
38:16 So far and you have to have a way to make sure that you keep that person accountable and that’s where like SQL audit logs end up coming in into play because yes I can lock both of you out but then if I can go update the data we need to be able to audit that information and after a certain period of time like not everything we don’t guarantee that everything is going to get captured inside of like query insights for example or maybe I can schedule it in such a way to get around having that logged by query insights because I know that there’s some operation that doesn’t get logged or something like that. , but the audit logs aren’t one of those things that you
38:48 Can really mess with. They they are there, like I said, they’re immutable. You can’t change them. They capture all this information that happens, all these different events and everything. , and we just store them in one link. So, , the the data is out there. It’s got your three copies of data because it’s backed by ADLS and all that stuff. So, it’s redundant. , and so this is a really great thing like like you’re saying, Tom, just for compliance reasons, , to be able to to make sure that exactly what’s happening in case the FBI comes in and wants to do some investigation. , data sitting out there in one lake like we were saying
39:21 There. So you can just use the same system functions that you’re used to in SQL Server to go read those read those in just in an XEL file I think is the extension that that sits on those , and be able to to go take a look at those. So it’s a nice nice little piece of functionality because like the other side of that if you were in the the SQL server world and even like PowerBI and stuff like that you can write out to log analytics some of the events that that occur. but I don’t know if you guys know this but even log analytics we don’t guarantee 100% of events are always captured with those those
39:53 Methods. so this is a way that we can have a little bit better guarantee that things are going to be captured for for all those events. So nice nice addition there. I think it’s going to be going GA here pretty soon. Excellent. This is this is hitting like the triple 9s, right? So it’s like 99.99% of the like the higher level of confidence. , the log analytics is like 99.98, right? Just just just a touch just a touch under. So yeah, don’t quote me on those numbers. Those are not real numbers. So for those of
40:25 Just making these up just making these up, FYI. All right, Brad, let’s go to the the next feature. What’s another feature that you’re you’re thinking is going to be a big impact for the community here? So this one I I don’t know how big of an impact but I think it’s an interesting conversation about architecture u that we can have here and that’s the configurable retention. This is the one that I say I don’t think is actually on the the actual road map out there but we’ve talked about it a little bit like we’re looking at is it on there? Okay. Yeah. Number number three on the list Q3 2025. The explanation’s not long but it’s there. So
40:56 The explanation’s trash. Let’s let’s give it what it is. It’s like it says configurable retention rate 1 to 30 days. It literally says ability to configure retention rate between 1 to 30. The description is the same as the title. So like it’s okay. Help me out here, Brad. I don’t really know what’s going on. So this is great conversation cuz no one knows. So this really just gives you the ability to configure retention between 1 and 30 days. Oh, I get it now. I want you. [Laughter] So we got the sheetcake. It’s time to
41:28 Party. Brad, that was good. That was very good. The the genesis behind this was actually that like Okay, so when you go out and make a change to a table, let’s say I delete a record, we in order to do time travel and point in time restore and all that stuff, we store that particular version of the record for 30 days. And then we’ve got a garbage collection process that comes through and says if this isn’t referenced like if it’s a if it’s a record you deleted for instance then we’ll go out and we’ll we’ll get rid of
41:59 That file. Obviously if it’s a record that’s still in the table we have to keep it but we get rid of the changes. And so what happens is you you actually get build for that storage for the 30-day time period because we have to keep the data around. , and we get we had these situations where people come to us and say, “Look, I don’t care about that history on these on these tables. , give me the ability or maybe I don’t care about it for that long. Maybe I only really care about it for like four or five days or a week in case something goes wrong or I realize it a
42:31 Couple days later so I can go back and recover the data.” Conversely, we also get people to say, “I want to be able to keep it for years.” And yeah, we were like, well, Delta doesn’t quite perform well if you start ending up with 10 years worth of history in there. Yes, there’s a tipping point. So that’s where we ended up with this configurable retention between that time period. We said, look, we’re going to give you the ability to say, I I don’t need to keep it quite so long. I’m going to be able to optimize my cost a little bit. This is where things get into the architecture discussion a little bit is because the configurable retention is
43:05 Going to be at the warehouse level. And if you think about like loading data in, we we always have these transient tables that that we we’re going to land data into with a copy into and then we’re going to like do a merge into my final dimensional model or something like that. And so it you end up in this interesting paradigm or you have this interesting paradigm where I’ve got some data inside my warehouse that I don’t care about any history. I would drop it all the way down to one day if I could, but then I’ve got all this dimensional model stuff that maybe I do want to keep 30 days worth of history. , and so we
43:38 Have to balance those two out inside of this at least for the time being until we get enough feedback to say no, let me go define this on an individual table level. But then we get into our previous conversations of well do I do we do it at the individual table level and give give the prodev people the ability to do that at the risk of confusing some of the people who are on the low code no code side who maybe wouldn’t know how or know to go look on the individual table level. So then we get into an interesting discussion about like what should the feature actually look like. But again from an architecture
44:10 Perspective that does have some interesting implications because it is at the at the warehouse level. That’s the genesis of where that feature came from. And this is my lack of knowledge here a little bit, but help me understand. Does the warehouse have the concept of like schemas inside the warehouse? Yes. Question. Y. Okay. And the reason I’m I feel like what you’re saying is again I’m thinking architecturally here. I think a good break point for me personally again the warehouse seems a bit heavy-handed in my opinion there. That’s a little bit overly done. But the
44:43 Fact that I can talk between schemas on in a warehouse, right? If I have two schemas, I can easily specify the schema and pull tables from that schema. I feel like a a potential decent compromise here. And I’m not trying to write your feature for you. You you write it how you want it. But I feel like at the schema level, at least that gives me a little bit of boundary to okay, it’s one warehouse. Because if not, I’m I’m actually jumping. Like if let’s say I did want two retentions. I’m just going to spin up two warehouses and then I’m gonna jump between the two warehouses, land it in one, and pick it up in the other. Again, just seems a little bit
45:15 More heavy-handed than I think it should be. But at a schema level, that seems reasonable. And table level might be also on the other side of things a bit too heavy-handed as well. So schema level for me would be like I think a reasonable balance between look, you get some bit more control. We’re not heavy-handing you the whole warehouse. We’re giving you schema level, , retention changes. And then again, this I’m probably asking for something that’s even harder to do than table by table. , but no, I that feels to me.
45:47 Not saying that I have made that exact same suggestion or anything, but that but I’ll just say that is a brilliant compromise. I think that you have come up with there. I love that. That would be perfect. All right. So, we’re gonna we’re gonna edit this clip. We’ll make a short out of this. Make a short out of it. We’re going to promote the snot out of it. Yeah. The first brilliant Michael’s ever gotten usually. That could be a game changer right there. Don’t use that. We know that we don’t we use that word sparingly. Yeah. Right. Sparingly. We we sparingly use game changer. Awesome. Cool. Tommy,
46:20 Your reaction to retention periods. No, I think this would be one of those things we talked about with co-pilot that I would love to see where it’s like, hey, I’m dumb. what would be the best configuration for me especially if the explanation as detailed on the user interface for users getting into this because this is something that it has a lot of benefit and has a mar a significant impact and can can have a significant impact but to a user right getting into this and again that whether it’s the expert they’ll know what to do
46:52 But that’s someone like again dumb people going in and looking at this going I know there’s some value here, but do I do 10 days? Do I do 30 days? , for the size of my data, what would be the recommendation here? I think this is something I I like to see both from a co-pilot point of view, but it’s something that I know is important or we know is important. Yeah, for sure. Brad, I I think I can guess, by the way, the one that you wanted me to say about what should have been there in the
47:24 Warehouse. Can I see if I’m right? So, yeah, go ahead. Is it He’s going off the guard rails again. Like he’s just I’ll stop. I’ll stop. Go ahead. No, no. I’ll I’ll leave it till the end. No, no, no. It’s probably on my list. Go for it. Go for it. Let’s see. Bar Max. Oh, no. I can’t. People should never use Var Max. What are you doing? Well, I was I was going to tease about that one, too, but I wasn’t going to say it’s a good I was like, “Okay, fine. Give it to me.” Like, really? Okay. No, but it does fall in the same family. It’s in that same family of things. What was your next guest? I’m curious
47:57 Now. what? I I’m just going to stop at this point. I’m He’s just throwing darts now. , this is just darts on a board at this point. So, I’m putting myself on mute. Continue to couch. So, VCAR Max, VCAR Max actually go in the same merge. There it is. There. Merge identity. Var max. they all go into the same like SQL parody bucket that we have like and and so I I think those those three things together will help us get to a more like code first person like
48:30 I I’m not somebody who doesn’t know how to write SQL. I’m going to do everything with the interact with my warehouse through data flows maybe where I’ve got the guey and I can do all this stuff but I know enough that I I want to be able to write a merge statement or I need an identity column like I know these terms. , so Varcar Max again is just another one of those like yeah I I tease there are some valid cases where you might want to put VCAR max of course you should use that so sparingly it’s one of those things too it’s a little bit of a guard rail like when you don’t have it it forces people to to think about like well how big is
49:03 The data set that’s actually going to be in there because in SQL Server world and like this that is the world that the the data warehouse engine runs off that stuff does impact the query plans the size of the data that sits inside of or the size of the columns that you’re you’re using there. So that’s where I say like it’s you can go just define varcar max on everything but it’s going to lead to a poor performance and that’s like my my concern as we bring that back in but also again I know I understand that there are obviously use cases where we have to have that
49:35 Stuff in there. , but yeah, it must be difficult as a PM to have people requesting things are going to hurt them, right? Like you you’re like, I understand why you want it, but really can we just have a better conversation? Like, okay, it it’s like , this is a bad analogy, but it’s like telling your kid, “Don’t do that. You’re going to burn yourself.” And they’re like, “Okay, and do it anyways.” I’m like, “I just told you we shouldn’t be doing this.” And yet you you hurt yourself. like ah don’t don’t run around the pool you’re
50:08 Going to fall and then we we have people run around the pool and they fall I’m like see I told you like the performance is not as good if you use var max on everything like I get it just feels like that one of those that was that was like the the fun thing about being a a on the solution architecture side before coming to the product team was like you can you have the ability to stop people and say all right well let me push back on you a little bit more with that like you have that ability over here but at some point like we have to just understand what people need to do and like we don’t have as much influence on what people actually build. So yeah, it’s very much
50:41 A like I understand you need it. I’d rather you not do it, but we have to do some of these things here at the same time. Let’s maybe pick up one more here that before we wrap. I think we’re getting close on time here. and I think maybe Brad, you talked about this one before we went on the phone call here. Were you were you talking about data clustering? Was that one of the ones you was on your list? Talking about data clustering. Yeah, there there’s two things that fall into that like prodev category. Custom SQL pools and data clustering. I I’ll give you the the very quick version of custom SQL pools just because I think I think it’s interesting and then we can talk
51:12 About data clustering here. let’s do So workload management is like one of the biggest feedback items that we get from from folks that have to manage these systems on a day-to-day basis. And like with custom SQL pools, what we do today is you basically get two sets of compute. you get a read set and a right set or select and non- select is what we call them in the the docs I think. And so what custom SQL pools are allowing you to do is say hey we’re not going to have these two distinct sets of compute now you’re going to be able to carve it up into any number of of pools of compute. So that way when when
51:46 Mike connects you get two of my 10 nodes. When Tommy connects he gets three of my 10 nodes. when I connect, I get, , the default, which is the rest of it, but I’ve got to share it with everybody else that’s running inside the organization. , so it allows allows you to configure some of those guardrails a little bit better. , so that you don’t have one person going in and just running something and taking away everything from from the environment there. So that’ll be something really cool that that’s going to be coming here pretty soon. I like that a lot. Data clustering. Yeah, go ahead. Sorry. I was gonna say, Mike, that goes with a lot what we’ve talked about of from that
52:19 Governance. It’s like we want to give the teams access to these products, but we also don’t want them to like we don’t want to give everyone to fall over. Right. Exactly. I think if I want to give marketing, hey, you have a weird asset all your data, but don’t use it because you don’t know what you’re doing. What a what a great solution here. Yeah. If if there’s one if there’s one place where I can say it depends. It feels a lot about like what does the architecture design look like and like how many fabric SKs do you use? like and sometimes you may want to guy buy one big one and throw it across many different workspaces but then in other
52:51 Cases you’re like well maybe that doesn’t make sense maybe I do need to divide my large skew into here’s one for dev test here’s one for prod like I want to make sure there’s there’s physical compute isolation in certain environments and so I feel like this is one of these areas where it seems to fit this story of we have to think about what does the architecture mean for our organization and in which places are we not tolerant to have anything fall over, right? There’s there’s going to be places where we’re going to for sure say we’re not going to let this
53:23 Workspace or these groups of workspaces, this is so important to our business. We can’t let them fall over and therefore that’s going to change slightly the design. We’re not going to buy buy one large fabric skew and share compute. We’re going to then specify it per team. And this also speaks to Tommy, your point around each team’s knowledge set, right? Some teams are great at data data flows gen two. Other teams are great at Python. Other teams are going to write other stuff. They’re going to be in the warehouse, the SQL, right? Each one of those teams would be using a a level a different level of compute. And you may need to say, “Okay,
53:56 Marketing team, here’s your compute to go have fun.” Like, use it until it’s gone because that’s what you’re willing to pay for cuz I can’t have them coming into my my ecosystem and you overusing the data engineering space where I we can’t get you data because now you’re running dumb things in somewhere else in the system. So it’s this is a very it depends question it feels like to me. Yeah. Yeah. So I think what I’m going to do is I’m going to take like the last three minutes of this and I’m going to send it to the PM that that actually is in charge of this feature because I I think there’s there’s a little bit of a
54:30 Like the follow- on request that we always get is exactly what you guys are describing. It’s it’s it’s twofold. But it’s one I need to limit a particular group or a particular user from taking all of my my data warehouse capacity. Like we’ve only got so many compute nodes behind the warehouse specifically on a workspace. I want to be able to limit them. But the other part of it that you guys take it you took it a step further there in the discussion was but now I want to also do that at the capacity level and I want to be able to limit it so that my warehouse itself doesn’t take over my entire capacity and
55:03 And be able to do that. So this this particular piece of functionality is going to solve the first one, but it stops short of fully satisfying that second piece that you have. Now, we can help mitigate that. Like you could say, I’m going to create one SQL pool that’s got only a handful, like a very small number of nodes so that it chips away at my capacity slower, but it’s not going to ever put that guardrail on to say that it can’t chip away at my entire capacity over time. And I think that’s the that’s the piece that that we get from a lot of folks. We talk to them about this and
55:35 They say, “I want to go that one step further. I want to be able to to make sure I protect my capacity all up, not just the experience on my warehouse.” I think that’s you hit the nail on the head on that particular feature because there is this a capacitywide feature that we’re talking about here. And because I don’t have the dials to say okay let me just ratio out here like what things are important or even just even just making the prioritize list right hey my reports and semantic models come first if those
56:07 Things start running out of stuff start slowing down the slow down the loading of everything else behind the scenes again this is the this is the workloads need to like communicate to each other like what I’m doing how am I doing on it like it just just eats it just feeds itself right so Until I get those controls governed at the capacity level, which I don’t know when that’s coming, if it ever come, I’m going to be forced to say, okay, I need to as an architect think strategically around carving out the right pieces at the right levels. And again, that’s what I think maybe that’s temporary for now.
56:40 It’ll probably get smarter and better over time. , that voice of that story will get louder and louder. Eventually, , Microsoft will come back and say, “We’ll have a solution to help you with that.” I I’ll give you just one one push back on that and and that is the the challenge with saying like hey I I want to once I get to a certain point I want to slow down these other workloads here so that I don’t necessarily hit my full capacity. Yeah. People like especially in the the database world there’s you value two things. You value uptime and you value
57:12 Predictability. So if I come in and my query runs in five minutes today and then tomorrow I come in and the system happened to be be busy and it was outside of the realm of my SQL server instance and now my SQL server is all of a sudden running running in 10 minutes instead of five and I don’t like I don’t really know why that’s happening because my focus is the SQL world like that makes DBAs very unhappy and that has downstream user effects as well that come into that. So, I don’t disagree with with what you’re saying necessarily, but I would just maybe say
57:45 We we’ve got to figure out what the right implications of all these things are so that we we don’t end up impacting because ultimately the stuff all has a business impact at the end of the day. Yeah, agree. But also does also putting your capacity to 500% and being locked out of it for for 48 hours or five days or whatever it You can’t have that either. Like it’s a balance. Yeah. Well, I ran a test with data flows and a single data flow in a F2 capacity, something I could do in PowerBI would refresh in 20 minutes killed the
58:17 Capacity and I have no control over that. But I I think from what we’re talking at least in the warehouse point of view and Brad, I’m going to give you a sound bite that you’ll be able to feed to your manager or whoever you need to. Oh boy. So Brad, I completely warning warning. Yeah, completely agree with the previous statements and insert product or feature here that you just said. But honestly, yeah, go ahead and I was going to say that’s I I like this this discussion actually. I think this is very relevant here. especially one
58:50 Thing I’m going to poke on a little bit and again Brad feel free to like bounce this question off and say we’ll talk about this later if you need to because I this might be a bit visionary in nature. Spark has the ability of letting us pull out capacity. There’s a I’m going to call it Spark Autoscale. , that’s really what the feature is, right? I can I can take out the pricing for Spark and move it out of my fabric CU usage and just put the price of the Spark running in Azure. Like, and this is where, , Santosh and the
59:22 Other team, we’ve talked a lot about this on the podcast. It’s like, hey, we want hypers scale. I want I want to have like 30 nodes or 100 nodes. I’m going to do a big data processing thing. I need a lot of compute and it needs to just be outside. Like there’s no bout of fabric you can buy that’s going to get you that compute stuff. So to your point Brad, I really want to double down on that point of there are missionritical consistencies. To your point, I agree. I would be angry too if today it ran for five and tomorrow it ran for 10. I didn’t know why. Now if there’s a message from the system that says, “Look, we’re
59:54 Throttling. We’re pulling back some things. Be aware it’s going to be a bit slower.” Fine. But sometimes SQL SQL and DBAs are used to saying I have purchased a compute. I know how much I have. It’s consistent and I’m expecting regular results and for whatever reason I’m running testing or whatever you expect that to run well every single time. I think there’s a need here to in the same way that Spark offloads its compute back to Azure. I would like to see specifically for SQL databases, maybe even the data warehouse the same concept, right? I’m still using the on
60:26 Demand per core single node multiple node MP architecture but I want to be able to say look I want the auto scale billing for SQL and data warehouses as well just pull it out of fabric and put it back in Azure Microsoft you’re still getting your money right I’m going to I’m going to right this this is a money grab at this point how I see it so you’re still getting your money for like when I use the machine I’m paying you fairly for the extra usage that machine and then we get best of both worlds where I’m not trying to overshoot my fabric capacity for something that’s too
60:58 Big that I don’t need and then I’m also paying for all the actual compute I’m using from the SQL data warehouse or the SQL database and now it’s reliable and whenever we start talking about bringing transactional workloads into fabric this is the one that makes me very cautious right now is because that sucker can’t go down I can’t have a production transactional SQL or data warehouse system running where it’s not doing fast transactions. It’s it’s a it’s a must-have capability. So, I’m just going
61:32 To leave that one there. You may take that sound bite, pan it back to a different PM. What do you think, Brad? Is this is this something like can we even talk about this or is this just like shut up, Michael, we we’ll talk in a year. I will say it is a discussion that is constantly ongoing. , I think Spark set the this Spark team blazed that trail and and pushed through some of the like I’m sure there were plenty of internal politics they had to go through. Like I don’t know what all they had to go to, but like that obviously went against the full capacity model
62:06 That we have. And so, , I would I would fully expect that we will have other workloads start to to piggy back off of that being that it’s already there’s precedents for it in the platform now. So, right that would this is the this is our call to action. Aroon Amir. This is this is the the message to you through our podcast. We don’t we don’t we don’t to be to be frankly honest, we don’t talk daily. Like we we’re friends, but we don’t talk. But this is this is the Arun Amir. Can you please give us the allow Brad and his team to develop this amazing
62:38 Architecture that will be fabric auto fabric SQL autoscale and and let us pull that out of out of the fabric capacity. I think that would be a major win or at least a problem that I see and I’m very cautious about by not putting transactional loads inside fabric at this point. Brad a better answer would have been but Mike haven’t you heard about varchar max being available? That’s That’s always how you push push these off. I’ll I’ll remember that for the next time this comes up. Tommy knows this because I do this with
63:10 Him a lot whenever he’s talking to me on the podcast. We have to redirect a lot. Brilliant ideas. I love it though. I love it though because this is this is the good part about what we do. Most of my conversation in those situations start with a Tommy, let me rephrase your question so I understand it. And I just change the total question to whatever I want to add what I want to answer. I like that. I’m going to have to try that one. I haven’t done the let me let me make sure I understand the question just ask a completely different one yet. So Mike I think I hear you saying when you’re talking about capacity let me rephrase your question you’re asking about Varax right we’re on the same page. Okay good
63:46 I think we’re good. I think we’re good Oh my goodness. I don’t think I’ve laughed this hard on an episode in quite a while. And who thought it would have been on data warehousing no less. Yeah. , as long as it’s not laughing at data warehouses expense, that’s all that matters. No, no, we’re good. I I think this is just us being goofy people and data big data nerds here as well. So, , awesome. Love that. , and then with that, did you want to touch , I did want to let actually have an opportunity. You said data clustering one very briefly thing, but Brad, did you want to make a quick note on there before we we call it everything?
64:17 I’ll give you just the the the very quick elevator ping. So, clustering is one of those those cool things like we we talked a bunch over the last couple episodes, which by the way, I really appreciate you guys having me on these couple episodes here. This has been a lot of fun. , so we’ll have to come back and do this again in the future, but I’ll let you guys get back to your your normally scheduled content from here. but so in we we talked before about how things like data distribution are things that we want to abstract away from people but we we find these small needs
64:50 Here and there to to where we need to give people a bit more control over what’s going on in the system. So what what data clustering is going to allow us to do is say we’re still not going to make you define a distribution key or anything like that on the table whenever you create it. We want to continue to make make sure that that is a very accessible functionality for everybody to just create tables and be successful. But when you have those big data sets or you have these specific workload patterns where you’re doing like these point selects on a table or maybe you’re looking at a specific
65:21 Range of of data that that thing, we want to be able to not have to scan billions of records in order to find that. Like so we want to be able to say partition elimination if you will. if you go back to like SQL Server days, if you’re familiar with that term there. , and so what you’re going to do here is essentially define a distribution key, if you will, or a clustering key as it were in this case. And you can give it a handful of columns. I think we can give it up to seven columns or something like that. , and it’s going to then through the magic of some algorithms,
65:53 It’s going to colllocate as much data as it can on disk into files that are near each other. That way when I do those point lookups, I don’t have to go scan this entire range. I only have to go scan this little range over here. So, it’s going to help customers from a couple perspectives. One, , it’ll help with hopefully performance on some of those queries there, but also the cost of those queries because I’m now having to read less data and all that stuff. There’s going to be an ingestion cost because we have to, , move the data around on the back end and re rejigger the files and all that
66:26 Stuff to make sure that the data is colllocated. But once you get to that point, all your read operations downstream are going to all benefit from that. So that’s really what it’s it’s built for. It’s a a readtime optimization for some specific query patterns. And we’ve seen some really good success with with some customers already using that. We see it I’ve seen a couple of customers use it for like multi-tenant environments where they say, “Look, I’ve got five or six different clients worth of data inside of the same table, but I always look at a particular client at a time to surface it back to them.” So
66:58 Cluster on the client key and then all my queries are going to be able to scan just a subset of the data, save me some money downstream and hopefully get some better performance at the end of that. So this I super enjoy like again I like this idea of like you Microsoft is so smart. There’s there’s geniuses all about like seriously like who’s going to spend more money on the people who know more about SQL than Microsoft will? That’s this is their bread and butter. They’re going to have the people who understand all the details here and I really love the fact that you’re just like taking the step ahead and say look we can pruning partitioning like the best practices just
67:31 Make it easy for me like just I don’t I don’t have to figure it out you’re just going to automatically figure out okay here’s giving you some more control even providing suggestions with like co-pilot to things like hey we’re going to do a thing here and click this okay button and we’ll we’ll help you get it figured like you guys should handle it I like this this is this makes a ton of sense to me that’s what I was like I think of a world in the future where you just run queries a bunch and then we come in and say, “Hey, this this table looks like you do have a pretty common query pattern on it. Maybe you want to do some data clustering on here and and improve that.” Like just like you’re doing in
68:02 SQL DB in SQL DB and fabric like hey you run these queries we’re going to go ahead and create this index for you. Maybe we’ll start moving the data around in the back end automatically. So I I like that a lot and that just again makes me happy as a as a user just again I think this is to your point this is bringing the barrier that you had to be like pro code all the time and now we’re bringing that that barrier down a little bit lower to people that technical but like they don’t really want to stay in the world and I don’t see any so just another one I don’t see any LLM or MCPs showing up on the road map yet but I I have to imagine
68:35 There’s at least internal conversations already with where does an MCP fit in this place? What does that look like here for all these other things? There’s It seems like there’s a whole new world that’s opening up to us in general that will be interesting to see how it all unfolds. All right. I I have some thing real quick. This is an honor data warehouses as you guys were talking about an endless stream and all the things coming. Brad, I’m going to dedicate this to you because honestly we appreciate you having on. thanks ChatGpt2. So where this is coming going with me and you’ll see a
69:10 World built on data foundation. Rows and streams endless dreams inside the fabric creation. We’ll begin with a spin on a query without limitation. Oh, that’s good. Find blows your mind in a warehouse of imagination. So I figured we’re doing a lot of things. We’re opening up this world. It’s a bit of pretty cool with the warehouse. We’ll edit that part out. But this is good. It’s okay. Tommy, don’t quit your day job. I’m muting myself for the next like three days, so I’m out. I’m out. Brad, thank you so much for your time
69:42 Today. This was absolutely super fun. , I’ve laughed so hard through these last couple episodes. It’s been en enjoyable. Thank you for picking up on our really weird, dry, techucky humor and just jumping in like the team. I’m like I said, I really appreciate you guys having me on the these last few episodes. It It’s been a lot of fun. I really enjoyed coming and talking about this. And if we can make one more warehouse fan out there, then it’s all worth it. We’ve done it. We’ve done it. I think I think you’ve actually made a couple. I would say more than just one. I think we’ve you you’ve turned our eyes more towards this. And I’m going to be
70:13 Much more pro warehouse in the the future here. Thank you all so much. We appreciate you watching the podcast. I hope you were laughing along. This is a bit longer of an episode, so thank you for hanging in there with us. for those of you who are on your bike or running, good job. You’re making it to the end here. push a little bit harder till the end. You got you’ve gone. We’re at the end of the episode. so we do know a lot of people exercise because we’ve gotten feedback on it. You’re like, “This is great. I’m so bored when I walk and you guys fill the fill the time.” So that’s that’s great. we really appreciate everyone in the community here. If you like this episode, please comment down below. It
70:46 Helps the algorithm out. Just let us know what part of the the episode that you liked. Are there any features that you see on the road map? The links are in the description. Please go check them out. Let us know with your comments. What do you think is going to be the most impactful coming out here in the near future? Brad will will make an algorithm and he’ll have a data warehouse pulling all the comments in so he can see them and feed them right to the PMs directly. We know this will be a thing. we can do it. it’s a thing. Just kidding. That’s totally a joke. and then finally here please like and subscribe to to the channel. We really appreciate your your viewership
71:17 Here. And also, by the way, we’re announcing something new over these last couple episodes. We have a membership tier. If you would like to become a member, you can down in the description below. And if you are annoyed with ads on top of your videos, you can go get our videos. We will supply this video to you as a member. There’s no ads in it at all. You can just listen to it clean through on YouTube. and you can also listen to it on your other platforms as well. Tommy, where else can you find the podcast? You can find us on Apple, Spotify, or wherever you get your podcast. Make sure to subscribe and leave a rating. It
71:49 Really helps us out a ton. Share with someone. Put it on your LinkedIn. We do this for free. Do you have a question idea or topic that you want us to talk about on a future episode? Head over to powerbi.tips tipsodcast. Leave your name and a great question. And finally, join us live every Tuesday and Thursday, 7:30 a.m. Central and join the conversation on all of PowerBI. Tips social media channels. Yeah, you Brad, you you flustered Tommy. We’ve done We only done this for Yeah, we’ve landed hard, Tommy. Like, there we go. It’s just bumpy ride on the
72:21 Way in here. , we’ve only done this for like 450 episodes. We still can’t quite get it. It’s just It is what it I’m out. It happens. I’m good. I need a nap. Long weekend. We’ll talk We’ll talk to you later. Thank you, Brad. Thank you, everyone. Appreciate you so much.
Thank You
Want to catch us live? Join every Tuesday and Thursday at 7:30 AM Central on YouTube and LinkedIn.
Got a question? Head to powerbi.tips/empodcast and submit your topic ideas.
Listen on Spotify, Apple Podcasts, or wherever you get your podcasts.
