Fabric Decision Guide – Ep. 224
Choosing between a Lakehouse and a Warehouse in Microsoft Fabric isn’t a religious war—it’s a workload decision. The right answer depends on how people will query the data, what level of SQL compatibility you need, and how your team will manage change over time.
In Ep. 224, Mike, Tommy, and Seth use Microsoft’s Fabric Decision Guide as the backbone for a practical conversation: how to pick a default pattern, where teams get tripped up in the early days, and how to keep architecture decisions from becoming a weekly debate.
News & Announcements
- Fabric Decision Guide (Warehouse vs. Lakehouse) — A concise way to align on which Fabric experience to start with (and what tradeoffs you’re accepting).
- Power BI Theme Generator — Fresh icons and theme uploads for keeping report styling consistent across teams and projects.
- Explicit Measures Podcast — Subscribe and browse the full episode backlog.
Main Discussion
Topic: Fabric decision-making (lakehouse vs. warehouse)
The Decision Guide is useful because it forces you to answer the questions teams usually dodge: are you optimizing for SQL-first consumption, or for open exploration and data engineering workflows? Once you pick the default, you can design guardrails (naming, ownership, and promotion paths) that make the platform feel predictable.
A big theme in this episode is consistency. It’s okay if your org prefers “warehouse-first” or “lakehouse-first”—what hurts is when every project reinvents the rulebook, producing a pile of one-off implementations that nobody can operate reliably.
Key takeaways:
- Start by choosing a default: “warehouse-first” for SQL-first analysis, or “lakehouse-first” for engineering + exploration—then standardize around it.
- Treat ‘decision guide’ outputs as guardrails, not gospel—validate against your team’s actual tools (T-SQL, notebooks, dataflows, semantic models).
- Separate raw vs. curated data early (even if it’s just folder/layer conventions) so experimentation doesn’t leak into production consumption.
- Document the gold sources: which Fabric assets are authoritative, and how downstream semantic models should reference them.
- Align on governance basics: ownership, naming, security groups, and a cleanup cadence that prevents sprawl.
- Optimize for operability: the best architecture is the one your org can run repeatedly without heroics.
- Expect evolution—pick a pattern you can refine over time instead of freezing teams while you wait for the platform to ‘settle.’
Looking Forward
Pick your default Fabric pattern (warehouse-first or lakehouse-first), write the rules down, and apply them to the next project so teams can move fast without architectural whiplash.
Episode Transcript
0:31 good morning everyone welcome back to the explicit measures podcast with Tommy the one the only Seth and Mike in the morning and only Mike good morning for those of you who are watching live on YouTube Michael has got a new hat on it is a good day today oh yeah there has been an announcement from Microsoft there’s a the Microsoft I guess it was the June update just came out apparently is that right Tommy June yeah so we talked about
1:03 right Tommy June yeah so we talked about a little on Tuesday where it was out in the morning but we didn’t know all the features the blog wasn’t out yet it was just like desktop was released the blog was incoming and it wasn’t there yet exactly and now we have the entire blog post with all the future updates it’s pretty amazing so thankfully there’s more on object updates because that’s going to be forever fixing an issue and fixing getting that a little more streamlined but there is a brand new visual this month month yay there is brand new official so there
1:36 yay there is brand new official so there is a new card visual which frankly this one’s been long coming I think this is a great idea it allows you to make multiple cards all at once if this is if this if this is just a a taste of what’s to come yes I like it I like it I like it I like better the the nice thing that it outlines is how many many how many objects and things would you’d have to build yes in order to make it
2:06 have to build yes in order to make it happen and why why they’re making the change and then some of the the inbuilt properties and features that just Auto work so yeah I already have some customers I’m thinking of that today like right now I’m like this could replace like tons of things or objects on a page it all becomes one single and that’s the that’s the point it’s now one visual it renders one visual it gets all the data at once renders division it completely is one object doesn’t fire a bunch of queries so I think a great idea I like the performance Improvement that we’re going
2:36 performance Improvement that we’re going to get from this so the one I’m most excited about and the reason why I’m wearing this hat that you can see online or not online we now have a pbip a pbip RBI hat RBI hat and for the first time ever usually I wear these things and people have no clue what it is it’s like OBX right those little stickers you put in the back your cards no it’s these stickers like well there’s a sticker in someone’s car like it’s like what is it half marathon they do like there’s a little sticker that would be in someone’s car with a half marathon I don’t know what
3:07 with a half marathon I don’t know what the number is for half marathon 14. 5 or something like that some crazy amount of Miles people run you’re you’re not an endurance Runner I’m not me surprised a shocker shocker shocker I’m not an endurance Runner my little mini legs and my little squatty body is not going to be able to run marathons so I could but I just I choose not to not your thing not my thing but is that being said though the power bi desktop developer mode is out and if I had a little sound bite I would put on
3:37 I had a little sound bite I would put on a little yay like this has been long coming very much cheering for this this feature to come out so so right out of the gate which we we haven’t talked about is it is it a pbip file or is it a pbip it’s got a pbip technically you don’t need the p-bip if you you actually just need what is it the pbeir
4:08 the pbeir you can actually delete the p-bip because that’s just a project yeah that’s the project file it’s the container isn’t it not it’s not it’s just a pointer it’s a series of pointers that point to the things yep so Contra Contra to P by X they’re isolated so you don’t actually need all you need is the definition Dot pbir Tommy’s reading the actual blog on it yeah so the definition pbar will also open up the report
4:39 open up the report not not just the p-bip so don’t don’t try to to downgrade the value of pbip Tommy I love it too but you don’t need it it it’s not true you need it what’s interesting okay so for for those of our listeners who weren’t listening to us but a couple episodes ago we’re talking about things which everyone should be listening to all the episodes I’m just saying the new I I
5:09 episodes I’m just saying the new I I find this interesting and Tommy I’m surprised you didn’t jump onto this but like it’s the power bi desktop developer mode yeah it’s like hardcore mode for a long time yeah give us give us the give us the hardcore mode excluding Max since 2015. give me an actual button this is Hardcore exactly yeah exactly yeah we don’t even have integrated with our system yet we have to use two other applications I’ve stopped writing I’ve stopped building reports in desktop I
5:39 stopped building reports in desktop I just now write it all straight with code I just I just write the code that I need now yep I don’t even use desktop I know you use Dax generator my friend and that is desktop so this new mode gives us the ability to basically have Source control cicd which is continuous integration continue yes delivery yep looks like text editor support programmatic generation editing of artifact definitions yada yada we just we just hit the gold mine as it
6:09 just we just hit the gold mine as it relates to actual report development so one of the this file it’s it’s local right and then you can hook up a workspace in fabric to be the at the end of a branch right so as you’re connecting git or vs code and creating the project you can essentially have multiple people working on the same file you can validate code before it gets deployed to certain locations and this is the big
6:39 certain locations and this is the big win right gents anything else you would add to that I I think this is this is just in desktop mode you’re really focusing on like a report and data model but if you think about this and extend this idea or concept into a workspace where we have multiple models multiple reports and basically pointers to the various objects there this is basically this is what you’re going to get when you use the git integration inside a workspace a very very similar
7:11 a workspace a very very similar formatting of things but I yeah I’m very much thrilled with this one you can just open this stuff up in vs code check it into a git repo it’s very helpful yet the only there’s two things right now that I was a little disappointed with is one no Tindle it’s still all Json right now oh it’s going it’s coming but it’s not I was hoping that that the timing of it would be when they released get the what they are like a very large boat I I understand
7:42 like a very large boat I I understand they’ll just came out preview it six months ago it’s like it need they need at least three and a half years to integrate it into the new thing demo it because it would make perfect sense because we talked about when the Tim Dole announcement came was it’s really hard to edit Json files I don’t think you’re getting a lot of people who are going to go into the code right now with as it is and actually in a sense really in production or in the true situation be just editing a Json file the other thing I was a little I’m a
8:13 the other thing I was a little I’m a little disappointed with is there’s one folder so if I have all of my reports and data sets in a workspace I can’t organize it within the repo so they all have to live basically on the same level level just like in normal workspace so you can’t in your repo have like subfolders so here is the question that came up in my mind today but well hold on before we go over to your question hang on to your question Tom Seth I want to ask answer
8:43 question Tom Seth I want to ask answer Tommy’s question I believe Tommy there is no requirement for that for that I think you could modify the way the pbip file is working to automatically add subfolders as needed needed you’re talking about GitHub in the workspace are you talking about PPI desktop how it’s actually going to go pushed into the workspace so basically you can have the subfolders but it won’t recognize it in oh as
9:13 but it won’t recognize it in oh as you’re saying so you can reorganize it but the works when you publish it it won’t be happy yeah so that project name the files and folders that represent the report report the yeah let me see if I can find the actual text on it so multiple multiple reports and data sets at the same folder yes yes so you basically have to save your every pbip which creates two folders in the same level per workspace if you have anything in a nested folder
9:43 if you have anything in a nested folder it’s it won’t actually get pushed to the workspace here’s the question my question relates to this conversation okay what strikes me is is fair is the fabric workspace just a an end point for a branch that I can deploy to to or is it where I have to store everything and and the reason I ask that question is like do I really need fabric to use this you need a fabric to use it why no it works it works in anything
10:15 why no it works it works in anything premium right now everything that these oh but hold on hold on if this is a p-bip p-bip and all I have to do is connect like create a visual studio code git repository that recognizes what that file is what is preventing me from
10:31 that file is what is preventing me from ingesting that file modifying it and writing it back to a shared folder location call it local yeah there’s a like a compiler basically taking the pbiv and just compiling it down the only prop you could but it’s not going to be a personal workspace notice the document the wording right you’re right there would be no there would well there would be no ability to like verify it in the service I get it notice the wording when anytime they say get there’s always a preceding word
11:02 get there’s always a preceding word they’re calling it fabric get well okay yes but I to your point though Seth this thing should be very standard to compile and I’m looking at what’s coming out from PBI tools I’d have to imagine there’s going to be other tool engines like tabular editor and and PBI tools that will let you run as code basically right check in these files because you can’t check in the file my understanding is right now is if you’re
11:33 understanding is right now is if you’re using this desktop developer mode you can’t check in those files to powerbi. com right now you can connect your power bi. com environment to get itself but there is I have to double check the the features in this one but you should be able to link your workspace to GitHub I’m not sure how you check in desktop files and can you just literally lift the entire files section into an entire workspace I’m not sure that works from your desktop does that make sense what I’m trying to
12:03 does that make sense what I’m trying to say like you have Hardware desktop mode on your desktop you save the things as a pbip project the project doesn’t necessarily get lifted directly into the service I think you still I get it you might be able to like programmatically use a tool like te or or PBA tools to compile it make a pbix and then using the apis then publish the pbyx in the service it’ll be interesting to see where this goes goes it looks like you’re gonna actually you
12:34 it looks like you’re gonna actually you can actually create the git integration and non-fabric I just don’t know what that does okay so so bear with me just because because it peeved me that all of a sudden I was like wait what I have to use this with fabric fabric we are don’t we already have a connection mode between SharePoint and power bi that just picks up the changes of a file yes correct you have that can’t type point a get a get project to deploy changes into that
13:06 project to deploy changes into that SharePoint file location thereby just automatically updating yep you can do that that so then I could have a test workspace that would be tied to a particular SharePoint location we’ve hacked the system in five minutes and they’re gonna now shut down I I did like the blog post directly from
13:37 did like the blog post directly from Christian Wade was going out and talking about the Zoe was talking about the section of their video that they did for Microsoft build they didn’t go through and they gave you a very clear definition let’s take a definition look at what’s inside the root folder and they start talking about the data set the report what is the git ignore file for what is the project name. pbip file for like it shows you like more of what’s inside the file so it gives you a bit more of a structure around what you’re doing to build this
14:08 to build this this does eliminate the need for having if you have multiple thin reports and one data model this does eliminate the need to be able to use PBI split tools right so this allows you to to aptly change between multiple thin reports in the same project [Music] [Music] because if you think about like a project right if you think about a data set and three or four or five thin reports you’re using the same data set in all cases of the reports so what you want to be doing is you want
14:38 so what you want to be doing is you want to be pointing to okay I’m going to work on the sales report with this data model you can just open it up boom it is works and then you can change the pbip file or make a copy of the pvip file and open up a different PPI pbip file which would then reopen the model with a different report so I could use stocks and the same data model so it does it will make this I think easier to switch between different different thin reports as you build them the challenge becomes in this case you don’t
15:08 challenge becomes in this case you don’t have a separate desktop window for the model N1 for the report it’s all it’s still just all one desktop open file does that make sense there’s gonna need to be some diagrams on this one we need some more developer mode mode I can’t test it out on a non fabric workspace because I was just like oh maybe I can use one of my existing workspaces and just push everything to another at like an Azure get repo
15:40 to another at like an Azure get repo but my entire tenants fabric right now so I still think this is still a fabric only no matter what you do well let me let me let me step back here for a second the pbip is a way of desktop decomposing a file a PBI x file into multiple little smaller files I don’t think this has the same connection to what a git integration looks like in a workspace workspace you you don’t use the same thing
16:13 when you look because when you connect a workspace to git and then synchronize your files the structure is different different yeah so I I’m not sure if it’s I’m not sure if you’re supposed to use desktop developer mode directly with Git directly synced with a GitHub repo inside a workspace so I think right now the pattern is pbip is you can check in the code the idea is you check out the code open it up in desktop do your development check in the code and then you publish from desktop back into the service I don’t I don’t
16:43 back into the service I don’t I don’t think right now there’s a seamless integration between the pbip files older structure and the service GitHub integration that’s true because I don’t see anything talking about oh just synchronize everything that way but they do have git integration at the workspace level and I do know I’ve tested you can use the git integration with a fabric workspace so if you have a workspace in Fabric and you have a power bi Pro workspace workspace or sorry power bi premium workspace it
17:13 or sorry power bi premium workspace it could be any one of the flavors or premium premium per user premium Azure embedded P1 all the other premiums seem to work with the get integration they will allow you to synchronize your workspace with a get repo repo the only one I found that does not work is a pro workspace so any workspace is made for just Pro users the git integration will literally fail and say no you can’t do this this is not a feature of pro
17:45 so sweet so Kevin Kevin’s confirming here so Kevin says hey I did this for a client yesterday that does not use fabric they did the get integration on a premium per user so right now premium producer does allow you to do git yeah yeah yeah and then Donald you’re confirming what I said here you what I said here when you’re using the pbip project know when you’re using the pbip project right save pbip as a pbix and then publish the pbix or just open the pbip and publish directly from desktop which gets you to that next level
18:15 this is Seth you said this yesterday or maybe we were talking about this offline we were talking about fabric has just come into the power bi ecosystem and just thrown a ton of it features at business users all at once just just a ton like this whole git thing no business user understands what git is they’re not they’re not thinking this way this way right right this is true this is probably why they call it power bi desktop developer mode but what what does that mean for a
18:46 but what what does that mean for a workspace that’s integrated with Git like I I suppose your hardcore developers would only be the ones driving those workspaces across the ecosystem okay it doesn’t make sense It makes sense but I’m just saying mean it makes sense but I’m just saying like if you when these features are still existing to normal business users they’re going to see these things called get and they’re like I don’t even know what that is they’re good they’re gonna go code oh my gosh what is this repository no no yeah how about publish that sounds good exactly right oh Greg
19:17 that sounds good exactly right oh Greg you’re fine once again in power bi Pro means undeserving features actual professionals need to use use a need yes oh it’s funny anyways happy to have it yeah I think this is a great ad I’m looking forward to playing with it I will note that if you’re going to use the power bi developer mode you need to turn it on in your preview feature so there’s a setting inside your PT features you need to turn it on when you turn on the developer mode you can now save your power bi files as a pbip folder structure and then it will save
19:49 folder structure and then it will save your your project so and you also have to so both things we talked about as openers the visual you have to turn on as well it’s a preview feature the visuals aren’t on by default it’s not it’s well it is so the new visual you just download desktop that preview feature comes on by default but you could turn it off if you don’t like it which it’d be crazy not to yeah I I’m surprised we didn’t talk about that more that more that was such a bombshell obviously we’re all the nerdy tech tech guys so we
20:19 we’re all the nerdy tech tech guys so we had to talk about git but this card and because this is like the first core visual I think that they’re really releasing releasing and that’s what I’ve been seeing that this is this is the first of many enhancements that they’re doing part of what I I believe is Miguel Myers right right yeah Miguel Myers yeah so this is all part of the core visual PBI core visuals yeah so as as cool as git is and I just tested tested this on a non-fabric workspace
20:49 non-fabric workspace if you connect an existing workspace to get that you didn’t do anything in power bi desktop it does extract the full each data set and report into two folders no
21:03 set and report into two folders no pvip file see that’s no difference there so but as a pbir so what is the pbir then so the pdpip is a full like a Project based things correct pbir can also open up power bi desktop that’s just the report right connected to but the PV the p-bip is really not what we’re not used to where it’s like they’re connected so you don’t need a p-bip interesting I’ll have to think about that more I
21:33 I’ll have to think about that more I don’t I haven’t played with the pbir files oh stupid now I need to make another hat I was trying to save you from that that’s that was my only pushback on pvip [Laughter] don’t do it oh man now we need another hat so all the PBIS there’s there’s a good question I want to talk about okay came up in the chat yeah let’s talk about it from Alex he he asks has it been too many features all at once I don’t think you get away from that
22:04 I don’t think you get away from that from the standpoint that if you’re relaunching how you do analytics right like there’s there’s just a lot to consume which is obviously why we’re talking about a lot of changes but I I do find it an interesting question because it depends on what what is the purpose of the question is it too much for us as technologists no like and I think the feedback in the chat is like Brandon bring it on we’ll figure it out we’ll sort out this new ecosystem we’ll we’ll
22:35 we’ll see how it pans and plays and how we integrate into our systems but if the intent is that you just threw a whole bunch of features or we now have a singular analytics package that business users are supposed to Leverage Leverage is it too much I don’t know I I don’t think honestly in this scenario I don’t think you can get away from all the feature releasing that’s occurring right now because they’ve been there’s a huge fundamental shift that just happened and
23:07 fundamental shift that just happened and I think that main that main shift in technology stack has enabled so many more use cases now granted this is something that they’ve probably been working on what years right someone on behind the scenes has been trying to figure out how to move things over to Delta Delta tables for sure months like so ease of use experience yeah so I think my guess here is like typically when we see features being built if you think about Microsoft’s development Cycles they’re probably there’s a handful of things that are quick wins right hey and
23:37 things that are quick wins right hey and then there’s these longer pull movement items that are just more massive in nature to just reshift or retool or redesign a lot of the core code and to get to those major Milestones it’s one a longer development cycle and you need more people on it and it felt like the beginning before build was like a lot of skinny features were coming out right like this is just we’re just fixing bugs making things a bit more solid and then build comes and like we get all these new features and everything changes and our whole ecosystem is different now but
24:08 whole ecosystem is different now but that’s because they’ve been building this for like a year two years or however long it’s been tell me what are your thoughts but at the same time like yes we’ve been so used to Major updates being in a sense the same Journey or the same path of what we normally do like oh well there’s drill through or look what they’re doing for the Enterprise this makes sense it’s been on the same journey of I’m a business analyst or I’m a report developer and I’m just expanding that what they’re doing now with Git which I
24:39 what they’re doing now with Git which I would probably say the majority of power bi Developers they’re probably not in git or they where they wouldn’t have a reason to right they you may naturally like you right they you may naturally like find yourself learning about it know find yourself learning about it with the different repos and like Powershell and Python’s packages because there’s some code but by no means up until now have you ever needed to know what git was to be successful at Power bi bi and then everything now with fabric I and then everything now with fabric this is not just
25:09 mean this is not just the growth story or the growing up story this is literally like shaking all the dice or all the marbles of the skills and what path they’re going to be and just shaking it around right now and is dizzying so it’s not just oh look at this new feature in power bi desktop and another way to share it’s literally expanded and blurred the lines of other personas other professionals and where do they stand I think that’s the hardest part here
25:39 the hardest part here I I find it good I was gonna say I like Alex’s second comment there talking about organizations are slow moving right this is a lot of new stuff all at once and and thinking about this these new features that we’re introducing is going to take years maybe maybe maybe a year maybe maybe months long months or years of time for organizations to really wrap their head around what just occurred and be able to integrate it into their actual workflows because we’re in we’re in the mix we
26:09 because we’re in we’re in the mix we understand the problems we’ve already been professional developers but for now a new user showing up to power bi for the first time or organizations who already have a wider range of skills around power bi today yeah they need these features they’re very relevant but now there’s a whole layer of Education that needs to be applied into the power bi ecosystem that we’ve never really had to deal with before we’ve been building reports working and modeling and that’s about it but now we’ve got these two other new personas around data engineering and data scientists pipelines this whole data warehouse and lake house management
26:40 data warehouse and lake house management system what does all that do and so now organizations need to figure out where does this new ecosystem fit inside their existing infrastructure none of these changes are none of the challenges these problems are solving I would say that right none of the challenges organizations have around Version Control are different is it the same stuff that every organization has struggled with or everything they’ve ever built right we’re now just incorporating a lot more of the stuff closer and easier to use for the power bi user
27:10 bi user anyway you say that but for for the developer yes yes I my concern is that when we look at the tooling prefabric what what was what was the ecosystem power bi yeah right is the business interaction and the virality of a dynamic tool that is allowing business people to get an insane amount of value
27:42 people to get an insane amount of value out of data from conducting data from multiple systems doing their own thing doing their own Transformations doing their own reports which creates problems for bi and Enterprise whatever but it solves a ton of problems the day before like I think my problem is this is this the day before build it was power bi and the day after it everything was just renamed fabric right yeah like where was the transition for business users who tried to go find
28:13 for business users who tried to go find power bi community and it’s fabric now and now everything’s Fabric and like already I’ve had conversations where people are like where’s where’s all the power bi stuff like where did it go because that was my world and it’s fantastic that like hey not only are we taking that world that is just part of this amazing ecosystem that we’re now enabling you to use that’s the other part I I would have loved to see more of which is what we’re talking about which is that transition
28:43 talking about which is that transition from hey business user or people who are going to make this a viral thing and within an organization and drive fabric this fabric ecosystem yeah instead of just power bi like here’s all the other options this is how your world just exploded into like a million possibilities yes and scaling and solving your problems yes and I would have loved to have seen more of that come out with the announcement of new tech and I think Microsoft repetitively
29:14 tech and I think Microsoft repetitively does this does this poorly from a marketing standpoint of like yeah big transitions of of this and bringing people along for the ride because if you think about subscriptions and Subscription Service like you did you just ostracize all of the value of power bi in that community and the like how excited people were because you just re rebranded the whole thing and you didn’t bring those people along for the ride it was my concern although I wanted to lean into the
29:44 although I wanted to lean into the question a little bit we’re not going to change the podcast of fabric. tips right so like power bi always so it’s out but I guess I guess when when you when you think about organizations and their adoption of new technologies it’s always way slower yes always so you just you drew a hard line and and like I think there’s a lot of confusion in the market going like wait wait wait wait wait where’s this thing we probably just adopted or rolled out into our ecosystem
30:14 adopted or rolled out into our ecosystem and now it’s fabric like what are we supposed to do so there there’s that and then like over long term does that mean you’re not going to get the same adoption or upswing in fabric becoming the new thing within organizations no but like the lead time on that stuff is usually longer so my two points because now you have yeah preview features on top of a Production Tool yep right that is one of the most popular reporting and data tools out there I
30:45 reporting and data tools out there I agree now part of preview but not preview but not part of it but we’re going to rename it and then and then we’re going to tell you where it’s going which is like co-pilot all these new features which aren’t part of the preview aren’t part of the tool yet but we’re telling you about them and we’re demoing them it’s like okay I’m excited I’m confused and where did my tool go all that’s exactly that’s a really good point yeah yes exactly right it’s I’m yes agreed and I think this is what I think this is the nature of Alex’s question is right there’s so
31:16 of Alex’s question is right there’s so much going on where do we focus our attention now what is what is going to drive the largest impact in my organization now we got to figure that out right so in the question of is it coming out too fast I think all of those changes at the same time have created a lot of lot of disruption in in confusion in the
31:36 disruption in in confusion in the market that’s all but we’ll but we’ll Forge away ahead speaking of some of the documentation and how you want to use fabric area we can finally talk about our main topic today well I think this is a great transition because we’re not going to pick apart fabric even more because we used to have just a what a power bi data Mart as of two months ago and now we have beta warehouse and lake house already rolled into now the fabric ecosystem
32:06 ecosystem so let’s jump into today’s topic I guess this would be probably the most appropriate thing to get into next we’re starting we’re starting to talk about another article that we’re going to really put in the chat window here it’s talking about another Decision Guide for fabric the Microsoft fabric Decision Guide the data warehouse or lake house which is funny the title only says Warehouse or lake house yet in the article it actually talks about the power bi data Mart as well so we’re not deciding on the data Mart we’re just only deciding between the lake house and the data warehouse so already out of the
32:38 the data warehouse so already out of the gate I’m a little bit confused by this article just by beginning here so I’ll put the article Link in the chat window as well it’s also in the link in the description below in the video so let’s jump into this one so high level the article goes through here’s an outline of the three different tools you can use you can talk about a data warehouse a lake house and a power bi data Mart and it goes through a series of features that each of those tools have or do not have things like what’s the data volume what type of data can you use the data is organized by you
33:09 you use the data is organized by you you use the data is organized by what right operations can know what right operations can you use using t-sql or spark or seek or some other data flows and then there are other features in here that I have no clue what it is well I had to do Googling this morning like multi-table transactions okay and then it talks about security things and then it talks about how do you query across different items or elements inside the lake and then it goes into what we like what we did on our last article which was talking through different scenarios and it talks about personas so we’ll see how
33:39 it talks about personas so we’ll see how far we get through all this we already had a really long intro anyways and we’re now just starting the article halfway in so let’s start with reactions what do you what do you guys think about this new documentation around how to decide what to use Seth I want to let you go first he’s gonna let Tommy’s gonna let you I’ve been talking a lot already yeah I think there’s so what’s interesting to me is like these these
34:09 interesting to me is like these these these to me are in the context of fabric just interfaces to get into the same storage location which is Delta parquet right there we’re storing data in the same the same ways to your one point that is interesting that data warehouses support that lake houses don’t which is the multi-table transactions and somebody in the chat can correct me if I’m wrong but I think that relates to the ability of a warehouse to have commits on a
34:39 warehouse to have commits on a particular process so if I have to update if I have a chain of sequential things that all have to succeed successfully in order for final fact Dimension whatever to be created if any one of those things fails it’ll it’ll unwind it as opposed to lake house where I just have transactions and and commits happening on on tables all the time like right away away I think I think it’s it’s good because a traditional bi
35:11 it’s it’s good because a traditional bi right uses data warehousing whereas a lot of new architectures relate to having to deal with unstructured data and to me that’s the big differentiator here if you have flat tables if you’re coming from transactional systems data warehouses make sense it’s all in SQL why why complicate things or throw in lake house unless you need to do additional types of like data transformation and or machine
35:42 data transformation and or machine learning on top of data sets that you just can’t do in the warehouse itself right so having these back and forth and I think this has been demoed really well makes the most sense to me because we’re in the Enterprise space with both of those two different tools and I’ll Tommy’s at the data Mark guy so I’ll let him talk through all this stuff but stuff but incorporating that into this I guess would be the the develop like the the
36:12 would be the the develop like the the business person aspect of storing data in like sorting things out but yeah I don’t know like this this one makes more sense to me from where to put things in a easy data Mart intermediate data warehouse and harder or more complex lake house just because you can use multiple different languages and there’s multiple different use cases and unstructured data in that in that ecosystem ecosystem you think this one makes more sense than
36:44 you think this one makes more sense than the data ingestion one yeah to me and maybe that’s just because I’m familiar with all like well maybe it’s also because we’re automatically eliminating one of the scenarios right off the bat too a few things with this and I was really just trying to dive into at least the roles here the biggest thing that makes it a little easier is the roles are more defined in this scenario I would agree the scenarios I feel like are a lot more realistic in this one than they were in the other one it felt
37:14 than they were in the other one it felt a bit more contrived in the prior article or they were just they used the same wording and or the same Persona they didn’t really explain okay where do they permanently Focus like okay I’ve done things in pipeline so I’m a data engineer well not necessarily this one’s a little more like a lot more defined and I think let’s let’s talk about the elf in the room and is the data marked one here I know they’re really trying to push the product here are they guys
37:46 product here are they guys they’ve added in a fabric documentation you have to because it’s part of it but I would argue like so looking at the feature richness of all the three items right you have this idea of like a data warehouse you have this idea of the lake house and you had the party data Mart Mart if you look down line by line warehouse and lake house are almost the exact same feature set there’s very few differences and I’d even argue I debate some of their arguments here so the primary developer skill set in a data
38:17 primary developer skill set in a data warehouse yes is a sequel the primary data skill set in a lake house house they say it’s spark but I would emphasize there’s they’re not emphasizing enough the spark SQL interface which I know Seth you’re a SQL Developer and when you started looking at Spark at Spark was it a really hard transition for you to go from a straight SQL Developer to now a spark SQL Developer I Developer to now a spark SQL Developer no I don’t know conceptually mean no I don’t know conceptually like the the biggest difference is just understanding the the back end of what
38:48 understanding the the back end of what what you’re interfacing with that is yes like the gotcha part but yeah you’re if SQL and that’s the other part of lake house is you you absolutely can do all of the things you would need to using spark SQL is a like slight differences sure yep and so when I look down here I’m going okay well there’s really no downside of just moving over to a lake house with maybe a slight skill set difference the skill set that I feel like you would continue to have in the data warehouse would be anything around store
39:18 warehouse would be anything around store procedures right that’s the only thing that you would get that would be different than what I would see in the lake house and there’s this really weird feature of multi-table transactions that they’re really pushing on here for whatever reason there must be something with that that I’ve never used it never even really heard of it so is that going to be a major deciding factor between my data warehouse and my lake house don’t I’m not sure the other part of this I would look at is what’s the cost of these things so a data warehouse I would think would be from my experience at data warehouse
39:48 be from my experience at data warehouse is always more expensive than a lake house build and the amount of data you’re pushing through and even though it says unlimited amounts of data you’re going to pay for more of that unlimited amount of data I think per gigabyte in a in a warehouse than you would in a lake house which is interesting so like let’s talk about it because we’re obviously talking about at our synapse right sure as the under underpinning of the warehouse you’ve implemented more Warehouse Solutions than I have
40:19 Warehouse Solutions than I have when I’m looking at lake house I pay literally nothing for storage yep a whole lot for compute correct which is variable based on how well you design your system right absolutely yeah and and can be tweaked right like correct Performance Tuned yep fine-tuned and all that I will say getting there is a challenge but at the same time in fabric fabric how how like do you have that fine grain tooling or understanding of like when you’re
40:49 or understanding of like when you’re doing things way off the rails and I guess that like that opens up another thing because our first Solutions in building like houses were extremely expensive expensive and it’s only after refinement that you understand that ecosystem better to like not spend so much how do you curb that in this in a software as a service yep right and and so that would be a challenge right yes I would agree processing things in storage but to your point I’m gonna go into your point there a bit more right it’s certain workloads like I wouldn’t say
41:20 certain workloads like I wouldn’t say all workloads you were doing was expensive it was certain workloads that were identified as expensive because they were doing very chatty things between the lake and very you between the lake and very it needed a lot of data so instead know it needed a lot of data so instead of doing a select star all thing right you had to like okay let’s only pick the least amount of data let’s be a bit smarter about only handling the data that needs to be changed as opposed to just reading an entire table and writing it back down again right so my my thought here is how you design the systems are important
41:50 the systems are important the monitoring of what workloads are being done in that system will become increasingly important with this shared service now it’s a it’s a it’s a platform as a service now and I need to across my organization identify where are things being used inappropriately
42:08 are things being used inappropriately well now I don’t know we’re gonna get that reporting right away one of the open questions I have with this like is depending on what you’re using for data warehouse or lake house are are the is the power behind it is the CPU different is the capacity different or is it just Auto figuring out like if I think about databricks right there’s a whole slew of different types of clusters you can use depending on your data workload yep those are wide and varied and depending on the workload or how you do you selecting one is already a cost savings
42:40 selecting one is already a cost savings is auto figuring out for me which cluster it should be using so instead of picking are we just picking the D series where it’s just like yep it’s the generalized workload and you’re paying you’re paying a little higher because we can’t tell you what what workload you’re using so that’s a great question and I I don’t think there’s an easy answer to this one but from what I gather right now in the workspace settings of a of a workspace you have the data science and data engineering and there’s an entire section where you’re able to pick what spark
43:10 you’re able to pick what spark compute you want okay so you can you do select your yes your spark compute and you can make a pool like there’s a pool that you can use and again this is what I this is where I’m not quite sure I understand exactly what’s going on here so if you think about what you would do in data bricks right you can have a single notebook run on a very specific cluster because you can say this notebook runs on this cluster in fabric it sounds like the entire workspace gets a type of cluster so you’re you’re not able to really pick like in if you think about your workspace in your workspace
43:40 about your workspace in your workspace you may have different loading or data engineering activities that would require a different type of cluster or different types of clusters based on that that one notebook right so getting very specific on a per notebook level level it looks like Microsoft has tried to make the cluster selection and sizing very simple to you you basically pick small medium large that’s what you get those are those are your options you get to pick for the different spark pools and that’s what you use so I’m I feel like they’ve removed some of
44:12 I’m I feel like they’ve removed some of the developer level controls away from The Spark engine which will make it harder for us to manage and you’re just going to get what you get and hopefully it optimizes itself or or and this is where the like further analysis and like digging into the workspaces part for me comes in like the the confusing part for me was like oh hey power bi users you guys are used to workspaces this is what is used in Fabric and it automatically equates in my head workspaces are for reports yep and in this ecosystem maybe that’s
44:44 yep and in this ecosystem maybe that’s not the case right maybe I have engineering workspaces maybe I have a workspace that is data science and ML and the only thing in reporting would be like their own reporting but the engine like the artifacts we’re creating should be part of one Lake and then it’s just a matter of opening visibility to or permissioning to other workspaces that would be doing the reporting or putting putting some of the data models together right yeah Yep this hierarchy of how you segment teams and
45:14 hierarchy of how you segment teams and workloads in this ecosystem breaks the original concept of workspaces and and this this is the problem with all the and I think part of the initial question that we had on on the chat where too many updates will it’s not just the update we’ve added so many more entities in on not just tools but in a sense objects and and organization with the same level like the same hierarchy
45:44 the same level like the same hierarchy that we’re used to so yeah all this is going to exist in a workspace the the domain’s more security it doesn’t really group anything it’s more from a a security point of view what someone can see it’s not just security right because to the to Mike’s Point if we’re going to provision like a a very large cluster in a workspace yep that’s not for business users right like that’s permission that’s locking down a resource on what they can what the expenses they can they
46:14 they can what the expenses they can they can yeah for certain workloads yes and so my thought here being if there are if I look look at the entire workspace and I have 90 of my workloads are small quick wins you of my workloads are small quick wins easy loading small tables no big know easy loading small tables no big deal great but when there’s that 10 of workloads that are I’m going to read this massive table I’m going to write millions of Records I’m going to do a massive load of data engineering or I’m going to run a data science project on that that cluster needs to be designed for what that use case is so that you’re now now I’d have
46:46 that you’re now now I’d have two workloads in the same workspace that may not align with how I want the cluster to run so I almost I almost feel like inside the workspace I should have the ability to be able to to pick here are three different types of clusters I would want to run and then on a per notebook basis attached to that cluster execute what I want and then move on or be able to designate that cluster for those different those elements I don’t know it’s it it’s going to be
47:16 I don’t know it’s it it’s going to be interesting to see how people actually find solutions to this because there’s probably a million different ways you can think about how to optimize and tune and rip out costs on this stuff but right but at the same time like the the structures of those things should be in line with line with each of these familiar places for folks sure the nice thing about Fabric in general is like by combining all this stuff together Greg would be like in chat outline that yeah these are like
47:46 chat outline that yeah these are like doors to the same storage mechanism it’s all going to the same place right yes correct it’s it’s the familiarity for people that know data warehousing and SQL that you’re you’re bringing along for the ride it’s it’s Lake to the point that you’re bringing along for the ride for for I think the significant difference is more for data Mart and data warehouse people in the store underlying storage but for Lake House people like that’s that’s exactly what we’ve been doing the whole time right like there is no change yes it’s just bringing these other
48:17 yes it’s just bringing these other platforms into this ecosystem where it no longer matters right yes there’s your path of how you like to to manage manipulate and create curated data sets the the the the interaction point with every reporting user is still going to be the adult parquet tables yes permission on on those artifacts that come out of these ecosystems that are the value to the company right so if you think about like it’s almost like I don’t care which way
48:47 it’s almost like I don’t care which way you want to do all the data stuff here’s all of it together in in different ways in which you can do all of those features functionality of tool sets that have existed previously the end output though is what’s uniquely different about all this with one like in Delta parquet because it’s just a matter of visibility who has visibility into the object and when do I use it which is game changing yeah I agree with that that so I manage it well yeah well I wanted I
49:20 so I manage it well yeah well I wanted I wanted to like if you had your first choice of these three options based on what Microsoft describes in the article First Choice second choice third choice if you’re if you’re a developer today and you’re looking at this world and trying to say I don’t know which one to pick how do I maybe maybe aside from where am I comfortable right now because I think I’m I’m under the assumption here that people are going to learn like let’s just assume that whatever technology you pick here is the Learned technology you’ll you’ll spend the time you’ll figure it out you’ll make it work so
49:50 so how would you rank data warehouse Lakehouse and then datamart what would be your first choice of those three then your second choice and third choice for the vast majority of users and companies yeah so just just like General General design right you’re coming into a company you’re going to try to recommend something aside from skills right so you something aside from skills right so I would argue that many of the lake know I would argue that many of the lake house house [Music] [Music] design decisions or the many of the lake house concepts are very
50:20 lake house concepts are very transferable to a SQL Developer they would feel comfortable there they do need to learn some things but it wouldn’t be so bad that you’re like I’m totally lost but I’m assuming that learning aside based on the technology stack the way it’s at right now what which of these things would you choose first second and third Tommy would definitely pick the data Mark first he would always always first you can do everything is the native art absolutely honestly it’s it’s probably gonna I feel like it’s gonna be the
50:51 gonna I feel like it’s gonna be the standard answer but data flow data Marts do not fit in this story they they fit as I think as intended to be a querying tool we’ve not seen the updates it doesn’t do pretty much everything that a lake house or the warehouse can do so I think honestly just shove it aside but I would probably go number one and just the focus is on the the lake house now and it’s because of Microsoft integration that’s what works the best
51:22 integration that’s what works the best with fabric if I’m doing if I buy all in so to speak with Fabric and basically put all the all the birds and once whatever they’re saying is two birds with one stone put all the eggs in a basket all the eggs at the farm I got Birds yeah birds and eggs in a basket yeah okay well yeah if I was if I’m gonna put all my eggs in the fabric basket then I’m I’m gonna want to obviously go with the wick house Seth which what’s your second decision Tommy
51:53 which what’s your second decision Tommy I think if I’m still not between data warehouse and probably I did a Mart is no I’m still choosing between lake house and where else I think if I’m just doing a few parts of integration with fabric not everything in the kitchen boodle I’m probably still gonna go with the warehouse it’s what I know it’s what I’m familiar with it’s just because lake house plays best and we still know this yet but how well does it play when there’s other
52:24 well does it play when there’s other systems and other Integrations that are outside of fabric right now everything’s being that end-to-end solution so I would probably still say with the warehouse if I’m not going on Seth where do you think you
52:36 not going on Seth where do you think you fit on so you’re basically Tom you’re saying dataware lake house data warehouse then datamart that’s your order of Precedence if you had to choose things where would you land yeah before before I answer that like there has to be some like the difference between these must be how like the CPU or the cores and how they’re processing data because although Greg B cracked this every single time we talk about data sizes this is specifically the difference between that
53:06 specifically the difference between that they’re making in the documentation between the warehouse and lake house is data size data size so if I’m going to be met with challenges in the data warehouse even though I’m storing everything in Delta parquet now I’m a little confused with that but that but that aside right I think the vast majority of companies are going to start with a data warehouse like well preface I’m a lake house guy everything we do is in lake house but I also have massive volumes of data and I love the ecosystem do I think the majority of companies need to go there
53:36 need to go there no I and and I would say I’d revert it I’d say data warehouse first in Lake House second because the vast majority of companies don’t have that large of size of data and realistically it’s about shaping data to get the maximum amount of use of it in reporting tools and put business logic behind it right and if I’m already in flattened tables in SQL servers why would I need a lake house right like why do I need the unless I’m going to incorporate certain
54:07 unless I’m going to incorporate certain data science and ml things but that can be just its very separate thing in the fabric ecosystem it can just be this hey we’re going to spool that up because it’s not like I have to move data into a lake house right where with other tooling right now it’s what we have to set it up and structure it and it’s in the lake house now and I have to keep pushing data around to different systems if it’s flat most users are absolutely familiar with SQL SQL interfaces like that’s probably
54:37 SQL SQL interfaces like that’s probably the place that I’d I’d say the vast majority of companies price agnostic I’m not talking about pricing right now right tool sets and capabilities that would hit the widest audience are absolutely going to be in SQL and those sorts of user interfaces lake house changes the Paradigm a little bit you have to get familiar with Jupiter notebooks you have to understand the Clusters and and how they how they operate how they operate against workloads it’s it’s
55:07 operate against workloads it’s it’s different from the object you can create from transformational things within them it’s just more to learn right and if there’s not a need from a data size perspective or you’re doing Advanced Data science type things then I like a lot of this ecosystem removes some of the benefits that we would have argued were part of a lake house because of how it’s stored in Delta parquet in one Lake So based on this diagram that Microsoft provided I would I would be in the camp of Tommy I would choose lake house data
55:38 of Tommy I would choose lake house data warehouse and datamart what diagram the the table that Microsoft provides in the article so above that table that they provide I would definitely from the features they list out there I would definitely charge pick lake house over data warehouse the only thing I could see that would be an advantage for the data warehouse potential is this object level security table view store procedures if your company already feels like you need to run and I’d also argue in my opinion here too is most companies need to get away from store procedures it’s just an
56:08 away from store procedures it’s just an it’s just an old thing that you can use if you’re comfortable with it you’ve built a lot of them fine but I think the world is moving away from those elements and and the fact that the warehouse is still writing Delta tables down to the lake and can read things from those Delta tables this gives me one more reason to move over to Lake House pieces the only thing I could what what’s the justification behind that though same store procedures aren’t like like
56:39 same store procedures aren’t like like the equivalent is what I just don’t like them I don’t I don’t like building them they’re they work I but I would rather have a pipeline execute things like a store procedure right so I feel like the pipelines activities does that orchestration into what I would argue with higher more capable degree right so if I when I look at store procedures it’s doing a specific set of things to a pieces of data but only local to that SQL Server yes you could call a store proced
57:09 yes you could call a store proced teacher to run some other process or notify someone outside of this the ecosystem of that SQL server but when I really think about what I want to do capability wise that’s pipelines sure but in this ecosystem that doesn’t apply because if I have access to the underlying data set which is a Delta parquet file if that gets created in lake house I can manipulate that through the warehouse I can manipulate that through a store procedure and that’s why I’m saying it changes the dichotomy because we’re not talking about being locked into a single SQL Server anymore
57:39 locked into a single SQL Server anymore you’re talking about an ecosystem in the storage and those artifacts are the same regardless of which door you open and that’s why it changes for me I could see that I’m not I’m not going to say you’re wrong I’m just saying I wouldn’t choose that direction 10. well we’re just going to disagree on this one because I I really feel like the the lake house adds more flexibility and capability long term and it’s I think it’s also more effective like I call so the fact that Microsoft is pushing many of their tools already towards this Delta lager Lake of
58:09 towards this Delta lager Lake of elements elements and I think it’s cost agnostic yeah I know I know and and I also kind know I know and and I also was making a note here about of was making a note here about learning agnostic right if we if we just said we’re going to learn it we’re going to do it move forward with it I think the future of technology is going to be sitting around that lake house more I think there’s going to be more features and things that are coming to the lake house that are going to make it easier for users the linchpin or the really big deciding factor for me was in the lake house
58:39 house the lake house really the more they abstract that away from the business user the more they can make that easier for a citizen developer to use and execute on top of the easier it will become to adopt a lake house and so I think the barrier to entry for a citizen developer who maybe was playing with datamarts initially working with a a data flow moving some tables around trying to make a table I think that user that citizen developer will feel more comfortable inside a lake house house well again talking about like that that
59:09 well again talking about like that that grow up story what does that graduation Story look like inside an organization you’re comfortable in data flows you’ll be more comfortable in the lake house architecture and structures than you would be in a SQL Server yeah but that’s ETL that’s not lake house house it’s you’re talking about pipeline activities dataflow Gen 2 but doesn’t data warehouse already incorporate pipeline activities with store provision store procedures but that would be equivalent of running a store procedure is equivalent to I suppose I suppose orchestration of data have you would
59:39 orchestration of data have you would have notebooks in the lake house that you’d be executing through the pipelines correct in India that’s what I’m thinking it I to me dude I think it’s split in here saying like it it sort procedures make less sense than me creating a a notebook that modifies data I agree with you like I said I promise you I like lake house I would recommend lake house but I think it the vast majority of people are going to be much more comfortable in data warehouse in that land because it’s very SQL specific however if you’re going to say it’s
60:10 however if you’re going to say it’s going to cost me 2x than the lake house then then yeah go to the lake house yeah exactly exactly the cost makes a big decision on that part yeah I would agree with that one all right with that we’ve burned through a perfectly good hour we’ve been arguing a lot a heated discussion around this I lot a heated discussion around this again like always we didn’t even mean again like always we didn’t even get to the scenarios or the different users here Susan Rob and Ash as a citizen developer a data engineer and a professional developer maybe we’ll save that conversation for a future statement there so there so I feel like I’ve hit a chord with some of the SQL developers in our crowd
60:40 some of the SQL developers in our crowd around stored procedures so I just I just don’t like I don’t like using them I just I want to move to other stuff so sorry it’s my personal preference excellent well thank you very much for listening to the podcast I hope you found some value from this one I hope this helps you think a little bit more around the different Integrations between data warehouse I think we are right there in the same boat with everyone else we’re trying to figure out how fabric will impact our normal workflows what will this mean for us
61:12 workflows what will this mean for us moving forward and how will fabric influence or impact what we build moving forward so just some great thoughts today we appreciate all the comments in the chat thank you so much we really appreciate your comments and your thoughts as well thank you for contributing to the conversation additionally but that our only ask is can you please just share their podcast with somebody else if you found this to be valuable if this made you really confused about why store procedures and why not like house and whatever feel free to share this to other people let them be
61:42 share this to other people let them be confused too with you and we can all figure this out together and yeah exactly Mark says my head hurts even more have a great day you’re welcome Mark I’m happy to give you an early morning headache before you start your day Tommy where else can you find the podcast podcast you can find the podcast anywhere it’s available Google Apple Spotify make sure to check us out if you want us to talk about a particular thing with fabric make sure to go to powerbi. tips slash the podcast leave us a message we do look at those and yes
62:13 and yes I actually have a report see just to see so I’m hoping to get the numbers up a little more but anyways surprise me so so then I’ll put I can actually put an automation system because I haven’t needed to just make sure you stick it in the lake house and don’t use the store procedure we’ll be happy [Laughter]
62:43 maybe you can do it and you can just tell it you can do it Tommy and then tell us how bad it is and then we can figure it out later on how like what we don’t like about it does it anything it doesn’t do anything we’ll see you next time thank you very much we’ll catch you later
Thank You
Thanks for listening to the Explicit Measures Podcast. If you found this episode helpful, share it with a teammate and subscribe so you don’t miss the next one.
