Load all the data in OneLake – Ep. 250
OneLake is pitched as the “OneDrive for data”—a single logical place to store analytics data across your organization. In Episode 250, the crew debates what that actually means in practice: how many lakehouses you should create, how much data is too much, and where security and ownership need to be nailed down before you start throwing everything into the same lake.
News & Announcements
-
Heads up: Power BI OneDrive and SharePoint report viewing will be on by default starting in October — Microsoft is switching the OneDrive/SharePoint Power BI file viewer preview to on by default for tenants in early October 2023. If your org isn’t ready for that experience, there’s a specific admin setting and deadline to opt out—otherwise you can still disable it after rollout.
-
microsoft/Fabric-Readiness — A handy GitHub repo of reusable Fabric slide decks (with speaker notes) intended for user groups, conferences, and internal enablement. If you’re trying to get a team aligned on Fabric concepts, this is a solid starter kit.
-
Dataset Refresh History Enhancements — The Refresh History page now exposes more detail per refresh (including successful runs), making it easier to see retries, step-level timing, and what actually caused long refresh durations. It’s a practical improvement for troubleshooting refresh reliability and performance.
Main Discussion
The core debate is scope: should OneLake be treated like a single, giant lakehouse, or is OneLake the umbrella and lakehouses are the real curation units? The team uses familiar analogies (SQL Server vs. databases, or a skyscraper vs. rooms) to reason about where data should live—and why governance and security need to lead the architecture, not follow it.
Key takeaways:
- OneLake is best thought of as the organization-wide container (the umbrella), while Fabric lakehouses are the domain/project buckets where curated tables and files live.
- Don’t confuse “one place” with “everyone can see everything”: the more you centralize storage, the more important role-based access and clear ownership become.
- If you’re coming from Databricks/Snowflake, expect a terminology shift—under the covers you’re still dealing with familiar storage concepts (Delta/Parquet) even if Fabric wraps them in new artifacts.
- You’ll likely want multiple lakehouses (by domain, environment, or delivery boundary), but you still need a plan for shared objects like “dim customer” so every team doesn’t reinvent it.
- Shortcuts/direct linking can reduce duplicate copies of data, but reuse only works when the \“source of truth\” tables have an accountable owner and stable definitions.
- A medallion-style pipeline (bronze → silver → gold) still applies—what changes is where you land it and how you expose it (lakehouse tables feeding semantic models/reports).
- Treat semantic models (datasets) as first-class artifacts: think through whether they should bind to one lakehouse, span multiple lakehouses, and how you’ll promote them through Dev/Test/Prod.
Looking Forward
Before you scale out Fabric, decide on your OneLake operating model—lakehouse boundaries, shared table ownership, and access patterns—so consolidation gives you reuse instead of chaos.
Episode Transcript
0:02 [Music] foreign good morning everyone welcome back to the explicit measures podcast with Tommy
0:33 the explicit measures podcast with Tommy Seth and Mike hello everyone hello gentlemen I don’t know what if I woke up early this morning or a little bit there’s a there’s a spring in my step I don’t know there’s just things going on I just feel a bit more Punchy today than normal so normal so we were already teasing Seth before the the podcast already so it’s it’s gonna be a fun day I think yeah we’re gonna have some yes we’re gonna have some words personally Punk about this conversation today but we’ll get to that
1:03 conversation today but we’ll get to that Tommy loves Lakes he likes swimming in them he likes boating in them fishing in them ever since I went to Wisconsin for a summer vacation is that not a pro is that how you say it well it’s close it’s there’s no e in the beginning of it but that’s okay Wisconsin and that that’s gonna that’s I told you I’m punching on you better be on point today I’m Gonna Get You Who asked you so who asked you oh goodness so let’s go through some
1:33 oh goodness so let’s go through some openers here there’s a lot of articles that are have appeared recently I think there’s some things we definitely want to talk about we’ve we’ve been doing a lot of things in general we’ve just been very busy so having some more live sessions here recently we’re kind live sessions here recently we’re getting back into the swing of things of getting back into the swing of things one thing I am very I’m not I’m not gonna say I’m nervous about this but I definitely want everyone to be aware this is the same level alert alert this is the same type of alerting that was that we were giving
2:03 that was that we were giving out when we were talking about fabric being introduced because there was some weird settings in the admin portal that were by default off but will be turned on at a certain point in time so be aware there is a post from the Microsoft team team the the post is literally titled heads up up RBI power bi OneDrive and SharePoint will be report viewing will be on by default starting in October so let’s quickly talk about the
2:34 so let’s quickly talk about the feature and then we’ll talk about you need to be paying attention to this and you need to have a policy or you need to figure this out for your organization do you want this on or do you not so the feature is I think if I understand this correctly go to SharePoint without going to party. com be able to view a powerbay report inside SharePoint you can you can go to report click on the report file just the way you would view it you go to a SharePoint right now with Excel you can click on the Excel document the Excel document opens up in the browser and you’re now reading or
3:05 the browser and you’re now reading or viewing that Excel sheet I think this is the same thing you’re going to be able to see the report I don’t know if it’s see or edit the report but you’re going to see the report in the browser inside SharePoint thoughts gentlemen so the feature is cool and it is it’s just really viewing the desktop file but in a consumable in the consumable appearance in the UI and any report that’s available in OneDrive or SharePoint can be neat it’s actually not a
3:35 can be neat it’s actually not a terrible experience but it’s probably something you really want to plan out from an organization point of view yeah it’s definitely something you don’t want just to turn on by default let me give you what I think will happen right this is what I think will happen please I think what will happen and this is what I don’t understand yet I’ve got it figured out but if you want to track what’s being done in power bi if you want to track who’s opening what reports if you want to track what’s being used or not being used from your power bi ecosystem
4:06 used from your power bi ecosystem you would probably not want to turn this on you would probably want to leave this feature off and not let people View files from SharePoint yes they can store them there that’s totally fine just make them files and and push the teams to push these files into powerbit. com and do proper sharing right my my suspicion that’s going to happen here is everyone’s gonna be like oh this is amazing I don’t even need powerbi. com now and so now people will just show up even though they have a pro license they’ll put reports together in a SharePoint location and the team will
4:37 SharePoint location and the team will work or use the reports from the SharePoint location again this it will happen it’s not a question of if it will happen and people will not publish to power bi. com anymore and there’s gonna be this whole other workflow around how do we manage the loading of these Excel files into my power bi file from SharePoint and all this other not I think you’re going to get a lot more questions here in the near future there listen there are certain use cases that I just don’t understand and hey I it’s it takes
5:07 understand and hey I it’s it takes somebody to explain them to me and I can’t wait for somebody to do that for me for this feature right I can understand like I can hear a lot of users who are not familiar with or use power bi power bi normally requesting this feature I want to see these reports every like if if this is this is the pbix file I want to see it and that’s not the point of the pbix file right like we’ve gone through years of
5:38 right like we’ve gone through years of refining the experience of how you share content with consumers I think this is going to confuse people because if you start sharing these files around nobody’s gonna be you’re not refreshing them this isn’t live data this isn’t automated in any way like why why would we start taking the files themselves as the sharing mechanism I I just don’t like I don’t get it yes like it’s already confusing people like I can consume in an app I consume consume in
6:08 consume in an app I consume consume in teams I can embed it in SharePoint why am I embedding in SharePoint if I can read the file in SharePoint like you read the file in SharePoint like like all of this is disconnected know like all of this is disconnected and what this screams but initially this seems like to me is a whole bunch of people who use power bi wrong telling telling Microsoft they need this feature unless there’s and like I said I’m not the brightest bulb in the Box all the time I don’t know everything I
6:38 all the time I don’t know everything I could be completely wrong and there’s a great use case I just we’ve spent so much time curating this that this one just seems yeah we even spent an episode here and I’ve been trying to argue with myself on the use cases how’s that going not well so not well did you win did you win the argument isn’t the question here I never win those arguments still well the one use case like well maybe it’s a review process for the bi team if it’s in sharepoints like well no because I would want to look at the Dax exactly
7:09 would want to look at the Dax exactly right the model okay well maybe rather than that it it’s a process to go through before you actually publish it to an app it’s like well no that’s what a member that’s what a member role is where they can post a workspace but it’s not in the app yet so they to your point said yeah there was someone who said devote time to this but from any workflow or process it’s a oh neat feature but not much else and and Mike
7:40 feature but not much else and and Mike it just leads to more confusion than and then Enlightenment Enlightenment I will will say this there will be value brought from this feature it will be helpful to go to a SharePoint page and be able to just look at each of the files what I would prefer if it were my If This Were My feature I would prefer a static image of what that report looks like when it was there I don’t want to interact with it I just need to see what it is so I can know which report I’m trying to open and edit and then move forward with that so all
8:11 and then move forward with that so all this to say is it’s going to be an interesting feature I will say this if this were my tenant If This Were my team I would turn this off by default and I would only roll it out slowly after I had some substantial test time with this feature making sure that I had a fair understanding of how this would impact my environment and I would I would talk with my leaders and my center of excellence and say do we really want people using power bi assets outside of powerbi. com and for sharing things you can totally do that today you can
8:42 can totally do that today you can totally have power bi files you can push them up to SharePoint everyone could download them from a team and put them on their desktop and use private desktop to build stuff I totally understand that but I think it’s defeating the purpose of the the Enterprise solution around power bi in general and I would want to educate and or train my teams to be more equipped to be comfortable inside apps and powerbit. com to actually use the content so it’s more of the Synergy of PowerPoint power bi oh shut up I hate that I’m not even no Tommy my favorites
9:13 that I’m not even no Tommy my favorites Synergy it’s not PowerPoint for data it is data for data I’m going to reach out to Lauren on this to get get Lawrence the the author of the blog post we gotta I’ve gotta I’ve gotta understand I got questions I’ve got questions let’s let’s drift gears a little bit more a couple more items here I want to go through as we have an intro here one thing I will point out here Microsoft has just released the Microsoft fabric Readiness git repo so that’ll be I’ll put that in the link here as well so there’s a fabric Readiness repo that has been produced by
9:43 Readiness repo that has been produced by the fabric team and this is a really good repository it looks there’s about seven presentations and it’s talking about that the whole purpose is a collection of presentation Decks that are for user groups online presentations in-person conferences and customer conversations so this is like the sales deck that you’re going to use or go through and understand what are the talking points that Microsoft is using when it comes to fabric I think this is a pretty strong
10:13 a pretty strong solution here I think a lot of people are still asking particularly me I have tons of questions what is fabric how does it work where what are the different people that need to be involved with what fabric is doing so I think this is a great way of reading through these materials understanding what’s going on inside these these presentations that Microsoft is articulating things like it’s a good one using using data science and fabric using data engineering and fabric how do you use real-time analytics in fabric what does data integration look like with data
10:43 data integration look like with data Factory inside fabric just in basic information around fabric for power bi users what I’m finding is fabric is bringing again this is what I said in the past fabric is bringing very much the data engineering world into the business user Arena so I think from a business user we’re getting a lot more capability a lot more Enterprise pieces with fabric and I think I think when we talk about these things like the the way you’re describing them we should we should say
11:13 describing them we should we should say wrapped in fabric wrapped in fact data science wrapped in prep power data science to the next level of analogies I like it that’s good great good wrapped fine so mine’s good this is what I’m contributing yes exactly the plug plug one minor thing here I would also like to point out as well as far as updates there’s been a good number of updates I think this one will be helpful to people I just want to point it out the Microsoft team has now actually rui
11:43 the Microsoft team has now actually rui Ruiz on fire whatever rui’s building at Microsoft I love I love all his things that he’s producing he’s doing the pbip format he’s now data set refresh history enhancements his blog about that one Ruiz just killing it so really keep doing what you’re doing Microsoft pay Ruby double and have him do more he’s just doing such a good job so he’s I just doing such a good job so he’s he’s not a sponsor not a sponsor mean he’s not a sponsor not a sponsor so there’s the automatic refresh tracking has now been improved inside the service so you’re going to be able
12:13 the service so you’re going to be able to see to see when you refresh a data set by default the data set will try retry itself multiple times so you you’ve run into these occasions where I’m trying to load some data it’s taken two hours and then finally the data pipe fails well you don’t know what happened in those two hours what was occurring during that period of time so now there’s more details and there is it’s telling you about the automatic tracking of refreshes it’s giving you when it started and when it stopped it’s giving you a status on each of those refreshes it’s telling you what type of
12:45 refreshes it’s telling you what type of refresh was occurring like a data or query cache so there’s a lot of really interesting new features coming out of this it tells you more about the duration of information so I really like this new data research enhancement I think this is going to be helpful for debugging no 100 yeah like if you guys have spent any time in triaging refreshing issues you so this is this is great because it consolidates probably like
13:15 because it consolidates probably like depending hundreds of rows of just log file stuff you could get at in the execution logs within like the Gateway but if you don’t have one you didn’t have any access to this yeah right so this is really nice just because across the board now you have a little bit more insight in the retry attempts and like where things are happening if something’s delayed which is buried now in a one one-time thing right like for instance if something completed and
13:46 for instance if something completed and it took an hour longer I think this breaks out the detail where potentially you could have had a couple failures first right while it was trying and re retrying but it didn’t kill the initial connection but it showed that it took a lot longer yeah as opposed to just succeeding right because there’s a lot of questions in here that once in a while you’re just like why did it take this long compared to like these other these other attempts and what’s error code zero zero zero what so well what yeah and one thing I love here
14:17 well what yeah and one thing I love here one thing I’m hoping one I love that this is actually going to be part of like the rest API so this is actually an information they’re not just bringing this into the UI this is actually going to be incorporated which is huge to be able to see the refresh history and get store that the one thing I’m really hoping here is they also store this or somehow store also store this in data flows because I believe this is just for data sets so and they’re getting better with the data flow refresh history especially with Gen 2 but all this
14:48 with Gen 2 but all this information is so integral to say okay what should one what should I optimize but two what’s going to be causing this problem there’s nothing worse than waiting 10 minutes for 15 minutes to see something refreshed just to see that error at the end without any real good clue on what’s causing this so one thing I’m hoping is that this is part of data flows one thing I love it’s going to be part of the rest API I would agree I think it’s going to be really good here so anyways they’re definitely getting better about messaging back to the users I think they’re improving the right areas
15:19 they’re improving the right areas there’s definitely a lot more things that are also coming out there’s the the fabric Blog has other things as well so those are the the key things we get get to and talk about so with that we’re done with introductions there’s a lot of moving parts today these days let’s check it into our main topic today Tommy frame us out here so we’re talking about loading data how much data what is one Lake let’s talk more about that frame us out the the conversation for today Tommy absolutely gentlemen now being 250 episodes in by the way congratulations for that
15:51 the way congratulations for that and also Microsoft fabric now being around since May we I think it has come to a point of conversation and understanding well how much data actually should exist in a one Lake we’ve talked a lot about this in a lot of our training too has been on the framework of it’s a data set but should it be more should it be treated as multiple data sets how much information how much tables how much artifacts should actually exist in one in one Lake
16:21 should actually exist in one in one Lake how many data sets should a one like support support can we make that decision yet based on what the information that we know and has our perception changed this is a great question and I like where this is going I still feel that Microsoft has mislabeled what one lake is versus lake houses I still I still disagree with their language here and when I think about or overlay what I understand about like other systems snowflake data bricks other
16:51 systems snowflake data bricks other other systems that are building this Lake house-like architecture I think of one Lake as the lake house it is it is the lake house and then the lake houses are the equivalent of like a a database that would live like on a SQL Server so a SQL Server is a machine it would host many different SQL databases each database is a collection of tables and so if I if I relate what people know about SQL and how SQL servers have been developed in the past right the the one lake is like
17:23 right the the one lake is like all servers in my company the server is acting like almost like the the one like object and then all the databases are collections of tables those collections of tables are the equivalent of a a lake house and power bi. com so I think I think Microsoft changed the language on what these things mean because when you start talking about the security of everything inside one Lake Lake I don’t think we have a full picture as
17:54 I don’t think we have a full picture as to how this will be impactful or how good this will be until we get to more security or governance around what data is being produced across the organization within the one like ecosystem what makes you equate one lake houses with one like like because that can have multiple lake houses in one Lake yeah I think of so if I think about the structure of like a lake house here the reasoning behind this is under what from I gather the one Lake
18:25 what from I gather the one Lake experience is a collection of storage accounts that are just housing information right if you need to localize localize a some information into one like in your region or your country you can do that and that data can be stored inside a particular region you can access it you can read it so if you think of the one like experience as a service service of a collection of storage accounts that all house tables in my mind yes that’s that’s a little
18:55 in my mind yes that’s that’s a little bit broader of a definition than what I would say I use in like data bricks today but in databricks today I build a single blob storage it has almost infinite storage space in there and I just make Collections and each collection is where I can put my files or tables or whatever I need so I don’t like to when I build things for companies we’re not building a series of storage accounts we’re just putting everything in a single storage account which is a lake house for a company company so I I think of I think of the one Lake
19:26 so I I think of I think of the one Lake as this idea of all data that the company owns everything that’s being stored in the cloud is living in that one area yeah but that’s a different like and that’s why I I guess I’m I’m I have the questions I do around your statement saying like you think they mislabeled it or lake house because it it there the the descriptions I have is there’s one one like there’s one one like for the organization and I think there the pitch is it’s one drive for data right yes correct exactly
19:57 data right yes correct exactly and even the experience of it right feels very much like OneDrive and it reduces all of the same setup and like Jeff Kaplan did a video with really good video cube right where if you think about how we had to manually set up network drives to like do sharing of files before OneDrive right a big pain in the butt we’re doing like the it’s very that concept of what we’re doing now with Cloud sources and one like being the
20:28 Cloud sources and one like being the central location to remove all of what are now multiple storage containers within an organization they all serve different purposes one lake is just the storage storage right so I to equate that with the lake house or the one part that you’re building in Analytics I guess it doesn’t make sense because you could have multiple lake houses we can have multiple artifacts it’s the storage engine for all of the data within the organization yes not
20:58 within the organization yes not necessarily like the artifacts that we start to build within right one like that can sit in one like that we can interact with these Delta tables Etc I guess maybe I’m relating more to this concept of it’s not necessarily the technology or where the files or the the number of storage accounts that are associated I I guess I think of when I think of lake houses when I would talk to or communicate to a client or or talking to someone I would communicate you have one lake house I would not communicate to a client you would need to have
21:28 to a client you would need to have multiple lake houses you for certain could have multiple storage accounts that are serving and storing things in there but there but I think I think they got the messaging right I just think the name in my terminology or like where my frame of reference is coming from sure I guess I guess there there may be one lake house you may create a universal place where you have all of all of the data or an organization totally but that’s all that’s all that would be in there is curated information that belongs in the lake house the one Lake
22:00 belongs in the lake house the one Lake contains much more than that right I think I would agree with holistic container for everything within the organization like like think about it this way how many organizations do of right now even in analytics their their production Etc how many lakes do they have stood up in in what how many ADLs Gen 2 Storage Lake City they have set up but this is where e this is where I’m saying I would this is where I would disagree this is
22:30 this is where I would disagree this is where I disagree with the terminology a little bit I think if you think about the lake house like ADLs Den shoe that’s the technology piece and that’s fine you could have 10 15 of those stood up you could have 10 15 of those stood up one for Dev one for test one for know one for Dev one for test one for prod that’s fine when I think about lake house architectures I’m I think I’m communicating the lake house architecture at least how I understand it is more of a concept A conceptual piece that is the lake house is house is is the equivalent to a one Lake it’s it’s the sum of all data that’s being
23:01 it’s the sum of all data that’s being produced anywhere in cloud and but one lake is providing is a management layer to actually say who has access to what pieces of data across what storage accounts let’s think about this in the realm of Microsoft fabric too because remember anytime you create a lake house in Microsoft fabric you’re also getting the SQL Warehouse you’re getting the SQL endpoint and this is my point interconnected but this is this is my point point you’re right you’re right but my point is the lake house as Microsoft called it
23:31 is the lake house as Microsoft called it inside fabric is the word lake house it is a single object that has tables and files I’m saying my definition of lake house is broader than that my definition of lake house would be equivalent to what Microsoft is calling one Lake that’s what I would communicate to a client or something like that let’s have the comments it’s confusing in the realm of Microsoft fabric where every lake house is binded to a warehouse binded to a SQL endpoint that can have a data set to it compared to just being storage and
24:01 to it compared to just being storage and I think yeah this is part of the confusion too but like that’s what it’s not boundary it’s not bound anytime you create a lake house you got to see the modeling yeah you by default so this is so this is where this but let me step back then because lake houses is what you describe because exactly so what what it is Microsoft’s lake house is a bound asset of storage and compute all bundled together and when you create quote unquote the lake house which which
24:32 quote unquote the lake house which which would basically be a container for tables and files as well basically basically what that means for the storage component with that you by default get this SQL serverless machine that goes right along with that and Microsoft automatically builds a schema for you that is the the lake house or the the data set or sorry what’s the other one called It’s called The Warehouse the warehouse It’s it you get a lake a lake house it’s basically this you get the storage account the storage and you get to compute automatically the
25:03 and you get to compute automatically the compute comes in the form of two objects the warehouse and the data set those are automatically produced for you when you create create the storage account so to me like I get it and it makes a lot of sense and it it really is borrowing the language of lake houses to the organization I’m just saying Microsoft had to change the language here because it is this this picture of the world is now much larger yes and to add another fun layer of complexity here if you’re even dreaming
25:34 complexity here if you’re even dreaming of doing direct Lake you need a lake house house if you want to have that refresh capability you need to be using lake house correct because that’s the combined that’s the combined asset of storage and compute in one artifact okay so so where where I can understand your point of view view if if I’m thinking holistically yeah the foundations for which I can store data
26:06 foundations for which I can store data for the organization any type of data for all users is one link it is the one drive for my organization it’s everything data yeah right if you’re saying that the experiences some of the experiences in fabric in fabric interact with or build tables of information which we know are Delta parque right like that’s the
26:36 Delta parque right like that’s the storage within one Lake it doesn’t change the fact that the structure of all of this file storage is one like it doesn’t change the fact that when we build artifacts however we engage with them it’s still Delta parquet yes what you’re describing to me is the interaction by which fabric allows you to interact and create those files but it’s still just the Delta parquet file you can still interact with that file on that framework called
27:06 that file on that framework called OneDrive like one Lake yes correct with other tools as well so you’re describing your interactions of like fabric as being the thing that is lake house one Lake thing and I don’t agree with that I think these fundamentally these are different things this is the structure by which these artifacts are being built and the permissioning across this isn’t relegated to just what you’re interacting with it’s it’s across those objects based on what a person would
27:36 objects based on what a person would have permissions to so perhaps let me frame the a problem having where this really the the agenda came from and maybe that will help frame a little here well hang on before we go on Tommy let’s let’s get to that one but I just I agree with you Seth but I also say I think my fundamental understanding of what a lake house came from was from databricks and so this is this is probably most of where my heartburn comes from databricks already started this language of a lake house and a lake house from a databricks
28:06 house and a lake house from a databricks perspective was all-encompassing all storage things all compute things everything in parquet all of the artifacts in your entire organization lives in this thing called a machine basically a combined storage and compute layer called the lake house so my fundamental understanding of like where I come from is a a organizational concept phase and so what I’m looking at now is I’m trying to take what I know of a lake house and overlay it into the language of what power bi is doing with
28:37 language of what power bi is doing with fabric which again Microsoft wants to be unique and a special snowflake here and design their own language around one Lake was that they aren’t you’re still what you would construct is a lake house within databricks is a curation of ETF like ETL patterns of building out tables structured lake house type data for the organization it’s the same thing in fabric in fabric one one Lake just extends and says you
29:08 one one Lake just extends and says you don’t have to have multiple different storage locations just throw all your crap out into one Lake and everybody who has access to it has access to it just like we do with OneDrive file sharing share access throw your PBX files out there who cares and I’m I’m saying what you just described in the latter half of that is a lake house no it’s not it’s not the same thing as data where are your PBX files in in databricks that’s not part of the Lake House Lake
29:38 that’s not part of the Lake House Lake House is purely I know storage yeah yeah we’re saying the same thing we’re just saying the same thing I’m just saying this you’re you’re pushing on the points that I say is confusing Microsoft has added confusion here because what Microsoft has built or is building is lake house the whole thing one Lake One Security all the things in fabric that to me is the lake house architecture that’s what you want yeah I don’t agree with your your naming but let’s move on okay come back we’re going to agree to disagree so let
30:10 we’re going to agree to disagree so let me share my problems no no moving on here’s the fundamental thing that I I think this this real topic or this question arose from every demo every training every Slide the user stories documentation all revolves around the lake house to the SQL endpoint to a power bi data set the problem I’m having with this right now or struggling with is it’s being a single lake house to a single data set we’re not look I have not seen one demo
30:40 we’re not look I have not seen one demo one one demo or one iota of examples of one fabric lake house supporting 25 different tables to support 10 different data sets that has not been a use case Mike you and I know this too from our demos one lake house is supporting one lake house so far for one data set is supporting a lot of artifacts so I don’t know right now I’m sure it can handle it but is that the intended use case how much data for the power bi reporting for
31:11 much data for the power bi reporting for that final endpoint and that’s the fundamental difference here I see with fabric lake houses it’s the intended end use case here it’s not yes you can use it for storage you can use it for those other purposes but the intended endpoint the intended end result for the lake house and putting data in there and putting tables in there is for a power bi data set or power bi a model not anymore with fabric I’m going to disagree with you here too I’m gonna man I’m I’m on fire so fine but show me a demo or even
31:43 so fine but show me a demo or even Microsoft showing a difference they have it they do data science on top so their their whole ID so this is their entire all within fabric so yeah yeah the idea is what they’ve done in my opinion here is they’ve taken synapse and data Factory and shoved it into Power bi they just they what happened well these are the same services so what has happened is Microsoft said whoops synapse didn’t do well or we’re not getting the right amount of usage in that what we needed to do is bring those
32:13 that what we needed to do is bring those things closer to where power bi is because power bi is the shining star in all of this right every when when power bi came out Seth and I were talking about it like a long time ago we were like anything that touches power bi becomes shiny and if you think about even when I talk about topics around power bi and fabric everyone loves seeing visuals and reports it’s so graphically in your face if you can make a if you can make something that attached to a report people love that and gravitate towards
32:44 people love that and gravitate towards that as a as a topic as soon as you talk about data engineering tables lake houses people are like meh not my not my thing someone else is going to take it so the audience is like when you engage people on visuals and Report design I think people are very excited about it this whole other mundane area around like data engineering and stuff not as exciting it’s not as flashy so you’re now bringing all this and so what Microsoft I believe is trying to do now is say okay you can use a lot of these
33:15 is say okay you can use a lot of these lake house artifacts directly with power bi that is one goal but they’ve added this whole new data engineering role they’ve added this new data science role into the space yes and they really want you to build models and I think there’s going to be now with notebooks and things you now have a whole new workload of building training models training things pushing those things out to Azure ml or Azure ml studio and trying to get a model up and running in production that can predict things in real time it’s it’s trying to be the everything system all of that I get it
33:47 everything system all of that I get it but that still goes to this fundamental question how much data then should exist in a lake house if even to the point of how many tables should be in a lake house should it support a single data set so to support 20 data sets and the data engineering and data or the data science side of it how like where how much storage are we looking at what a single lake house in Microsoft fabric should provide so I’m going to answer this question in a bad way and I’m pretty sure the one leg should contain everything
34:18 the one leg should contain everything okay all like if you’re loading if you’re loading data from a data flow and you’re building like a raw form of that table you’re just trying to Industry load it in fine that should be there if you’re loading if you’re trying to load things from Excel files great put those files in one link so yeah I think I think the design of one link is bring all the information files and or you all the information files and or groomed tables of data bring know groomed tables of data bring everything you need to the one link experience now and in my prior knowledge I would say
34:49 and in my prior knowledge I would say one Lake slash lake house now this is where I think Microsoft changes the terms here the lake house in terms of fabric is to your point Tommy a specific bucket bucket imagine my lake house has a house with many rooms the lake house Microsoft describes as a room in the entire lake house I’m with you so so I am yeah so to this point right now the question becomes okay I’ve gotten a lot of data loaded to my again maybe a better analogy like let’s think of our
35:19 better analogy like let’s think of our one Lake as an as a a skyscraper right the skyscraper has many rooms different offices different functions different teams even sell different rooms to different companies right so you have HR and finance and sales so the that is the building so everything that is being data driven gets stuck into the building now now what room what room what needs to be in room and what is in it do we do how do we decorate it right so yes this this is
35:49 decorate it right so yes this this is the question I think and this is I think where I’m struggling right now is to try and figure out okay what is the level of effort or what do we pull into this one and do I need a pipeline of lake houses and this is this is what really bothers me now because what I think about this I have to maybe I have a Bronze Lake House a silver lake house and a gold lake house and depending on what users I want to access that room if it’s gold it’s groomed it’s ready to go
36:20 gold it’s groomed it’s ready to go I now have three lake houses potentially with different quality of data in each of those spaces and now the sharing mechanism for each of those things are now different but to your point Tommy I get a compute layer I get a warehouse and I get a data set for all three of those spaces maybe I need that maybe I don’t why would I need three different lake houses can I just manage permissions on a layer you well today you today you cannot so you can’t manage permissions down to the
36:51 you can’t manage permissions down to the table or row level detail in a lake house today eventually we should be able to hopefully with one security that’s what I’m reading into that a little bit I would say then that’s a precursor for any engagement here is that level of I would agree and I would say I probably wouldn’t which problem what you’re discussing is like this explosion right I would argue I would argue you’d have one like one one lake house house you’d have your raw current and curated
37:24 you’d have your raw current and curated layers or gold whatever you want to call them are going to Branch out into different business units for different things things right you can start to segment things out or have different areas own different parts of that lake house can’t you not yet yeah yeah so Spencer would be maybe in the future you’ll get table read write access on the individual so this is where I’m struggling right if I build a single lake house for bronze silver and gold all in the lake house I think that’s
37:56 all in the lake house I think that’s going to become a lot of tables and going to become very confusing and what you’ll do is you’ll compensate with table naming schema to be able to sift through the tables to know okay I have a a customer a customer current bronze customer current silver customer current gold like the naming of these tables are going to get so intense that it’s going to be like unwieldy to manage and or govern if you throw everything into a lake house and this is where I think the analogy that Microsoft’s putting together here falls apart right I need
38:28 together here falls apart right I need multiple lake houses to segment what data people should have access to so so you’re saying would you equate a lake house with each business unit then or or a particular data like set of data sources because if we talk like yes what I’m trying to do is build the hierarchy Like Houses would fall like within an organization this is what I’m saying this is what I’m working through especially if what this affords me or what I should be doing as an you me or what I should be doing as an someone who’s building data know someone who’s building data structures is relying on the expertise
38:59 structures is relying on the expertise of business units right this this helps me me have ownership be where it belongs or at least have a Smee right of the business unit engaged heavily in the ownership of data from a particular Source well what would how do we build that like how does that ownership like yeah is that part of something that goes into the full lake house and they own that work stream maybe right maybe that’s prior to like the constructed final forms of things
39:30 the constructed final forms of things right actually you probably would it wouldn’t it wouldn’t own the final well maybe they would own both the input and the final result but like yeah you’d have to segment out yeah I don’t know and I like that but think about the road power Microsoft brings you on when you build a lake house build a lake house ingest in a pipeline or a data flow pushed to a table now what it’s oh here’s the SQL info here’s your default data set for that lake house this is where I really
40:01 that lake house this is where I really disagree or divert from I think that road because I don’t think it’s one lake house one data set in terms of like a or full built data set I see a lake but there’s there’s no reason why you couldn’t I think that’s a lot of effort you’re how much are you refreshing then Downstream you have to refresh these all these data flows to get this lake house I’m gonna get I’m gonna go with a single data set here Tommy I’m still on one well can you take so the question would be is can you take
40:32 so the question would be is can you take a data set and mix different tables from different lake houses nice right now I say that again okay so imagine a data now so imagine a data set and you load potato potato okay but that’s that’s the problem so this is what I’m thinking through finish I missed your sorry I’m gonna go back
40:52 I missed your sorry I’m gonna go back Tommy’s freaking out right now his mind just got blown so we have these lake houses potentially right multiple lake houses inside a fabric workspace so now we’ve added this whole con context of a workspace okay inside two lake houses so I maybe I have whatever for whatever reason I have an HR workspace and I have a sales workspace or sales lake house sorry there’s two different lake houses there does my data set do I create a data set Now the default one is always linked to the in the origination or the origin of that lake house so by default you’re going to get a data set that’s always describing one entire lake house
41:24 always describing one entire lake house yeah you can adjust that what if I made my own data set and then I pluck from lake house number one and pluck some tables from lake house number two and so now I have a single data set that’s using direct link to both sources where I’m joining the data together I see no reason why this would not work or no reason why technically this wouldn’t mean mean shortcut to those not even shortcuts I’m talking about the data set literally having a data set with two tables in it that you can then build relationships to in the data set but now I’m pulling from
41:54 in the data set but now I’m pulling from two different lake houses with dependencies and you lose directly it becomes important at that point it’s simply a connection to a SQL endpoint it’s no longer a connection to a lake house once you go to that I don’t want to get it in the connection method here but but I’m just saying this to me this seems like something in the realm of possibility like the tech if the technology exists to have a directly connection to a single lake house why would it be so hard for Microsoft to figure out how to pull another direct Lake connection to a different Lake we’re not going to get there we’re at
42:24 we’re not going to get there we’re at that point yet and I think like it talking okay talking about the realm of a single lake house because we’re not there yet to your to your scenario Mike is simply just connecting to the SQL warehouses endpoints and it’s basically a bunch of databases at that at that point this is not really Encompass I love SAS point to about the permissions which is a whole other layer here yes talk a whole other layer here on how much should be in a lake house yes but thinking about how much then what does a lake house a
42:55 much then what does a lake house a single lake house support I think I think a lake house will support support so I’m gonna step I’m gonna answer your question with another question and a statement statement questions okay maybe something like that maybe it’s a question I don’t know what I think needs to occur here is there needs to be a strategy around this and so where I I’ve been doing lake house development for the last five years or so so I’m I feel like I’ve got some strategies that seem to work well and when you think about what other teams or other companies are doing
43:26 other teams or other companies are doing in this lake house type architecture let’s not talk about the technology I’m not talking about fabric for a moment they’re building bronze silver and gold bronze is like that raw layer of data you’re streaming information in you’re getting a a ever a pending table of data the silver layer is all this transforms you’re you’re deleting any duplicates of Records you’re you’re simplifying the data maybe you’re building a table that has historical records maybe you’re building a table that only has the most recent value of a single record right current and historical records
43:57 current and historical records the the engineering the grunt of your engineering work is in the silver layer at the end of this you join tables together you make materialized views you build it you build a final table that looks very close like a star schema so I don’t want to do engineering inside power core anymore because I have the power of the lake house I want to build all the dims and facts as needed and I literally my my power query steps inside power query should little be two steps
44:27 power query should little be two steps source and change data types for power bi that’s literally all that should be happening at the power bi side for the data sets standpoint yeah so being that’s my architecture I think depending on what quality of data you care about that’s what should be in the lake house so if you have power bi reports pulling from the lake house I think you should be thinking about the tables that live in that final lake house element again whether you make this one big lake house and you just name the tables a separate way
44:57 name the tables a separate way bronze silver gold tables I don’t care doesn’t matter to me but regardless at the end of the day the data pipeline has to exist and that final lake house becomes the dimensions and the facts that support the data set the final objects you’re creating become the artifacts that you share regardless of whether that’s through Power bi data sets or what I could agree yeah it’s it’s the shareable artifact it is confusing because it’s all part of the lake house sure right like yes Incorrect
45:29 lake house sure right like yes Incorrect and maybe and and this is like I don’t is this part of the conversation where we’re rolling a lot of different layers of layers of objects that we’re creating into this conversation whereas what’s really relevant at the end of the day is the final product right yes and that is a a set of data sources or data sets that make sense to the business and are usable usable and and that could mean when I when I say data sets I don’t mean models per se
46:01 say data sets I don’t mean models per se yes correct what I what anything probably models but could also be you probably models but could also be views or compiled tables that know views or compiled tables that provide hard hard Road level data that business business areas want to have access to that are Consolidated from many different sources within a lake house and I think that’s the power of pulling all of this information together is simplifying the
46:33 is simplifying the access to access to and the ability for data people and teams to create queries or build data sets very easily and that’s the power of something like power bi can do on a very front-end level but it’s not Enterprise right right it’s not repeatable yes right and I think that’s I think that’s the the most fun part about it I like that definition this ecosystem here in how power bi can be leveraged in an
47:04 how power bi can be leveraged in an organization is it is it can be used as a discovery tool tool whereby you are pulling together sources of data you’re doing Discovery you’re figuring out how apis work you’re you’re connecting to all of these systems that organizations have and you’re learning what you would need to to build the actual engineering Pipelines Pipelines because at the end of the day the thing you probably built a sliver of for a report problem or some an objective
47:35 report problem or some an objective within the organization and you you give a very fast result to the business which is valuable you can then take those learnings and build in that to Lake House processes which are slower they’re going to be slower but the next time the business comes around and says okay hey we built this thing now I want to see this this it’s not a three-week exercise yeah it’s a two-day exercise right yeah and you can give them the data in many different forms or if they said hey we did this thing
48:06 did this thing what I’d love to see is how we would connect that with this and this and this and if you have these processes in place and you’ve Consolidated your data in a lake house you can be like okay give me 10 minutes and I’ll I’ll join that I’ll do that query for you yes exactly this is that’s the value of why we compile all this stuff together and so this is where I’m leaning or maybe it would be a better way to say this is where I want to lean where there’s almost the front there’s a framework of Department level lake houses which could support the data science can we support creating views
48:37 science can we support creating views from the warehouse that’s created and supports all the tables for a single Department at the the end result or the final lake house to me that makes the most sense it could support multiple data sets with multiple different schemas and relationships from a modeling point of view it can support any of the data science there because it’s already curated and created and can support the views to your point from the warehouse where I don’t have to create these multiple lake houses per like one or two data sets yeah I guess I guess the question I would have is how
49:09 guess the question I would have is how you delineate building those objects I guess matters less to me than the than the the framework that we have within fabric that allows us to interact with the objects that we need to yes when we need to yeah right so because I guess we’re a business unit and you’re going to go build your own lake house I don’t care building I would care is if I if I use the culture Dimension that I made if I wanted to access your data and pull it into yes information I have I I don’t
49:40 into yes information I have I I don’t care how you built it if you want to own the object and you did that in a different lake house I don’t think that matters in this ecosystem I could make an argument that well that doesn’t that doesn’t that doesn’t equate to its own lake house I’d just give you a schema within mine and Tommy I think you’re making a good point here around like we could talk about like a department level goals and they have their ways of shaping data for their department and what they need but there also are when you guys are talking I’m running down things that
50:10 talking I’m running down things that think fabric is the repeatable data engineering for the Enterprise that is what it is across all business users and I. T it does it serves all purposes but one area that I’m I’m struggling with here a little bit tell me that I’m challenged by when my statement you said was I feel like like I feel like the the idea or the concept of of a single Department building their own like house I think I’m very I’m very much into that I think it makes sense but I do think there is a we need to
50:40 but I do think there is a we need to step back as a as a as the company looks at the data they have for their entire company and say we have there are things there’s artifacts dimensions for customers dimensions for products there are things that are cross-company every Department should be looking at the same stuff and and to me this is where and this is and this is where in the same way we have potential data sets in a
51:06 way we have potential data sets in a separate deployment pipeline of Dev test prod data sets to a workspace that is devtest product of reports right the same respect of having that delineation between the data sets and the reports all that fabric does for me is adds another ability or another layer of control another layer of lake houses and where we could say as a company a holistically here is dim customer okay Department build your own lake house but here’s a shortcut to these other tables
51:37 here’s a shortcut to these other tables that are groomed holistically from the entire company and there’s a team and this is where maybe the central bi team for representatives from different groups say who owns the data that’s going to be in this dim customer table does that all come out of Salesforce are we making up our own thing are we trying to load stuff from crappy Excel like what is what’s going to give here and and we figure out this gives us the opportunity to really step back and say what is the data that drives our company and I think on the projects I’ve been on recently there’s a lot of value
52:07 recently there’s a lot of value if we’re talking like the commercial team the operation team the revenue the finance team like those teams have a lot of the same data sources and when you talk across all of them and have a more holistic view of your company yes you may go ahead and build your own lake house for what you need but there is parts of this that should be pulled from a central place yeah and so to me this is I’m so excited for where fabric is going it’s it’s still Rough Around the Edges I’m still not recommending anyone go use it today day one you
52:37 anyone go use it today day one you anyone go use it today day one it still needs some refinement know it still needs some refinement and I don’t think I don’t think anyone should use it until we start talking about more like one security that’s on top of all this right yeah once that gets there I think I’ll feel much more comfortable about permissioning because as of right now you can only permission access to or not to an entire lake house a lake house in itself and I think this is why I said this is where I want to lean to your point that’s absolutely having those Master tables which are essential right to have Central you gotta have them even date for crying out loud having that
53:08 for crying out loud having that somewhere in a Central master lake house your your sister Bay of Lake houses so to speak Wisconsin reference there for some of you but but but no but I I but I I’m agreeing with you where I I’m I want to lean that way but I also think the architecture or the technology or even the UI can really support that right now that I agree it’s not there yet that’s what I said it’s in preview it’s still getting things figured out I think the from what I see if I can read the tea leaves and in the
53:39 if I can read the tea leaves and in the tea here a little bit what were Microsoft it feels like they’ve got the right design and vision for where this is going and I saw a tweet from Amir who’s leading the helm here he said another massive update for I think it was August another massive update for August get ready there’s going to be tons more massive updates for now until forever on top of this this this this fabric thing so I I really do think Microsoft is very much pushing this one and you’ll notice actually Donald just put a note here SQL
54:09 actually Donald just put a note here SQL bits 2024 just got announced so we’ll throw a quick you heard it here first people SQL bits 2024 is out the 19th to the 23rd of March in fan bro I don’t know people people from from across the pond they’re gonna be like yeah what is that yeah Google it immediately Googling we’re slaughtering in it so but there’s also so with that
54:40 so but there’s also so with that conference there’s also another one that Microsoft’s putting on in December there’s an announcement from the December conference for data and AI so It’s when you look at the speakers mean it’s when you look at the speakers it’s all about Fabric and it’s all speakers are on the fabric pace so Microsoft is super doubled linked down on you enter you have your operational system and then now you have fabric so literally that’s the only solution you’re going to need for anything in your company will be go build your you your company will be go build your dynamic side the Enterprise side is
55:10 know dynamic side the Enterprise side is a very compelling story it’s getting better and if all these pieces start pulling together right it it makes sense in in many many ways yes I think this opens doors and we we talked about this I think in our last in the last podcast right where there’s a definite possibility even with all the AI introductions and well and you throw that into the conversation right in fact
55:40 that into the conversation right in fact every element of this simplified ways for more and more people to get access to these systems and be a part and a player you’re you’re bringing very complex Technologies and systems and you’re lowering the bar of Entry yes which which opens the doors for a lot more ownership to subject matter experts within businesses it’s a big deal so to answer this question in one statement here at the end I’ll give them my view on this one right how much data do you load
56:11 this one right how much data do you load to one Lake everything everything gets landed into one Lake my caveat to this is you need a plan to figure out on how you’re going to bring everything in and what is the method by which you’re going to find data stewards for that information and how will you find a proper method to communicate and distribute that content back out so to your point Tommy right our our lake house is is the skyscraper with many rooms with lots of data fill in all those rooms it’s gonna it’s gonna have everything in it great if I have the
56:43 everything in it great if I have the right level permissions I can go to any room in that building and look at any data anywhere in our organization that’s exactly what we want in addition to this it lowers the barrier of we’re not having to worry about building the building or building the infrastructure all that’s being managed Now by Microsoft and we now have tools at our disposal that are all software as a service this is the way the market is going and I find the speed of innovation has very much increased when you start using tools like this the challenge that I think we’re going to face is we need to have a plan on what information in the same way we’re
57:14 what information in the same way we’re going to have report proliferation because it’s javen’s Paradox for data now so we just talked about John javen’s Paradox around the report building experience and the bi that goes along that this is the same effect that’s going to be happening with creating tables of lake house data the organization will be able to at a much lower cost time effort build a ton of more tables inside the lake house how do we find which are the best pieces and sources of data and how do we Elevate
57:44 sources of data and how do we Elevate certain portions of that data above others so that it becomes certified and regularly reused that’s going to be the challenge the technology has gotten better better we need better policies and we need to educate our people to make sure that we can we can keep up with it we’re not going to be able to keep up if we don’t educate that’s my point Seth any final thoughts my final nine points I I think I’d like your final nine points there we go I I think I I agree with you a lot of the
58:16 think I I agree with you a lot of the foundational things that we’ve been doing still apply it it is what’s changing that is how how we how we bring those to light or make the best use of them within the organization and there’s the people people aspect of this we 100 bill we can bring in more people in Frameworks like this to take ownership of data and that changes how we go do things or build things within the organization so exactly how we architect the certified data objects and
58:46 architect the certified data objects and reports can change completely and be much more efficient and that’s what I I’m pretty excited about tell me anything yeah I think the last thing I I really foresee where we’re going with this with the user enablement and empowerment we’re getting less to I don’t want to say there’s going to be different words than just data engineering and what Fabrics really going to open up for the gold boy yeah oh boy so we’re really opening up something that I I still don’t think we we see the potential here I I still
59:18 we see the potential here I I still don’t think we see what the final impact this is going to have on businesses both from your bi all to your point Seth all the way down the business to what impact this is going to have and I think good we’re planning it but continue I think the knowledge and also on understanding what can it can it hold and talking about what is the final output what is that final outcome we’re looking here I would totally agree with these things with that thank you very much for
59:49 with that thank you very much for listening and hanging out here with the podcast thanks for letting me argue this thing out with Seth and and share some of our ideas here this is healthy for me it helps me really refine these ideas and what we’re thinking through here I hope you are also exploring what does this fabric Thing mean for you I think regardless at the end of the day this is going to change how you build and work with your data it will be fundamentally different after you start really using Fabric and start in getting engaged with it and I’m very excited for it let me be clear I’m very Pro fabric I
60:20 it let me be clear I’m very Pro fabric I think this is the way that Microsoft needs to go to remain competitive I’m just trying to make sure I understand where is the best place to spend my effort what should I learn and how do we govern it thank you for watching we appreciate your listenership please like And subscribe if you don’t mind we would love it if you just share this with somebody else this is we think of really good content if you liked it and you hung out for the whole episode I think some of you have please share somewhere else let somebody else know that you enjoyed the content as well and maybe you didn’t learn anything but maybe you got a couple kicks out of Mike being frustrated with the lake house
60:50 being frustrated with the lake house with that Tommy where else can you find the podcast you can find us on Apple and Spotify wherever you get your podcast make sure to subscribe leave a rating helps us out a ton if you have a question an idea or topic that you want us to talk about in a future episode head over to power bi tips podcast and finally join us live every Tuesday and Thursday 7 30 Central and join the conversation on all of power bi tips social media channels awesome thank you all so much and we’ll see you next time
Thank You
Want to catch us live? Join every Tuesday and Thursday at 7:30 AM Central on YouTube and LinkedIn.
Got a question? Head to powerbi.tips/empodcast and submit your topic ideas.
Listen on Spotify, Apple Podcasts, or wherever you get your podcasts.
