Overfed and Sick Golden Datasets - Ep. 520 - Power BI tips

In Episode 520 of Explicit Measures, Mike Carlo and Tommy Puglia unpack the latest Power BI and Microsoft Fabric topics from the show. You’ll get a quick read on the episode’s biggest ideas, why they matter, and where to dig deeper in the full conversation.

News & Announcements

No linked announcements were available in the episode description for this post.

Main Discussion

This episode covers the major themes, opinions, and practical lessons Mike and Tommy surfaced during the conversation. The transcript below captures the full verbatim discussion if you want the exact phrasing and context.

Mike and Tommy react to the episode’s biggest Power BI and Fabric developments and explain what stood out to them.
They connect product announcements to day-to-day practitioner decisions instead of treating the news as abstract roadmap chatter.
The conversation highlights where teams can move quickly, where they should slow down, and what tradeoffs deserve attention.
They share candid perspective from real project work, which gives the discussion more practical value than a headline recap alone.
The episode mixes tactical advice, opinionated takes, and a few forward-looking predictions about what listeners should watch next.

Looking Forward

If this episode’s topics affect your current Power BI or Fabric plans, use the transcript and linked resources to identify one concrete change you can test with your team this week.

Episode Transcript

0:02 Lighting up the sky. Dance to the day to laugh in the mix. Fabric and A. I get your feels. Explicit measures. Drop the beat now. P feel the crowd. Explicit measures. Good morning and welcome back to the Explicit Measures podcast with Tommy and Mike. Good morning. How you doing, Tommy?, I I look at the intro now and I what? I think we were

0:32 now and I what? I think we were so young. so young. Oh. When the earlier photos Yeah. Yeah. Yeah. To be honest, Tommy, I’ve looked at some of the earlier episodes or thumbnails that we’ve done. Like, so we have, for those who are watching the podcast, the thumbnails that we put on YouTube,, we took pictures of ourselves like in the beginning of like expressions and how we looked and the earlier photos, we look way different. We look a lot younger. That was five years ago. Six years ago. I I like how we traded facial hair. We did. We’ve traded. I keep getting

1:03 We did. We’ve traded. I keep getting comments about bring back the beard. Bring back the beard. I like the beard. I I do too. The wife is original Carlo for me. That is the OG Carlo. That is That is the That is the refined Carlo. The vintage Carlo. Oh goodness. Oh wow. let me get into the what is our main topic for today? and then we’ll come in some news and we’ll banter here for a bit at the beginning. All right. So, the topic for today is a mailbag that’s quite long. There’s a lot of there’s a lot of

1:34 There’s a lot of there’s a lot of items here to unpack. Okay. but really the idea here is overfed and sick golden data models. What happens when they are used for a number of years and they just get big? They get bloated. There’s a lot of things in them. What happens now? So, we start having this problem of just keep piling things on. They keep adding to it, keep adding to it., and and over years it grows into its own beast. Someone knows how to build it, but may not be documented very well. So, let’s

2:04 not be documented very well. So, let’s we’re going to unpack that topic today. What happens when your data set grows to this monstrosity that you have to manage? Okay, before we do that, let’s go through some news items. Tommy, what did you find for us? Yeah, so I think we got three pretty good ones here. We’re going to start off with a blog that just came out April 9th, so it was last week. Associated identities for items. What is that? So, fabric’s introducing associated identities for items which allow lake houses and event streams to run using a

2:34 houses and event streams to run using a service principle, a manage identity or another entra entra identity rather than depending on an item’s owner’s user account. So this just simply allows people to,, the a dependency on a on an item’s owner if they’re not there or they’re not have they don’t have access to it. so other people can run it or those identities can run it, which is pretty cool. So let me unpack what this means. Again, this is meta at this point.,

3:08 this is meta at this point., right now people have this concept of so let’s talk about people’s mental model of like what when you look at identities inside fabric what does that look like to you right and I think this this kind to you right and I think this this informs some more of that and let me of informs some more of that and let me see how you read this article Tommy if this makes sense to you as well right now when you talk about an identity usually we’re talking about a person right so a person is someone who’s logged in to power. com you go to the workspace and then you add users to view, contribute, member or admin the

3:38 view, contribute, member or admin the workspace. Those are identities that we have. In addition to that, we have some programmatic identities. We have a app registration, which is a registration or an identity that lives inside enter ID that just it has like a not really a username and password, but it does has like a gooid and a token, which is like the username and password, but it’s something an identity that you can say, look, this identity has access to an item. So it’s an it’s an it is well think of it more like a service

4:08 think of it more like a service principle. In addition to this we just got the workspace identity. So now workspaces have their own like ability to have an identity. And I think this is used when you have I have a workspace that has lakeouses in it and I have another workspace that has semantic models in it. Right? So the lakehouse that has the semantic models needs to have permissions to read direct lake to the lakehouse which is in a different workspace. So now you have to start

4:40 workspace. So now you have to start administering okay what items in a different workspace can talk to items in your lakehouse or data engineering workspace. Okay. So right now we’re at three different identities people service principles and then workspace identities. Tommy, you’re saying that you’re telling me this one is another identity and now it’s per item. An identity is appearing, right? So, you don’t have to necessarily create an identity per item, but you can

5:10 create an identity per item, but you can associate a item with a certain identity. Got so I have four lake houses, I have four different identities. I was going to say because that’s what my initial impression was is like every Oh, no, no, no, no. This is an association part where you can associate some identity to I guess this this would make sense around the two programmatic ones, right? App registrations or workspace identities. Those are already identities that exist and I’m going to associate something with that identity. So a lakehouse

5:40 lakehouse doesn’t have an identity, but it can borrow or associate to some other programmatic identity. Right? So I think a good example is say you and I are working at the same company and I created a lakehouse. I owned a lakehouse. managed the the flow of it. I went on a vacation in Sicily and never came back. And you’re like, “Hey, this lakehouse is not refreshing.” Rather than you not being able to access it, you have the identity that’s associated with it to help manage it or then give access to someone else. But

6:10 or then give access to someone else. But basically act on behalf of the user in so many words. The part here that I think I’m really most excited about here is is identities identity updates can now be done via API that the fact that things are now starting to be turned on to more API based things programmatically I can go fix stuff especially when things either to your point Tommy you leave something breaks traditionally we’d have to go into the UI to to adjust anything but now that you can identity update things with the API that’s going to be I

6:41 things with the API that’s going to be I I think that’s going be a big win. Well,, fabric was sorely lacking with APIs initially,, when I felt like it was more important than it was with PowerBI to have an API call, which it did not. It was very limited. We’re finally seeing an admin can manage fabric fabric thoroughly through the API at this point, which is, I think, essential. I need to learn more about this one. I don’t have to go into deeper to the article here, but for those who are

7:11 article here, but for those who are interested in using these identities and understanding what they mean, here is the article here attached in the in the in the description as well. So, this will be in the description of of the video here, but also in the chat window as well. There’s also a link at the bottom of this article that talks about the actual learn. microsoft information update and item identity API. It’s currently out in beta right now, so it’s very very early. I would not rely upon this. This is more of like just experiment with it for now. But the learn document is also here in the chat

7:42 learn document is also here in the chat window as well if you want to go learn more about this. I’m not seeing really clear examples of why I would want to use it. They’re just explaining this is how it works. Did they say beta or did they say preview? So they say preview in the article you sent Tommy and then the the the document link when I go when I click on the learn documents it says managed identities associate with fabric items beta. So beta. So that’s different than preview. I I don’t know. I think it is. I I would

8:13 I I don’t know. I think it is. I I would argue a beta is a preview. I would argue that too except they never use beta anymore. I’ve never seen that before. This is the first time I’ve seen a beta release of something. Okay. I guess. Anyways, so this is out there now. U go check it out. I just wish there was a little bit more of like a clear example, right? Hey, here’s when you would want to use this. Here’s like I get the idea that you want to have identities attached to things as opposed to users attached to things. maybe

8:44 to users attached to things. maybe this is the idea of the lakehouse when you create. So maybe again I don’t understand yet, but I unpack this one. Tommy, when you create a lakehouse today, currently you create a lakehouse, it has your name on it. You can’t change it. You can’t just switch that easily. I like change ownership of data flows gen one. gen one. Correct. You can’t just go in and claim it again and switch it out. This is one that’s a bit more permanent. So, in this situation, maybe this API is actually doing that. Maybe it’s going to allow you to switch out those identities

9:14 allow you to switch out those identities programmatically. So, it says that they’re starting with lakehouses and event streams. Those are the two that they’re starting with. So, that would make sense. So, now I don’t have to go make a help desk ticket to switch out the owner of a lakehouse. Interesting. Okay. Good topic on that one. All right. What else you have for us, Tommy? Another news article. This is a fun one. I think you’re going to actually I’m curious if you’re going to like this one. Shortcut transformation. So, a little more shortcut updates here. Turn files into delta tables without a pipeline.

9:46 delta tables without a pipeline. What? So we’re finally enabling teams to convert files CSV parket JSON reference and one lake shortcuts directly into delta tables without any code without using a pipeline. Fabric handles the ingestion the schema the synchronization and the incremental updates automatically dramatically simplifying data engineering. So ETL that usually requires pipelines or orchestration flowing time or really we’re immediately eliminating that the shortcut

10:17 eliminating that the shortcut transformations that overhead the transforming data are continually put in place and synced and really there’s no architecture. It’s just simply you have a shortcut to a file make it a delta table. table. this is one of those easy buttons that goes to me. I immediately put this in our label of too easy. Is it too easy or is this great? great? Okay. Okay. I don’t think you can make I can I don’t think you can make things too easy. The only thing I would say about too easy is we don’t have enough controls. There’s

10:48 we don’t have enough controls. There’s not enough B knobs button. It’s in that it’s for us. Do we have enough control? Like you can always make it easy but like is there enough control? Right. Do that simple versus I actually can manage it. it. Yes., I one I really like this pattern, Tommy. Like it it seems silly to bring a bunch of JSON files or things into the lakehouse and not have them like like there it’s consistent. Like we know what’s going on., one of the things I’m I think about this one, this is what I’d like to test is the syncing

11:18 I’d like to test is the syncing capabilities, right? So what does that what does that mean? Is the file updating? Do I put files in a folder and you point at a folder and load everything there? What what do we mean by this? If I update the file, is that when it syncs?, is this an always on thing where it’s always looking at the file and trying to see, did it change? Did it change? So, I want I want a non I don’t want a active pipeline to do this. I don’t want an active action. I want a passive action, right? I want the

11:47 want a passive action, right? I want the file file Yeah. Yeah. I want the file to update and then the shortcut just figures itself out. Well, no. So, it even talks about nested folder structures and it’s an always in sync with the source data. So, if you have a folder, even if you have nested folders or subfolders in there, it’s going to look for those changes and still automatically update. Mike, I look at this and,, I go back to something I said a year ago that I think we’re going to see more and more of of the evolution of data engineering architecture like the

12:17 engineering architecture like the Medallion approach. Yeah, Yeah, I’m not saying it’s getting it’s going away, away, but it’s shifting. It is shifting. Yeah. Because the reason you had the medallion architecture before was out of necessity. You had to load files initially. You had to get that into a store place to make a table from that. Well, this is at least this preview beta is eliminating that at this point. So, we’re seeing more and more a direct way to get my tables into a lakehouse.

12:49 lakehouse. This is This is I don’t know how to really articulate this Tommy. There’s now so many ways to get files into a system into a delta table and then once it’s in delta table then is there additional transforms? Do we got to get it to I’ve seen some people complaining about the fact that dataf flows gen one is disappearing and one of the major complaints is I don’t want all the bloat that comes with all these new things in fabric. I don’t want to have to go teach all my teams how to do all the things inside fabric.

13:21 how to do all the things inside fabric. It’s becoming it’s very nice we have many different options to do something but how do I know how to load data anymore Tommy? Is it copy job? Is it a copy activity in a pipeline? Do I use a notebook? Like there I like that we have a lot of options but now you have to be somewhat knowledgeable around all the different solutions of which way I’m going to bring in information in. And I I it’s so new, there’s not enough market data or people in the industry or MVPs testing each one

13:52 in the industry or MVPs testing each one of these. Someone’s got to sit down and start doing benchmarking. I’m waiting for MIM to show up and be like, “Okay, here’s how this works and here’s how this performs against this, this, and this, and, do I do I want to use Delta format or do I use duct DB for something?” like he’ll come out and give me a whole bunch of weird analysis on things, but I feel like I need a little bit more assistance on like, okay, there’s a lot of things to choose from here. And the reason I say this is because at some point I’m just going to want to talk to an agent and say, “Here you go.” Like, so when I look at Alex

14:22 you go.” Like, so when I look at Alex P’s CLI tool that he built, which was basically I you talk to talk about your business problem. Here’s the business problem. Here’s what I want to build. and it says, “Oh, here’s what I think you want to create.” And it then goes out and builds all the pieces for you and and basically creates the items inside Fabric for you. That’s what we need to some degree. Here’s here’s my input. Here’s the schema. Figure it out. Okay, agent, you go decide the most efficient way to run this. this. So, I think there’s more to the story

14:53 So, I think there’s more to the story though, at least the first part that you said around like we need to benchmark all this. I I think that’s absolutely true. However, Mike, if let’s say all things were being equal, maybe there’s a,, insignificant cost in time for each of these options, users. Even though cost may be higher for one or less for another, are not necessarily always going to go with the least cost option because to your point, how many different avenues or approaches or toolings do we have right now to get

15:24 or toolings do we have right now to get data in a delta table? More than five. I and I that’s being very conservative right off the top of my head. Probably seven seven plus at this point if I thought of all the different approaches I can take in the fabric interface I can create a pipeline I can copy jobs I can use this I can use a data flow I can use a notebook. Yep. Yep. Okay. So with all that being said I go back to back to a conversation we had a while ago. It

15:54 a conversation we had a while ago. It was one of our episodes around choice. Yes. Yes. And I don’t Yeah. So like the mental thing and I think a lot of users are going to struggle with this because there is not necessarily one better than the other in all cases and especially for a team here and I think that’s going to be a point of struggle. If I saw someone, let’s say you go into an environment, you see a company who is using this new approach, are you going to switch them to a notebook because that’s a better approach, but this

16:24 that’s a better approach, but this works. So I think this there’s a lot more to the story than just benchmarking in terms of the preferred approach. Yeah, I agree with you Tommy there. For me, me, one thing that this resonates with me is lakehouse and storage accounts is is one leg basically. Mhm. Mhm. Is a is a core tenant of fabric. Sure. So, so what so whatever whatever we’re doing in whatever technology how things are being handled the to me the main message here is one

16:57 the to me the main message here is one lake is essential and and what we will be using for everything and and so whether it’s this easy loading system here here Microsoft’s just making it really easy to load anything you want to the one lake lake and so I think as we think about data and agentic spaces now I’m actually doing a lot more research and spending more time thinking about where does AI and agents live? How do we get them to work on stuff? stuff? At the end of the day, everything is landing in the lakehouse, the one lake.

17:28 landing in the lakehouse, the one lake. That that is that is so essential. So, again, how we get it there, I’m a little less worried about. There’s going to be options. Great. Love it. I’m not No, I’m not, I’m a professional in this space, so I’m always going to be looking at like building the solution, but building it once and optimizing it immediately as I build it, right? So you get run time on these different loading methods if you try them all and build different loading patterns with them. At some point in time you just figure out this is efficient. So I see almost all software

17:59 efficient. So I see almost all software projects going through a pattern of phase one is just get it working prove the concept works. Does it add value? Phase two seems to come back and optimize and per and performance tune. I don’t know how many customers are on that second step yet, but I think a lot of times we we’ll come in, we’ll build build something, make sure it works, get the value out of it, but we do need to have concern or thinking about what is the optimization step looks like, how do we make it better? And I think,, Alex Powers has mentioned it before, customers are

18:29 mentioned it before, customers are starting to minmax their fabric subscription, right? Minimize the costs, but maximize the value out of it. And I think that’s a fair that’s that’s true. Like it’s what you want to do. Oh, no., I think that’s enough evolution for most technology. So, you’re telling me, Mike, you walk into a company and five different departments are all using five different ways to get data in the lakehouse. One’s using notebooks, one’s using data flows, one’s using pipelines. That’s totally okay. If you’re just getting things off the

19:00 If you’re just getting things off the ground, yes, because the the whole point is are you getting value out of the process? Right? If I’m using a data flow gen 2, yeah, it’s probably the more expensive route to run. But does it does it get value? Is that pipeline generating for you revenue? Is it saving you money? Is it is it becoming integrated in your deeply integrated to your processes of like how you do your business? business? If it works, it works, right? It gets the job done. There’s always a trade-off though of price to balance of is that

19:31 though of price to balance of is that process making enough value versus the cost it takes to run it. And so later on once how it works and again I I’ll argue this for software as well too. Once the software is working it’s much easier to go back and say okay let’s review what it’s doing right we now have the spec. We now have the the diagram the instructions on what it needs to do to add value. I think there’s a lot of time that we spend as analysts, Tommy, where we’re just focusing on what does the customer need

20:02 focusing on what does the customer need to get the value, right? Yeah. Yeah., but I I’m struggling with this one or I haven’t made a clear decision yet on Why does it matter if the business is going to Why does it matter? It doesn’t matter. It doesn’t matter. It does not matter what process people do as long as they get the data in it. It’s doesn’t matter right this moment. So you’re always you’re you do a lot of these absolute things. Well, you you make this a very absolute thing. If the

20:33 make this a very absolute thing. If the team that you’re working with doesn’t like notebooks and doesn’t know how to use them, I’m not going to go force them to use notebooks, even though it’s the most efficient way to go. So I don’t think you need that’s not what I’m saying. I’m not saying there’s only one clearcut way. What I’m saying is you do have to decide at some point the appro some approaches that people are going to do not whatever is best for that particular user because at some point Mike when something breaks someone has to go in what I am saying what I’m trying to say is the approaches that people do take

21:04 is the approaches that people do take I’m not saying they necessarily have to be approved but just like if oh I’m using an API to get data in that’s how my team’s doing it right well you’re the only person who can manage that right on your team because you’re the it’s that head knowledge idea. I’m trying to make an argument here that yes, there are a ton of options. But if you are dealing with an organization, I’m as a consultant not going to make a job all my jobs APIs and just hand that off to a company. I’m going to do something that’s going to be able to be handed off

21:35 that’s going to be able to be handed off a skill that can be transferred easily, right? And I I don’t think I care. I I think you want to let the business So this is the this is speaking to I understand what you’re saying Tommy. Yeah. Yeah. And I I understand you’re trying to get people to understand like what they build, how they build it, make sure it’s sustainable. I think that’s what you’re speaking to. 100%. 100%. Where I’m going with this one is like Roach’s Matthew Roach’s pyramid of enterprise data warehousing down to personal uses, personal reporting, right? There’s going to be a lot of

22:06 right? There’s going to be a lot of I would I would agree with you Tommy when we’re talking about the centralized BI team the the ED the EDW the centralized team that I agree with right having the top of the pyramid locked down and saying these are the patterns we like to use and everything will conform to them. You can set standards for those and those become your items for how you manage and and distribute data through the central BI team. Right? that has a different level of standards that you can apply because it is the w

22:37 that you can apply because it is the w the largest audience of people that are using it. As you go further down the pyramid, you’re getting smaller groups of users using the data. While very impactful, maybe I’m not saying there’s not I’m not talking about the the impactfulness, but as you go further down the pyramid, you have more flexibility for ah it broke, it’s not as big, the audience isn’t as large, right? those tolerances are not controlled by a central managed team. It it doesn’t need as much rigor on those

23:07 it doesn’t need as much rigor on those personalized or team or department based reporting. So I think in there I’m willing to give more liberty to pick what you pick what pick what works for you. But I would agree with you Tommy like if someone wants to delegate or relinquish hey Tommy like I’m a business unit Tommy we have this really great process I’ve made a data flows gen two it works for me but it has like a lot of steps in it and it’s just

23:33 like a lot of steps in it and it’s just slow but I want to hand that back over to you Tommy and say look I need you Tommy to own this as part of like the central BI team. I want you to make this like part of the standard process. And at that point in time, we need to do the data transition handshake between okay, me, the business user that made something that worked. Tommy goes, okay, great. I see what you did. I understand what you wrote. I’m going to reinterpret that and we’re going to make something else that’s a bit more performant, something you can maintain and you could

24:04 something you can maintain and you could actually put your stamp of approval on because we’re going to go through the data certification process. Right? To me, that’s when you’re allowed to apply those what you’re saying is rules, standardization elements at that upper echelon of,, it’s now higher end. It’s on the EDW side of things. Does that make sense? Yeah. I think what we’re speaking to here is something that we’re going to see exist, but there’s in the same fashion that we have managed

24:34 same fashion that we have managed self-service around PowerBI. I think we’re going to be seeing managed self-service around fabric. Manage self-service empowers teams and users who are not in the BI team to be able to create and distribute PowerBI content. I think we need to see the same thing around fabric too. And I think you’re speaking to that where we have the situation where we want to allow people to be able to create things best suited for them. However, that does not mean it’s a wild west of however and whatever

25:05 it’s a wild west of however and whatever they want to create. There are still guidelines that need to be around that. And I think to your point, are we allowing data flows for every team? Is this going to be something for people who have gone through something or are we going to prefer or nudge people around the copy job? Because again, again, but can be managed. Okay. Other other than monitoring what they build, we can’t really turn it off. Yeah. This is the difference. So I’m with you, Tommy. And like maybe this is what we want, right? Maybe

25:35 this is what we want, right? Maybe shortcut transformations is what the team would need. I don’t know like maybe it is a pipeline. Maybe it is just a notebook. But I don’t have the ability of turning off a particular workload or feature set for a team that gets a fabric workspace. And this is the challenge I think right now because it’s wide open and the business is going to pick what they pick and if they’re coming from PowerBI, they’re already going to know data flows gen one or Power Query. So they’re going to be familiar there. So yeah, an interesting question that comes up here in the chat, Tommy, I want to bring this up as well.

26:05 Tommy, I want to bring this up as well. I know we’re getting close to where we need to get to the real topic here. We’re talking a lot about this, which is good. It’s a good one. We’ve we’ve hit a nerve. nerve., ,, Johnny Winter asks, “Do you think it’s okay to not source control your content?” content?” And I think my answer is I think Tommy, you’re going to align with me on this one. We Tommy and I love source control. It’s got to be Git, Azure DevOps or GitHub, something like that. as much as possible. And I think this is one of the challenges I think has been

26:35 one of the challenges I think has been we don’t necessarily communicate or have the training in place on business teams because to to to to do get integration. It just doesn’t exist. They’re used to you working in Excel building what they want saving it to SharePoint and we’re done. And barely that we could get that done initially. So I believe there’s a learning experience that needs to happen. We need to educate more about this one. But I think I firmly believe if you ever have a restricted who can create workspaces

27:06 a restricted who can create workspaces environment in your PowerBI tenant, which I recommend. I think I think that makes sense have a process where you can request a workspace. You give it out to them. You can automatically as the central team dole out. Okay, this workspace is already enabled with git. Now the training part has to become users can create things in the workspace but they never check anything in or commit anything. So there’s there’s one I don’t know if we can API our way into committing things Tommy I don’t know if that’s something we could actually do for backing up

27:36 we could actually do for backing up there’s nothing for that yet but at least the git is turned on and we train the teams on look see we see that there are things in your workspace that are uncommitted just go click this button and commit them. I as much as I agree with you and how much we both are so strongly feel about the importance of source control I am deeply tr deeply troubled by what you said why turning on git for a team even if I’ve done a training and the reason why Mike is you training and the reason why Mike is this I know this I feel like better

28:06 know this I know this I feel like better than anyone it is freaking easy to break something and get to do something wrong and get and if I’m just giving a team who’s doing copy jobs or how or how I don’t think so I disagree with that statement statement Yes, it used to be easy to break it with the UI in fabric, but I think it’s gotten much more resilient. I’m not having nearly as many problems synchronizing or committing things or not breaking things. So, oh, I’m not even talking about fabric and get I’m talking about source control in general. Oh, that Oh, that if you don’t know what you’re doing, there’s some concepts there that need to

28:36 there’s some concepts there that need to be Yes, agree. Again, training education, but I’m not expecting anyone to go. to go. The expectation here is you’re never leaving the portal. When I say turn on Git, Git, I Okay, I got you. I got you. to I’m not going to push everyone down the proc code space. However, even even now like Alex Johnny is asking another question., where do you draw the line between like low code and pro code? I also agree with that statement like where do we draw the line? Where is this going to move towards with agents? I think you’re going to go more towards

29:06 I think you’re going to go more towards low code experience, right? Already agents make everything low code for me. I talk to an agent, it’s in English, and I say, “Here’s what I want to do.” and it just figures it out. It writes the code for me. It does the things. So, I think we’re going to continue seeing a very heavy the code things that we used to do. They’re going to continually move more and more towards the agents to write and generate the code. That’s where we’re going. It’s going to happen. So, how do

29:36 going. It’s going to happen. So, how do we as developers prepare ourselves for lots more changes? And the agents are going to be able to do not just one code check-in, multiple changes. Hey agent, like I’m just interpreting a possible scenario. Hey agent, go change this notebook to add a new column in this table. It reads the notebook. It finds where the transformation is. Adds a new transformation. Boom. Done. I’ve updated the notebook. Okay, agent. Now run the

30:06 the notebook. Okay, agent. Now run the notebook. Make the new column. Go update the semantic model. make sure it has the new column and go add this new column to this visual on this report. Like we’re not far away from the end system of like being able to talk to an agent and have it weave a change from different areas from notebooks to lakehouses to semantic models into reports. We’re already seeing a number of tools proliferate to do this. I I’m going to take that a slight step further where

30:36 take that a slight step further where we’re going to be and I think what’s required for this to happen where I can really do a fabric manage self-service around source control. It’s going to be the trigger based things not even chatbased. You’re talking about talking to the agent. you and I were talking about something that I was working on where I change a status and an agent runs and basically does a ton of things where I’m just changing the status on a page, right? And I think for me, I want to see a point where the marketing team or the team who’s not pro code

31:06 or the team who’s not pro code is making updates saying everything’s ready to go and allowing an agent off of that status to be able to do the complex things around git to do the commit, make sure there’s no conflicts or notify the right users because I think that we need to see that. I cannot expect I cannot expect a business user to be an expert or competent in git everyone to be maybe some people will be but I can’t that yeah yeah it’s going to be a progression but again I think I think with the advent the ad

31:37 I think I think with the advent the ad in advention of agents right right right that’s what I’m saying yeah there is some complexity there with with git and get integration but then if we can just get our heads around conceptually why it’s there and what it’s doing. Well, our agents already know how to run git already. Like my agents already have the GitHub MCP server already attached to them, right? But then a business user doesn’t have to worry about correct, I need to do this commit or notify. Just say, “Okay, it’s ready to go.” And that

32:07 say, “Okay, it’s ready to go.” And that allows a trigger to happen that handles that. And I think this is again, as long as we’re working on the concepts of what agents know how to do and know how to build things, then we’ve got all this worked out together where we now have this really like tight integration between, okay, yeah, I don’t really know how to use the command line to run a git command as a user, right? However, my agent does. My agent knows how to work with this. And when I ask my agent to do something, most agents now are creating

32:38 something, most agents now are creating branches. automatically off of the code and making changes. And I even think it’s better than that. It’s almost like a draft. They don’t even make a real branch. It’s like a It’s like a draft branch branch tree. Yeah. Yeah. And it’s making something that you have to go, “Okay, I’ve made a draft of what I think you want me to build. Go review it.” So, I think we’re going to do a lot more reviewing of things. And to your point, Tommy, I think this is where agents are going to be really effective. And as we see more of them appear, they’re going to have more capability to create and build and

33:09 capability to create and build and manipulate and what everyone is very cautious about is I don’t want it to break stuff. It needs to be building things in a vacuum and we need a human in a loop or a second review or review from a different agent that says this is going to break or not. And and testing is going to become much more important here. So, all this to say, Tommy, I think,, we should probably drop this topic and get to the main one here., I really, really like this shortcut transformations feature. I think this is going to be really useful. Very excited

33:40 going to be really useful. Very excited to see where it goes. How does this fit with everything else? Love it., good feature here. Looking forward to getting in and testing some of it. So, anyways, that’s that’s all I want to say about that one. Let’s get to the main topic today. Today’s a mailbag. Tommy, over to you. Main topic, overfed and sick golden data sets. And it’s a doozy, so it’s gonna take a little All right, buckle up, baby. Buckle up. Buckle up. In my current job, we have a fat golden data set of about a 100 tables, over 400 measures, and 100

34:11 tables, over 400 measures, and 100 calculated columns. Ship it. It’s good. Just ship that sucker. We’re good to go. Sorry. Sorry. Yeah, even look even using lookup statements as my gosh, my boss comes from an Excel background. It has a data model resembling a sophisticated spaghetti dish. Oh, Oh, with lots of two-way filters, many many many to many one ones and other beauties that would make Marco Russo fate. It even has a lot of legacy auto date

34:41 even has a lot of legacy auto date tables where we we’re turning off potentially disrupt who knows what of what the 100 reports of the data set serves. serves. This is hilarious. Angry top management executive. This is great. great. My b my boss insists we use this for all reporting via service connected thin reports. Okay. So agree PowerBI developers take turns to download it, make changes and upload it and pray that it works. Everyone everybody waits and bites nails

35:13 Everyone everybody waits and bites nails as deadlines approaches. Oh boy. Oh boy. Man, they must have a strong Never mind. This situation developed well before my

35:21 This situation developed well before my time as I’ve only have six months in this position coming from a very traumatic past of multiple dispersed data sets and my boss constantly grilled with critical reports showed for instant differences Q2 sales. Oh boy, Oh boy, I feel you. I feel you. Since then, we have championed the idea of a single source of truth. Probably to the extreme. The solution my boss decided upon was we have a single data set, a single data model and all thin

35:51 set, a single data model and all thin reports being served out to it. That caused us the obedient PowerBI developers to constantly load all we needed even for a single and sporadic report. So the new table we added the lazy calculated table Aaron the kitchen sink was there to save it forever. It is not exaggerating. The situation has turned unbearable. Even using Tableau Editor 3, dealing with the Becky Sign Ignora, little fat lady as we call it impossible. Anything makes us wait four minutes with

36:23 Anything makes us wait four minutes with the cursor stopped spinning. We often forget what we were doing. Recently, we had a meeting with my boss and I think we were probably too vocal about the situation. To say that politely, he did what any decent leader would do any complaining employee. Well, show me how to fix it. Put up some proof of concept. Yep. Yep. And show me. That’s I get Yep. You got to do it. I told him about putting together enterprise architecture such as reserd predicates and what Tony Pulia.

36:53 predicates and what Tony Pulia. Tony, you got a brother. Oh, I guess I got a brother. Another MVP out there., that sounds promising along using data marts. Sounds promising with it data modeling capability. But how? Thanks in advance for any comment about this probably typical situation. I am sure many of your listeners would benefit from hearing your thoughts about a fat golden data set that could lose weight and be agile and healthy again. Juan Aguero. Okay. One one hilarious. I thought this

37:25 Okay. One one hilarious. I thought this was really Yeah, that was great. One great great personality on the on the the comment here. So, absolutely love this one. okay, let’s unpack a couple of things here. Oh, there’s a lot of things to unpack on this one. One, we’ve all seen it before. We’ve seen the big bloated data set, many tables., it also, it feels to me a little bit, again, this has been built over time. This is, I would think, the quintessential Matthew Roach mantra here, Tommy, transform as far upstream as possible

37:57 transform as far upstream as possible and as far downstream as necessary. I think this makes a lot of sense. So it also feels to me like a lot of these onetoone relationships that’s something that you have like junk dimensions potentially you should consolidate those. What I want to maybe talk about more Tommy is what does this transition plan look like? H how do we get from where you are into what should be right? That’s that’s what I’m looking for here. So one thing I’ll just maybe quickly call at the end of this note here this mailbag is the data marts piece of this.

38:29 mailbag is the data marts piece of this. I know you said data marts they’re going away don’t use them so if it’s if it’s a powerbi data mart we have problems so I will first and foremost it so this also feels a lot to me like we are purely in a pro or premium per user workspace is how I read this just because of the word of data marts and queen of ponds I did I did coin that I

38:59 queen of ponds I did I did coin that I had a session about that. Who knows when. So when. So I think it was make a data pond and not not a data swamp or something like that. I think there was well before fabric I think it was a user group we did when co hit. Oh yes yes yes. Yeah. Yeah. Okay. But I but I would so here okay let’s let’s talk idealistic things. What would you do Tommy in this situation? Where would you tackle this to to start? Yeah. So if if I had to show me a proof of concept, show me an I how this we can fix this.

39:30 how this we can fix this. Yep. Yep. You cannot tackle this by trying to accomplish everything at once. You cannot you can’t you can’t put together an entire project plan for every report that exists out there. So you have to start at a basis and to me is usage. This is the number one place I would start. Yeah, that was good Tony because that’s where I was going to go as well and I was going to talk about that a little bit as well. Okay, keep going. I love this. So because honestly you are at a point right now where there is not a migration

40:00 right now where there is not a migration it is a revelation is a revolution where you are starting over to an extent but where do you start? You have to start with what is most used what reports are most used. Yeah. Yeah. And because that is the most critical items. The data model is not critical. It’s what people are using off of that data model. Mhm. So, we have to tackle the top five or and I I bet you I bet you because it’s always true that 8020 rule 80% of your usage is coming from 20% of your

40:31 usage is coming from 20% of your reports. reports. Take the top 20% of your reports that are getting 80% of your usage. And you have to then detail out what are they using and begin to document what are they using from the data model? Are they using other report measures? Are they,, like what are they using from the data set? I guarantee you’re going to find overlap. There’s always going to be overlap on what relationships they’re using. Then begin to start documenting what those data models are, what those semantic models are for those 20 reports. Can we

41:02 models are for those 20 reports. Can we create a report for just those or maybe three to four? But let me back up because I’m I’m going too far ahead. The number of place number one place I start is you have to look at the usage of reports. What is most critically valued and relied on on what people are consuming? And that is that dictates everything. So I’ll I’ll stop there. Yeah. So let’s let’s unpack that because I agree with you Tommy on this one. I think usage is good. I think what you describe about usage on the reports themselves, just looking at how many times each report is viewed.

41:33 times each report is viewed. we’re really good at building lots of things. We’re really good at updating them and reading from them, but we’re really bad about deleting and deprecating and getting rid of things that are no longer used. So to this effect right if this is a model that we’ve given out to a larger audience of people to build on top of how many reports are we talking about here if there’s a 100 report attached to this model we need to we need to have some visibility to that so one is the report

42:05 visibility to that so one is the report metrics the other one I want to point out here is the temperature on the semantic model so if you use semantic link labs you can directly or XMLA endpoint there is a property inside the semantic model that tells you the temperature of every column and what you can do is you can grab that at periodic times throughout the day and so I think let me finish this content I’ll come back to an idea what you can do is you can actually map out the temperature

42:36 you can actually map out the temperature by column across the entire semantic model and obviously there will be parts of the model that are going to be hotter than others right and those are areas we want to focus our attention on. This doesn’t necessarily directly correlate to reports per se, but this is every time a report runs and every time people click on buttons, those queries get sent back to the model and then the columns where those queries are running on top of get some heat or temperature and as the temperature rises

43:06 temperature and as the temperature rises on a single column,, it’s more important. So having a read across an entire day or throughout the week, let’s just imagine every hour you just go grab the temperature. What is the temperature? Because what happens in these columns, the model naturally over time starts decreasing the temperature on the column as it ages and has not been accessed, right? right? That’s important to us. So I think this is how Microsoft knows like what data sets they can evict. This is how they know what model and information about the model they can drop or evict from

43:36 the model they can drop or evict from memory. This is an optimization thing that Microsoft’s already doing on their back end, but we can also leverage this to get information. So that will help you two things, right? The reports tell you what’s valuable from the user side. The temperature model tells you how the model is being used and it also includes anything around analyze and Excel because that’s one of the things we don’t get a lot of visibility around which is how many times are we running analyze and Excel against the tables. I’m I’m not I wouldn’t be terribly concerned about that because Mike when

44:06 concerned about that because Mike when you look at the the whole outline the map of how you would actually tackle this from beginning to end and I know it’s obviously hard to say what the technical side is would you agree or disagree that this is not a transition you are in a sense having to start fresh at this point well I would probably well okay let me go into another concept that I think would be relevant to talk about

44:36 I I also think that stepping into this and looking at the model, right? If you went page by page, let’s just let’s let’s take Mhm. Mhm. This is where agents I think make a lot of sense, right? Let’s take all the thin reports down and let’s turn them all into PBIS. Let’s look at the model and see what the model’s doing. Let’s just say we gave an agent the task of okay for each report that we have map out which tables are being used on which page of every report likely and this is generally what I

45:07 likely and this is generally what I think happens here when you get into very big models that serve a wide variety of reports and you’re not doing like domainbased semantic modeling what happens is you get one or two maybe three fact tables per page right so those factual tables are helping you provide data. Summary pages and overview pages by far touch the most amount of fact tables, but typically you’re looking at some data and then you’re drilling in or drilling through to information that’s in a in one fact table or a fact table or two. And so I feel like when I’m

45:38 or two. And so I feel like when I’m building reports, I can map out this fact table these dimensions, this fact table, this data. And I feel like for me that’s very common. common. Would would you agree? Let me just stop right there. Would you agree, Tommy? Is that how you generally map out a report? Yeah. Yeah. No, normally now I think we are in a little of an outlier here with this 100 tables, 400 measures though. Well, hold on. I I don’t want to react to that. Yeah, just react. I’m just saying normally in in the general sense, yes, I agree with you.

46:08 yes, I agree with you. Okay. So, if you’re building a report, typically you’re pulling measures or dimensions from like one fact table, building some stuff on a page, and then if you have another page, that’s usually where I start switching to a different fact table and start making other things. The dimensions may be common but I have different fact table informations again rough approximation this is this is just a guideline not a rule right when you go back and look at this now so the reason I say this is because now you can distill these hundreds of

46:39 can distill these hundreds of reports and figure out okay of all these hundreds of reports which tables are used most often on pages in the report and then we can start saying okay does this make sense to be domainbased And the reason I I’m trying what I’m trying to get back to this is you see I’m I’m trying to build a vin diagram based on all of the reports and putting each report into a portion of a ven diagram that says okay these 10 pages all pull from the sales table of

47:08 pages all pull from the sales table of data that we have let’s put those pages in this bucket right these five pages use the HR data or the operations data they’re separate so what What I start seeing is I start seeing domains of report pages appearing. Often there’ll be sometimes where you have like two pages together where you’re maybe merging two fact tables together, but maybe that should be a separate report, right? So if we’re talking about distribution apps, workspace apps or organizational

47:39 apps, workspace apps or organizational apps, you can have a report that’s using multiple data models as a summary or an overview and then you have these domainbased pieces that are separate. So all this to say is I need to organize this model into domains. That’s what I’m trying to accomplish. Let me let me pause there. Well, let me ask you too because I again in the general sense in the in the normal sense I completely agree with you, but I’m I want to make the argument here more unless you already agree with

48:09 here more unless you already agree with me is the fact here that you really are starting you need to start looking at this fresh as if it’s a new build rather than trying to migrate because Mike there you cannot with the complex relationships all those calculated columns how many we talk about over a your calculated columns. You’re not going to reverse engineer that. You’re not you’re not you cannot look at the semantic model here and then try to separate that out. It is going to be rendered either you need a lot harder

48:40 rendered either you need a lot harder things with you so to speak. You can try an agent, but that is not the path that you go to me. I see what you’re saying, but the way I’m looking at domains is I’m looking at the reports. I will look at the what tables they’re using, but you have to start with if we were starting from scratch because you’re not going to take this semantic model and then just break it down. see, this is where I maybe slightly take a different spin on this, Tommy. Like I this model’s working for

49:11 Like I this model’s working for somebody. somebody. It’s not they’re talking about biting their nails every time they touch something. Not the developers. It’s working. It’s not working for them, but it’s working for the consumers. This wouldn’t exist if consumers weren’t able to make stuff on top of it or having reports that they’re using. Again, this goes back to your earlier point. Which reports are being used? Now, if this model is junk and no one’s using it, you’ll you’ll immediately see the usage of that inside the report usage, right? Mike, every report using this model.

49:43 Mike, every report using this model. Well, this is But which ones? We have hundreds of them maybe. Which one of the reports are there? And when we go to those reports, what are people actually doing with it? Okay, yes, we have a one big semantic model. Okay, great. But what are they doing with it? If I go into these reports that are highly used and I’m seeing tables all over the place, which I will argue is probably what’s happening here is here’s a bunch of models, here’s a bunch of information, and I’m going to go into these high use reports and I’m going to see just lots of tables. It’s going to be a it’s not going to be graphical.

50:13 be a it’s not going to be graphical. It’s not going to be structured in a way that’s it’s just going to be just a bunch of columns there. And if we went into the process, I would argue what we’re going to see is I go to this report, I get to my thing, I go to this table and I export it and then I go do something else with Excel somewhere else. else. This is what I’ve seen in these big model these big monolithic models. It you’re you’re building instead of building a semantic model that addresses a business need, we’re building the data warehouse as the semantic model. Oh yeah, this is exactly what’s going on. And and so what you’re trying So let

50:45 on. And and so what you’re trying So let me let me step back here again. Let’s talk really big picture. This one monolithic model is trying to describe your entire business inside a representative single semantic model. Right? If we step back and say look let’s assume that this is right. Let’s assume this is exactly how things should be built. be built. If you step back and said let’s let’s just remove the the concept of actually having data attached to the table. the relationships are valid. The tables that

51:15 relationships are valid. The tables that where they wherever they came from are still valid, right? the measures that you’ve already built help someone. They’re there for a reason. So I would say there’s there’s information in this that is already valuable to you. Delete the data portion. Just talk about the semantics of it. And this is where I’m going to go a little bit off the rails here a little bit where I think things should go. And I want to put this out in the internet to see if someone builds this because this is I think a neat idea. idea. Let’s take this entire semantic model and let’s push it back into an ontology.

51:46 and let’s push it back into an ontology. What if we came back and said, “Here’s all the relationships built as rel as as items in the graph. Here’s all the tables that exist. Here’s all the measures that we have.” That’s what we’re trying to build. We’re trying to build a literally a semantic model without any data in it. And what business users want are I need information from this table and this table and this table. And what we really want is an autogenerated model from the bigger schema that

52:17 model from the bigger schema that decides if I get this table and I want this measure and I want this dimension. The the the smarts of the service know how to just auto build you a semantic model. And the reason I say this is really powerful now is think about how useful this could be Tommy for pro workspaces. like oh sure oh sure at the end of the day I don’t care how many models we have a pro workspace can have unlimited models and Microsoft’s expected to manage it okay okay so as long as the tables are pulling

52:49 so as long as the tables are pulling from the same source as long as the measures are coming from a single source the schema right the ontology of my business of the tables the me the measures and the relationships that needs to be locked down that needs to be the single point of information when we get into them into the autob building semantic models those are just snippets or windows or or subsets of the bigger organizational model now that’s

53:20 bigger organizational model now that’s conceptually what I think is accomp is happening here when I look at this model and or if I did look at this model right birectional relationships there’s a lot of oneto ones like already I’m seeing there’s the the structure of the tables need to shift to make that larger semantic model become simplified. So again, I would take this as a reference point, Tommy. First thing I would do is get rid of any calculated column. If there’s any calculated columns, we’re going we’re going to figure out where to make upstream. But the

53:50 But the Go ahead. Your turn. I I But again, this is my argument here. You’re not going to take this semantic model and try to reverse engineer and fix it. you’re keen or not. Like I’m taking a different approach from you. And let me point out the first thing you said about having that ontology thing. Sounds great. Doesn’t exist yet. I’m just telling you what. Yeah. Yeah. Yeah. Yeah. Maybe maybe we’re getting there, but for one, not helping right now. Not helping. Would agree, but I think let’s I think you need to start conceptually

54:21 think you need to start conceptually with what works and then and then we’ll work our way into a solution that’s actually feasible, right? Yes. So what I I think the argument let’s go back to the usage as step one here if I could to your point what I would do if it’s available to me is I would take those top 20% of reports I would make them PBI and I would like to your point yeah I would go through VS code what are being used from the model and document this so we can have an

54:51 and document this so we can have an outline of where’s the heat map look at my semantic model heat map of what’s being used what’s relied on what are the dependencies there’s a lot there’s a lot there’s a lot of there’s a lot of service area of data points we need to get together for this to be useful I need to understand number one fact one before we start where’s the heat map of my semantic model in my top report so phase one I am going to start building a new fresh those semantic models or maybe it’s a single semantic model right I’m

55:21 it’s a single semantic model right I’m not going to try to break out those calculated columns I’m going to act It’s like I’m building a new city, a new Totally agree. I’m not I’m not arguing with these points with you. I’m just saying like these are things I think we would still do regardless recommendations we provide anyways. We would still just do this. Like that would be something you but we’re saying where do you start? And I think that’s such a hard thing to do here. here. I think that’s where we start and I to your point Tommy as well like we have to go from this monolithic thing to like domain based stuff. Now the other thing I’m going to point out here

55:51 other thing I’m going to point out here too a lot of the recommendation Tommy you and I are providing here are all around the fabric having access to fabric. This whole story changes I think when we get to PowerBI because PowerBI has data flows gen one but we really don’t have a data store. So the thing that fabric gets us is if we’re going to take a monolithic model and break it into smaller pieces that we can then make domainbased

56:22 that we can then make domainbased modeling on top of you really do need a storage layer. I’m I’m going to take those Power Querym transforms and put them somewhere else and start building tables that I can use in multiple places. And you’re saying Juan has fabric here? I’m saying it’s the opposite. He This is only a PowerBI shop. Our recommendations that we’re talking about here are leaning more towards a fabric environment because when when we say multiple domain like let me say

56:52 say multiple domain like let me say it this way in a PowerBI Pro and premium per user environment, you don’t get the ability to create a table independent of a semantic model anymore. anymore. Right? dataf flows gen one was the only way for you to load a table and have that table exist outside of the semantic model but have the semantic model connect and load that data when it needs to. to. The thing I’m struggling with here is I

57:22 The thing I’m struggling with here is I don’t want a lot of the management overhead of multiple tables. When does the model load itself? Is all the data in sync across these different models?, even just talk about dim customer, right? Let’s just say I have a dimension customer table and it’s in three different models, right? I want the ability to when I update the dim customer table that the dim customer table in each of the models is immediately updated, right? I that just needs to be automatic. I don’t want to has a lot of sequencing. I don’t want to waste a lot of compute time around

57:53 waste a lot of compute time around having to reload all those three models. And so when we’re talking domainbased modeling, I’m already pushing people more towards a fabric type design. There’s not a good way of doing that in pro and premium per user. Does that make sense? I would completely agree. And yeah, I I think both our recommendation is mean I think both our recommendation is you can solve a lot a lot faster if you go into a fabric space here and what you’re trying to do, what Juan’s trying to do. If fabric is part of the future

58:23 to do. If fabric is part of the future solution, I begin to integrate that. If it’s not, if they’re still dealing with just premium or pre ppu, you’re not going to like this. You’re not going to like me saying this, but I would still look at dataf flows gen one. It does not have a retirement date here. And the odds are to your point, Mike, you’re going to have to reuse these tables in multiple semantic models in the new build. There is no way around that. And what’s critical, the reason

58:53 that. And what’s critical, the reason why the boss wanted this single model because there was a single source of truth. If I don’t have fabric and we’re not getting fabric for the foreseeable future, I have to rely on something that all my semantic models can depend on, right? Like there’s there’s no way around that. I cannot build in every semantic model an individual dim customer table with its power query because they will not talk to each other. other. I I don’t want to man and that’s the point, right? So the point was the

59:23 point, right? So the point was the business the business logic of generating that one table on that one model now lives in the model in the M code that is the dim customer M code right and to your point Tommy it does not make sense for us to continue loading the things in there I will though agree though Tommy is dataf flows gen dataf flows gen one is being eventually phased out I don’t know the speed of it has not been announced So, I would not recommend anyone to

59:53 So, I would not recommend anyone to build anything net new in something that will be phasing out at some point. If it could be a band-aid before we get to fabric, we’re this model is so big, Tommy. I would be concerned we would do it and then we have a fire drill in like two years when they’re they’re actually trying to get rid of data flows gen one. And I I would not want to build in something that’s going to have problems. So immediately I’m thinking about okay if you’re in a pro or premium per user workspace workspace let’s look at just the landscape of where Microsoft is building like we have

60:25 where Microsoft is building like we have to look at where features are coming from there’s some features coming from the report there’s some features coming from the visual layer a little bit but Microsoft is investing heavy amounts of money in the fabric space and I really I’m firmly convinced that if we’re at this scale at this size of model where we’re doing things we need to seriously look at pulling the data engineering out of this and at least moving the data engineering into fabric. You can still do an import model. You can still do that in a pro or premium

60:55 can still do that in a pro or premium per user workspace. Those still work. It’s a bit more effort and coordination, but you can load or move away from dataf flows gen one. You need you need a place where you can have consistent solid table structure. Period. You just we were just talking in the early intro like one lake is the place like everything’s going to live on top of one lake. It’s going to make it so much easier for one lake. Odds are this new feature of shortcut transformations might even help this. It

61:26 transformations might even help this. It might actually make this easier for these users to use this massive model. So already I’m saying look the table creation has to start moving over to fabric. We need the rich tools that are over there to simplify that part. Right. What one thing I want to make a note of is part of this phase. So yes, Juan may begin to map out what those semantic models look like, but you still have another critical part of this. And I know we’re already near time, but there’s another critical part of this about about moving over those reports to those

61:56 moving over those reports to those semantic models and there is some disruption there. So I want to emphasize how important communication with the business is at this point because at this point you may be able to do everything we’ve talked about here in a vacuum a vacuum and just the BI team. Yep. Yep. But if you do not have a communication hey announcing in the next two months these reports we are moving over to a different way a different fashion different model to make it easier for you. You can ask better questions yada yada yada. However, we’re just letting

62:26 yada yada. However, we’re just letting this is part of a process the BI team is doing. You don’t need to give the business necessarily all the, you the business necessarily all the,,, juicy details of what the know,, juicy details of what the model looks like, but there does need to be communication on something’s going to happen to these reports. They are going to slightly adjust. It’s shouldn’t be bumpy, but eventually you’re going to have to take that number one report and move it over to that cleaned model, right? right? Yes. I cannot do that without communicating to the team on this is a

62:57 communicating to the team on this is a month from here now, two months from now, next week by the way. We’re doing this transition, so please let us know if you see anything different. You have to do that or it’s not going to be successful. There’s too many unknowns to know exactly how to like solve this right away. I agree with you, Tommy. You’re you’re going to need to go look at the models and figure out what’s in these reports. reports. Again, I’m going to go back to if this is a monolithic model like this, odds are I’m going to make a big prediction on this one. Odds are a lot of your tables are t a lot of your report pages are tables. So as long as you can

63:27 are tables. So as long as you can reproduce the table with the new model, that’s your that’s your check that the requirement is met and corrected. I will also note here too, one thing that we found when we do work with this and our team when you have monolithic models like this and you’re rebuilding them, you likely find data errors and data mistakes. And so when you vet the new model, model, the old model gave a number of X, you fix it and now a new number is Y. And you have to accept the changes, the fact

63:59 you have to accept the changes, the fact that you’re like, hey, look, these numbers physically won’t match. I’m going to build some testing to show you that it does it’s better now. But when you go from old model to new model, sometimes there’s this misconception that everything well all the time there’s always this misconception of everything I build in the new model should exactly match what’s in the old model, model, right? right? And I think I think when we shape data correctly and fix things, we’re actually going to find problems in the data

64:30 going to find problems in the data loading loading that we now get different numbers. And so at some point we’re going to need to have one buy in that rigor was applied to this vetting testing has to be very important and part of this process. But you’re going to have to trust the new information at some point. At some point you’re going to say this is the new way. These are the new numbers. They’re slightly different but we fixed problems and that’s why they’re different. And I think this is exactly where the communication is so important. Like let’s say you decide to attack that Q2

65:00 let’s say you decide to attack that Q2 sales semantic model first and all the people who rely on that. Yeah. Yeah. I hopefully have a center of excellence set up and a BI team where people we communicate to I’m letting them know a month from now we are cleaning up and optimizing the semantic model for Q sales. Part of that effort we’ve also realized to ensure the accuracy of the data and you have to communicate that you there is no way in goodness green earth that you’re going to be able to make this change even if it is right and

65:31 make this change even if it is right and people are going to trust you again if you don’t have that communication. So when you’re Yeah. I want to put this in the in the chat window here too. so the Johnny was talking about an experience that he had. Johnny’s been very vocal here. Johnny, thank you very much for the the really interesting, engaging chat that you have going on here. I’m really enjoying this one., he goes, “You’ve done this before. You made the change. You fixed the business logic that was wrong and the business came back and said, “These numbers are not right.” And made you change the number back to the

66:02 made you change the number back to the wrong number, the way it was originally counted. counted. Again, I don’t know the details of your story here. Likely it was tied to someone’s bonus and the number was smaller. Something something along those lines, right? But this is a change management problem that happens at the user level. This wasn’t a technical issue. This is not a technical issue at this point. We’re talking purely around people and process that has to be kind people and process that has to be adhered to. And also I think there’s of adhered to. And also I think there’s again for me as an understanding to this. Hey, let’s say you’re coming into

66:33 this. Hey, let’s say you’re coming into one of these situations. You’re going to rework a model. One of the very first things you should bring up to the business team that’s saying look we want to simplify this. Here’s a risk that we need to address. A risk is the information. We’re likely going to find problems. We’re likely going to find issues. We’re going to fix those issues. We will likely get different data coming back after we fix the problem. We will vet it. We will make sure that it’s right. We will prove to you that this is the new right number make you aware. you I think you need to

67:03 make you aware. you I think you need to very clearly call this out at the very beginning of the project so that people understand this is this is going to happen and that way when they do see the changes and they you do explain it like see I told you reworking the model is going to slightly change the numbers because we fixed the problems and I think really you hang on this note of we’re fixing an existing problem the error was already there we just accepted it and we have to like help people through that emotional like change of oh okay we’re fixing it it’s

67:36 change of oh okay we’re fixing it it’s getting better this is good for us we want to move forward with the new good thing thing and how do you get that buy in to me there are three things that you tell the business that allow them to get buy in and willing to accept this because most people if you just say yeah we’re changing the model the number is going to be different they’re going to go then don’t do it like that sounds idiotic yeah well I don’t trust you well I’m going to you tell them three things What we’re trying to do is going to solve three things for you. Your reports are going to be faster. When you need your data, you’re going to get it quicker,

68:07 data, you’re going to get it quicker, especially your Q2 sales that need to be updated on a 15-minute basis. Your reports are going to be accurate. No matter what business logic that changes, we’re going to make sure that everything’s been vetted correctly, not over years and years of bloat. And finally, what the really selling point to me. to me. Yeah. And and these are like you have to communicate these things. Yes. Your fi the final point is we can be flexible with you. What changes you want or how you want to extend the report or the model we can do that in near real time. Yes. Right now we are

68:38 near real time. Yes. Right now we are constrained by this. Yes. Yeah. There there comes a point where working on the existing model is so fragile like the only way to move forward is to rework and move forward right. So yes agree with that 100. And I think these outline points, Tommy, these are great talking points that you want to have clear alignment before you touch it at all. Right? Before you even get to the place of like, hey, we’re not even touching the model. We’re not going to make any changes until we agree upon these main bullet points. Without agreement to these, we can’t move forward. And so, we need that rough like

69:10 forward. And so, we need that rough like handshake agreement and saying, “Okay, this is a project we’re going to take on. We all agree upon these things. We acknowledge this is going to happen. We still agree. Yeah, let’s move forward. So, this is this is really good., love this discussion and it was great. All right, so we’re very much over time. Really good topic., we had a good intro and a good topic. Lots of things happening today. So, I’m really excited about this one as well. With that being said, thank you all so much for listening to the podcast. Sorry we went

69:40 listening to the podcast. Sorry we went over a little bit later., we have a new member joining us today. So, I just want to call out Brad. Thanks for joining and becoming part of the community around the Explicit Measures podcast. We’re really happy you’re here. We hope you get a lot of value from this. That being said, Tommy, where else can you find the podcast? You can find us in Apple, Spotify, or wherever you get your podcast. Make sure to subscribe and leave a rating. It helps us out a ton. Do you have a question, idea, or topic that you want us to talk about in a future episode? Head over to powerbi. tips/mpodcast. Leave your name and a great question. And finally, join us live every Tuesday

70:10 And finally, join us live every Tuesday and Thursday, a. m. Central, and join the conversation on all PowerB. Tips social media channels. Thank you all so much, and we’ll see you next time. Explicit measures, pump it up, be lighting up the sky. Dance to the day to laugh in the mix. Fabric and I get your fix. Explicit measures. Drop the beat now. Has kings feel the crowd. Explicit measures.

Thank You

Want to catch us live? Join every Tuesday and Thursday at 7:30 AM Central on YouTube and LinkedIn.

Got a question? Head to powerbi.tips/empodcast and submit your topic ideas.

Listen on Spotify, Apple Podcasts, or wherever you get your podcasts.

Overfed and Sick Golden Datasets - Ep. 520 - Power BI tips

News & Announcements

Main Discussion

Looking Forward

Episode Transcript

Thank You

More Posts

Importance of Skills for the Fabric Developer - Ep.528

Semantics Layer Genie & Data Agents - Ep.527

Stop Using Bookmarks - Ep.526 - Power BI tips