What is Semantic Link? – Ep. 278

This episode is a practical tour of Semantic Link in Microsoft Fabric: what it is, where it fits in the ecosystem, and why a Python-friendly connection to the semantic layer changes what “good governance” can look like.

The discussion also includes a quick rundown of December 2023 Power BI Desktop updates—especially the quality-of-life tweaks that make day-to-day report building less painful.

News & Announcements

Power BI December 2023 feature summary — What shipped this month (Desktop + service), including several builder-focused tweaks.
Semantic Link data validation using Great Expectations (Fabric Blog) — A concrete example of using Python + Semantic Link to formalize data-quality checks.
Submit a topic idea (Explicit Measures Podcast) — Send in the Fabric / Power BI scenario you want the team to debate.
PowerBI.tips Podcast — Subscribe for new episodes and browse the back catalog.
Power BI Theme Generator (Tips+) — Quickly generate consistent report themes for your org.
Mike Carlo (LinkedIn) — Follow Mike for consulting notes and Fabric / Power BI content.
Seth Bauer (LinkedIn) — Follow Seth for engineering, governance, and Fabric learnings.
Tommy Puglia (LinkedIn) — Follow Tommy for analytics and Power BI content.

Main Discussion

Semantic Link is essentially a Python-first bridge into the semantic model layer. Instead of treating a dataset as something you can only interact with through visuals and DAX, you can access model metadata and query results in a way that fits modern engineering workflows (scripts, notebooks, CI-style checks, scheduled runs).

The conversation frames Semantic Link less as a novelty and more as a missing connector: once you can programmatically interrogate the semantic model, it becomes realistic to build guardrails that catch issues before a refresh silently breaks trust.

Key takeaways:

Semantic Link makes semantic models automatable: Python access enables repeatable checks, not just manual report testing.
Data validation belongs close to the semantic layer: that’s where business meaning lives, so it’s a high-leverage place to detect drift.
Great Expectations is a solid pattern for enforcing rules like nullability, accepted value ranges, uniqueness, and row-count shifts.
Start with the high-impact surfaces: validate the columns and measures that feed executive dashboards and widely shared reports first.
Expectations should be explicit and versionable: write them down, keep them in source control, and make failures actionable.
Small Desktop improvements add up: December’s release includes UI and workflow tweaks that reduce everyday friction for builders.
Paginated reports remain underrated: quality-of-life updates (like better search/sort) keep making them more viable for operational outputs.

Looking Forward

Semantic Link is a strong signal that Fabric’s semantic layer is becoming something you can test, validate, and automate—not just something you publish and hope stays correct.

Episode Transcript

0:28 good morning everyone and welcome back to the explicit measures with Tommy Seth and Mike hello everybody it was the night before Christmas four days before Christmas and what we’re doing is still talking about powerbi or fabric hey fa conversations have exploded so many different directions all the time so today’s main topic here is there’s an article from Microsoft that I’ll put here in the chat window this is a Blog announcement from the fabric blog talking about semantic link data

0:59 talking about semantic link data validation using a a feature from python which is called Great Expectations which is a library that enables you to be able to care about or or look at the quality of the data in your tables that you have so I’ll put the article link here for later in the discussion that will be coming up shortly so before we do that let’s do some news some news happened this week which we didn’t talk on Tuesday about but we’re going to talk about it now we have a release for powerbi desktop it was a little bit

1:30 powerbi desktop it was a little bit delayed I believe it was published a little bit later than normal typically I think the blog post was published on the 12th and the desktop didn’t actually get released till the 18th so there’s a little miscommunication there or something broke at the last minute and they’re like wait a minute whoops so they try to coordinate all this together I it must be difficult to coordinate all of this stuff at the same time and having everything published around the world at the same time CU you don’t want to have the desktop published before the blog is out no no one knows what’s going

2:01 blog is out no no one knows what’s going on and anyways let’s talk about the new features that are coming out for you you complain about our If We complain about our our deploying to one place can you imagine oh yeah can you imagine a release coming out of Microsoft how many different touch points in different areas man Kudos all the regions that the the yeah you’ve got to have your stuff in time to have this and this and this group pick it up and then that and every and then like it gets the log of dependencies I cannot fathom well

2:33 of dependencies I cannot fathom well people are people are out I know this now Microsoft like every other company takes off around Thanksgiving and through December so those are the months where people leave they’re out of the office this means all the work needed or anything that was being worked on to get released for January doesn’t get done so that’s this is why January doesn’t have any releases and there’s not going to be a release for powerb desktop in January won’t be any releases until February never has has been never will be give little break I’m

3:03 will be give little break I’m okay with that take one too excellent so what what features stood out to you anything that stood out to you in this release I have one but I’ll let you guys go first Tommy what do you think what’s what’s what is picking up on your radar here or something interesting to look at well Mike it’s it’s so easy and they geared this exactly towards you it’s new storytelling and PowerPoint suggested content we know

3:33 PowerPoint suggested content we know yeah I think even the blog says how much Mike Carlo Mike Carlo loves suggested how much they love PowerPoint so just kind just the new title should be the title is storytelling in PowerPoint for Mike Carlo Mike Carlo suggested yes M suggested content by Mike Carlo don’t put it in put it in PowerPoint so the gist of this is when you add powerbi the add in the slide it’s actually going to scan the title of the slide so the content

4:03 title of the slide so the content outside of just the object of the powerbi widget and say hey what are you trying to do here maybe there’s a report here that’s recommended it’s neat I I’ll admit I haven’t played around too much with it I’m still adding the link because I still have a very direct visual report that I’m using but I think this is a good place this is going I continue to love to see the integration with PowerPoint powerbi we have I don’t remember which episode but if you want to learn more we

4:34 episode but if you want to learn more we actually did a whole episode one of my favorite episodes to date just around the PowerPoint feature or the powerbi feature within PowerPoint so Kudos it’s okay just okay Seth any any features that stood out to you that’s it it’s more of a quality of life one cleaning cleaning up certain certain aspects aspects like the The Styling on common bar charts is like hey having more

5:04 charts is like hey having more options to visually represent data always a big fan of that I didn’t get to plug in at all and look at the alerting on powerbi reports of data activator but looks like they’re plugging in some of the fabric components which is cool but hey you components which is cool but hey if you’re really talking about know if you’re really talking about excitement paginated man P paginated ability to search and sort in paginated reports dude the little quality of life stuff like that goes a long way so I

5:34 stuff like that goes a long way so I think honestly I’m G to land there for for for top top ads I think pagein reports is an underrated feature in general I think it’s I think people are I think people are building powerbi reports but in reality they actually want paginated reports and actually now that I’m looking at and working with the the new explore your data feature and you now see that on top of your reports now you can actually go right into explore the data I I really like this

6:06 explore the data I I really like this ability to create these tables and figure it out from a table and see just just let me focus in on the data model and just a single table let me let me look there only yeah off the data off the semantic model off the semantic model yeah we need to solve something right now to my left P paginated to my right paginated which one is it I just heard two different how do we it’s paginated paginated that say

6:36 paginated paginated that say paginated which one’s right this is vital it’s PX Mike m Mike Mike originally is from not Midwest right so his his pronunciation slightly different I don’t I don’t know if you wanted Tomy it’s interesting bag or bag it’s interesting you’re you’re leaning into the proper anunciation of certain words here because if you want to go we

7:10 here because if you want to go we could lean we could lean into here I think we might want to skip this I think I know where’s going on this one I’ll meet you on your you see that you’re seeing that already I saw that already I saw that already let’s talk about let’s move specific let move on here all right let’s move on to so another one here that is been a a big mess and I think it’s getting a little bit better here is the on object interactions has been really painful to work with just in general I’ve turned it on I’ve just I’ve just figured this is

7:42 on I’ve just I’ve just figured this is the way we’re going to be but there is some new default setups for on object interaction and for the people who are classically trained in powerbi they’re letting you leave the filter leave the paines open and one thing that really annoyed me was there was a I think there’s a combination of a bug and or the functionality of them if you closed one of those windows the little icon on the far right hand side would just disappear because it was an X it was just it was an X to like remove the

8:13 was just it was an X to like remove the pane of the window and everyone’s going to turn on all the icons and they’re just going to use that icon list as just a menu like that’s just going to be what’s happening well you would click the X it would just disappear and you’d have to open up the menu click the thing put it back up I me just the whole UI experience was like this is totally wrong I don’t understand what’s going on here I I want the pain the pain to be open I want to go away when I want it to be open and closed it’s weird anyways they fixed all that so now you can shrink it as opposed to

8:45 so now you can shrink it as opposed to deleting the icon and moving removing it from the list and you can by default make sure those panes are always expanded because that’s what I do every single time I work on a visual so like I drop even if I don’t put any data in the visual I’ll put the visual down and then I go to the semantic model pan data pain I guess it’s called Data Pain still I don’t even know what they keep renaming things every day but then you look through there and find the columns you want and then I tick The Columns that I need and then boom the visual shows up

9:15 need and then boom the visual shows up like that’s that’s what you do I don’t understand why we had to make it so extra complicated to get visuals on the page yeah I’ll I’ll admit Mike to your point there’s been a lot of times I’ve been close to throwing my mouse across the room I’ve lost so much real estate with the visual choosing visual is larger the space between visuals is larger it’s it’s been frustrating and I’m really really happy to see that they are making updates based on feedback by the community here yes and

9:45 feedback by the community here yes and in in the little in the desktop like when desktop runs up for the new version you have there’s a little menu that pops up and it says I think the wording is good but it could be better on the on the menu that pops up you have two options keep the current set up the way it lives right now or you can use more use a more classic pain setup and I would I would argue it was keep the new stupid setup or use the correct version of how desktop should work so I why don’t they ask you to write these things

10:16 don’t they ask you to write these things I know it would be so much better on these dialogue boxes hey guys yeah guys we’re gonna build we’re going to invest a lot of money in building this new thing let’s call it the stupid setup like like we won’t we don’t we don’t act want anybody to use it Mike Mike what do you think I think we should call it the St setup so nobody ever uses No it

10:34 St setup so nobody ever uses No it should be power stupid setup it should have the word power in it and it will say their name no it’ll say hey dum dum so yes I’m very happy that they’re fixing this one so it’s definitely going the right direction Microsoft has been listening to a lot of this one I believe Adam Saxton and G Cube did a video on this one and there is an overwhelming response from the community like this is not helping I can’t get it this is not working for me it’s all broken and I remember I think an Adam actually responded in like a message either in LinkedIn or or

11:04 a message either in LinkedIn or or Twitter like hey we’ve heard your feedback we know this is really a mess we’re going to really work hard to figure out how to fix I’m like okay good we’re really focusing on listening to what people are saying about this feature it’s it’s okay but it could be a lot better so I I definitely feel like this is a move in the right direction I also think that it is a perfect example of why putting things out in Pre first is a good idea very idea good move because it doesn’t and it doesn’t happen

11:35 because it doesn’t and it doesn’t happen often where you get a preview feature and then there are really large changes to it like it typically goes in incremental steps and like oh okay add this in here and whatnot but yeah I this in here and whatnot but yeah just resoundingly I I I don’t know mean just resoundingly I I I don’t know of anybody that was just instantly thrilled with on object and I’m I’m glad they took the feedback they probably also took their metrics that how many people were actually using it and the

12:06 people were actually using it and the frustration and they’re adapting it and that’s that’s a good thing and I’m glad that they’re plugged into the community way they are like you’re saying because ultimately it’ll come out to be hopefully a much better experience yep the the other one I’ll I’ll point out here that is interesting to me I’m not sure where this fits in and this is more of a nuanced and developer side of things inside the developer area of the release they’re talking about there’s a new API that allows custom visuals to obtain an Azure active

12:36 visuals to obtain an Azure active directory access token through single sign on this is going to help you facilitate secure efficient user contextual operations and the API will be controlled by a global admin setting what where is this one what is this I know right it’s in the developer section and it says it’s the feature is a powerbi custom visual authentication API for custom visuals I don’t know if this is like a licensing mechanism that they’re trying to apply here like hey

13:07 they’re trying to apply here like hey this visual is authorized to be used by these people therefore we can authorize you inside a custom visual now I don’t know what this is really here for but it seems interesting and this is a very different and for those who have built custom visuals they are very sandboxed you can’t talk to anything it like if you want a certified visual Microsoft really closes you down to a very tight controlled area so this will be interesting to see what this means and how custom developers will use this

13:38 and how custom developers will use this to help build custom visuals that’s interesting I I haven’t been keeping up with custom visuals but that that would seem to imply a a company yes UPN yes right the US principal name like that’s active act Azure active directory MH would be able to be linked to it in some way for for an organization so I’m not sure what that means that makes it makes a lot of sense because we know even with third

14:09 because we know even with third party plugins and all these things there are a lot of companies that have very hard restrictions around yes what you can and can’t upload or use yes and that’s like that that gets proliferated everywhere in RBI report one and two some of the licensing on the custom visuals is tied to how widespread your using it or how many users can actually use it I wonder if they’re I wonder if they’re going to plug that in there huh yeah I think there’s something else coming on it looks like there’s some development around that area so that’s

14:39 development around that area so that’s also something very interesting one thing I don’t see on this list here that I’m I do think is very important to note here as well is they didn’t really notice in this list and this is maybe in the fabric blog maybe that didn’t come in the powerbi blog but powerbi desktop now gets a new M connector for Delta lake. that is a new connector that exists now so powerbi desktop can now read paret files directly without the need of spark or SQL serverless I’ve confirmed this it’s awesome and I think

15:10 confirmed this it’s awesome and I think that feature alone is a major game changer for this December update what is it using then your local machine it’s using the so it I have to explain this a couple ways here so when you talk to Delta tables you need some compute to talk to it yeah traditionally it has been spark or SQL serverless so it’s the only two that we were aware of of yeah then Microsoft added the ability for data flows to read and write Delta tables so then data flows got this feature now what you’re seeing is the vertec engine itself the desktop vertec

15:41 vertec engine itself the desktop vertec engine compute that’s on your desktop machine it can now read and write or it can read it doesn’t write it just reads Delta tables so to me this is just adding one another compute engine that Microsoft is building the feature into just read these Delta tables which I think is huge I think this is massive I’ve already connected it to things that I’ve used in data bricks and in unity catalog when you make Delta tables in a blob storage account you can connect anything you want I don’t require a SQL

16:11 anything you want I don’t require a SQL or data bricks endpoint to read the data and it appears based on Chris Webb’s blog this new function allows what they’re going to call U partition pruning so if you partition the data the same way you want to incrementally load it in your data models you no longer need to to go spend money on another compute engine just to read the data to loaded into parbi you can just push parbi edit it will read the metadata files it will then grab the only partitions it needs during incremental

16:41 partitions it needs during incremental refresh basically U filtered by dates right so Partition by dates and then you can load the data in and only the partitions you want so so when you’re connecting this locally it’s probably it’s running off your local machine correct and it will run off play it it runs off the capacity it will run off the it still runs the capacity but now I need one capacity to run stuff as opposed to two or more right so if because at this point we don’t care right like there like prior we would say the premium

17:14 prior we would say the premium capacity was analysis Services right it was the semantic model engine yes and now they’ve just lumped it all together correct so who knows what other thing what other they the capacity itself is spool up the cluster spool up the whatever all the things for the different processes yes but this also this feature also opens so but right now you’re talking about this is a fabric only thing yeah right this feature this very feature opens up the ability for you to do this in Pro and premium per

17:44 you to do this in Pro and premium per user you now don’t need fabric to read Delta tables from other places in your organization so this to me this is another decision Point here I have to figure out why like what this is doing how this is changing things because this might this might might let me stay on premium per user for longer without having to go over to fabric because I can still read the Delta tables in their raw form with Partition pruning this is this is a big change I think here I I don’t think people understand how much of how important this is well you keep

18:15 of how important this is well you keep digging Mike and you build a solution and I’ll get it yeah I’ll steal it yes it’s not steing it’s just using i’ I’ve been doing I’m probably going to need to do a Blog about this one cuz it’s it’s definitely very technical and how to get everything to work so that it pulls all the data together but it is very slick I’ve been able to make shortcuts into other things inside data bricks and then again shortcuts in in one lake is is a different feature and all itself but from desktop I’ve been able to

18:45 from desktop I’ve been able to successfully connect to other things outside of the fabric ecosystem and make them work so that that is a very interesting and a very intriguing pattern to be able to be using Mike I will keep my predictions to myself here but the only thing I’ll say is you are right here this this is one of those underlying features it doesn’t seem like a big wow game changing the lights are spotlighted onto this and it’s like in the but what this can do and I think what this

19:15 what this can do and I think what this is going to evolve into is going to be an integral part of our workflow in a few years yep and just part of there’s going to be training around this be exactly you need your process do you’re using this right that’s know you’re using this right that’s going to be the question yeah and I think you’re right Tommy I think this is like a this is an inflection point of I can’t remember the world before I didn’t have access to Delta tables once we turned on Delta tables the entire infrastructure what I do How I build stuff is now greatly

19:46 do How I build stuff is now greatly changing right and just by adding this one little feature this also means if you’re building things with Azure synops you don’t need Azure synapse anymore this this feature alone has killed Azure synapse in my mind for from a from a pattern of using a pipeline loading some raw data manipulating it into tables and then getting the tables into pvi you you don’t need synapse now you can do a lot of this now directly with data Azure data Factory which in my opinion it’s aure data Factory is a more robust tool

20:16 aure data Factory is a more robust tool has more features than the synapse pipelines and Hazard data Factory has a lot more has a lot more features than the fabric pipelines just because it’s newer so like and anyways it’s very interesting to see how this is like shaping and coming together and we’re seeing that in our article today speaking of giving synapse it’s it’s retirement so yeah let’s do that let’s Okay this is a good transition send synapse up North so to speak we we’re gonna go send synaps by the way of the

20:48 gonna go send synaps by the way of the PX no we’re not getting rid of p is it PBX or is it PX is it EAS or is an i PX I’ll stay with passionated that’s what everyone says when they see my hat I have a hat that has PBX on it and they’re like what is that PX I’m like actually that’s a PBX that’s okay you don’t know but if if you don’t I know you don’t know so one

21:10 you don’t I know you don’t know so one of those things let’s jump into our article for today so the article for today is the semantic link data validation using great expectations so Tommy I think you might want to just start off with just a little bit of let’s talk about what is SE antic link let’s start there first and figure out what that’s doing and then we should talk about this package and again this is not a this is funny to me a little bit some in some ways Microsoft is

21:40 bit some in some ways Microsoft is claiming these packages that are Open Source like this is a great expectations as a python package which is made by the broader community in Python and M like hey look at this you can use Great Expectations on top of all your other stuff so anyways with that being said Tommy give us an overview of what s semantic link looking so semantic leag is a feature that actually allows us to connect to semantic models also known in and I love this in the documentation as the diamond layer so

22:10 documentation as the diamond layer so The Medallion approached I don’t know if you caught that I was like oh Diamond layer like what can this be well I’m only gonna build in Platinum that how about that now like it’s gonna be Beyond Diamond when will Platinum come out yeah I got very excited about that so I’m I’m waiting for obsidian yeah or I was saying I don’t even have a cobblestone layer or like jaded Moss I don’t have these are things I don’t have yet I don’t have these layers yet so I guess this Medallion is getting a little wider here but yeah

22:40 getting a little wider here but yeah like diamond layer it’s the Minecraft Medallion layer it’s bronze silver gold diamond obsidian right but the semantic link allows us to connect to those semantic models power bar data sets two synapse data science features and simply allows you to query in data science applic specifically in fabric but can be outside of fabric of the semantic model the tables the measures and allows you to do that in a Jupiter notebook in anywhere notebooks are found to me

23:12 in anywhere notebooks are found to me this is one of my if you were to say the top 10 fabric features that came out this year or possibly the top five gamechanging this is one of them because this allows us with a python package you said semantic link is also a python package allows you to connect to all the parts of your model through a jupyter notebook or through python and actually in a sense shape it there so it’s already created Great Expectations now is the other side of the coin and it’s specifically for data

23:44 the coin and it’s specifically for data scientists and data Engineers that actually to your point earlier it meets data meets specific quality standards before it can be used and we can do a lot with it ensuring data quality now I do think Tommy to your point here this is definitely a game changer changer 100% I will also Echo this is a notebook spark level experience it’s you’re writing python but you’re writing python in notebooks inside fabric so right

24:14 in notebooks inside fabric so right away you’re already at a level of this is like super Cody you have to understand what’s going on there but what I find is particularly around micros tools that Microsoft develops right they have TOS like data flows which are very powerful in their own right but it’s a lot of UI it’s a lot of clicking it makes it easy for average business users to like learn data engineering and start doing data engineering things to me yes in when you write straight code the

24:44 when you write straight code the potential or the possibility of doing things is way higher you have way more opportunity here in this stuff so cool however this is very technical in nature so I will say that so be aware if you’re not aware of semanic Link the article that we put down below is actually really good it does a really good job explaining what this is but some things I would caution you around is when you’re running these notebooks you’re using this thing called a pip install which is a a language where you’re installing something on the cluster

25:15 installing something on the cluster while it runs so not wrong not saying you shouldn’t do it but whatever version of spark Microsoft is running semantic link and grid expectations these are packages that are outside of the Microsoft e ecosystem and you have to install them on your clusters so they can run so there’s actually a whole bunch of other things you can do as well where you can like pip install things or have packages available before you turn on the cluster and you can set up you on the cluster and you can set up spark settings and all kinds of know spark settings and all kinds of other cool stuff too so there’s this is

25:45 other cool stuff too so there’s this is very cool I really like it but it can get really technical very quick I just want to put like a disclaimer on this on this topic at if you have someone who’s already has this guess what mic we haven’t there’s a whole other podcast we do just around python now with powerbi but totally you can actually set up environments in your workspace so you can have it pre-installed too but no and this is a bit that’s a good point Mike because again we’ve always been focused powerbi and data modeling Dax and kind powerbi and data modeling Dax and this one lane of business of this one lane of business intelligence and when you download desktop you get it all like it’s like

26:15 desktop you get it all like it’s like one thing one installer like I get one file like boom you have everything you need there with the exception of any external tools you want to go add right so traditionally everything is very simple but this is just a more technical piece of right I find it interesting well let me I’ll back up just a sec just talking about this because if you’re one of those people who has always just been in the bi space and your Dax your data model your power query and should you expand to python we

26:46 query and should you expand to python we have an episode around that in terms of learning that skill please check it out so I just want to reference that in terms of kind like who are you listening to this just because you may not know this does doesn’t mean you can’t can’t correct a few things here with focusing on this great expectation side of things because I believe we’ve we talked about semantic Model A lot part before before we weigh in that one there sure I’d be curious Seth your

27:16 one there sure I’d be curious Seth your opinions on what you’ve seen or have you had chance to play with semantic link I’m just curious if if you feel what use cases Do You observe potentially around being able to directly hit the data model or the XML endpoint using a semantic link type thing it’s a good question I think I’m confused especially as it relates to the data quality conversation and I would I would ask

27:46 conversation and I would I would ask the audience or you do enlighten me right like the the way where I’m I’m having a hard time peeling myself out is there’s a the data quality confirmation aspect of like okay I have I have a model I want to ensure that you I have a model I want to ensure that as data is connected to it on a know as data is connected to it on a daily basis it’s checking all the balances Etc but like this article does it semantic link model does it like we’re M it seems

28:18 link model does it like we’re M it seems we’re mixing pre-model work conversations in with what should be the final product and that’s what I don’t get this is this is why I wanted your opinion on this whole thing so let me let me step back just for some semantic link things another couple comments here before we move on to I think where you’re going with this Stu which I which I agree with one area here is I think semantic link allows you to interrogate the model by using evaluate statements basically writing a Dack statement that runs against the model

28:49 statement that runs against the model one use case I could see here is this is a this enables you to use the DMVs of the xmla mpoint so if you want to document all the measures that are in your model if you want to document the size or anything that’s related in the dynamic management views DMVs I think that’s what that means you get data out so for one you can actually extract all the relationships inside your model and there’s actually another article from Microsoft talking about using semantic Link in Microsoft bringing bi

29:21 semantic Link in Microsoft bringing bi and data signs together and they’re actually talking in this article where they’re actually grabbing information from the model and they’re able to provide what is it called here they’re calling it a like it’s it’s almost like a a diagram finding the dependencies of information and you can literally diagram out the dependencies inside your model of what’s building what and that can come out of their models so anyways I just want to make that point first let’s go back to your comment Seth around data quality and where

29:51 around data quality and where this makes sense my understanding again I haven’t dug in too far with Great Expectations in this example that Microsoft provides Ides they write they use the semantic link to go access a table of data I’m going to go grab time and something else total units ratio and then it produces a divide statement and and basically returns a table of information here’s time here’s all my ratios in there they’re able to use the Great Expectations and inside great

30:21 Great Expectations and inside great great expectations you’re able to say for this column we expect the data range to be a number and between this minimum value and this maximum value so it it sounds to me like this is a data test a test of data that you can apply to a process so where you use this if this is happening in Dev test maybe in production workloads or happening before you publish things I think this is a great portion of that so that’s so that’s I think it’s

30:51 of that so that’s so that’s I think it’s more of a data quality checking thing is is how I see it right now okay that that that is the keyword right I have a I have a test model right like I have a version that I’m going to apply changes to to or hypothetically like so I I guess my point though is here even if I have a a test model are you running the same volumes through your your test models that you are your production

31:23 that you are your production one this is this is going to be this I think will be the question right how because I like your point I like your point which is okay I’m in a test realm I’m making changes I’m I’m doing data validation this is but this is still like where just in far as far as like quality is concerned I’m going to ensure

31:44 quality is concerned I’m going to ensure that whatever my change from a data perspective is valid even though a lot of what we’re doing is validating code changes correct you’re validating data that’s fine Yep this is data Ops to some degre John ky’s baby here I’m not I’m not familiar nor do I in most cases push the same volumes or the same exact data sets through my model so like this is where yes that’s why I have data quality pipelines that are part of the production workload because that’s where

32:16 production workload because that’s where you’re validating your data and your like the expectation thing looks super cool yes right but I i’ ne like I’m not putting that on my final product doing that before I produce my final product I think yes and no yes and no and and and I’m I’m yes for me maybe no for you there’s a different use case here besides exactly what’s in the document itself which I can identify that I wish

32:46 itself which I can identify that I wish I had this years ago what has been the biggest issue and we’ve tried to find Solutions around for for power ba data sets back in the day I If by the way if it’s you’re referring to a data set five years ago is that still a semantic link I don’t know another conversation for another another day it’s always been a semantic link Tomy it’s always been that way of input of data data quality checks tried multiple ways to create data quality dashboards yes around

33:16 data quality dashboards yes around user input in CRM systems or how people are inputting data are people inputting the right information where are we missing blank values that’s been a struggle to try to do that with an already cre model in powerbi not just the tables but my already created model in my relationships where are the dims missing in my fact other way arounds so this still matches here obviously it’s not the same exact workflow that you both are referring to but this still checks out in terms of something that I know

33:47 out in terms of something that I know organizations are looking for like where are our blanks and our dims are we do we have all the keys are people inputting the data the way they should be this can allow me this can allow organizations to do so because I think there are Enterprise workflows around data quality and there’s midsize workflow around data quality here on just what are the quick not I don’t to say quick fixes what are those items going on in our business that we know people are not inputting or updating statuses the right way that’s

34:17 updating statuses the right way that’s been I don’t want to say impossible but nearly impossible to do with already created model because it lived the endpoint was the model and and now we have with semantic link I can create a data frame I can create access that semantic model in a data frame I can use G Great Expectations or other libraries to do so that’s to me where the strong value is so I’m not saying the yes to no to your point Seth but this is a wider scope of data quality

34:47 this is a wider scope of data quality than what’s outlined here so I love your point Tommy and I I would 100% agree with this in a production environment or let me paint some pictures here because I think Seth struggling with is a slightly different use case around where we’re testing data quality versus where I think you’re speaking to Tommy is your data quality is something slightly different so I believe there’s two places we come from here if we think about I’m working only in production or I’m talking about production only Tommy what you’re saying

35:19 production only Tommy what you’re saying I think sounds 100% true right I have made semantic models in the past they have been published I don’t have any way of really vetting the data other than going into a report and trying to make all the tables that I care about inside that report to me there there could be and there there should be in some cases if let’s think about it this way there is in different organizations you’ll have Dev test and production there’s three things you need to version when you’re talking about three different environments when it comes to the data warehousing of things

35:50 comes to the data warehousing of things the three things are you need a version of the infrastructure code which is handled by deployment pipelines I can literally move the artifacts between different things like move a Lakehouse from Dev to test then there’s the second layer of doing the engineering code right so the engineering code is the pipeline the notebooks anything that’s creating tables of data that’s the engineering code that’s used to generate the data and then there’s the version of the data itself okay so here’s here’s where things I think get tricky if you have Dev test and prod things and the

36:22 have Dev test and prod things and the source of data is always prod right so the source of where I’m pulling data from is production and I’m building a test a Dev version of a model and report and then I’m moving over to test and I’m still pointing to prod but I’m just pulling down information there not uncommon not uncommon it happens but I would say in in larger organizations as the organization increases in size what you find is the dev environment points to the dev server because and the re is where it gets cost effective is I don’t

36:53 where it gets cost effective is I don’t want to take a hard copy of all data from production and move all production data into the development server just to solve a small problem with that issue 100% well and and in in in many organizations like those are completely different data sets yeah there’s they’re scrubed they’re clean yeah like none of the like those environments are not just your versions of your report the entire data set is

37:23 of your report the entire data set is completely different and so this is the challenge what we’re talking about right because the the volumes that we’re playing with nowadays right like in your production system are costly you’re not going to just run unless you absolutely have to yes run those same volumes in a test environment agreed so so and with that though so let’s let’s keep going down this path here right so if your Dev test prod environments all point to a version

37:53 prod environments all point to a version of prod or the actual things that is prod then yes Great Expectations can work all the way through you can test your data quality and test right before you get to production however that’s usually not the case as data sets grow in size and so this was where things become much more challenging and I would argue here the patterns should be should be the same so semantic link as far as doing data quality things there’s no reason why you can’t deploy a change into test and have a

38:24 have a there’s no way you can’t deploy to test and have like a version of of a pattern around how the data is is being built right so you like for example let me say that let me give you a clear example I’m going to take February of 2022 I’m going to make sure in all my environments I have at least a representative copy of data from February 2022 that way when I make changes to measures I should be able to test the data and run semantic link and Great Expectations against that data and say okay if I group this data by these

38:55 say okay if I group this data by these categories I should have this output of information information and that is where grade expectation comes in for a checking of the data also you can also look use grade expectations inside your production environment where you’re going to have data drift people will enter data incorrectly into your Source systems and it will make its way into your reporting so you can also apply expectations when you refresh your data set right behind that you can run Great Expectations again to verify do these numbers make sense did we miss a lot of

39:26 numbers make sense did we miss a lot of information did something drastically change and that way the the bi team knows this stuff is happening before the business finds out so so in this case I think so let me ask this clarifying question from what you’re reading and looking at this yep can Great Expectations and I think I’m reading it hang hang on it can read measures yeah yes you can you can L it’s it’s so but that’s atic semantic

39:58 atic semantic link is allowing you to run Dax evaluate statements against the model and then you can get out from it everything you need and then that becomes a data frame and then you can run the changes this entire conversation for me like and I just didn’t I wasn’t cataloging that and the reason is is like that’s one of my biggest black anybody’s biggest black hole we we can have as many data quality measures as we POS like data quality points checking values going into our our models but one of the biggest pain points is did something change or shift

40:30 points is did something change or shift like you may have a value did did the value like did somebody introduce a new value and it no longer works within the measures yes right so if if this is saying hey we now have a way to Quality check the the final product all the time for our end customers and get out ahead of any of the changes because we know a ton of development yes into the model that is just in that black ho space like nobody’s doing

41:01 black ho space like nobody’s doing validation on that except customer where visual all of a sudden isn’t working so now this is starting to make a lot more sense to me because if that’s plugging into the entirety of the semantic model yes instead of just the tables that put up like make up the semantic model then then yeah I see a lot of value here and I and I think the the ne the another example of this one would be Finance departments Finance departments are going to produce a model they’re going to spit out a bunch of information but that thing needs to be checked and tested six different ways from Sunday

41:32 tested six different ways from Sunday like it just has to be tested a ton all the numbers have to match up you can’t lose numbers and I think Great Expectations is a good way of running tests against your data set so you could run a model for them in finance that the entire model in test is grabbing everything you need before you roll over to to production Mike a big point to I was going to say you mentioned that you can run d run d but really the other big part of the semantic link is you don’t have to and I

42:02 semantic link is you don’t have to and I we can put this in the parking lot but this goes in the data science I can create a data frame off of measures columns and tables already in the semantic model and I only reference the data science because it’s mentioned three times in the first two paragraphs yeah I think they’re pushing the data science story way too hard at this point

42:19 science story way too hard at this point I don’t I don’t see a lot of data science is just clamoring to get into fabric at this I’m ready to roll there we can put that into the parking lot but I’m ready to roll there if especially that the art like the fact that they’re trying to push it again you want to put that in a parking lot we can reference that but I’m ready to rig roll with that and talk about that I see maybe is maybe the fact is I is I see less data science is is a good thing to be able to be capable of doing so one thing I will point out you’re not doing

42:49 thing I will point out you’re not doing data science on SQL servers so if you’re an organization that has a whole bunch of SQL servers laying around and that’s where your data lives you’re not ready for data science period end of story so you’re going to need some tooling it doesn’t matter what it is but you’re going to need some tooling that at least enables you to do predictive data science type things on top of your data spark is is likely that candidate that compute engine that you need the notebooks expose this to you so for one Microsoft is at least allowing you the

43:20 Microsoft is at least allowing you the ability or having giving you the capability to do data science stuff inside fabric what I’d like to see more is is I’d like to see more of the Azure ml Studio experience coming towards the powerbi experience so that way you feel more like it it needs to be better it needs it gets better but if you’re doing data bricks if you’re doing fabric now these two tools are two tools that UNC corporate machine learning if you are a snowflake user snowflake has no plans on their road map to build anything about data science there is no

43:51 anything about data science there is no ml there is no there’s no there’s nothing in the road map that’s going to build that experience for you so if you have that tool you bought the wrong tool it’s not going to help you get to the next step the the I don’t want to say I’m arguing with you here or I would argue with your statement but the only thing with the Azure machine learning which is a great platform but I think they’re pushing the data science because the semantic link the semantic link library is really finally finally the bridge between what a lot of data scientists are clamoring

44:22 a lot of data scientists are clamoring to do with a cleaned up model and what’s available on powerbi that’s never been before possible and please tell me a better way that that could be done because right now if I can create a data frame choose selectively choose the tables and refined model that is my diamond layer apparently and I can now connect to that powerbi the model is now just part of the road it’s not the end stop it’s not the Final Destination that’s huge man like that

44:54 Destination that’s huge man like that this is incredible for data scientists where 80% of their time time they’re cleaning cleaning data yeah not disagree with you in any way there I’m not sure data ready for reporting is really in the right format for what data scientists need my understanding is data scientists typically will like wider flattened out tables data Sciences don’t typically prefer star schemas of data so while this is cool so to me it’s more of like a this is a nice to have feature but I

45:25 a this is a nice to have feature but I don’t think this is a oh wow we need to have this off the shelf for our data scientist scientist so I’m not sure if I’m there yet I I’d have to see some more use cases and maybe talk some more directly to some more data scientists because in my in my opinion many data scientists that I work with they’re very comfortable working only in Python with whatever they need and so they’re going to go to the rawest form of data look at that information and from there they’re going to want to build what they want they’re not going to want a super cleaned model like a

45:57 to want a super cleaned model like a schema or what we’ve built in these data models because now the data scientist needs to understand the relationships of the data in the semantic model in order to pull stuff out so they can use it and you’re right so I just feel like there’s an I feel like the semantic model is designed for reporting it’s there’s there’s aspects of data science that are not going to be there that they would want and they’re going to they’re going to go back to a silver layer of data and start there is what I would is my feeling you’re right about the flat table to denormalize

46:28 about the flat table to denormalize that’s what they want to work and they want to work in a single data frame the only point I’ll say is that’s what the semantic link actually allows I can actually pull the semantic model and in a sense denormalize with measures choose the columns and add make a single data frame I agree but you’re also you’re also adding in all the business logic to get you to there yeah and I’m not sure and I’m not sure if you’re if you’re talking pure data scientist sometimes you need as a data scientist you want to PE back those business rules and you need to say I

46:59 business rules and you need to say I don’t need the business rules just show me the data because there’s likely insights that are coming out of that data that’s not cleaned by already those pre predesigned business rules just just straighten out the data so what I I’m not gonna have a strong opinion on either one of them okay I’m not sure I see the use case as strongly as you do yet okay maybe I can be convinced we’ll keep I’m sure we’re going to keep seeing more blogs around this but it’s it’s interesting that they do push both how much on the data scientist side in the beginning of this

47:29 scientist side in the beginning of this article and one of the quotes again is semantic link can align and grow collaboration between data scientists and business analysts so that’s a huge push at least in from what Microsoft’s trying to do how how much we’re we’re bi developers like it’s it’s the bread and butter you do a lot you play a different a lot of different roles yeah one of the things that is I guess I don’t know concerning or doesn’t make sense to me is like the the third paragraph where there’s such a

48:00 third paragraph where there’s such a heavy emphasis of the data scientist playing the role of validating cleaning and transforming raw data like data scientists shouldn’t be spending tons of their time on data quality it’s interesting that person yes that’s the wrong that’s the wrong role for that you want the data engineer like right I like even from a cost perspective know things I want you working in models like I want you to like I want you out of ETL as fast as possible yes but I I

48:31 ETL as fast as possible yes but I I guess the other concerning part I have with this is like if in projects that I’ve worked on with data scientists there there are two main parts parts that required a lot of emphasis for end customers the first of those being dumbing down the output to something that people understand all of the statistical algorithms and and models and all the things that you’re running

49:02 and all the things that you’re running data through to produce an outcome so in some in some respects if you’re doing an enhancement on the exact same data sets in a model I could see that that this would be valuable but the second part of this this that I I guess while I’m excited about the data quality aspects raises a level of like would data scientists even want to use this is the rigor and rigor and hardening around understanding and

49:34 hardening around understanding and ensuring that the data outputs are exactly what they would expect especially as you’re especially as you’re productionizing something something and I I I don’t like I’ve never thought of measures as like like that so like and maybe it it should be right like but it’s are taking a model which is designed for reporting and misusing it for for a data

50:05 reporting and misusing it for for a data a structured data set that goes into a model that’s that’s my point that’s my point I I feel like we’re we’re taking something that has been designed for a very specific purpose and there will probably be some advantages for pulling that but I me again I if you’re a data scientist on this call and you’re on the chat here let us know like let us know does this make sense to you do you do data sciency things would this be a good place to start now to to your point though Seth if I’m a business user or I’m a data

50:35 if I’m a business user or I’m a data engineer already and I’m looking to try to explore some predictive things I would be more apt to use this because I already understand the data I understand the model I understand the data engineering that got me to hear some lightweight predictive things probably would make sense here but I’m not gonna say data scientist full-fledged is going to love this the other the other part of this though is like let let’s say you’re you start leveraging this aren’t aren’t you putting this back into the realm of of

51:09 putting this back into the realm of of not like potentially not being able to support the business as quickly as you would need to because our Our concern with you mean that man I don’t if I follow I’m I’m designing a model that supports a bunch of business reports yes as as people want as the business evolves as they want new things that model needs to change and you break this thing that the data Science Guy made I’m I’m modifying that model for business needs because it’s a model designed for reporting oh

51:39 it’s a model designed for reporting oh interesting I would be terrified to deploy a change if I knew one of our models was was using this as a source of information like how much more testing would I need to do to validate that whatever I push out for the business doesn’t absolutely screw something up which which would like 100% dud dat science and models and the this is why Great Expectations on this yeah it full

52:09 this yeah it full circle but so 100% right with exactly your point Seth like 100% agree with you like I think the Great Expectations portion of the S synaptic synaptic synaptic semantic I don’t know why I put another extra n or a couple letters in there santic I have to look at the word literally to say it correctly but I think you’re right Seth 100% anytime you Pi that model the more people you throw at that model the more people that are going to expect it to be built a certain way is

52:39 expect it to be built a certain way is going to slow down your need to like can I make a change if I’m changing the logic on a measure that could impact a lot of Downstream things and you’ve got to be very clear what that means and when you do this is where the data Ops comes in so I so I think the use case

52:55 comes in so I so I think the use case for Great Expectations is much better than the data science experience I think that’s just a much more robust like we’re going to produce tests on top of this data in these different environments and we’re going to review them and it will tell us when we’re out of scope on those data criterias so f a few fun stats and I I think the the Mi the misunderstanding here I think between us and the article is in the article there’s 12 paragraphs and data scienti data science is

53:26 and data scienti data science is mentioned 10 times and the majority of that’s all in the beginning and I think it needs to be data science parentheses diet this is not the machine Learning Studio this is a data scientist utilizing python packages a lot of what I hear from data scientists around the predictive side of correlation side that can all be done with python packages it does not rely on machine Learning Studio I’m not saying that’s the best solution but I think that’s what Microsoft’s referring to to here when it comes to

53:57 referring to to here when it comes to data science is can we do some quick predictive side power we know powerbi is not the best from a predictive point of view there are Dax measures to do that there’s a lot of workarounds but that was never power bi’s intention was the predictive or the correlation side what they used for that custom visuals made an r that was the only way to do forecasting or python yeah and in Python so this has always been something that’s not been in the with powerbi the basic data science

54:29 the with powerbi the basic data science but even thinking about the data flow for just a moment here right I I do data engineering I go Dev test prod I make a model the the model’s in production right data scientist shows up he uses that model to do some predictive things where do you put the output of that data scientist’s work and how do you incorporate that back into what you’re doing you don’t want to round robin like it’s not to me it doesn’t seem efficient to go take data out of a data set or semantic model have the data scientist work on it well then what does he do with it where does

54:59 then what does he do with it where does he put it do you build another report and another semantic model that does the machine learning the predictive pieces or do you then go write it back down to a source system so it can get get picked back up again and put back in the model so to me the S even even thinking about the flow of data doesn’t make sense to me like I would rather say go to the tables that are made in my Lakehouse do the work there produce your predictive things and then all of my normal reporting and all the predictive things can get joined together and then turned

55:29 can get joined together and then turned into the model because if you’re going to do predictions you’re going to want to put that right with your regular business data anyways it just makes no sense to bring all that information to another semantic model or another place where you’re doing predictions I think you’re going to want that in your main models and the argument here I I just don’t I’m not following Tommy like one all the three paragraphs describe that this isn’t just ad hoc right analysis like two the the the harder the harder play is as a data scientist we

56:01 play is as a data scientist we want to plug into the semantic model because all the business logic is there and I would argue that that goes into our conversation where we want to push that back as far as possible the like before I’m pulling into like I don’t like building tons of business logic only in my semantic model for this very reason it’s not repeatable I can’t reuse it yes so the the artifacts that I’m building related to Enterprise efforts which this is this is not a business person’s model right in Enterprise realm

56:33 person’s model right in Enterprise realm like all my business logic is in the facts and dimensions I’m building already those are artifacts you can plug into without going to the model right right so like I I get the I Now understand and I appreciate the data quality aspects of this yes but I’m still struggling with like like using the semantic link as a source for data science work in general because of what we’re talking about like it’s like I yeah I think my

57:05 about like it’s like I yeah I think my final thought here is I think the heart of this is to me the concept of what Microsoft’s foreseeing for the semantic model again and they want this to be if you had a sank key diagram for the powerbi classic 100% was data set semantic model to report yep and now I we’re seeing these other arrows these other Bridges or these other links to other avenues like the data science part that’s where they’re pushing are they there yet no but the

57:37 pushing are they there yet no but the semantic model is now having other avenues no longer being the end destination so the heart of this to me is the concept of what a semantic model was powerbi classic as which I love that by the way that term Mike but and now powerbi modern and what the semantic Link’s going to be or the semantic model is going to be so that’s really my final thought what what we’re seeing this evolution of I will always be powerbi classic Seth any any final any final thoughts Seth for you I think ultimately it’ll be really interesting to see where this goes two two things if

58:10 this goes two two things if semantic link and the artifacts that we can plug into now include measures right and and we can stabilize on those and like use them outside the realm of like what we would normally have done with just reporting interesting and cool still have some questions about like the the the the the quality and making sure that those stay static I I think the thing that I’m most concerned about in in aspects of this where to Tommy’s point we’re now going to sank key and and break out a semantic model for its uses is are we

58:40 model for its uses is are we repeating history if you keep throwing things onto and build these monolithic giant things like warehouses you’re going to slow things down for the business and if business needs to change and adapt and Implement things and this is what we’ve been building models for either wider adoption and usage throughout an organization maybe there are certain use cases where it’s the model that everybody uses and this is where you would plug that in that’s maybe maybe 5% like the the challenge would be is

59:11 5% like the the challenge would be is I I wouldn’t want to keep building on something that I can’t change or would require months of testing before we can get back to the business and that would be my only concern yeah I’m I’m going to probably say I think this whole article around Sam semantic link is very good I think this is very helpful for us again I have not resonated with the data scientist experience that Microsoft has been providing through fabric yet I don’t see the value of it currently I think data scientists have their own tools that

59:42 scientists have their own tools that they’re happy using and they’ll continue to use those tools until fabric catches up now all this to say Microsoft will probably do a better job in the future I feel like everything we talk about data scientists in the fabric realm is data scientist light it’s like the business users or the data Engineers who are just starting to move into data scientist Realms this makes sense but I don’t think you’re going to throw a full data scientist at this and and be extremely happy with what’s going on here so I I do think the value of

60:12 here so I I do think the value of semantic link is data quality looking at DMVs and getting data and Diagnostics out of your models if you can if you can start tracking how your this don’t underestimate the the power of the DMVs that gets data out of it you can go look at the heat of the quality of columns the heat of each column by how much it’s used wouldn’t you like to know how many time like which columns in your data model are used most often over time wouldn’t that make sense this this DMV

60:44 wouldn’t that make sense this this DMV area allows you to extract that information and you could Trend your data your your model over time column by column looking at which columns are used the most this could provide you a really interesting heat map of which columns are most important and I believe some people have already blogged about it I can’t put my finger on the blog right now but I think that is worth the the just that alone is worth investigating this feature to understand exactly how your model is being used I wish this was a classroom and just seeing you two in the back like I I have a question oh boy

61:15 the back like I I have a question oh boy man oh if I I would pay to be in that class well I think you would be H Mike and Seth are talking again put your hand down no calls they just know it all just shut up already I have a problem with your formula yes exactly exactly anyways we’re allowed to have opinion opinions luckily no one listens to the podcast so we’re all good there anyways thank you so much for listening to the podcast for those of you who stuck around for the whole thing we really appreciate your listenership

61:45 we really appreciate your listenership so thank you so much for listening to this we really do ask if you found some value from this if you learned a little bit more about semantic link where it might fit inside the ecosystem of powerbi we really appreciate you sharing with somebody else let somebody else know on social media at your holiday party go ahead and let them know you found this amazing data podcast it’s so good we’d love it if you would share that as well and let people know that you you found some value from it with that Tommy where else can you find the podcast yeah and this is our last podcast before Christmas so Merry

62:15 podcast before Christmas so Merry Christmas what a better gift to give than a free podcast a free semantic link and you can tell them you can find us on Apple Spotify or wherever get your podcast make sure to subscribe and leave a waiting it’s great Christmas gift for us helps us out a ton do you have a question idea or topic that you want us to talk about a future episode head over to power b. tisp podcast leave your name and a great question finally join us live every Tuesday Thursday a. Central and join the conversation all of

62:45 Central and join the conversation all of powerbi tips social media channels this is our 104th gift to you this year we’ve been doing so many podcasts twice a week for the whole year so so we hope you’ve enjoyed our free gift of random knowledge that doesn’t really help you in your daily business chatar mind numbing cheddar so we hope you have enjoyed that as well thank you all so much there will be two more episodes coming up they will be pre-recorded so just FYI about that

63:15 be pre-recorded so just FYI about that they’re coming out they’ll still be here so there’s a couple more episodes coming right for the end of this year so we appreciate you all as listeners and we look forward to starting 2024 with you when we see you again thank you all very much we’ll see you next time

Thank You

Thanks for listening! If you enjoyed this episode, consider subscribing and sharing it with a teammate—and send us your topic ideas for future shows.

What is Semantic Link? – Ep. 278

News & Announcements

Main Discussion

Looking Forward

Episode Transcript

Thank You

More Posts

Git Best Practices Diff Noise & Naming - Ep. 513

Publish to Web vs. Embedded - Ep. 512

Scaling a Power BI Side Hustle - Ep. 511