Abnormal Data Documentation – Ep. 312

Abnormal Data Documentation is the focus of this week’s Explicit Measures episode. Here’s what was covered and a full transcript for reference.

News & Announcements

PowerBI.tips Podcast — Subscribe and listen to the Explicit Measures podcast episodes and related content.
Power BI Theme Generator — Create and download Power BI report themes using the PowerBI.tips theme generator.

Main Discussion

This episode focuses on abnormal data and the often-ignored job of documenting it. The team discusses why anomalies happen, how they get misinterpreted in reports, and what teams can do to make data issues visible (without derailing trust).

Key points from the conversation:

Anomalies are inevitable: the goal isn’t perfection—it’s fast detection and clear communication.
Documentation as a product feature: notes, known-issue logs, and ‘data health’ indicators reduce support load and confusion.
Operational patterns: who owns investigating issues, how to escalate, and how to close the loop when data is corrected.
Reporting implications: how to avoid dashboards that silently lie when upstream data changes unexpectedly.
Build trust: transparency about data quality tends to increase trust, not reduce it.

Looking Forward

What are you trying in Fabric/Power BI this week? Share your wins (and your war stories).

Episode Transcript

0:31 good morning and welcome back to the explicit measures podcast with Tommy Seth and Mike hello everyone and good morning good morning Mike how are you I’m doing well it’s been a busy April this is I I I said it last episode or a couple episodes ago man it is just going fast so are more of the conferences March was all conferences right so is this everything just the followup then I think what happens when you spend time at conferences everything else piles up and so a lot of work stuff piles up and so this has been a lot of work travel

1:02 so this has been a lot of work travel for work things and other other situations there so in general I would just say I’ve just been it’s been busy I I’ll take it I’ll I’m good with the busy it keeps it keeps things interesting just it just means a little mix of recording and pre-recorded and live but hey we’re back those joining us in on YouTube this morning we are we are live so join the conversation as they say chat will be open if you want to say some comments or you have some thoughts around here we’d love to hear

1:32 thoughts around here we’d love to hear from you in the chat window any major intros I I don’t know if there’s this is a major intro from my perspective but I feel like since the fabric conferen has occurred and a lot of announcements were made there it feels to me like there’s a lot of of blogs posts like things were announced and and refinements on those announcements are now coming out now and it’s a lot to keep up with I feel like there’s a lot of post coming out every couple days there’s something new about

2:02 couple days there’s something new about another post another feature some more refinements on different things here so I’m very excited about this this this feels really to me this feels very early days of powerbi when there was constantly getting quick updates or little updates here and there and in in improving your features I see see our previous article literally talking just about that yeah so it’s it’s never been more apparent than now few things with copilot I’m seeing but it’s it’s hard one I think the biggest thing

2:32 it’s hard one I think the biggest thing it’s hard one to one just stay on top of it but what’s our favorite thing to do when something new comes out get our hands on and try it out say talk about it oh yeah well we talk about then yeah try it out what what about when there’s 25 things you have to try out oh I know that’s that’s that’s my point like I’m busy with work and all these new things are coming out and I want to try all of them like I want to figure it all out I and I’ve appreciated some very good blogs that have been recently been published published I saw one on LinkedIn that I threw

3:02 I saw one on LinkedIn that I threw down at the community Jam someone was doing some load I guess would be load testing a little bit I guess it’s between like hey what happens if I run a pipeline versus run something in a spark notebook right what is what does the cost look like to me in capacity units between the two solutions right because you can do the same thing this is what we’ve been talking about was like we need some people to come out come out and figure this stuff out I’ve also been seeing people communicating and like how do you make sure you understand how much

3:32 do you make sure you understand how much this job cost me because cus can be backed their way into the cost to run that unit so you can you can do the math of how many cu seconds you are able to acquire and you can divide out the math and then how many cus are you’re consuming on a particular feature you can dial that thing in so anyways very interesting there as well I thought that was really fun and very interesting Al but not to be lost in all the

4:03 Al but not to be lost in all the excitement of the new new new stuff I’m telling you you guys you guys got to open up those on premise Gateway blogs like the update release there’s a golden nugget in there y now the the latest on premises data Gateway now supports longer than 1 hour refreshes talking about big impacts to like hey today go down download the latest that that’s one you’re going to get a lot of legs out

4:35 you’re going to get a lot of legs out of right right out of the gate and and for longer running refreshes if you’re still using or leveraging the on premises data Gateway that has always been a struggle bus point a little bit and that that is now out in the latest Gateway so go check that out awesome I I’ll definitely read up on that one that that’s a huge point out there because I think anyone who has on-prem data that’s like the thing to figure out how to how to make that stuff work so that’s very important for

5:05 stuff work so that’s very important for people to know and yeah I on some hands I agree I agree like that’s great but if your data refresh is taking that that long I’m also thinking to myself like is there is there anything I can do to tune that that slightly know the option is there it’s good to have I agree I agree but I guess the question I have is will this potentially cause abuse on the system I don’t know we’ll see interesting

5:36 don’t know we’ll see interesting on somewhat of the same subject I’ve actually had a few conversations with people in the community and some people just getting into powerbi and there’s a change happening where a lot of people are starting with solution fabric rather than what I think we normally do on power query or we normally expect to do on like powerbi desktop like oh well there’s this approach fabric called this Medallion approach and people this is people coming from Excel I’m talking

6:07 people coming from Excel I’m talking about I’m like this is a change because you think the transition is always what try this on powerbi because this is the natural transition was the conversation I had with some like oh you could do this in powerbi too I was like yeah Power query yep it’s it’s it’s super cool having all the tools bolted together so I think that’s very a good feature here all right enough of our intros our main topic today is going to be around really you really sorry

6:37 to be around really you really sorry there’s one more there’s oh one more there yeah well there’s a flip sorry so they’re they’re they’re changing the default behavior of co-pilot and power so that that was another blog I saw that one I didn’t quite understand it maybe I didn’t read it thoroughly enough they enabling by default as opposed to so you’re enabling co-pilot that’s an admin level like yep I’m going to let my people do that they’re flip-flopping it so I think what they’re pointing out is there were some there were a couple things that were probably

7:07 were a couple things that were probably holding their this back from them on by default and they’ve addressed those so they are going to switch that on May 20th so admins you can disable it again like if you’re not going to do widespread or you don’t have co-pilot enabled it’s not really much of an effect but they are that that switch is coming May 20th so it’s worth worth noting and that’s the only other thing I wanted to say sorry that’s a really good call out because anything that’s turning on by default anything

7:38 that’s turning on by default anything turning on by default is something we should be aware of I would say that that feels like something we should be to yeah paying attention to all right anything any other announcements I should have asked for that first before I go on to the main topic all right I think we’re good all right with that being said let’s jump in and Seth I Kick this one over to you so today we’re the main topic is talking about anomaly detection

8:08 topic is talking about anomaly detection I literally caught you as you’re taking a sip of coffee sorry about that so but the the main topic today is talking about data anomaly what happens when something adjusts inside your data or the data model you have to make some changes you’re you’re redesigning the data that comes out to the reports and how do you notify users within your organization that something’s changing what what does that look like in order to roll something like that out how do you notify them do you just let them figure it out like what what what are the options that we have here and this is going to be PR pretty much a data

8:39 is going to be PR pretty much a data culture I think conversation I think we’re going to go into into the topic for today this comes from a mailbag and Seth if you wouldn’t mind it does come from a mailbag and a huge shout out to davidor Australia because we got a name and a place and I was really tempted to try to do an Australian accent from danand but I’m not going to because that would just be a train wreck for everybody and that would probably be

9:09 everybody and that would probably be like clipped yeah oh yeah I gotta think about those kinds of things yes David asks how best to document your dashboard for data changes significant events and other abnormalities example last July we changed our segmentation model but did not apply it retrospectively or covid impacts can

9:33 retrospectively or covid impacts can cause a spike in the data or these if these things are not documented the user can reach the wrong conclusions or have decreased confidence in or perhaps worse not the have confidence in our output at all as a secondary question if you can identify an impact due to an abnormal event should you try and normalize your data model even if you can’t do it with much accuracy ooh okay I’ll just say for one I’ve seen

10:04 ooh okay I’ll just say for one I’ve seen that before and that go over well I don’t know I will say this Co has impacted a mean I will say this Co has impacted a lot of businesses in weird ways some have gotten extreme usage or spikes in in consumption for whatever their product or whatever their service is others have had a large larger dip than normal so I’ve seen some organizations actually starting to carry

10:34 organizations actually starting to carry more data typically you would do like a two or threee carry of data some companies want to compare their pre-o numbers to their after covid numbers and so they’re they’re carrying As Time Clips on here they’re carrying more and more data to where were we before the covid experience happens and and you covid experience happens and and are we far enough away now that we know are we far enough away now that we don’t need to carry that data anymore how is this going to look so there’s I think there’s a lot of considerations here here yeah not that I want to revisit those covid years but there’s a really good

11:05 covid years but there’s a really good point here if we if you were to go back then as the from an analy role I don’t know if I’ve been ever busier than the few months right after the pandemic immediately hit from honestly a a pure data point of view from a data culture point of view I was in house and everyone wanted to see the numbers in a completely different way immediately goes I always go back to like I’ll call it a seism but nothing is some things are not important until they’re the most important thing in the

11:35 they’re the most important thing in the world and that’s exactly what happen like well we need to see it we need to shift everything this way and see it this way because we’re such an significant event caused everyone not just wanting to see the numbers but see it in a different way and I think that was very very common with a lot of teams we had to come up with a different model for everything that we had the different metrics the way that they wanted to see it and it was so hyperfocused that’s the problem

12:05 hyperfocused that’s the problem with outliers right or at least significant events you don’t know they’re going to happen and how do you make that change because if everything you’ve been reporting we’re doing sales by some well interesting all of a sudden you have the significant event or outlier that occurs and it’s hard to just peer communicate that and it’s really hard to do that in the middle of one just trying to from a technical point of view but then how do you let everyone know oh on this day

12:35 you let everyone know oh on this day we’ve made this significant change because of whatever the abnormal abnormality is interesting yeah I think where this question throws me for a loop and and we can talk through the different aspects is it threw me into like looking at the types of anomalies right that we we have in data and the one I think you’re speaking to is the it points right there’s a point in time there’s some action like there’s a spike in data and even in in in the

13:08 in data and even in in in the business that that I’ve experienced with a lot of those anomalies anomalies are like something happened in a day right so it’s very easy for me to identify or highlight a point and potentially help the end user visualize why that point Spike May haveen happened right in this case like if if it’s a day anomaly and we had a dip or typically it’ be like a dip because like a major part of our normal

13:39 because like a major part of our normal ingestion process didn’t fail or failed and there is no recovering of that live like live data right it’s gone like you can’t retrospectively like just fix the dip so do you show the dip or you just normalize it right and in those cases for the business it’s very we can’t normalize that we can’t just say oh that never happened because there’s a lot of underlying detail data that goes in the report as well and you’re going to miss that so if we’re tracking that volume by day and we have

14:10 tracking that volume by day and we have that big dip the thought that I I still we still haven’t like implemented completely but in my mind would be I could have a one-off table that the business owns where we could have the key being the date and a slight a description of what happen well there there is a major failure something something and that could be a card that goes along in the line chart or something like that where I’m I’m just real time adding that anomalous data

14:40 real time adding that anomalous data once in a while to to a visual anyway that that was like the original thought well I think we’re speaking let me know if you guys agree or not really two different types of abnormalities or outliers there are peer data ones like oh something changed when did that happen this large Spike that you don’t notice till retrospective or there’s the external one something like covid where everyone is going to assume the day is going to change but more importantly that affects everyone on how they want to view even if it is

15:10 on how they want to view even if it is the same information right because when everyone’s aware that something’s going to change and it’s a macro or external Factor right yeah but from from a from a we are we are talk so yeah from from an anomaly detection perspective though right in data point in time is what I’m talking about you’re saying hey there’s an there is something that happened external today Co covid we’re going to adjust Something I’m assuming

15:42 to adjust Something I’m assuming Something’s Gonna Change that he’s he’s talking about and and those throw me into like the other parts of what types of anomalies they are right because you can have contextual anomalies right is is this something that in normally in business business like sets itself apart right like all of the sudden we’re tracking swimware sales and for some reason all of a sudden we see an increase in swimware sales in December like what the hell like why why would that be happening in

16:15 like why why would that be happening in anomalies and data that may not just be single point maybe it’s like a few days in a row or something like that and then the other part of that conversation I guess is is it a collective anomaly where there’s groups of data points that exhibit different behaviors than we’re used to and to me that’s almost like what you’d be talking about with like Co like impacts to business and retrospectively we’re seeing that there’s a a lift you

16:46 seeing that there’s a a lift you seeing that there’s a a lift in in sales or a huge dip and we’re know in in sales or a huge dip and we’re we’re we need to show in our data sets that the impact of the business was due to covid or was due to some other outlining effect and I think that’s what he’s driving into here is one of these anomalous types effects and how do we work with the business or convey that in in the reporting and that’s not like single point thing like hey covid there’s ramifications to all the data within there I I see what you’re you’re going

17:18 there I I see what you’re you’re going with this Tommy I think you’re trying to classify certain kinds of anomalies and maybe they impact the business differently but I think at the end of the day it’s it’s something that is not expected and I feel like when the anomalies occur there’s a flurry of additional need for self-service Discovery work right so you have your normal reporting things are just clipping along production goes down data stops flowing something happens right then and then I think the question

17:48 right then and then I think the question really turns into why did that happen I want to dig deeper into the information to figure out what is causing this and I think people try to start planning solutions to like how do we how do we fix this is this something that fixable can we can we do something else so yeah it’s to me I’m I’m thinking like that I don’t necessarily I would I don’t think for me I would quantify the anomaly I would just say they happen and

18:18 anomaly I would just say they happen and then my reaction to this is like what is the right way to deal with it and because I do know like so collecting some data in my engineering days right there is there is normal statistics that take out some of the noise and the data and one thing that I was I felt like I was doing a lot was with leadership or when we’re reporting out to leadership things can Bounce Around anomalies can happen in your data but is it worth noting is it worth alerting people to take action on that so like is

18:49 people to take action on that so like is this bad enough that we need to is this a one-time deal like okay great we had a a dip in sales or but is that something that’s going to necessitate action to take place because we won’t meet our budget or our goals and I think when that occurs there’s a lot of times and to your point Tommy right CO’s occurring

19:08 to your point Tommy right CO’s occurring it’s a large event it’s affecting lots of people many businesses have to Pivot and figure out a different way of doing things potentially so everyone’s trying to like okay we all need to self-surface on data very quickly now because everyone’s got to figure out what’s going on and how do we work out a strategy to address this and figure out the anomaly so let me just say that from a classif ification of anomalies one thing that just comes to mind here mind here that this may not fit the topic or or may may or may not fit the topic but one

19:38 may may or may not fit the topic but one thing I find that’s interesting and it feels like it corresponds closely is when you’re looking at the capacity metrics app in powerbi and there is smoothing turned on so that reporting is trying to smooth the capacity usage for your your environment so again one thing I’m I’m thinking through here from like real you thinking through here from like real a fabric example is when know a fabric example is when you have those occurrences of your heavy usage on a notebook or something inside fabric you get those spikes but you

20:08 fabric you get those spikes but you don’t actually see the spike in usage you see it smooth out over time and so you can identify when smoothing is occurring but if you’re trying to address the behavior in fabric of the spiked event you need to know what it is and you need to know when it occurred so you can go back and debug and say oh this event occurred at this time with really high usage so yes it’s fine to show smoothing in the capacity metrics app but you also need to see it without the smoothing so you can go back and

20:38 smoothing so you can go back and retroactively say oh wow that pipeline was very abusive to the system we probably shouldn’t put it at that time because everything else was burning at the same timer or whatever the thing was right you need to actually know what’s occurring so you can back into the way back into the solution anyways that’s just my thought about an a place where I could put some anomaly thoughts around something that I’m very familiar with yeah I’m very interested in really the first part of David’s question here because we could

21:08 David’s question here because we could obviously dive into from the analyst role documenting and identifying certain spikes and data that either were going to be known or expected but this is very a different way of looking at it where it’s how best actually document those changes or those significant events again whether like we’re with an organization so we’re going to expect something to change or we identified something that occurred that we were prev unaware of so and I’m looking like

21:39 prev unaware of so and I’m looking like what documentation do we do I feel like almost like a change log of hey we’ve had these significant events that have occurred these are either are the reasons why so users are aware if you see this is really focused I would imagine on like a sales rep looking at their quota if something changed well what like hey the the numbers change right or some why the significant event how do we actually then from a I think a data culture point of view communicate that and make that part of

22:10 communicate that and make that part of the organization not just identify it and smooth it out well I think I think you’re making a good point here Tommy and I think where I’m where my mind is going I think where I’m part of this topic is and to your point the earlier part of the question right what do we do to communicate what’s what’s occurring how are we getting consensus around what occurred right so some things like a production failure yeah inside the system clear right something failed a machine fell over data’s not

22:42 failed a machine fell over data’s not flowing a refresh didn’t work right that’s a very clear expectation I think those an novali when we see data dip because of those things yeah you you may have lost some things because the system fell apart right so you can communicate what the what the issue was you can clearly communicate okay we’re going to take actions and try and fix this we’re going to build more resilience we’re going to go fix the code what whatever the thing is right there’s there’s some action that usually spits out of those things but other things where you don’t know what happened

23:12 don’t know what happened you’re working with a customer and the sales dip sales dip unexpectedly what what changed did something in the market did did some other company run a sale and you weren’t did you not catch that right that could have caused your clients data to dip and sales to go down but someone else’s sales went up do you even have visibility to see what was going on there so like there’s a there’s again this is where I feel like there’s potentially a lot of retro retro retroactive Discovery work self-service I would say that needs to occur in order

23:43 I would say that needs to occur in order to come up with a reason to why that data did what it did and so to I think that maybe the question here is what do you do when you find it and how do you communicate that back to people in the reports I think that’s I think that’s a bigger question I I don’t I don’t really know if I know the answer or how I would approach that other than the fact if you so let me say it this way I’ve seen other companies do anomaly detection so let me give you another example that I’ve I’ve encountered Google analytics Google all the time it’s streaming data

24:15 Google all the time it’s streaming data it’s 247 it’s always on now I don’t care about real-time data but I do care about aggregated historical data often I’ll see inside the charts that Google has produced there’s a note hey on this day something went down or something was incorrectly counted or whatever this point was and so they actually put marks or vertical lines on the the the chart and note something that’s occurring on that that time can you do that in powerbi

24:46 time can you do that in powerbi maybe there’s it’s a lot more work to add that to your data and your charts so would I spend a lot of time in my standard charts to build a whole bunch of of additional features in them to be able to capture if you have a lot of anomalies maybe it makes sense to spend the time to do that but maybe it’s just something as simple as in the information button you add a little note there around anomaly data detected on this this day or maybe you put a little card on

25:16 day or maybe you put a little card on the page with maybe a tool tip that describes hey we found some anomalous data we put that here inside the report so you want to make it easy for people to see what was occurring I’m going to go on a limb here and say maybe you supply a link to a blog post or a a page in your cop that describes hey here’s here’s a little note click this link takes you somewhere else and then you can describe the you else and then you can describe the what occurred what happened how to know what occurred what happened how to how to document more of those things I

25:46 how to document more of those things I do think you you don’t ignore it mean I do think you you don’t ignore it that I don’t don’t think you do that I think you bring up a good point like around that that raises the question of like what it what is the occurrence of something like this is it is it just this oneoff decision point then to me that’s just that’s that could be as easy as communication with the business and saying hey you you’re seeing this anomaly in the report because of this it might be just an email and could be that could be like yeah it could be

26:16 could be like yeah it could be completely outside the realm of the report corre right because it’s not like you’re going to make this decision all on your own or just hey we’re going to make a decision then communicate that in the report correct there should be a dialogue like do do we want this in there and if the answer is yes like I would almost argue that that goes in a page you should have anyway that describes what the intent of the report is the business logic applied and right in there could be in early 23 we we remove or

26:46 early 23 we we remove or smoothed out the data because we completely changed our segmentation model as an organization and whatever whatever and everybody knows that if they’re retrospectively because that was part of his question we didn’t he didn’t modify any of the historical data retrospectively then there’s like there’s going to be a shift so if you’re looking at 24 on yeah not a big deal everybody’s on the same path all the everything looks right but if they’re doing year over year like back year like I’m GNA look at the the last

27:16 year like I’m GNA look at the the last six months of 23 sales are up are down why why all the sudden is there this big spike or this major dip and that would like then you have that already in the report and you’re communicating that those types of changes are also part of the report documentation so I really like if it’s not that though right if it’s not off and we have to start developing I guess I’m I I lead into your other part of what you were talking about which is what is what is the

27:48 about which is what is what is the view or what is the intent of the report is it that very short-term detail I need to see every tick or is it more smooth because the the go the the report output requires decisions based on long-term analysis and we’re looking month over month or whatever and all of a sudden that that Spike or dip causes an anomalous effect when we’re looking at a month that that is a point at which I think decisions have to

28:19 point at which I think decisions have to be made around what what you want to do with the data what I have seen in in terms of effect is is if this is a consistent like yes we’re dealing with anomalies all the time I’ve seen Solutions where there are processes in place for

28:40 processes in place for business to identify those spikes and and have them removed from the models have them removed out and saying yep I understand what that spike is I choose that that’s not going to affect or not be part of the data coming through in in this report and there business owns that anomaly right because those are happening consistently and they know the reasons because there’s a subject matter experts but they’re also that also requires systems that allow them to

29:12 requires systems that allow them to not only see the raw data right to see the data that’s going in the model but also like impact and and retrospectively like have an opportunity to remove that out of stuff that they’re making decisions on in the report itself I think it’s important to identify here Mike when you brought about the Google analytics and I’m thinking about their automated their insights or in powerbi hey we found an interesting Trend sure I think that’s very different from let’s say we’re dealing with a certified

29:43 say we’re dealing with a certified semantic model or in this case we’re kind we’re looking into things from the organization point of view and not from the analyst point of view on an outlier where we notice an event I think there’s it’s important where this goes into a business convers ation and data governance because you’re probably going to have to identify some thresholds right around some of your main metrics not just a interesting Trend occurred hey if sales goes spikes 10% then that’s something that we’re going to put in our documentation or look into

30:15 documentation or look into what are the thresholds both for an increase or a decrease either over time or on a single day especially for your semantic models something that we’re in a sense tracking in case rather than a interesting Trend because from a lay person’s point of view was like okay I can look into my automated insights but that’s we call a little more interesting not necessarily hey we merged with the company or we had to change our systems we made that note of it where I’m going with yeah I’ll

30:46 of it where I’m going with yeah I’ll stop there because I have another thought I think you have another yeah I want to just interject here on this first thought and then I think your’s probably more thoughts are going to come a you’re on a roll I I like what you’re saying there and I think you you mentioned like data governance and what comes to mind here is and this is this is where I think planning and proper data governance in your organization does really matter in this situation because if data is coming out SE your example earlier something

31:17 out SE your example earlier something went down we have a stream of data coming into our system we stopped collecting data for a period of time someone needs to own that right and so I think being able to understand the pipeline of how the data gets into your system and where different roles on different teams are responsible and what I have found some teams don’t want to be responsible for data that they should be in charge of without clear leadership expectations of these are the boundaries of what team owns what data you can you

31:49 of what team owns what data you can you should push on that team a little bit to either do the analysis to fix it the other thing in in the comment here was do you normalize the data like typically the numbers would look like this we’ll have two lenses of the data now one where it’s normalized and one where it’s not normalized right so in the example of we’re changing the categories over the question would be is in my opinion who’s the owner of the categories of that data was was there was there approval made to get that to happen if that to me that sounds

32:20 happen if that to me that sounds like a large effort that is a big change occurring that will impact the data someone made a conscious decision to say 2023 and older eh we’re not going to change the categories 2024 and newer we’re going to change the categories so in in that decision point there should have been another conversation around well I’m sure there was this will impact our data you don’t just do the change and all of a sudden like oh it’s going to influence all our reports you know it’s coming so in that in that

32:50 know it’s coming so in that in that situation I think there’s actually a a change order management exercise that’s coming along with this someone saying we’re going to change the categories we’re going to own this responsibility FYI this is coming out and whether it’s a training webinar a little short video you record to the team maybe you make a blog post about it and you communicate it but somehow the owner of the change of that category should be communicating to the organization this is coming so my feeling here is yes this is a governance but this is purely a data culture part

33:21 but this is purely a data culture part of this and does your company have it doesn’t have to have it right now but like when you do something like this when you learn from this and what works in your company culture maybe you need to make a policy around what that looks like in the future spend an extra a little bit of time thinking about what we changed how we communicated it and did it go well and then maybe there’s a policy around that youan maybe you have to ask for more stakeholders to figure out what that’s looking like or maybe this was a top- down decision that was just made great

33:52 down decision that was just made great no problem communicate it like you can’t just do these things in a vacuum and have them occur one other thought before I jump off that topic and let you go back Tom me to your other question I think the other moment in my mind here is when you talked about smoothing out the data or or going back and restating the data I have a I have a I don’t know how hard this is this is going to be like a very I don’t I’m speaking out of there’s a ni I guess about what I’m going to say here but I feel like it’d be nice to be

34:22 here but I feel like it’d be nice to be able to see the data unaltered and altered and if you’re altering the data being very clear about what section of data was being altered would be helpful again I realized that that potentially could cause a lot of extra report rebuild there’s a lot of things so to our other points earlier around this change is coming this is going to change the data we’re going to see things that are different how are we going to handle the impact Downstream of what that’s going to do

34:52 Downstream of what that’s going to do and then what teams like for example if we’re making the category change that’s maybe a product management team to decision great they can decide whatever they want to do to align it to their business objectives great hopefully something’s coming down from the top that says this is why we’re doing this and then every team that’s Downstream of that data needs to compensate and adjust because if you change the data and you have other people building reports in a self-service way their data will be different they need to know what’s going on so anyways other that was just my other thoughts around those things there

35:23 other thoughts around those things there there’s a ton of good thoughts but I you and I on a role by the way you segue perfectly into I think the the other part of this yes obviously the from the people point of view and policy you can’t really get a lot done I think really without this right but I’m thinking about this in terms of I don’t think it’s sufficient just to have in the reference page for this situation and have a reference page in powerbi and say this is where we have

35:53 in powerbi and say this is where we have a change log especially to the point what I love and that it was going is the ownership and like those major changes has to be accountable and has to be accessible again to everyone not just this change we’re looking into it and you have to scroll yeah I’m I’m thinking honestly if you ever gone to the Microsoft admin Center and you’ve gone to like the message center and it’s like here’s all the major either things coming up things we’re looking into things we’ve changed yes and even

36:23 things we’ve changed yes and even something like if you have set up for a and again I’m assuming here looking at a certified symmetric model sure that’s we we have those policies you make a lot of assumptions about that yes we’re not talking about the random Google analytics report here with all this effort that would need to be involved you can’t you just can’t but this is trust in data this is trusting your data though like this this this erodes things like this if you do things like this this erodes trust potentially in teams because you’ve modified the data away

36:54 because you’ve modified the data away from what it really was and they needed to see the real data potentially but also I think what David is outlining he doesn’t want to he’s aware of that and doesn’t want that to happen I agree so I’m honestly envisioning if you will with me come come on this journey of I I following you on a journey of any kind is scary ahead I’m I’m on I’m on board something in Confluence hey if you want to see updates to the semantic model or call that major changes click on this link and there it has the status

37:25 on this link and there it has the status of any hey we changed the system systems we we looked into it this was occurred there’s a major change with our categories the marketing teams looking into it and having the statuses where someone could easily see we either Chang some of the logic on this date here’s the effect of change and or something happened like I said externally where it’s a major abnorm like outlier bi is looking into it having that change log and actually stored somewhere too

37:56 log and actually stored somewhere too because it’s not just one to have the documentation or the log but I think to the point is it also needs to be accessible for everybody right so if I’m looking into something as a normal consumer or I think especially even as a from a leadership point of view it’s

38:12 from a leadership point of view it’s like man I’m looking some of these numbers I can easily go to this and say oh good my team’s looking into this or this is why the categories are off this is what potentially occurred and I think again this is a lot of setup I’m not saying this is easy but I think you’re dealing again this is something that could be really cool for especially certified models and also that trust point of view oh okay let’s talk about that I so that wasn’t too far walk so no it

38:45 so that wasn’t too far walk so no it wasn’t and I and I like the point there at the end I I continually go back to this whole area of is it central importance or is it self-service important right because I do think there’s different levels of let’s call it let’s call it what it is right there’s certain level of spend of dollars on people’s time to do certain things right if we’re saying things are certified we want to retain trust we want to have a process we need to have people engaged right so if you’re so to me a lot of the break point

39:17 you’re so to me a lot of the break point on what this looks like is is this stuff certified have we made a decision that this is really important data that we know we need to have a lot of eyes on I think the I think the effort you spend to curate educate about that data is different than a single team and a single Department I think you can be a little bit more loose about the requirements around how you document things I think it works I think it I think the ability for you to scale a

39:47 think the ability for you to scale a community of practice for either the central team for Central data STS or or a business unit team in their in their business unit I think it scales really well honestly like you can have one person in your team doing some data stuff and they’re making little notes and then instead of emailing everyone you could do that and that’s that’s a way of doing it but I think it gets buried in emails and people lose track of things I think I would rather put a post on a SharePoint page and document what happened and write a little article and then share an email hey this changed

40:17 and then share an email hey this changed here’s some information click here learn more about it and then people can kind more about it and then people can interact with that information but I of interact with that information but I I think it’s really relevant to have that public not public iner your company like that that common thing that you’re going to use I think that just sets up really good practices that’s how I started RBI tips that’s what we started with initially I was going I was trying to teach people how to use powerbi in very basic ways and I thought well if I’m doing it for if I’m teaching

40:47 well if I’m doing it for if I’m teaching people in my company how to like get some Excel files in doing some transforms building this visual a certain way if I’m trying to figure it out I guarantee you someone else is trying to figure it out too and that’s kind why I started blogging and putting down posts and writing things because look I’m learning these things everyone learn along with me here’s here’s some examples of this and I think that really grew it grew with interest and that’s that’s how this stuff works because now I can write one article and I can hit thousands of people with that single article as opposed to now I’m only doing like a one toone piece of communication it scales

41:18 toone piece of communication it scales what you’re able to do one little quick thing on that and then I know Seth is clamoring but because my mind is popping with ideas right now how this could actually be set up in an actual software I think any of these changes it’s good to have the communication but again I think it’s really important here is that the data or the information needs to be structured searchable what have you statuses ownership change Etc that can be searchable I’m I could

41:48 that can be searchable I’m I could set this up in Confluence where hey here’s all of our certified models look at this log status updates the owner and the the resolve right and then you can send the emails and communication out because we’re kind and communication out because we’re shifting a little into change of shifting a little into change management like an infrastructure right we’re we’re looking into what updates we made or what those statuses are hey we’re gonna change our from AWS to Azure in two months if

42:20 our from AWS to Azure in two months if that’s part of this log here or this thing we’re expecting something to happen so there’s the statuses there but it has to be again I think to me accessible so if I wanted to go back on a certain date or certain significant event I can easily doce that and it’s it’s simple especially if you have the right tooling and I think you do need some tooling here what are your thought Seth anything

42:50 here what are your thought Seth anything that sticks out to you that gentleman was a wild ride a wild ride I did not see coming I I think and the reason I say that is not not to be poking but poking how how we take a question about a report and end up in AWS versus Azure change management on a corporate level it slow clap very very good shows it shows it shows just where our conversations can

43:22 shows just where our conversations can go and and I love them I love them I’m not saying like the points you guys aren’t are making aren’t relevant however I didn’t pick up any of the corporate culture change management Coe like stuff from this question I I I picked it up from just the report level now do I want to kill that part of the conversation no should I should we dial it back from change management AWS Azure yeah I think so I guess from my perspective the the one other point that that sticks

43:52 one other point that that sticks out is and some of the comments were being made in chat is I don’t know enough about like the report types of reports he’s building right the segmentation where a lot of the anomaly detection and or feedback I think I had related to how businesses business people could impact those anomalies was mostly based in like forecasting like where we’re we’re talking about different parts of the organization that are working in tandem together to

44:25 are working in tandem together to analyze reprocess a bunch of data against different types of models and that’s where like these these ideas where how do we handle these anomalies how do we smooth them how do we project really take impact in reporting like in the visualizations right so another thing within here and I don’t know if the data construct that he has is applicable because it he doesn’t he doesn’t retroactively

44:55 he doesn’t he doesn’t retroactively right he’s saying he’s not retroactively going into change like modifying the historical path so you may not be able to do this but is there a point where you could show the different streams right like hey we we made a change or there’s an anomaly to the data that is affecting our business in this way and this is now how we are segmenting out this data if we hadn’t done that right like we’ll show you this other line right so you’re seeing like in the forecasting room you’re comparing

45:26 in the forecasting room you’re comparing different different models or different outputs and I don’t know if that’s an option here in what he’s asking specifically but you could like show show multiple tiers of data depending on what the business need is if they’re they want to see an impact of X thing or if they don’t want to see that that could be a filter that they could could pick and choose so it’s just another option I guess depending on the data sets that that I’ve seen of as far as manipulating or still giving the

45:56 manipulating or still giving the business business the opportunity to see the output that they would expect now on the flip side rather than saying like hey these are multiple different models you’re comparing in a forecast there’s also potentially an opportunity to say hey I I there are anomalies in our data set and obviously whether it’s a documentation point in a report if this is a single report or if this is this is

46:27 is a single report or if this is this is a link back to a Confluence page because we’re a solid Coe and we’re developing this is just one report of many on this model and that’s where we want to standardize our our documentation process same same effect right the business understands where they can go look for the detail about what we’re modifying in the data and I think that’s what’s the most important thing now does that mean like do you have a different set of filtering right just just in a

46:57 set of filtering right just just in a toggle that says yes I want the data smoothed out I don’t want those anomalies and my output reflects that or do I want the grain do I want I want the the anomalies because I’m going to go sus out and and figure out like the differences in in the outputs of the report I think those are also a couple ideas that you could do probably a lot more data engineering on the back end probably a bigger model right but potentially maybe not right if I can flag my data and say anomalous versus

47:28 flag my data and say anomalous versus not that’s a that’s a pretty easy toggle to just include exclude I like that that point there Seth what I wrote down in my comments here was the predictive side of data I think has much more implications to

47:45 think has much more implications to whether or not you smooth it out and fix it I think than potentially some of the reporting aspects if we’re thinking about like just the report I’m thinking about like the line chart and I’ve got information on that single line chart I I think you don’t want every year in the month of March that same Spike to reoccur every year because that’s probably not very common on what the data would have normally looked like right so you go back historically and say okay what has our data looked like historically during the month of March had this anomaly not happened and

48:17 had this anomaly not happened and potentially to your point Seth like if we’re talking about like smoothing out the data and fixing it for that particular need if our systems went down in potentially the data didn’t stop occurring we just couldn’t see it so we have to make some assumptions like we’re going to just fix that and move on we’re going to we are going to normalize the data for that period of time or when things went down so I think it really depends on what your use case is here and now that we’re talking fabric I think you actually have to you have the ability to consider well not I

48:48 have the ability to consider well not I again I also argue that not everyone’s in fabric so I get that but what is the fact that we now have fabric at our disposal there is the possibility for business users to be doing data science like things and they’re going to need to be able to clean room and make sure the data is is good in order to prod better predictions part of better predictions is better data supporting the prediction this is one of those things this is this is the large activity that needs to occur of like how do we normalize the data for that period

49:18 do we normalize the data for that period of time so that we get rid of some of that noise and can give us a better predicted model moving forward and again a prediction is an estimate a guess I don’t need all the anomalies potentially in that maybe I do I it depends what you’re what prediction you’re doing if you’re doing if you’re doing failure prote failure prediction on certain things you’re trying to predict when things will fail yet you need the anomalies that’s what you’re looking for the the edge cases that are producing the failure but if you’re trying to do forecasting you’re probably not doing that so you have you have to have someone who knows

49:48 have to have someone who knows the data domain and what you’re trying to do with the data because that changes I think a little bit Yeah it totally does and I think the the one thing we haven’t talked about that’s potential the easiest solution right is does your does your data typically have anomalies right and if so there are visualizations that you can include to your visual that will highlight what those those anomalies would be and just Auto autof filter them out right like I see those anomalies that doesn’t make any sense to my business I’m going to

50:19 any sense to my business I’m going to just click off these and I no longer see them instead of the obfuscated approach that that we’re talking about yeah either way Adrian points out in the chat I think one of the most important reasons why this is a relevant conversation and why it needs to be in some way shape or form clearly agreed and communicated documented whatever with the business is this is the challenge he says the challenge is that in few months

50:49 challenge is that in few months everybody forgot what caused the spike yep such information needed to be easily accessible and persisted over time I if I can stress one thing in experience it is never ever assume that no matter how big the problem was or big the impact to reporting is that even if business and you are aware of it that you’re going to retain that information in the future because yes to his point a couple months later someone’s going to ask that question you would have

51:20 ask that question you would have forgotten obviously they have forgotten or a new user didn’t know and there’s no reference point and guess who has to go track it down again you right like you are the tracker of anomalies in the data and that’s why whatever solution you have related to in report in visual in documentation in central documentation yes put this down somewhere yep or put the smoothing or why you did something so

51:52 smoothing or why you did something so that there’s a central reference point for you and anyone else to go back to because that’s how you remain like true to the original build of the report what it was supposed to do in the first place and or the data within it that is is Shifting or not shifting as it as time goes goes on I dig it I love that point there as as well I think I think if if anything I’m

52:22 well I think I think if if anything I’m I’m pushing more towards document what you’re doing put it in a central location and communicate out there’s definitely and you you have to incorporate what your business and data culture looks like and I I think being more transparent with those things and not trying to hide stuff always seems to to present itself with better trust towards the data and and being transparent with those things letting people react to it because you may change the data or you may have modified something in a way that the business didn’t need it or they want it back or they need to see different like

52:53 back or they need to see different like they’re probably that change potentially might facilitate additional conversations that will change how you’re doing it so having that documentation centrally I think is really important and and it helps a lot with with building trust for your organization all right we’re about at time here let’s do final thoughts so Seth was that your final thought there on the com yeah we go that was my final thought all right Tommy any final thoughts you’re gonna now move to AWS it was an example maor change to the

53:24 AWS it was an example maor change to the system maybe you’ve seen it I think the biggest thing for me come really coming out of this conversation is you could go incredibly granular into the fine details with any abnormalities but I’m going to hey this is something that’s important to the or or team let’s talk about setup first rather than we’re going to track any any outlier where are our thresholds who are going to be the people involved their ownership and I think again I this is probably huge from

53:55 think again I this is probably huge from certif ified model point of view but you have to have that or then you just have a lot of data points and a lot more noise of outliers so to me it’s like let’s get the focus on what is what are we tracking and then who’s GNA own up to it and then then you have the setup then you actually have a pretty clear Direction on being aware of anything that’s going to or what has occurred I like that I think I’ll say on

54:25 occurred I like that I think I’ll say on this one I think my takea away from this one is I think you should document what’s going on I would I wouldn’t make changes without letting people know I feel like depending on the scope and size of the team that’s using this information if you’re in a department or if you’re from Central bi you should really weigh where that information should be living but regardless a simple little blog post if you’re if you’re taking the effort to change the data or normalize it or you change the data or normalize it or whatever the decision of the know whatever the decision of the business has been I think it’s important that you you Rel least write write it

54:56 that you you Rel least write write it down document it put it in a public place boom just even if it’s simple couple screenshots problem statement here’s who owned the issue and resolution something simple like that just enough to get it down on paper I think that’ll really help and I I do think you have to weigh the balance of what are you going to use it for what’s that decision point if you’re doing predictive things yeah you probably need to normalize it if you’re not then there’s other things you’re trying to do with it then then maybe you don’t maybe just make a note about it and move on I

55:26 make a note about it and move on I will say that today the powerbi tool doesn’t give a great amount of support to making ad hoc comments on various data points inside a chart you have commenting you can do certain things about that but there’s not super rich like I can just make comments on a page and just and just tag them against something so maybe that maybe that’s a feature that we could eventually ask for in powerbi but regardless I I I think it’s it’s I think it’s I think it’s going to be

55:56 I think it’s I think it’s going to be it depends on what is the importance of that data where does it come from anyways really good conversation today thank you very much for the question from the community really appreciate those things you come up with amazing topics and these are real problems people really struggle with these kind people really struggle with these things this is the real world example of things this is the real world example here so with that we thank you very much for Lending us your ear for an hour having us banter about some powerbi and data things we appreciate your time our only ask is if you found some value from this one if you really like this episode please share it with somebody else

56:26 please share it with somebody else let us know your thoughts have you done smoothing on data have you done some normalization how did that work for you do you like the idea of having a central repo to document what changes on data sets let us know in the comments and share it out on social medias with your thoughts Tommy where else can you find the podcast you can find us on Apple Spotify or wherever you get your podcast make sure to subscribe and leave a rating it helps us out a ton do you have a question an idea or a topic that you want us to talk about in a future episode like today well head over to

56:58 episode like today well head over to powerbi. com we try to leave it to one question for for our sake and finally join us live every Tuesday and Thursday amm central and join us on all of power. tips social media channels awesome thank you so much and we’ll see you next time

Thank You

Thanks for listening! If you enjoyed the episode, please subscribe and leave a review—it helps others find the show.

Abnormal Data Documentation – Ep. 312

News & Announcements

Main Discussion

Looking Forward

Episode Transcript

Thank You

More Posts

Git Best Practices Diff Noise & Naming - Ep. 513

Publish to Web vs. Embedded - Ep. 512

Scaling a Power BI Side Hustle - Ep. 511