PowerBI.tips

Tag: Data Modeling

All Columns have Data Types in DAX, Or do they?
To say this another way. Not all columns have datatypes in DAX, specifically speaking to using dynamic ranking with an “Other” category. Let me explain.

The Tabular model is comprised of tables with typed columns. This is a truth learned in one’s Tabular infancy. Whether the table is imported, calculated, or Directly Queried does not matter; all tables have columns with explicitly defined types.

DAX is the language of the Tabular model. It has less type discipline than tables in the Tabular model. Measures can have a Variant data type. Implicit type conversions abound. Nor are DAX expressions limited to returning scalar values. DAX is a query language. An expression may return a table. When defining a table in DAX, projected columns have no explicit type. Each row’s value is calculated independently and its type depends on its value. This is the phenomenon we seek to explore in this post. If this seems unsurprising, you can probably stop here.

A practical scenario

Here is the scenario where I discovered this. I was helping a colleague with a dynamic ranking measure and aggregation. The requirement read:

Show the top for production plants as ranks 1 through 4. Group all others ranked 5 or lower together as “Other”.

We have three tables:
1. a fact table 'Fact'
2. a dimension table 'Dim', related to 'Fact' with the relationship 'Fact'[DimKey] <-N:1- 'Dim'[DimKey]
3. a disconnected table, 'Rank Selector', with the values: “1”, “2”, “3”, “4”, and “Other”.
We aggregate 'Fact'[Value]. Here is a Power BI file with sample data and code.

Our first try

We started with a measure [Agg] as our base measure.
```
Agg = SUM ( 'Fact'[Value] )
```
[_Rank (variant)] is a helper ranking measure, returning a rank 1-4 or “Other”.
```
_Rank (variant) =
VAR rawRank = RANKX ( ALL ( 'Dim' ), [Agg],, DESC )
// note this value is either a number 1-4 OR the string, "Other"
VAR groupedRank = IF ( rawRank <= 4, rawRank, "Other" )
RETURN
    groupedRank
```
Below is a screengrab of these two measures evaluated against our 'Dim'[Plant]. As you can see, [Agg] returns a value for each plant. The top four plants have ranks of 1, 2, 3, and 4. The other plants all have the same rank, “Other”. This is all as expected.

'Dim'[Plant] with [Agg] and [_Rank (variant)] measures, working largely as expected. Each dim value has a value for [Agg] and a rank of 1-4 or the string, “Other”.

The issue reared its head when we attempted to use these ranks to define a new visual based on 'Rank Selector'[Rank]. We defined a measure to show [Agg] based on the plant’s rank (1-4 or “Other”), rather than the plant name. Note that 'Rank Selector'[Rank] is a column in a data model table, with Data type: Text. Our expectation is that we will see a table like the image below.

A table visualization showing six rows with labels: “1”, “2”, “3”, “4”, “Other”, and “Total”; and showing an aggregate alongside each.

Wherein our heroes encounter a challenge

[Ranked Agg (variant)] is the display measure to show aggregations against 'Rank Selector'[Rank]. We wrote the variant version you see below first, not realizing the perils that laid ahead. (Correct implementations follow later.)
```
Ranked Agg (variant) = 
VAR rankContext = VALUES ( 'Rank Selector'[Rank] )
VAR ranked =
    ADDCOLUMNS (
        ALL ( 'Dim' ),
        "@rank", [_Rank (variant)],
        "@val", [Agg]
    )
VAR currentRank = FILTER ( ranked, [@rank] IN rankContext )
VAR result = SUMX ( currentRank, [@val] )
RETURN
    result
```
When we visualize this measure against 'Rank Selector'[Rank], we get the result below.

A table visualization showing only the 'Rank Selector'[Rank] value “Other” and “Total”. Specifically, we do not see any value for ranks “1”, “2”, “3”, or “4”.

A table visualization showing only the 'Rank Selector'[Rank] value “Other” and “Total”. Specifically, we do not see any value for ranks “1”, “2”, “3”, or “4”.

What is going on here? We see data only for the Rank Selector'[Rank] label “Other” but none of the numbers. Let us walk through the measure step-by-step to understand this result.
```
VAR rankContext = VALUES ( 'Rank Selector'[Rank] )
```
rankContext captures whatever the current filter context is for 'Rank Selector'[Rank]. This is one or a collection of the values: “1”, “2”, “3”, “4”, and “Other”. For a detail row of a table visualization, the context contains exactly one of those values. For a total row (assuming no other slicers or filters), the context contains all of those values.
```
VAR ranked =
    ADDCOLUMNS (
        ALL ( 'Dim' ),
        "@rank", [_Rank (variant)],
        "@val", [Agg]
    )
```
ranked is a table with all rows and columns of the data model table 'Dim' and two projected columns, [@rank] and [@val]. For any row, i.e., for any specific plant, [@rank] is one of 1-4 or “Other”. Many rows may have [@rank] = "Other". We do not expect ties, so there are exactly four rows with numeric ranks, one each for the ranks 1-4. (This table looks like the first table visualization screenshot above.)

The critical part of both the ranking logic and the phenomenon we are exploring is in currentRank. This is a subset of the table, ranked, holding only the row or rows which have a [@rank] value that exists in the filter context captured in rankContext (again, one or a collection of the values “1”, “2”, “3”, “4”, and “Other”).

Note, that we see data only for the Rank Selector'[Rank] label “Other”. As you recall, our [_Rank (variant)] seemed to work correctly above – it definitely returned ranks 1-4, not just “Other”. 'Rank Selector'[Rank] has the correct values. We checked that our logic in [Ranked Agg (variant)] was correct. We verified in DAX Studio that ranked held the correct table. We even got to the point that we checked whether VALUES was behaving as we expected in rankContext. (I will note that if you find yourself verifying the behavior of an expression as basic as VALUES ( 'Rank Selector'[Rank] ), then you, too, may find yourself questioning what has brought you to this point.)
```
VAR currentRank = FILTER ( ranked, [@rank] IN rankContext )
```
We continued checking and troubleshooting and identified that currentRank had binary state: either it would be an empty table or it would be a table holding only rows with [@rank] values of “Other”. It would never hold a row with a rank value of 1, 2, 3, or 4. It seemed that the predicate in our FILTER would never find 1 to be equal to 1, or 2 equal to 2, and so on.

How could basic logic be so broken just in this measure? There was much gnashing of teeth. Several head-shaped indents were beaten into the nearest wall.

DAX, the language, has a more permissive type system than the Tabular model

You may have observed some apparent inconsistency in my quoting or not-quoting the rank values above. In fact, I have been scrupulous to always quote values when referring to 'Rank Selector'[Rank], the model column with type Text, and to not-quote the rank values returned from [_Rank (variant)]. The column 'Rank Selector'[Rank] has exclusively string values. The measure [_Rank (variant)] sometimes returns a numerically typed value in the range 1-4 and sometimes returns the string “Other”.

In DAX, 1 = 1 is an equality test that returns TRUE. Similarly, 1 IN {1, 2, 3, 4} returns TRUE, because 1 = 1 and 1 exists in the table there. In DAX, 1 = "1" is an equality test that throws a type error. Numbers cannot be tested for equality with strings. They are different types. Therefore, a number can never be equal to a string. Thus, 1 IN {"1", "2", "3", "4", "Other"} also throws a type error.

The lightbulb moment

In [Ranked Agg (variant)], currentRank has a column [@rank] with type Variant. Sometimes, [@rank] has a value in the range 1-4, type Whole Number. Sometimes it has the value “Other”, type Text. When we evaluate the predicate [@rank] IN rankContext, there are exactly two possible results. Either we ask for an impossible membership test, “Is this numerically typed rank value in the set of Text values?”, or we ask whether the string “Other” is in that set. The first cannot succeed. The second only succeeds for the “Other” rank.

The Fix

The fix, then, is straightforward. We must always return a value of type Text in our ranking measure. Correct measures are below with comments highlighting the modifications.
```
_Rank (typed) = 
VAR rawRank = RANKX ( ALL ( 'Dim' ), [Agg],, DESC )
VAR groupedRank =
    // with an explicit typecast, our measure now *always* returns a string-typed value
    IF (
        rawRank <= 4,
        FORMAT ( rawRank, "0" ), // force string type
        "Other"
    )
RETURN
    groupedRank

Ranked Agg (typed) = 
VAR rankContext = VALUES ( 'Rank Selector'[Rank] )
VAR ranked =
    ADDCOLUMNS (
        ALL ( 'Dim' ),
        "@rank", [_Rank (typed)], // the only difference here is using our typed rank helper measure
        "@val", [Agg]
    )
VAR currentRank = FILTER ( ranked, [@rank] IN rankContext )
VAR result = SUMX ( currentRank, [@val] )
RETURN
    result
```
The measures above give us the results we expected all along, as you can see in the table visualization below.

A table visualization showing 6 rows with labels: “1”, “2”, “3”, “4”, “Other”, and “Total”; and showing the correct values of [Ranked Agg (typed)] alongside.

Below is a screengrab of all the measures in visualizations, so you can see the differences.

A screenshot with all three of the table visualizations shown prior in this post: the 'Dim'[Plant] table with [Agg], [_Rank (variant)], and [_Rank (typed)]; the incorrect behavior with [Ranked Agg (variant)]; and the correct behavior with [Ranked Agg (typed)].

Summary

Tabular model tables have columns of explicit type. DAX expressions have implicit types. Columns defined in DAX have Variant type. Each row has an implicit type based on its value. If such a column is a Calculated Column added to a data model table, then an explicit type must be assigned and all values in that column will be cast to that type at load time. When such a column is used as an intermediate step in some calculation, each row may have a different type. If your operations depend on types, you must account for this explicitly. This is especially relevant when operating on data model and DAX tables together.

Insights from the community

When talking about this with some peers, Maxim Zelensky (blog, Twitter) observed that there is no type indication in Power BI Desktop. Nevertheless, you can identify this sort of thing in DAX Studio. The result of a query such as that below will right-align Variant-typed values and left-align Text-typed values. This query references the included Power BI file and its results are shown in a screenshot below. You can observe similar alignment in the table visual shown above. In general, when visualized, numeric data is presented with right-alignment and textual data left-aligned. This practice predates Power BI; it is a general rule that Power BI and DAX Studio wisely adopted.
```
EVALUATE
ADDCOLUMNS (
    VALUES ( 'Dim'[Plant] ),
    "@text", [_Rank (typed)],
    "@variant", [_Rank (variant)]
)
```
A screenshot showing the results of the above query, with [@text] left-aligned and [@variant] right-aligned.

Using the sample code and PBIX

Here is a link to download the PBIX that was used for examples above. This has four visuals to demonstrate the phenomenon discussed in this blog post. The first visual is a table with 'Dim'[Plant], [Agg], [_Rank (variant)], and [_Rank (typed)]. Of note in the first visual is left-vs-right alignment of the two [_Rank ...] measures. The second visual is a table with 'Rank Selector'[Rank] and [Ranked Agg (variant)], showing the incorrect results of the measure depending on a Variant-typed intermediate column, with a value only for rank “Other”. The third visual contrasts with the second, using [Ranked Agg (typed)] to show the correct behavior of the dynamic ranking and grouping logic, with a value for all ranks and a correct total. Lest one protest that the phenomenon above may be an artifact of using VARs, the last visual, on the bottom, shows an example that uses no VARs and references no other measures. The behavior of this last visual is identical to the incorrect [Ranked Agg (variant)] version.
December 7, 2021
Customize the Default Dataset
In the October 2020 release of Power BI desktop you have the ability to load a dataset from the splash page. For this tutorial we dig in on how Microsoft enables a default dataset. Additionally we show you how to customize the default dataset for your needs.

Quite often I need to prototype a visual, or work on some sample data to design a report. The very first step is always loading some sample data. Now that Power BI desktop comes with a default dataset, we leverage this feature to speed up our development process.

Watch the YouTube Video

Additional Thoughts

In this video I explain that the dataset does not auto load with datatypes enabled. This was due to my default setting within Power BI desktop. If you’d like you can make Power BI Desktop auto detect your datatypes for you.

This setting can be changed by the following steps:
- Click on the File button
- In the drop down menu, Click on the Options and Settings
- In the menu on the right Click the button labeled Options
- Under the Global section in the Options menu Select the item labeled Data Load
- Change the Type Detection for loading data
If you like the content from PowerBI.Tips please follow us on all the social outlets. Stay up to date on all the latest features and free tutorials. Subscribe to our YouTube Channel. Or follow us on the social channels, Twitter and LinkedIn where we will post all the announcements for new tutorials and content.

Introducing our PowerBI.tips SWAG store. Check out all the fun PowerBI.tips clothing and products:

Check out the new Merch!

Hasta La Vista Data

Go Ahead Make My Data

PBIX Hat
October 22, 2020
Tabular Editor Rocks!
Tabular Editor is an incredible Tool that enables users to manipulate a Tabular model at lighting speeds. Daniel Otykier is the creator of the Tabular Editor program. We are producing a video series to enable users of Power BI and data modelers for Analysis Services to learn more about this program.

Trust me this will be the best 4 hours of learning you will do this year. Learn about Tabular Editor and how it can save you HOURS of time. It will be worth it.

In this Series

This series will be broken down in to four topics.
- Introduction
- Using Scripting to remove repetitive tasks
- Best Practice Analyzer to ensure your model is top notch
- Incorporating DevOps with Tabular Models
Introduction

Daniel walks us through the basics of the program. We get a little bit of history on how the tool was developed and why you need to know about it.

Using Scripting

This by far is where the tool shines. Scripting allows users to automate common tasks for, creating measures, adding descriptions, building common calculations, hiding measure and more.

Best Practice Analyzer

Model running slow? This feature of Tabular Editor will allow you to quickly review your entire model and find areas for improvement.

DevOps

Once you have a data model how do you control versions? Or, can multiple people work on the same data model? Yes, to both of these questions. This video discusses manage data models with a DevOps pipeline.

All Videos

What all the videos in the series from Start to finish.

More about Daniel
- Follow Daniel on Twitter
- Connect with him on LinkedIn
- Download Tabular Editor
- Learn More about Tabular Editor on the Wiki
- Submit your idea for Tabular Editor
Slides for Tutorials

Click this link to download all the power point slides for this tutorial series.

If you like the content from PowerBI.Tips please follow us on all the social outlets. Stay up to date on all the latest features and free tutorials. Subscribe to our YouTube Channel. Or follow us on the social channels, Twitter and LinkedIn where we will post all the announcements for new tutorials and content.

Introducing our PowerBI.tips SWAG store. Check out all the fun PowerBI.tips clothing and products:

Check out the new Merch!

Hasta La Vista Data

Go Ahead Make My Data

PBIX Hat
April 2, 2020
Temple of DAX – 3D

To celebrate joining the Power BI Cat team next week, I thought I would update a DAX game I built last November (2018) which was a DAX-based maze game in Power BI that allowed you to navigate a character through a series of user-created maps in a top-down view.

The details for that game are in this blog posting. http://radacad.com/fun-with-dax-a-maze-ing-dax.

The update I wanted to make was to see what is involved in creating a 3D, first-person version of the same maze. The following is how I got on.

The PBIX File for the Temple of DAX – 3D can be downloaded using this link and here is a link to the publish to web version.

Like the 2D version, the report imports a CSV file which it uses to generate the map. I pretty much used exactly the same code to import the CSV file and pivot the coordinates to a format for suitable for processing in DAX.

All you need to create/edit your own custom maps is to open and edit a CSV file using your preferred editor and place an X character where you would like a wall to be and an I character for the insight (or exit). Excel is great for this particular task as you can size and align the columns nicely to easily see where walls will be.

The following image shows the data1.csv file opened in MS Excel, with all columns set to the same width of 15-pixels. Once modifications are complete, save the file as CSV (not xlsx) in the folder the PBIX file uses to import from.

Once again, I’d like to thank Margarida Prozil for providing a custom control to manage the navigation. This is an updated version of the D-PAD called a 3D-Pad. You can grab a copy from her GitHub repo if you are keen to work with this control.

https://github.com/mprozil/dPad-3D/tree/master/dist

The control has four arrows. The Up/Down arrows move you back and forward through the map, while the left/right arrows spin you 45 degrees. This is different behaviour to the D-Pad in the 2D version of the maze.

I’d also like to thank Mike Carlo at PowerBI.Tips for providing the wonderful background image and the overall visual design of the game. It was Mikes idea for the evil laugh on the intro page. This started to drive me a bit nutty as it would play everytime I saved the file.

The Game Logic

Like the 2D game, this version uses SVG as the method to draw the graphics to the screen. Each time an arrow is clicked, a new view needs to be generated based on the interaction, so a calculated measure takes into account the new position or perspective and generates a fresh SVG to be displayed using the Image control by CloudScope.

The SVG image displayed in the Image Control is a nested set of mini SVG polygons that draw the outline of various shapes into an outer SVG file. The performance is pretty good and I have still yet to hit any limitations on text size to store the final SVG set.

The 3D world

The first challenge was to figure out how to convert coordinates in a table in the data model into an image that looks like you are walking through a maze.

The map I use in the uploaded PBIX file is 26 steps and 36 steps long, so it isn’t big. If a step has been designated as a wall, it effectively has 4 walls (panels) around its perimeter, and when a series of X values in the CSV run next to each other, a longer wall effect is created.

Each of the panels for a step in the board has 5 points. I initially created these panels as 100 x 100 walls, but I found it added the extra point at the top to help me work out which way was up when I was first playing with the 3D to 2D projection code.

The Calculated table called 3D Worlds takes the basic X/Y coordinates from the CSV file and generates a 3D world of X/Y/Z points for every panel on every all in the maze.

The image below shows all X/Y/Z coordinates for 1 step in the CSV map. The first column shows there are four panels. Think of these as like north, east, south and a west facing wall. Each wall as five points. These five points define each corner of the wall in terms of a 3D world and the X, Y and Z columns for each row specify exactly where each point should be in a three-dimensional plane.

3D to 2D projection

The next challenge is to take the 3D coordinates and convert this information into an image that represents what you should see based on your location in the map, along with the direction you are looking. There was no way I was going to figure the maths for this out from scratch, and after a few internet searches and a bit of reading, I found this article to have all the information I needed.

https://en.wikipedia.org/wiki/3D_projection

The key information is in the section on Perspective Projection, including the algorithm I ported into the DAX measure in the game.

The basic principle is to define a 2D plane (or screen) called a viewport that sits a specific distance from the eyeball. Then an imaginary line is calculated between the eyeball and every X/Y/Z point from the 3D World table. If any of these lines pass through the viewport, they can be plotted onto the viewport at a specific 2D x/Y coordinate.

The code I used from the Wikipedia article is the following:

The Main Map calculated measure contains all the DAX code to convert the 3D data points to 2D coordinates using SVG.

The section of code in the Map Map calculation that matches first the algorithm from the Wikipedia article is here :

I’m not going to pretend to understand this any more than it reminds me of math classes from when I was 15, working out SIN, COSINE equations. I swore then I would never use them for real, so I guess I owe my old maths teacher an apology.

The good news is I didn’t need to understand the code in the end. Once I’d added it to the DAX calculated measure and started generating SVG polygons, it’s quickly looked the way I wanted.

I probably spent less than 2 hours getting this aspect of the game to work. There was a little fine-tuning while I worked out what the various elements were, but this was much easier than I anticipated.

A nested set of CONCATENATEX functions in the Main Map calculated column, loop through every object defined in the 3D World table and converts points to 2D versions. A series of filters are applied to stop plotting any graphics that fall outside the 2D viewport, such as walls that are behind you (taking into account the direction you are looking).

The objects are drawn from furthest to nearest to make sure far away objects do not appear if they are covered by a nearer object. This also means portions of further away objects will appear as expected if they can be partially seen.

Other filters are applied to stop panels of walls being plotted when they simply cannot be seen.

Mini Map

A separate Mini Map calculated measure generates a non-3D version of the map and gets added to a separate Image Viewer custom control to show your position in the map. A small red triangle shows your position and direction and a small white square shows the exit. In this case, it’s inside the letter E.

3D Dpad

Margarida Prozil supplied me with an updated 3D control that sits on the top of three separate columns in the database. X and Y (rows and columns) control the square you currently occupy, while the third column is V (for view perspective). X and Y represent how many squares wide, or long the map is. V represents the angle in steps of 45 degrees and there are 8 possible values (0, 45, 90, 135, 180, 225, 270 and 315). The example map is 26 squares wide, 36 squares long so combined with the 8 views, mean the control can be set to 7,488 possible values. A custom control can only manage about 30,000 points, so take this into account for larger custom maps.

Summary

I’m pretty happy with the 3D effect and can see it can be used with other X/Y/Z based 3D coordinates plotted to a 2D view plane. When I was debugging this, I had slicers for height on the screen so I could use a slider to create an effect of flying up in the air – and the maze still rendered as expected. Power BI seems to cope with the workload pretty well, despite not really optimised for this kind of work. In future versions, I may add more objects and detail to push the engine harder and see where breaking points are.

There probably aren’t too many business use-cases in Power BI for this type of report aside from educational. Feel free to have a look through the PBIX file where you may pick up some useful ideas, tips and tricks in the DAX.

There is a small bug at the start of the game when the bookmark drops you into the maze and you need to click the up arrow a few times to get going. Once you are on your way, it’s pretty good. I’ll try to get that resolved in the next few days.

January 21, 2019
Adding Data Types Within Query Editor
If you have spent any time working in Power BI, your very first step is to, wait for it… Get Data. Using Get Data will start loading your data into the Query Editor for Extracting, Transforming and Loading (ETL). When you start out in Power BI it is likely that you don’t spend much time in the Query Editor. However, the longer you use Power BI desktop, and the more you learn, you find that the Query Editor is highly under-rated. There are so many amazing transformations that you can perform on your data. After some time getting comfortable you’ll be building larger queries with many, many, steps. Eventually, it may look something like this:

Multiple Query Transformations

Perhaps your queries are already long, or may be even longer. Wouldn’t it be nice to shorten the number of steps? It would make it easier to read. In this tutorial we are going to talk through how we can combine several steps when you create a new column. This is achieved by modifying the M scripts or equations slightly when new columns are created.

While doing this won’t cut down every query in half, but it will remove a couple of additional steps per query. This makes your queries easier to read and maintain. Also, using this best practice, will save you headaches in the future. At some point you will run into a data type error. This is seen when you try to join multiple tables on columns with different data types, or when you need a measure to create a SUM but the column data type is still text.

Let’s get to the tutorial.

Open up your Power BI Desktop program and on the Home ribbon click Enter Data. Using the dialog box for entering data enter the following table of data:

Sales

100

120

94

20

80

Once you’ve entered your data the Create Table screen should look like the following. Be sure to name your table, in this case I named my data table Data…. yea, feeling a lack of creativity today. Next, click Edit to modify the query before loading the data into Power BI.

Create Table

This will open the query editor. Click on the Add Column ribbon, then select Custom Column. The Custom Column dialog box will open.

Note: When you bring in the created table that the Sales column already has the data transformed into a whole number. Also note in the right under Applied steps we have two steps, one for the source and one for Changed Type. This is because not every M equation (M language is the language used to perform the ETL in the query editor) can handle data types.

Add Custom Column

In the Custom Column dialog box enter the following, the column name, the equation below. Click OK to add the column.

Insert Custom Column

Note: It is good practice to name the new column something meaningful. This helps when you are working in the query editor. When you come back to your query months later and wondered what you were creating, the column names will help! Trust me I learned this lesson the hard way…

Great, now we have our new column. Notice the image in front of our column named Increase Sales. This means Power BI thinks that the data type of this column could be Text or a Number. Let’s change it. Click on the ABC123 icon and select Whole Number. Now the column data type has changed to numbers only.

Change Column Type to Whole Number

If we glance at the Query Setting under the Applied Steps, we now have 4 steps. Two were added, one for the added column and the second for the data type of the column. This is not what we want. Instead we would like the column to be added with the appropriate data type right from the beginning.

Let’s remove the very last step labeled Changed Type1. To do this we will click on the little X next to the step. This will remove the step. While highlighting the Added Custom step click in the formula bar and modify the equation to include the following statement in RED. Press the Enter to execute the new formula.
```
= Table.AddColumn(#"Changed Type", "Increase Sales", each [Sales] * 1.1, Int64.Type)
```
Note: if you don’t see the formula bar it can be toggled on or off in the View ribbon in the check box titled Formula Bar.

The query editor should now look like the following:

Desired Data Type

Without adding an extra step on the Query Settings, we have changed the data type. I know this might seem trivial, but when you are creating large queries, they can get difficult to read. For me, I find this technique quite useful, and it doesn’t only support whole numbers. This technique also supports the following data types:

Data Type Syntax

Whole Number Int64.Type

Decimal Number Number.Type

Dates Date.Type

Text Text.Type

Thanks for following along. If you liked this tutorial, please share it with someone else who might find this valuable. Be sure to follow me in LinkedIn an Twitter for posts about new tutorials and great content from PowerBI.Tips
January 31, 2018
Get Your OData Here
For those of you who are following my tutorials, you are most likely to understand that each tutorial comes with it’s own data source. This has been intentional so we can both walk through the example and learn by doing. Whenever, I am learning new things I find it’s helpful to do two things. First, make examples and demos of what I am trying to learn. Simply reading about a tool such as PowerBI, while engaging, it isn’t how I learn. I learn by getting my hands dirty, opening up PowerBI and creating something. You have to get utterly frustrated and stumped before that light bulb pops on and things become clear. Secondly, after I’ve taken the time to learn a new feature or method I find it extremely helpful to teach someone else what I’ve learned. By doing so, I am able share in the enthusiasm of others who also are learning. Teaching others also makes you remove all the cluttered ideas away from the essential lesson.

This being said, I consume a lot of data. Always finding data sources is difficult. Often I find myself making up my own data sets which takes additional time. At least, it used to take me forever to find data. This past week I was digging around for more data sources and I stumbled upon this gem. It is the data source has been used by many a bloggers for examples. It is Adventure Works sample business database. For this tutorial we will walk through how to load the Adventure Works dataset directly in to PowerBI.

Lets begin by opening Power BI desktop. Once Power BI desktop is open on the Home ribbon click the Get Data button. The Get Data window will open and under the Other section we find an OData Feed. Select the OData Feed and then click Connect to proceed.

Select OData Feed

The OData Feed window will open. Enter in the following HTML address into the URL field.
```
http://services.odata.org/AdventureWorksV3/AdventureWorks.svc
```
Note: In this example we are only using the basic OData Feed connection. Power BI does allow users to build more complex OData feeds which may require multiple parameters. This functionality is included by toggling the OData Feed screen from Basic to Advanced. This would be ideal if you had an OData Feed such as http://myOdataFeed.com/{custom parameter}/OData.

Click OK to proceed

Enter OData Feed

Power Bi will now go to the web address and read the OData Feed. What is returned is a list of tables. The Navigator window shows us previews of each table. Select the CompanySales and TerritorySalesDrillDown by checking the boxes next to each table. Click Load to proceed.

Select Tables

Ta Da!! that was easy. We were able to quickly load the two tables from the Adventure Works database. We can confirm this by looking at all the loaded fields contained in the Fields pane.

Loaded Fields

Add a couple of visuals with the following fields to pretty up our page.

Add Clustered Column Chart

Add Stacked Bar Chart

Pro Tip: notice how the two charts we added have the same color scheme but are different dimensions. The Clustered Column Chart has Years for the Colors, but the Stacked Bar Chart is using product categories. Change the colors on the Stacked Bar Chart by clicking on the Paint Roller on the Visualizations pane. Select the section called Data Colors and modify the colors for each product category.

Change Stacked Bar Chart Colors

Note: In order to sort from the largest sales amount to the smallest. Click the Ellipsis (small three dots) on the corner of the visual and select Sort by Sales. This will make the visual sort in order of sales from the highest amount to the smallest amount.

Similarly you can change the colors for the Clustered Column Chart to make the visual more appealing.

Change Clustered Column Chart Colors

Add Waterfall Chart

Again using the Ellipsis of the visual select Sort by Total to sort the number of sales by each employee from the largest sales amount, Linda to the smallest sales amount Lynn. Also, note that this chart is being built off the second table that we loaded from the OData Feed labeled TerritorySalesDrilldown.

Now, I believe we can say we are officially done. We have loaded data from our OData Feed and created three visuals upon that data. Thanks again for following along.

If you liked this tutorial please be sure to share. Also if you want to know more about me you can visit my About page and follow me on twitter or Linkedin for updates.
February 21, 2017
Load Multiple Excel (xls or xlsx) Files
Previously we’ve done a tutorial on loading multiple text files within one query. This is nice, however we will also need to import multiple Excel files. First, to understand the procedure of querying multiple excel files you have to understand the basics between the CSV (comma separated values) file and an excel (.xls or .xlsx) files. In a CSV file you have only one data set. The beginning of the file starts with values and separates each file with a “,” a carriage return starts a new row of data. This is an easy and efficient way to store millions of rows of data. By contrast the excel file is way more complicated. Excel files can have multiple sheets of tables of data. Think of this as a stack of CSV type files. For example if you have an excel workbook with three sheets of data, Sheet 1, Sheet 2, Sheet 3. You can think of those three sheets as grid of data, similar to the CSV file. The multiple sheet aspects of an excel file makes the data ingestion into PowerBI a little bit more complicated. To add to the complication, when you loading data from either multiple sheets, or selecting a specific out of many sheets of data. For illustration purposes imagine working with two excel files with three sheets each, 2 x 3 = 6, a total of 6 sheets of data, or what I will call “pages” of data. This is why it is more complex to load excel files than CSV files.

Note: If you want to learn how to load multiple CSV files visit this tutorial.

Not only do you have to figure out what data you want to ingest on the page you must all tell PowerBI which sheets do you want to look at, and from which excel file. If that was to many words think of loading the following data sample:

Workbook 1 – Year 2000 Olympic Medals
- Sheet 1
  - Olympic Medals Table
    
    Rank
    
    Country
    
    Gold
    
    Silver
    
    Bronze
    
    Total
- Sheet 2
- Sheet 3
WorkBook 2 – Year 2004 Olympic Medals
- Sheet 1
  - Olympic Medals Table
    
    Rank
    
    Country
    
    Gold
    
    Silver
    
    Bronze
    
    Total
- Sheet 2
- Sheet 3
The data structure for both workbook 1 and 2 are similar but the names of the files are different and there can be multiple pages.

To resolve this we will have to write a M language function that will load each file as a function. This will be done in later in the tutorial.

Here is the data source information for Olympic medals won by each country from 2000 to 2012, download here. Inside the Medal Count zip file are four xlsx files, extract them to your desktop. Move the files into a folder on your desktop labeled Medals.

Medals Folder

Now, open up PowerBI, We will begin shaping our data to load all the excel files. On the Home ribbon click on the Get Data button. Select Folder on the right side and click Connect.

Get Folder Data

Next select the folder path that you want acquire the files from, Click OK to continue.

Load Folder Screen

Next we are presented with the loaded files within our selected folder. Click Edit at the bottom of the screen to proceed. The Query Editor window will now open. Select the first two columns labeled Content, and Name. With those two columns selected right click on the header and select Remove Other Columns. This will remove all the useless data associated with the files.

Remove Other Columns

Click the Add Column ribbon and press the Add Custom Column on the left side of the ribbon.

Add Custom Column

Name the new column ExcelFileLoad and enter the following equation.

Excel.Workbook Equation

Note: Once you type “Excel.Workbook(” you can click on the column labeled Content on the right side of the screen to have the name automatically added. This is useful when you have many many columns to choose from or if there naming of those columns becomes complex. This way you won’t type in the column name incorrectly.

Click OK to proceed. Notice we now have a new column called ExcelFileLoad. Next click the Expand button (the one with the arrows) located at the right of our newly added column. Click OK to proceed.

Expand Column Button

Now we have a new column labeled ExcelFileLoad.Data, which is the data contained in our excel files. Now click in the Grey Area next to the word labeled Table. This will open up the file and reveal the information present in the file. Notice that we can see the headers and the data in our file. Row 1 contains the headers of each column. Rows after row 1 contains the medal data.

View Data of File

Next select the columns labeled Name and ExcelFileLoad.Data and right click on the column header, then select Remove Other Columns

Remove Other Columns Again

On the Add Column ribbon click Add Custom Column again. Name the column PromoteHeaders and enter the following formula. Click OK to proceed.

Promote Headers Step

Clicking again on the grey area in our newly created column reveals our tables with promoted headers.

View of Data with Promoted Headers

Next click the Expand Button, un-check the Use original column name as prefix and click the OK button to proceed.

Expand Data

Remove the following columns, ExcelFileLoad.Data, Rank, and Total, bu right selecting the columns and right clicking on the header and selecting Remove Columns. Now we want to parse out the year name from the Name column. To do this click on Name Column. Then click the Transform ribbon and click the Extract button, then select First Characters from the drop down menu.

Extract First Characters

In the Extract First Characters menu enter the number 4 and click OK to proceed.

Extract First 4 Characters

Change the following columns to whole numbers: Name, Gold, Silver, Bronze. Do this on the Transform ribbon in the Data Type drop down.

Change Data Types

We are now ready to load all the data. Rename the Query to Medals, click the Home ribbon and select Close & Apply.

Name Query

And there you have it. We have successfully loaded four excel files into one query.

Bonus: for added flare add the following measure.

Total Medal Count = sum(Medals[Gold]) + sum(Medals[Silver]) + sum(Medals[Bronze])

Now you can add the following Visualizations.

Bar Chart Visual

Stacked Bar Chart

Map Visual
August 10, 2016
Using Variables for File Locations
This week I encountered an issue when working with multiple queries in my data model. Here is the source files in case you want to follow along.

Here’s what happened. I had a PBIX file that had four queries in it, one file for the summer the Olympic metal count for the following years, 2000, 2004, 2008, and 2012.

Olympic Metal Count

After a bit of working I figured that my desktop screen was going to get to cluttered if I continued to collect Olympic metal data. Thus, I moved my excel files which were my source data into a folder called Olympic Medals.

File Move

By doing this I broke all the links for all four files. This was discovered when I tried to refresh my queries and noticed that all the queries failed. Power BI gave me a nice little message notifying me that there was a data source error.
```
DataSource.Error: Could not fine the file:
```
Missing File Error

To fix this I had to open the query editor and change each file’s location to the new folder that I just made. Seeing that this is not an efficient use of my time, I decided to spend more time to figure out a way to make a variable that would be my file location for all my queries.

Lets begin by making a new blank query by clicking on the bottom half of the New Source button on the Home ribbon. Then click the item labeled Blank Query.

Start Blank Query

With the new query open type in the file location where you will obtain all your working files. For me my file location was on my desktop, thus the file location is listed below. Rename the new query to Folder.

Folder Query

Note: Since we are working on building a file structure for Power BI to load the excel files you will want to be extra careful to add a “\” back slash at the end of the file location.

Next on the query for Medals 2000, we click the Source under the applied steps window on the right. This will expose the code in the formula bar at the top of the window.

Select the Source Applied Step

Note: If you don’t see the formula bar as I have illustrated in the image above, you can turn this feature on by click the View ribbon and checking the box next to the words Formula Bar. This will expose the formula bar so you can edit the source step.

This is where the magic happens. We can now insert our new blank query into this step. Our current file contents looks like the following:
```
= Excel.Workbook(  File.Contents(  "C:\Users\Mike\Desktop\Olympic Medals\2000 Medals.xlsx") ,  null ,  true )
```
Now remove the first part of the file location and make the equation match the following:
```
= Excel.Workbook(  File.Contents(   Folder  &   "2000 Medals.xlsx") ,   null , true )
```
Not only does this shorten our equation, it now uses the folder location we identified earlier and then we can pick up the file name 2000 Medals.xlsx. This makes is very easy to add additional queries with the same steps. Also, if you move your files to a new folder location, you only have to change the Folder query to reflect the new file location. To test this make a new folder on your desktop called New Folder. Move all the Olympic medal files to the new folder. Now in Power BI Desktop press the Refresh on the Home ribbon. This should result in the Data.Source.Error that we saw earlier. To fix this click the Edit Queries on the Home ribbon, select the Folder query and change the file directory to the new folder that you made on your desktop. It should look similar to the following:

New Folder Image

Once you’ve modified the Folder query, click Close & Apply on the Home ribbon and all your queries will now reload. Success!!

New Queries Loaded

Hope this tutorial helps and solves some of the problems when moving data files and storing information for Power BI desktop. Please Share if you like the tutorials. Thanks.
August 4, 2016
Loading Data From Folder

Let me setup a scenario for you. You get a data file from an automated system, it has the same number of columns but the data changes for every new file. Being the data savvy person that you are you’ve spent some time working in excel to make a template where you can copy your new data into and then automatically all your equations and graphs magically work. You pat your self on the back and happily send out your fantastic report to everyone you know. Then tomorrow when the data comes to you again you repeat the same process over again. Still enamored by your awesome report, you send it out again knowing you have saved your self so much time not having to do the analysis or creation of your reports over and over again. Now, fast forward 3 months. That stupid report shows up again, and now you have to lug all that data from file to file and begrudgingly you sent out your report. Thus, is the store of the analyst. You love data, but you hate it as well. Well in this tutorial I’ll show you how to remove some of the pain of that continual data loading process by loading new data from a folder.

My previous post (found here) talks about loading data from a folder. In this tutorial we will add some logic to this method that will look at a folder but only load the most recently added item from that folder.

Data for this tutorial is located this link Monthly Data Zip File. This data in the ZIP file is a monthly data sample from Feb 2016 to April of 2016.

Download the zip file mentioned above and extract the Monthly Data folder down to your desktop. Open up PowerBI Desktop and click on the Get Data button and select All on the left side. Click on the item labeled Folder and click Connect to continue.

Get Data from Folder

Select the newly unzipped Monthly Data folder that should be on your desktop. Click OK to continue. Upon opening that folder location you will be presented with the multiple files. Click Edit to edit the query.

Edit Query for Folder Load

Now you are in the Query Editor. This is where the fancy query editing will work to our advantage. We could load all the data into one large query. However, depending on the size of your data sets or how you want to report your data this may not always be desirable. Instead you may only want data from April, then May when the new data is sent next month.

Thus, our first step to start pairing down the data will be to first filter the files in sequential order. In this case because I have named the files with a Year-Month-Day format I can sort the files according to their names.

Note: When using PowerBI desktop it is a good practice to name the files beginning with a YYYY-MM-DD file name. This makes it really easy when sorting and ingesting information into PowerBI. I have used other columns of information such as Date Accessed or Date Created before but have gotten inconsistent results as these dates can change depending on when a file was moved or copied from one place to another.

Click the drop down next to Name and sort the files in Sort Descending.

Name in Sort Descending

This places the files with the most recent file at the top of the list.

File List in Descending Order

Next click on the Keep Rows button on the Home ribbon, select Keep Top Rows.

Keep Top Rows

Enter the number 1 when the popup appears. Click OK to continue.

Keep Top Rows Menu

Now you’ll notice you have only one file selected which is our latest file from April. Click the Load File button found in the Content column.

Load File Button

We have completed the activities in the Query Editor and can now load the data. Click Close & Apply found on the Home ribbon. All our April data has loaded. by making a simple table we can now see all the data that was just loaded.

Loaded Data from April

Now we will remove some data from our desktop folder labeled monthly data. Open the folder on the desktop labeled Monthly Data and delete the filed labeled 2016-04-01 April. You should now have a folder labeled Monthly Data with only two files in it, one for Feb and one for March.

Two Files Left

Return back to Power BI Desktop and click the Refresh button on the Home ribbon. Notice now how all our data has changed. We are now looking at the March data because it is the most recent file in our folder based on the file name.

March Data Load

To verify this we open the query editor (Click the Edit Queries on the Home ribbon). Click Refresh Preview on the Home ribbon and finally select the Applied Step called Kept First Rows. This will reveal the month of March as our data source.

Month of March Loaded

Now, every time you add a new file to our folder and refresh PowerBI the latest file (based on the naming convention we talked about earlier) will always be loaded.

Note: This method works great when your data source is coming from an automated system. The file format must always be the same for this to work reliability. If the file naming convention changes, or the number of columns or location of those columns changes then the query will most likely fail.

Good luck and thanks for following along.

June 16, 2016
Query Editor – Editing M Code

In this tutorial we’ll learn how to copy and paste queries to and from the Query Editor. When your working in Power BI Desktop often you will need to share and model the data before it can be applied to the visual. In my experience you’ll need to add a calculated column or break out a date such as 1/5/2016 into the the Year 2016 or Month 01, components to properly display a visual.

We will start off with from a prior example where we build a shaded region map. The tutorial to create this Power BI Desktop file is located here.

If you want to cheat and download the final PBIX file you can download and open the zipped file here: Regional Filled Map Example

This file was made in Power BI Desktop April 2016 version, 2.34.4372.322, download the latest version from Microsoft Here.

Open the zip file that you downloaded and extract the file inside labeled Regional Filled Map Example. Open the file. Once you’ve opened the file on page 1 of the you see a map of the united states that looks similar to the following.

Opened File with Map

Now we well enter the query editor. Click on the Edit Queries on the Home ribbon. You opened the Query Editor. In this window we shape and model the data so we can properly visualize it on the pages. Couple of things to notice here. Every time you press a button on the ribbon, the query editor generates an Applied Step. Each step writes a line of M code which transforms the data as it is loaded into the computer’s memory. In this case we have (7) seven steps starting at Source and ending with Changed Type1.

Query Editor Revealing Applied Steps

We want to expose the code that is begin generated at every step behind the scenes. Click on the View ribbon and then click on the button called Advanced Editor.

Query Editor – Advanced Editor

Opening this window reveals the M language code that is generating each Applied Step we saw earlier.

Note: some of the steps we saw earlier such as Filtered Rows had a space in it. In the query editor any applied step had a space in the name gets the added #”” around the applied step name. Thus, in the query editor Filter Rows would be #”Filtered Rows”. The hashtag and the quotations define the complete variable. If you changed the name of the applied step to FilteredRows, with no space. In the Advanced Editor you’d only see the step labeled as FilterRows, no hastag or quotations needed.

Now that the M language is revealed you can made modifications to the code. In cases where you want to make a function you would do so in the Advanced Editor. For our example today select all the code and copy it to the clipboard using the keyboard shortcut CTRL+C. Click Done to close the window.

Now lets copy that code into a brand new query. Click the Home ribbon, then click New Source, scroll all the way to the bottom of the list and select Blank Query. Click Connect to start a blank query.

Get Data – Blank Query

A new Query will now open up. Click the View ribbon, then click Advanced Editor. A blank editor window will open.

Blank Query

Paste the code we copied earlier into this window. Now the new Query1 should look like the following:

Paste Code in to Advance Editor

Click Done and the new query will now load. It is that simple, now we have two identical queries.

May 19, 2016