Tag: ggplot2

  • Santa Loves Power BI and R

    Santa Loves Power BI and R

    This past week I was talking with the big guy up north, jolly old fella, and the discussion came up about his toy production levels.  Santa was complaining about how hard it was to measure the performance of all his elves.  Naturally I started babbling about how much I enjoy Power BI and that I use it on all kinds of sources of data, google analytics, excel sheets, sharepoint, and SQL data warehouses just to name a few.  Now by this point most people would have wandered off looking for another conversation, but I must have struck a chord with Santa.  He jumped right in the conversation and told me how he had just moved all his local data centers into Azure and more specifically SQL data warehouses.  It was saving him loads of money in addition it has freed up all his I.T. elves to move to more important tasks, building the NES Classic for Nintendo, they are way behind in production.  To make a long story longer, I was able to convince Santa to give me a small sample of data so I could show him how to use R to visualize his data in PowerBI.  Here is what I came up with:

    Santa Production Levels
    Santa Production Levels

    Needless to say he was very pleased.  I explained the chart to Santa, each bar represents the average production volume for each elf.  Then the whiskers at the end of the bar represent the +1 and -1 standard deviation away from that mean.  It essentially tells you how consistent each elf is able to produce products and what is the average production rate.  For example, Buddy the Elf can produce an average 148 items in a day, he has a daily variance of 10 items.  Charlie can produce on average more items but has a wider daily variance.  Snowflake has the lowest average production level but is one of the more consistent producers.  Santa gave me a big smile and said “nice job.”

    Let’s walk through how I did this.

    Open up PowerBI Desktop, Click the Get Data button on the Home ribbon and select Blank Query.  Click Connect to open the Query Editor.  Click Advanced Editor on the View ribbon.  While in the Advanced Editor paste the following code into the editor window, click Done to complete the data load.

    Note: If you need some more help loading the data follow this tutorial about loading data using the Advanced Query Editor.  This tutorial teaches you how to copy and paste M code into the Advanced Editor.

    let
        Source = Excel.Workbook(Web.Contents("https://powerbitips03.blob.core.windows.net/blobpowerbitips03/wp-content/uploads/2016/12/Santa-Production.xlsx"), null, true),
        Production_Table = Source{[Item="Production",Kind="Table"]}[Data],
        #"Changed Type" = Table.TransformColumnTypes(Production_Table,{{"Elf", type text}, {"Toy", type text}, {"Prodution Volume", Int64.Type}})
    in
        #"Changed Type"

    Before you exit the query editor Rename the query to Production. It should look similar to the following:

    Production Query
    Production Query

    Click Close & Apply on the home ribbon.

    Add the following measures by click on the New Measure button on the Home ribbon.

    Avg = AVERAGE(Production[Prodution Volume])

    The Avg measure will determine the height of each bar in the bar chart.

    StdDev = STDEV.P('Production'[Prodution Volume])

    The StdDev will calculate the standard deviation for each elf.

    Ymax = [Avg]+ [StdDev]

    The Ymax calculation adds the Avg measure value to the standard deviation for production.  This produces the upper arm of the whisker.

    Ymin = [Avg]-[StdDev]

    The Ymin calculation is subtracts the standard deviation from the Avg measure value.  This produces the lower arm of the whisker.

    Once you have completed making all the measures you should have a Production table with the following fields:

    Added Measures
    Added Measures

    Add the table visual by click on the Table visual in the Visualizations pane.  Add the Fields which are shown below.  Your table should look identical to this:

    Production Table
    Production Table

    Next, add the R Visual from the visualization Pane.  When you click on this you will get a message stating “Enable Script Visuals” click Enable to proceed.

    Note: If you have not installed R or enabled the preview features of R in Power BI you should follow this tutorial which helps you get everything set up.  For this particular visual we are using ggplot2.  This is a package for R and should be installed in the R environment.  You can follow this tutorial on how to install ggplot2.

    Add the following fields into the R visual:

    Add Fields to R Visual
    Add Fields to R Visual

    Next in the R Script Editor add the following code to generate the R Script.

    library (ggplot2) # Load ggplot to run visuals

    # Set up graph
    ggplot(dataset, aes(x = Elf, y = Avg) ) +

    # Insert the bar chart using acutal values passed to visual
    # Stat = “identity” does not count items uses actual values
    # set up transparency to 70% with Alpha
    geom_bar( stat = “identity”, aes( alpha= 0.7, fill = Elf ) ) +
    # draw the error bars, use pass Ymin & Ymax from PBI
    geom_errorbar(aes(width = .5, colour = Elf , ymin = Ymin, ymax = Ymax)) +

    # Change the Labels
    labs(x = “Elf Name”, y = “Production Vol.” ) +

    # Make the theme simple and remove gridlines
    # Change the font size to 14
    theme_classic( base_size = 18 ) +

    # Remove the legend
    theme( legend.position = “none”) +

    # Change elements of the Axis, Angle, horizontal & Vertical position
    theme( axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.3),
    axis.text = element_text(colour = “black”),
    axis.ticks = element_line(colour = “black”),
    axis.title = element_text(colour = “black”),
    plot.background = element_rect(colour = NA) )

    Note: This code uses the R package ggplot2.  It will error out if you don’t have ggplot2 installed. 

    Click the run icon to execute the R script.

    Add R Script
    Add R Script & Run Script

    When the script runs you will have a beautiful production chart.

    R Chart
    R Chart

    Thanks for following along.  Like always be sure to share if you liked this content.  I am always, looking for feedback and possible topics so make sure you leave a comment below.

    If you want to download a similar example already completed you can download this example from the R Script Showcase (don’t forget to give me a thumbs up).

    Merry Christmas!

  • Map with Data Labels in R

    Map with Data Labels in R

    Mapping is one of the better features of PowerBI.  It is one of the more distinguishing feature differences between Excel and PowerBI.  You can produce a map inside an excel document using Bing maps, however, the experience has always felt a little like an after-thought.  Mapping within PowerBI has a planned, and thoughtful integration.  While the mapping functionalities within PowerBI Desktop are far improved when compared to excel, there are still some limitations to the mapping visuals.  This past week I encountered such an example.  We wanted to draw a map of the United States, add state name labels and some dimensional property like year over year percent change.

    I started with the standard map visual, but this didn’t work because there is no ability to shade each state individually.  This just looked like a bubbled mess.

    Globe Map Visual
    Globe Map Visual

    Next, I tried the Filled Map visual.  While this mapping visual provides the colored states it lacks the ability to add data labels onto the map.  Clicking on the map would filter down to the selected state, which could show a numerical value.  Alternatively, you can place your mouse over a state and the resulting tag will show the details of the state (hovering example provided below).

    Filled Map Visual
    Filled Map Visual

    Still this did not quite meet my visual requirements.  I finally decided to build the visual in R which provided the correct amount of flexibility. See below for final result.  You can download the pbix file from the Microsoft R Script Showcase.

    R Map Visual
    R Map Visual

    In this visual, each state is shaded with a gradient color scale.  The states with the lowest sales are grey and the states with higher sales numbers transition to dark blue.  The darker the blue the more sales the state saw.  Each state has an applied label.  The color of the label denotes the percent change in sales.  If the color is green then the sales this year were higher than last year, red means that the state sales were lower this year.  The state name is listed in the label as well as the calculation for the year over year percent change.

    Alright, let’s start the tutorial.

    First, before we open PowerBI we need to load the appropriate packages for R.  For this visual you will need to load both the maps and the ggplot2 packages from Microsoft R Open.

    Open the R console and use the following code to install maps.

    install.packages('maps')
    Install Maps Package
    Install Maps Package

    Repeat this process for installing ggplot2.

    install.packages('ggplot2')

    After installing the R packages we are ready to work in PowerBI Desktop.  First, we need to load our sample data.  Open up PowerBI Desktop and start a blank query.  On the View ribbon in the query editor open the Advanced Editor and enter the following M code.

    Note: If you need some more help loading the data follow this tutorial about loading data using the Advanced Query Editor.  This tutorial teaches you how to copy and paste M code into the Advanced Editor.

    let
      Source = Excel.Workbook(Web.Contents("https://powerbitips03.blob.core.windows.net/blobpowerbitips03/wp-content/uploads/2016/10/State-Data.xlsx"), null, true),
      StateData_Table = Source{[Item="StateData",Kind="Table"]}[Data],
      #"Changed Type" = Table.TransformColumnTypes(StateData_Table,{{"StateName", type text}, {"Abb", type text}, {"TY Sales", Int64.Type}, {"state", type text}, {"Latitude", type number}, {"Longitude", type number}, {"LY Sales", Int64.Type}, {"Chng", type number}}),
      #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"TY Sales", "Sales"}})
    in
      #"Renamed Columns"

    After pasting the code into the Advanced Editor click Done to load the data.  While in the Query Editor, rename the query to be StateData, then click Close & Apply on the Home ribbon.

    Load Mapping Data
    Load Mapping Data

    We still need to prepare the data further by adding two calculated columns.  Click the bottom half of the New Measure button on the Home ribbon and select New Column.

    Add New Column
    Add New Column

    Enter the following code into the formula bar that appears after clicking New Column.

    Change = StateData[Abb] & " " & ROUND(100*StateData[Chng],0) & "%"
    Change Column Measure
    Change Column Measure

    Again, click on the New Column button found on the Home ribbon and add the code for a color column.

    Color = if(StateData[Chng] > 0 , "Dark Green", "Dark Red")
    Color Column Measure
    Color Column Measure

    The Fields list should now look like the following.

    Fields List
    Fields List

    Add the R visual with the following fields.

    R Visual Fields
    R Visual Fields

    Add the following R script into the R Script Editor.

    # Load the ggplot2 and maps packages
     library(ggplot2)
     library(maps)
    
    # Load the mapping data into a dataframe called states_map
     states_map <- map_data("state")
    
    # Start ggplot2 by sending it the dataset and setting the map_id variable to state
     ggplot(dataset, aes(map_id = state)) +
    
    # Add the map layer, define the map as our data frame defined earlier
     # as states_map, and define the fill for those states as the Sales data
     geom_map(map = states_map, aes(fill=Sales)) +
    
    # Add the data for the labels
     # the aes defines the x and y cordinates for longitude and latitude
     # colour = white defines the text color of the labels
     # fill = dataset$Color defines the label color according to the column labeled Color
     # label = dataset$Change defines the text wording of the label
     # size = 3 defines the size of the label text
     geom_label( aes(x=Longitude, y=Latitude), 
      colour="white", 
      fill=dataset$Color, 
      label=dataset$Change, size=3
      ) +
    
    # define the x and y limits for the map
     expand_limits(x = states_map$long, y = states_map$lat) +
    
    # define the color gradient for the state images
     scale_fill_gradient( low = "dark grey", high = "#115a9e") +
    
    # remove all x and y axis labels
     labs(x=NULL, y=NULL) +
    
    # remove all grid lines
     theme_classic() +
    
    # remove other elements of the graph
     theme(
      panel.border = element_blank(),
      panel.background = element_blank(),
      axis.ticks = element_blank(),
      axis.text = element_blank()
      )

    After adding the R script press the execute button to reveal the map.

    Paste R Script
    Paste R Script
    Final Map Product
    Final Map Product

    Notice how we have data included for Alaska and Hawaii but those states are not drawn.  We want to remove the Alaska and Hawaii data points.  Add the StateName field to the Page Level Filters and then click Select All.  Now, un-check the boxes next to Alaska and Hawaii.  The data is now clean and the map correctly displays only the continental United States.

    Page Level Filters
    Page Level Filters

    Here is the filtered final map product.

    Filtered Final Map
    Filtered Final Map

    Thanks for following along.  I hope you enjoyed this tutorial.  Please share if you liked this content.  See you next week.

     

  • HexBin Plot using R

    HexBin Plot using R

    Continuing on the theme with R this month, this week tutorial will be to design a hexagonal bin plot.  At first you may say what in the world is a hexagonal bin plot.  I’m glad you asked, behold a sweet honey comb of data:

    Hexbin Plot
    Hexagonal Bin Plot

    The hexagonal bin plot looks just like a honey comb with different shading.   In this plot we have a number of data points with are graphed in two dimensions (Dimension 1, x-axis and Dimension 2, y-axis).  Each hexagon square represents a collection of points.  Now, if we plot only the points on the same graph we have the following.

    Scatter Plot
    Scatter Plot

    In the scatter plot, it’s difficult to see the concentration of points and if there is any correlation between the first dimension and the second dimension.  By comparison, the hex bin plot counts all the points and plots a heat map.  And, if you ask me the hexagonal bin plot just looks better visually.  To bring this all together, if we overlay the scatter plot on top of the hexagonal bin plot you can see that the higher concentration of dots are in the shaded areas with darker red.

    Plot Overlay
    Plot Overlay

    Cool, now lets build some visuals.  Lets begin.  Tutorial <- Hexagonal Bin Plot   (sorry had to interject a bit of R humor here, ignore if you don’t like code humor)

    The very first step will be to open the R console and to install a new library called HexBin.  Run the following code in the Mircosoft RGui.

    install.packages("hexbin")

    This will load the correct library for use within PowerBI.

    Install hexbin
    Install hexbin

    Start by opening up PowerBI.  Click on the Get Data button on the home ribbon, then select Blank Query.  In the Query editor click on the View ribbon and click on the Advanced Editor.  Enter the following query into the Advanced Editor:

    let
     Source = Csv.Document(Web.Contents("https://powerbitips03.blob.core.windows.net/blobpowerbitips03/wp-content/uploads/2016/09/Hexabin-Data.csv"),[Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
     #"Promoted Headers" = Table.PromoteHeaders(Source),
     #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"SampleID", Int64.Type}, {"Xvalues", type number}, {"Yvalues", type number}})
    in
     #"Changed Type"

    This query loads a csv file of data into PowerBI.

    Note:  For more information on how to open and copy and paste M language into the Advanced Editor you can follow this tutorial, which will walk you though the steps.

    After the clicking Done in the Advanced Editor the data will load.  Next rename the query to Hexabin Data and then on the Home ribbon click Close & Apply.

    Save Query
    Save Query

    Next click on the R visual in the Visualizations bar on the right side of the screen.  There will likely be a pop up warning you about enabling R Scripts.  Click Enable to activate the R script editor.  With the R script visual selected on the page add the following columns to the Values field selector.

    R Visual Fields
    R Visual Fields

    Notice that the R visual is blank at this time.  Next add the following R code in the R script editor window.  This will tell PowerBI Desktop to load the ggplot2 library and define all the parameters for the plot.  I’ve added comments to the code using # symbols.

    library(ggplot2) #load ggplot2 package
    
    # define the data inputs to ggplot
     # set data for x and y values x=, and y=
     # set the min and max for both the x and y axis, xmin=, xmax=, ymin= and ymax=
     ggplot(dataset, aes(x=Xvalues,y=Yvalues, xmin=40, xmax=90, ymin=10, ymax=30)) +
    
    # define the color of the outline of the hexagons with color=c()
     # using c(#"809FFF") allows for the usage of hexadecimal color codes
     stat_binhex(bins=15, color=c("#D7DADB")) +
    
    # set the graph theme to classic, provides a white background and no grid lines
     # Change font size to 18 by using base_size = 18
     theme_classic(base_size=18) +
    
    # Apply lables to the graph for x and y
     labs(x = "Dimension 1", y = "Dimension 2")+
    
    # change the gradient fill to range from grey to Red
     scale_fill_gradient(low = "grey", high = "red")

    Click the run button and the code will execute revealing our new plot.

    R Script Code
    R Script Code

    One area of the code that is interesting to change is the section talking about the number of bins.  In the code pasted above the code states there are 15 bins.

    stat_binhex(bins=15, color=c("#D7DADB")) +

    Try increasing this number and decreasing this number to see what happens with the plot.

    Five Bins
    Five Bins
    stat_binhex(bins=5, color=c("#D7DADB")) +
    Thirty Bins
    Thirty Bins
    stat_binhex(bins=30, color=c("#D7DADB")) +

    Well that is it.  Thanks for reading through another tutorial.  I hope you had fun.

    Want to see more R checkout the Microsoft R Script Showcase.  If you want to download the PBIX file used to create this visual you can download the file here.

    If you want to learn more about R and the different visuals you can build within R check out this great book which helped me learn plotting with R.

  • Digging Deeper with R Visuals for PowerBI

    Digging Deeper with R Visuals for PowerBI

    Back by popular demand, we have another great tutorial on using R visuals.  There are a number of amazing visuals that have been supplied with the PowerBI desktop tool.  However, there are some limitations.  For example you can’t merge a scatter plot with a bar chart or with a area chart.  In some cases it may be applicable to display one graph with multiple plot types.  Now, to be fair Power BI desktop does supply you with a bar chart and line chart, Kudos Microsoft, #Winning…. but, I want more.

    This brings me to the need to learn R Visuals in PowerBI.  I’ve been interested in learning R and working on understanding how to leverage the drawing capabilities of R inside PowerBI.  Microsoft recently deployed the R Script Showcase, which has excellent examples of R scripts.  I took it upon myself to start learning.  Here is what I came up with.

    R Plot in PowerBI Desktop
    R Plot in PowerBI Desktop

    This is an area plot in the background, a bar chart as a middle layer and dots for each bar.  The use case for this type of plot would be to plot sales by item number,  sales are in the dark blue bars, and the price is shown as the light blue dots.  The area behind the bars represent a running total of all sales for all items.  Thus, when you reach item number 10, the area represents 100% of all sales for all items listed.

    If you want to download my R visual script included in the sample pbix file you can do so here.

    Great, lets start the tutorial.

    First you will need to make sure you have installed R on your computer.  To see how to do this you can follow my earlier post about installing R from Microsoft Open R project.  Once you’ve installed R open up the R console and enter the following code to install the ggplot2 package.

    install.packages("ggplot2")
    Install ggplot2 Code
    Install ggplot2 Code

    Once complete you can close the R console and enter PowerBI Desktop.  First, we will acquire some data to work with.  Click on the Home ribbon and then  select Enter Data.  You will be presented with the Create Table dialog box.  Copy and paste the following table of information into the dialog box.

    Item Sales Price Customer
    1 100 20 Customer A
    2 75 25 Customer A
    3 20 30 Customer A
    4 18 15 Customer A
    5 34 26 Customer A
    6 12 23 Customer A
    7 20 22 Customer A
    8 15 19 Customer A
    9 10 17 Customer A
    10 8 26 Customer A
    1 120 21 Customer B
    2 80 24 Customer B
    3 62 33 Customer B
    4 10 15 Customer B
    5 12 26 Customer B
    6 60 24 Customer B
    7 20 23 Customer B
    8 10 20 Customer B
    9 8 16 Customer B
    10 7 20 Customer B

    Rename your table to be titled Data Sample.

    datatable
    Data Sample Table

    Click Load to bring in the data into PowerBI.

    Next, we will need to create a cumulative calculated column measure using DAX.  On the home ribbon click the New Measure button and enter the following DAX expression.

    Cumulative = CALCULATE(  sum('Data Sample'[Sales] ) ,   FILTERS(  'Data Sample'[Customer] ) ,  FILTER( all( 'Data Sample' )  ,  'Data Sample'[Item] <= MAX( 'Data Sample'[Item] ) ) )

    This creates column value that adds all the sales of the items below the selected row.  For example if I’m calculating the cumulative total for item three, the sum() will add every item that is three and lower.

    Now, add the R visual by clicking on the R icon in the Visualizations window.

    Note: There will be an approval window that will require you to enable the R script visuals.  Click Enable to proceed.

    Enable R Visuals
    Enable R Visuals

    While selecting the R visual add the following columns to the Values field in the Visualization window.

    Add Column Data
    Add Column Data

    Note: After you add the columns to the Values the R visual renders a blank image.  Additionally, there is automatic comments entered into the R Script Editor (the # sign is a designation that denotes a text phrase).

    Next, enter the following R code into the script editor.

    library(ggplot2)   # include this package to use Graphing functions below
    
    ggplot(dataset, aes(xmin=1, x=Item)) +    # Initialize ggplot function, define the x axis with Item data
     geom_ribbon(fill=c("#D7DDE2"),           # Set the color of the Area Plot,
     aes( ymin=0, ymax=Cumulative )) +        # Define the Y-Axis data
     geom_bar(fill=c("#21406D") ,             # Define the color of the Bars
     stat = "identity" ,      # Define the Statatics property of the bars - This is a required field
     width=.6 ,               # Change the bar width to 60% - 1 would be full bar width
     aes( x=Item, y=Sales )) +          # Define the X and Y axis for bars
     geom_point( color=c("#809FFF"),    # Define the color of the dots
     size=4,                  # Define the dot size
     aes( x=Item, y=Price )) +          # Define the X and Y axis values
     theme_classic(base_size=18) +      # Remove unwanted items from plot area such as grid lines and X and Y axis lines, Change font size to 18
     theme( axis.title.x = element_text(colour = "dark grey"),     # Define the X axis text color
     axis.title.y = element_text(colour = "dark grey")) +          # Define the Y axis text color
     labs( x="Item Number", y="Sales")                             # Define the labels of the X and Y Axis

    Press the execute R Script button which is located on the right side of the R Script Editor bar.

    Execute R Script Editor Button
    Execute R Script Editor Button

    The R Script will execute and the plot will be generated.

    R Plot Generation
    R Plot Generation

     

    Great, we have completed a R visual.  So what, why is this such a big deal.  Well, it is because the R Script will execute every time a filter is applied or changed.  Lets see it in action.

    Add a slicer with the Customer column.

    Add Customer Slicer
    Add Customer Slicer

    Notice when you select the different customers, either A or B the R script Visual will change to reflect the selected customer.

    Customer B Selected
    Customer B Selected

    Now you can write the R script code once and use the filtering that is native in PowerBI to quickly change the data frame supporting the R Visuals.

    As always, thanks for following along.  Don’t forget to share if you liked this tutorial.

    Want to learn more about PowerBI and Using DAX.  Check out this great book from Rob Collie talking the power of DAX.  The book covers topics applicable for both PowerBI and Power Pivot inside excel.  I’ve personally read it and Rob has a great way of interjecting some fun humor while teaching you the essentials of DAX.