Loading Data From Folder

Let me setup a scenario for you.  You get a data file from an automated system, it has the same number of columns but the data changes for every new file.  Being the data savvy person that you are you’ve spent some time working in excel to make a template where you can copy your new data into and then automatically all your equations and graphs magically work.  You pat your self on the back and happily send out your fantastic report to everyone you know.  Then tomorrow when the data comes to you again you repeat the same process over again.  Still enamored by your awesome report, you send it out again knowing you have saved your self so much time not having to do the analysis or creation of your reports over and over again.  Now, fast forward 3 months.  That stupid report shows up again, and now you have to lug all that data from file to file and begrudgingly you sent out your report.  Thus, is the store of the analyst.  You love data, but you hate it as well.  Well in this tutorial I’ll show you how to remove some of the pain of that continual data loading process by loading new data from a folder.

My previous post (found here) talks about loading data from a folder.  In this tutorial we will add some logic to this method that will look at a folder but only load the most recently added item from that folder.

Data for this tutorial is located this link Monthly Data Zip File.  This data in the ZIP file is a monthly data sample from Feb 2016 to April of 2016.

Download the zip file mentioned above and extract the Monthly Data folder down to your desktop.  Open up PowerBI Desktop and click on the Get Data button and select All on the left side.  Click on the item labeled Folder and click Connect to continue.

Get Data from Folder
Get Data from Folder

 

Select the newly unzipped Monthly Data folder that should be on your desktop.  Click OK to continue. Upon opening that folder location you will be presented with the multiple files.  Click Edit to edit the query.

Edit Query for Folder Load
Edit Query for Folder Load

Now you are in the Query Editor.  This is where the fancy query editing will work to our advantage.  We could load all the data into one large query.  However, depending on the size of your data sets or how you want to report your data this may not always be desirable.  Instead you may only want data from April, then May when the new data is sent next month.

Thus, our first step to start pairing down the data will be to first filter the files in sequential order.  In this case because I have named the files with a Year-Month-Day format I can sort the files according to their names.

Note:  When using PowerBI desktop it is a good practice to name the files  beginning with a YYYY-MM-DD file name.  This makes it really easy when sorting and ingesting information into PowerBI.  I have used other columns of information such as Date Accessed or Date Created before but have gotten inconsistent results as these dates can change depending on when a file was moved or copied from one place to another.

Click the drop down next to Name and sort the files in Sort Descending.

Name in Descending Sort
Name in Sort Descending

This places the files with the most recent file at the top of the list.

File List in Descending Order
File List in Descending Order

Next click on the Keep Rows button on the Home ribbon, select Keep Top Rows.

Keep Top Rows
Keep Top Rows

Enter the number when the popup appears.  Click OK to continue.

Keep Top Rows Menu
Keep Top Rows Menu

Now you’ll notice you have only one file selected which is our latest file from April.  Click the Load File button found in the Content column.

Load File Button
Load File Button

We have completed the activities in the Query Editor and can now load the data.  Click Close & Apply found on the Home ribbon.  All our April data has loaded.  by making a simple table we can now see all the data that was just loaded.

Loaded Data from April
Loaded Data from April

Now we will remove some data from our desktop folder labeled monthly data.  Open the folder on the desktop labeled Monthly Data and delete the filed labeled 2016-04-01 April.  You should now have a folder labeled Monthly Data with only two files in it, one for Feb and one for March.

Two Files Left
Two Files Left

Return back to Power BI Desktop and click the Refresh button on the Home ribbon.  Notice now how all our data has changed.  We are now looking at the March data because it is the most recent file in our folder based on the file name.

March Data Load
March Data Load

To verify this we open the query editor (Click the Edit Queries on the Home ribbon).  Click Refresh Preview on the Home ribbon and finally select the Applied Step called Kept First Rows.  This will reveal the month of March as our data source.

Month of March Loaded
Month of March Loaded

Now, every time you add a new file to our folder and refresh PowerBI the latest file (based on the naming convention we talked about earlier) will always be loaded.

Note: This method works great when your data source is coming from an automated system.  The file format must always be the same for this to work reliability.  If the file naming convention changes, or the number of columns or location of those columns changes then the query will most likely fail.

Good luck and thanks for following along.

It's only fair to share...
Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0Pin on Pinterest0Share on Tumblr0

9 Comments

  1. Again, Great article. I really enjoy learning from your examples. This shows how we always have the most recent folder but what if we wanted the accumulation of the three months Feb March and April and NOW we have May and we want to add that file to the folder so now we have trends showing all four months ad so on as we get more monthly files to add to this folder.

  2. Is it possible to have a slicer for the folder. For example i have daily and monthly reports for different projects lets say P1, P2 etc. And inside i have mulitple folders with error report, productivity report, timesheet etc. I want to have a slicer selection for the folders? Is this possible?

    • In a simple term yes you could have a slicer for the different folders. However, in order to implement this you would need to load all folders. All the data would load for all folders into the report. I think this possible, but would get complex in the query editor which would require writing functions to load all the data in each folder. Instead you may want to use a parameter with in the query editor to select the folder you want to load. This way you can edit the query pick your folder, refresh, and the data will load from the appropriate folder. I did something similar to a parameter in this tutorial. This would be the easiest implementation of loading different folders.

  3. Hi, great article. In Windows 8 and 10, we could add more columns to a folder, eg. for picture folder, we can have the camera model, date taken, etc columns added.
    I would like to include this additional columns into the get data from folder option, but it seems to fixed to the out-of-box columns only. Can you advise if we can configure the system to include these additional columns?

    • Albert, great question, as far as I understand PowerBI today, the only folder columns you are able to retrieve are the default ones that PowerBI detects. I am not aware of any configuration that can be changed which would give you additional columns during the file load.

  4. I found a typo:
    We are NOT looking at the March data because it is now the most recent file in our folder.
    Should be:
    We are NOW looking at the March data because it is now the most recent file in our folder.

  5. Great Job.
    I wonder if there is a way to tell Power BI: ‘Hey, i do want to load all of my monthly files; however, refresh only the files that have been modified’.

    This would improve my refresh latency from >40 min to less than 2 minutes.

    Any ideas?

    Thank you all!

    • I am not aware of such a function. However, If I were going to solve this kind of problem here is how I would try to do this. I would make two queries one that loaded all files with a changed date greater than one day. Then a separate query that loads files which have been changed within the same day. Then create a third query to join both queries together. This might speed things up, but what I don’t know is if you can request to refresh only one query on load time. You’d have to test that out.

Leave a Reply

Your email address will not be published.


*


Comment moderation is enabled. Your comment may take some time to appear.