Previously we’ve done a tutorial on loading multiple text files within one query. This is nice, however we will also need to import multiple Excel files. First, to understand the procedure of querying multiple excel files you have to understand the basics between the CSV (comma separated values) file and an excel (.xls or .xlsx) files. In a CSV file you have only one data set. The beginning of the file starts with values and separates each file with a “,” a carriage return starts a new row of data. This is an easy and efficient way to store millions of rows of data. By contrast the excel file is way more complicated. Excel files can have multiple sheets of tables of data. Think of this as a stack of CSV type files. For example if you have an excel workbook with three sheets of data, Sheet 1, Sheet 2, Sheet 3. You can think of those three sheets as grid of data, similar to the CSV file. The multiple sheet aspects of an excel file makes the data ingestion into PowerBI a little bit more complicated. To add to the complication, when you loading data from either multiple sheets, or selecting a specific out of many sheets of data. For illustration purposes imagine working with two excel files with three sheets each, 2 x 3 = 6, a total of 6 sheets of data, or what I will call “pages” of data. This is why it is more complex to load excel files than CSV files.
Note: If you want to learn how to load multiple CSV files visit this tutorial.
Not only do you have to figure out what data you want to ingest on the page you must all tell PowerBI which sheets do you want to look at, and from which excel file. If that was to many words think of loading the following data sample:
Workbook 1 – Year 2000 Olympic Medals
- Sheet 1
- Olympic Medals Table
- Olympic Medals Table
- Sheet 2
- Sheet 3
WorkBook 2 – Year 2004 Olympic Medals
- Sheet 1
- Olympic Medals Table
- Olympic Medals Table
- Sheet 2
- Sheet 3
The data structure for both workbook 1 and 2 are similar but the names of the files are different and there can be multiple pages.
To resolve this we will have to write a M language function that will load each file as a function. This will be done in later in the tutorial.
Here is the data source information for Olympic medals won by each country from 2000 to 2012, download here. Inside the Medal Count zip file are four xlsx files, extract them to your desktop. Move the files into a folder on your desktop labeled Medals.
Now, open up PowerBI, We will begin shaping our data to load all the excel files. On the Home ribbon click on the Get Data button. Select Folder on the right side and click Connect.
Next select the folder path that you want acquire the files from, Click OK to continue.
Next we are presented with the loaded files within our selected folder. Click Edit at the bottom of the screen to proceed. The Query Editor window will now open. Select the first two columns labeled Content, and Name. With those two columns selected right click on the header and select Remove Other Columns. This will remove all the useless data associated with the files.
Click the Add Column ribbon and press the Add Custom Column on the left side of the ribbon.
Name the new column ExcelFileLoad and enter the following equation.
Note: Once you type “Excel.Workbook(” you can click on the column labeled Content on the right side of the screen to have the name automatically added. This is useful when you have many many columns to choose from or if there naming of those columns becomes complex. This way you won’t type in the column name incorrectly.
Click OK to proceed. Notice we now have a new column called ExcelFileLoad. Next click the Expand button (the one with the arrows) located at the right of our newly added column. Click OK to proceed.
Now we have a new column labeled ExcelFileLoad.Data, which is the data contained in our excel files. Now click in the Grey Area next to the word labeled Table. This will open up the file and reveal the information present in the file. Notice that we can see the headers and the data in our file. Row 1 contains the headers of each column. Rows after row 1 contains the medal data.
Next select the columns labeled Name and ExcelFileLoad.Data and right click on the column header, then select Remove Other Columns
On the Add Column ribbon click Add Custom Column again. Name the column PromoteHeaders and enter the following formula. Click OK to proceed.
Clicking again on the grey area in our newly created column reveals our tables with promoted headers.
Next click the Expand Button, un-check the Use original column name as prefix and click the OK button to proceed.
Remove the following columns, ExcelFileLoad.Data, Rank, and Total, bu right selecting the columns and right clicking on the header and selecting Remove Columns. Now we want to parse out the year name from the Name column. To do this click on Name Column. Then click the Transform ribbon and click the Extract button, then select First Characters from the drop down menu.
In the Extract First Characters menu enter the number 4 and click OK to proceed.
Change the following columns to whole numbers: Name, Gold, Silver, Bronze. Do this on the Transform ribbon in the Data Type drop down.
We are now ready to load all the data. Rename the Query to Medals, click the Home ribbon and select Close & Apply.
And there you have it. We have successfully loaded four excel files into one query.
Bonus: for added flare add the following measure.
Total Medal Count = sum(Medals[Gold]) + sum(Medals[Silver]) + sum(Medals[Bronze])
Now you can add the following Visualizations.
Is there a way to do this from a sharepoint folder?
Thanks for the question. There is a way to load multiple excel files from SharePoint.
To use share point as a data location you will want to Get Data on the Home ribbon. In the Get Data window you will select SharePoint Folder. This will open up a new window prompting you for the SharePoint site. Your site address should be something like the following https://mycustomsharepointsite.com/sites/mysite (there maybe more in the address bar when you visit the home page of your SharePoint site but this will be all you need). Edit the query then using filters under the file name you can pair down the file list to just the data that you need.
I hope that helps. I have SharePoint sites at work that I connect to and use as data sources. However, I don’t have a non-work SharePoint site and thus I can’t do a specific example of this. Sorry about that.
This is really helpful. I have a small doubt.
I can see that, you have kept the file names. But even the file names can be eliminated if we choose to. Will there be any problem if we load files with different names.
You shouldn’t have any issue when loading new files names. The query editor acts like a stepped procedure. It will process one step at a time. Thus, removing the file names in a later step has no effect.
I have excels with different contents and fields in each file. Is it possible to load it in a simple way (Load all files in the folder) rather than loading each file one by one.
Great question! In this example we are loading multiple files that have the exact same data structure inside. The query editor is really good at loading data that is similar in column structure… but, it is not good at loading many files of different data structures, or multiple columns. The query editor is intended to load one type of data (multiple excel, csv files, SQL query… etc.) After the file(s) have been loaded the Query Editor allows for data transformation to prepare the data for visualizations. Each data source should be unique with different data transformation steps.
If I read your question correctly, you’d like to load a bunch of files of all different data structures, such as one file with Sales by month, Another with a different data such as product categories. I would not recommend loading all of the information as one table. Rather, I would suggest to make things easier you can load all the files in the folder using Get Data, from the home ribbon, select Folder, click Connect to load the files from a folder.
With all the files loaded right click on the newly created table found on the left side nav bar in the Queries bar, and select Reference. This will use the folder location as a source. Then you can select each file and begin shaping / transforming the data for each file. Then when you refresh the queries as long as your folder has all the files you transformed it will automatically refresh your data.
Hope that helps,
After adding a new file to the folder, the refresh is loading al the files. So this takes a long time for the refresh.
Is it possible to only refresh with the new file in the folder and not al the files.
Thanks in advance for the help
Sadly, at this time there is no way to refresh only the updated files. When you hit refresh it refreshes everything. However, if you are sharing your data via the powerbi.com service you can schedule refreshes if your files are stored in an SharePoint site (can’t be an on-premise version of sharepoint, must be cloud hosted).
This is very interesting. I have a question about a scenario where I want to use POWERBI for data analysis across different sheets. I have multiple folders ( by project names) on a shared server on company network. Each folder will have a specific excel sheet but would be a different name for each project ( name goes with project number which is unique). However, I can make a sub folder inside each project folder with a unique name such that all the project folders (there are 100s of these), will have same name folder and each folder will have only one excel. inside each excel there will be specific set of data , arranged in a specific way (Rows and columns, with same headers etc) and with different values, of course. I want Power BI dashboard to automatically capture data from all these excelsheets, and create the dashboard, capturing those unique values. when i create a new prohect folder on that specific drive and put a new sheet inside the subfolder, it should automatically capture data from the new sheet and add to the dashboard. Is it possible? what’s the optimized way to fulfill this requirement?
I would try to make a common name for each file to start with, something like “Dashboard-project-date.xlsx” Then you can point to a folder as illustrated in this post: https://powerbi.tips/2016/06/loading-data-from-folder/
However, you might have to write some complex functions that search though the various sub folders. This isn’t ideal, and it would be prone to errors. An alternative that I have used historically was to put all the Excel files into a common folder, this would be for all projects. Then in each project folder, create a shortcut that would open the actual file which would be stored in the “collective” folder. This works well, since you can have multiple links to the same “source of truth” file, and all the master files live in one folder in one location which is easier to deal with when loading data in Power BI.
I am not using Excel (XLSX) files, but CSV. However, each file has a data year embedded in the header, so every time I try to combine the files, I get an error. The data is the same, but the headers are different. How can I create a single header for all CSV files in the folder? As an extension of that, I need a custom column that pulls the file name, because I need it to identify the year of each file. Any suggestions?
I think you are trying to accomplish the following:
In order for this to work properly you will need to make sure each file has the same headers. The query editor will not be able to load multiple files with different headers.
To include each file name with each file load, you would start with Get Data -> then select folder -> Select the folder where the CSV files are stored -> Click Edit the query -> Filter the Extension down to csv files -> In the Content Column, Click the Double Down arrows button (Combine Files) -> A window prompt will come up click OK -> then all your csv files will be loaded with the Source.Name ( the File locations )
Hi! Carlo, thanx for all u doin,
I found a trick to promote headers when adding the custom colonm without calling the ‘Table.Promoteheaders’ function. Just use the formula like below:
Awesome, Thanks for the suggestion!! I just checked it out. Works like a champ!
Hi Mike I have slightly different question…I have a Folder full of Files for each day that contains a list of open order as of that day. So my Folder contains file names that are “Open Orders – 2017-09-08.csv” and “Open Orders – 2017-09-09.csv”, etc. I want to look down that list and only bring in the file that is the same day but for the prior year. So on 9/8/2018, I want to pull in the “Open Orders – 2017-09-08” file in order to see if our current open orders are ahead or behind last year at this time.
There are just too many data sets to bring them all in at once and then filter them out, afterwards so I’d like to filter them out pre-upload.
I would recommend make one data set that loads all the files from the folder. Then Parse the names of those files so that you can filter down to the one file that you need for 2018. In a separate table do the same for 2017, With only 365 days in a year, it should be fairly easy to load the names of the files for 2 years and then filter down to only the file name that you need.
Once you have narrowed down the files, you can do one of the following:
1. invoke a custom function that loads and parses the data (this way you can use the same function for both files, one for 2018, and one for 2017)
2. Follow this procedure to load the file: https://powerbi.tips/2016/08/load-multiple-excel-xlsx-files/ (you can customize this to suit any variations between formats in files from 2017 to 2018)
Once you have loaded the data from each file, then I would merge the two files together into one data table. Thus, providing a complete dataset ready for measure creation and report development.
Very Nice Explanation Mike. Thanks to you.