Much thanks must go to both Ferry Bouwman and Rui Romano for inspiration and building off the work they have done already for the use cases for the Scanner API. Ferry created the initial Microsoft Flow to automate the request and pull of data, and design ideas were taken by Rui Romano’s existing Report. Please give them a shoutout because this would not be possible without them!
Recently the Power BI team announced a major advancement in the ability for Power BI admins to extract tenant-level metadata with the ability to collect information such as tables, columns, measures, and DAX expressions in datasets in the Power BI Service. This feature is a huge step and something that any Power BI Champion should strongly focus on the use cases and integrating this solution into their catalog.
Let’s start with the what and the why of using the Scanner API as a Power BI Admin.
What is the Power BI Scanner API?
The Power BI Scanner API allows organizations to request and collect the entire metadata of a tenant’s Power BI schema and catalog. Using the Power BI REST API, users can push a scan and more importantly extract nearly all of a dataset’s information and schema. The Scanner API returns the entire tenant metadata such as:
- Datasets & Workspaces
- Data Sources
- Dataflows
- Dataset
- Tables
- Columns
- Measures, including the actual DAX expressions
- Table M Expressions
Why Use the Power BI Scanner API
The ability as a Power BI Admin or champion to consume and understand the datasets and information of their tenant is vital both from an Governance and Adoption perspective. Firstly, the Scanner API enables admins to discover and easily understand the workspaces, measures used, and what datasets are active in their tenant. Rather than relying on various methods of manual input of datasets into a system, the automated fashion to pull in this information positions admins to better enforce and manage the organization of datasets.
Governance
Along with dataset information, the newly updated Scanner API pulls in dataset metadata which creates more potential of how to better govern and unify the business logic used in datasets. A primary use case is to ensure that datasets and the tables being used are using the proper logic (columns, data sources, merges) by easily viewing the M code behind any table dataset. In the same fashion, champions can now ensure that datasets are 1) using Explicit Measures in their reports, and 2) those measures which are universal to the company are using the correct formulas (think Net New Members in multiple reports, ensuring that the correct relationship for date and Member ID is being used).
Adoption
There are many workarounds in the community to best provide discoverability of data for users. Unfortunately, many of these require manual input and do not synchronize with one’s active data. Using the Scanner API, admins can create automated solutions to easily provide datasets that are active for users to discover, and further can be integrated with other platforms to include custom fields.
One idea is creating a Data Lexicon for an organization, which includes a company’s report catalog and terminology. A Data Lexicon should include helpful content for consumers, such as a report’s purpose, intended audience, and refresh schedule. Using the Scanner API, anytime a dataset is added to the tenant, report authors can easily integrate these custom fields with active datasets.
Understanding the Goal
This article is not going to cover the intricate details of the API requests and parameters. Rather, the TL;DR version of the required calls / steps of the API are:
- Call the Scanner API to trigger a Scan
- This call must include a body of what Workspaces to be scanned
- If more than 100 workspaces, than loop through the request (limit per call is 100 workspaces)
- Wait until a Scan is completed (depending on how many scans)
- Get the Scan Result and collect the array as JSON
The goal here is then to try to accomplish the following:
- Create an easy-to-use process to automate the API workflow
- Store the scan results into a JSON file into SharePoint
- Transform the metadata into a structured model (Tables, relationships, etc)
- Use the structured tables in multiple products (Data Flows, Power BI, Power Apps)
Building the Solution
The majority of credit needs to go to Ferry Bouwman who initially created a viable solution that can easily be integrated into a report. He created a GitHub repo that included a Power Automate flow that truly covers the entire process of automating the API call.
The following is building off Ferry’s solution, including the new metadata schema that is now available. There is more that I want to accomplish in this solution, but to get the Scanner API and a template to connect to the data, you can do so using the steps below.
Pre-Requisites Before Use
Before starting, you must have already completed the following in order to use the Scanner API at all. Please see the documentation for each to set up:
- Enable Service Principal Authentication for Read-Only Admin API’s
- Create an Azure AD app
- Ensure there are no Power BI admin-consent required permissions on the application
- Create a client secret / key and copy the value for later use
- Create a new Security Group in AD, add the app to the group
- Enable Allow Service Principals to use read-only Power BI admin API’s in the Power BI Tenant
The Solution Bundle
The solution includes the following to implement:
- A Power Automate Flow that handles the entire API request and call
- A Scheduled Refresh Flow that refreshing daily and triggers the Flow above
- A Power BI Template report to connect to the metadata results
Download the Solution on GitHub.
Installing & Using
Import the API Scanner Flow
The first step is to import the Flow pbitips_ScannerAPI into your tenant. Once you do this, there are a few variables and actions to update before running.
- tenant: The tenant of your Active Directory
- clientId: The Client ID of your registered App
- clientSecret: The Client Secret value of your registered App
- SharePoint Library: What SharePoint library you want to save the files
- NOTE: Remember this location as it will be used in Power Query
- Folder Location: The folder location to save all returned scans
- NOTE: Remember this location as it will be used in Power Query
- Folder Location Trigger: A different folder with a different name, to trigger the refresh run.
Set up the Automation Flows
The next part is we want to set up the automation of the Flow, so that it triggers on a daily basis, or even a manual basis.
Import the Flow PBI_Scanner_Refresh into Power Automate. Once imported, you will need to grab parts of the initial Flow’s HTTP trigger and add them to the variables in the PBI_Scanner_Refresh Flow:
- Initialize string URI-first-part: The first part of the HTTP Request Received, everything from the start up to modifiedsince/.
- Initialize string URI-last-part: The parameters. Simply copy from the ? part of the URL to the end
- Initialize string modifiedSince: write all
Additionally, The Power BI Template also includes a visual to trigger the Flow within the Report. You can simply copy and paste the variables and the HTTP Call other flow with all Power BI API logic actions using the When a Power BI Button was clicked as the trigger.
Run the Flow: Ensure It is successful & saves the files
Run the flow manually. Note that the first time you ever call the Scanner API, it will return a subset of the metadata. The more that you run it (daily) the more complete metadata will be returned.
Once you can confirm that 3 files have been saved to the folder specified above (a MetaData_, a WorkspaceArrary_, and RequestStatus_ json file), you know the Flow works.
Once you have verified the flow runs and saves to the correct file, you are ready to start using the Power BI Report.
Connect to the Data – Power BI Template
Using the Scanner Tenant Metadata Power BI Template file, open it and it will prompt you to input two parameters.
- SharePoint Folder: The SharePoint Document Library url specified in the variable from the Flow
- FolderFiter: The deepest subfolder that the files live (for example, if the files live in PBI_Admin/MetaData/MetaFiles/, then enter in “MetaFiles“)
Once you enter the parameters, click on load, and wait for the magic to happen!
Using the Scanner API Report
The Report contains tons of information across the entire organization’s content in Power BI. From Datasets all the way to the DAX expressions per table and report. The template and report is meant to be a starting point for authors to further build out additional functionality to meet their needs. Much thanks to Rui Romano’s Scanner Template as well:
Summary Page
The Template starts with the Summary Page, providing a high level overview of Workspaces and Datasets active in the tenant. Included in the high level overview is the created date of a particular dataset, the number of tables, data sources, and users who have access to it.
Selecting a dataset will highlight a drill through button to navigate to a detailed dataset page.
Dataset Drill through Details
The drill through page for a dataset provides vital information such as the tables, fields, and even the query and DAX expressions within a dataset. Along with this, an information panel of the ID, storage mode, and even users is available here.
Selecting a table will display the M query in it’s entirety. Expanding the Measures & Calculated Columns displays the DAX expressions beneath it. Along with this, the list of data sources by type is available.
Datasets Page
The Datasets page is a overview showing the number of entities (columns, calculated columns, and measures) within a dataset, including what Data sources are being used. Tracking datasets by created time is a helpful feature allowing admins to monitor the creation of new datasets overtime.
Tables Summary Page
Tables allows admins to monitor what tables are being used throughout the tenant’s datasets. This a powerful feature, allowing admins to monitor tables that may be used across datasets.
Data Sources Page
Looking at the metadata in another way, admins can monitor the type of datasources used throughout the tenant, including the information such as the data source type (SharePoint, SQL, etc) and even the source. Selecting a datasource will display what datasets they are included in.
Users Page
The Users page is using the new ability to append to the Scanner API metadata, getArtifactUsers=true, to pull what users have access to various datasets. Again, the ability to select and display is a powerful feature for Admins.
Details Page
Understanding needs to get the metadata displayed as a list, the Details page provides all of the underlying information about each artifact in the tenant, such as the ID’s used in Power BI, types, and who last configured an entity.
Conclusion
The ability for a Power BI champion to have full visibility into the organization’s Power BI content has and will be a vital piece of adoption and governance of Power BI. The amount of information available and to act on will allow admins to readily understand the activity happening at all times.
You can find the full solution here:
This template is just a starting point. The community here should be able to take this and expand on this, and please provide your suggestions to the GitHub Repo here:
Again, many thanks to Ferry Bouwman and Rui Romano for building the foundation.
If you like the content from PowerBI.Tips please follow us on all the social outlets. Stay up to date on all the latest features and free tutorials. Subscribe to our YouTube Channel. Or follow us on the social channels, Twitter and LinkedIn where we will post all the announcements for new tutorials and content.
Introducing our PowerBI.tips SWAG store. Check out all the fun PowerBI.tips clothing and products:
Check out the new Merch!