Power Query API Considerations

Computer API Image
API

This article is the second part in a series on API calls. It will look at some best practices and considerations when using API calls in Power Query. While it does not serve as a definitive list, it is an important start to consider performance when using such queries.

The articles in the first part of the series can be read here:

Historical Stock Price Function in Power Query

Average Household Income Function in Power Query

Performance Considerations

Use Batch Queries where possible

Batch queries refer to sending information in “batches”. Making an API call requires information to be sent to an external source, that information returned, then the information parsed and then loaded. Imagine we are doing this for thousands of different dimensions. Consequently, you will have the exact same process to be repeated this many times. Instead, check in the API documentation if batch queries are available. As a result, you may be able to send many of the dimensions in the same API call. This will drastically reduce the amount of times this process happens.

Use Data Factory

Data Factory is an Azure service that offers no/low code solution to Extract, Transform and Load (ETL) or ELT (Extract Load, Transform) processes. These are called “pipelines”. Pipelines are repeatable processes that allow you to copy and move data from one source to another (read the documentation here).

Let’s say you’re trying to load a high level of stock data for thousands of stocks. By doing this in Power Query may put big stress on your gateway. Your gateway may get overloaded and cannot handle sending so many complex API calls. It may be a better idea to load the data into a separate table (such as Azure Synapse). Then your power BI report can read this file directly. Read this documentation for a good overview of this architecture.

Shifting this to the Azure cloud can leverage Data Factory’s auto-scalability and ability to handle large volumes of data. This results in a more reliable and robust process.

Consider the most efficient design

Always consider the way that will send the least number of queries. For example, if you are using historical data that doesn’t change, think if you need to refresh this data every day. In addition, try and avoid sending the same information multiple times. Do queries off unique lists.

Only return the correct data

If you load data into Power Query from a source such as SQL Server and then remove columns, a process called Query Folding will take place. Essentially, the data isn’t even loaded into Power Query – it edits the SQL query to not include these columns.

This can only be done on certain sources. Custom APIs will not do this. Therefore, make sure you send the correct queries. Don’t return extra bits of data that is not needed and make sure you only return columns you will use.

Review

While this is by no way a definitive list, it should serve as a starting point to acknowledge performance considerations. Pay attention to how many queries are sent out and try to limit duplication. Remember, Power Query is a powerful tool, but make sure you are using the right tool for the job. Very large and complex operations can be improved with the help of other tools, such as Data Factory.

2 Comments

  1. Hi Steve, thanks a lot for such incredible articles. In your previous article you said, you will cover “large lists of stock symbols”. I have a large list of stock smybols that I cant invoke one by one. Is there a way to do this automatically? Thanks

    • There are services that offer batch processing – i.e. it returns many stocks at once (e.g. Alpha). You can search for “Stock API” and probably find some, but often will have a price attached. Here are my tips:
      – Look for apis that offer “Batch” processing. Check the limit of stocks sent per batch.
      – Also, check the cost for requests and calculate how much you queries will be
      – check rate limits – this means how many queries you can send. For example, a service may be free, but you can only send 5 requests every hour

      Otherwise, using python could be an answer, but obviously this has a steep learning curve!

Comments are closed.