14
The use of the Power Query / Get & Transform tools in Excel

The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

  • Upload
    vuphuc

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

The use of the Power

Query / Get & Transform

tools in Excel

Page 2: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

1. Introduction

1.1 Not just data analysis The tools that were formerly part of the Power Query Add-in and now, in Excel 2016, form the Get &

Transform group of the Data Ribbon tab, have the potential to change the way many spreadsheet

models are constructed and, in so doing, to avoid some existing sources of risk and error and

introduce new ones.

Although the tools have an obvious role in data acquisition for data analysis and business

intelligence, they are also capable of being used to replace an extensive set of current grid and formula-based spreadsheet techniques.

Whether or not we choose to use these new tools and approaches, with Get & Transform now being

an integral part of Excel, it is inevitable that many users will adopt them. Consequently, an

understanding of the way in which the tools work is likely to be important in ensuring that research continues to reflect how spreadsheets, and Excel spreadsheets in particular, are used in practice.

The Power Query tools might also provide an additional method of checking some types of

spreadsheet by allowing calculations to be performed in a different way, with a comparison highlighting discrepancies.

The situation is complicated by the current state of development of the tools in question. Unlike

many of the Excel functions and formula constructions that we have grown used to working with

over a long period, Power Query is just a few years old and continues to evolve quite rapidly.

Changes are frequent and many of the possible techniques are yet to be subjected to extensive

exploration and testing in practice. Consequently, at this stage, we are likely to be asking more

questions than providing definitive answers.

As an introduction to these possibilities, the presentation will cover the use of the Power Query tools to replace some 'standard' Excel techniques.

1.2 Power Query in practice Power Query works in a very different way to 'traditional' Excel. Rather than using the Excel grid and

formulae, it uses commands, usually entered via the user interface, to create a series of steps which

process the source data into an output table of data. Each step usually uses the output of the

previous step as its starting point. Refreshing the query processes all of the steps in turn. If you click

on a step, the query preview displays the result up to that step. Clicking on the ‘gear’ icon for a step allows you to edit the step:

Page 3: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

Behind the scenes, the interface is creating code, much like recording a macro. You can see the code

(known as 'M' code) by clicking the Advanced Editor command in the Query group of the Home

Ribbon tab:

Power Query includes a very wide range of commands to process and transform the data. Custom

columns can be added with the results calculated using Power Query functions. Although the way

these functions work is similar to Excel functions, the function names are different and, in particular,

they are case sensitive. If a query name is entered using incorrect case, the formula will return an error:

Page 4: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

A key difference between a formula-based approach and the use of Power Query tools is the need to

refresh queries. Just as for PivotTables, recalculation is not automatic and, although queries can be

set up to refresh at defined time intervals, users will need to adapt to a situation where recalculation is periodic or manual, rather than based on the recalculation chain.

2. Append Tables

2.1 Problem The task here is to turn a variable number of Excel Tables, each containing a variable number of rows, into a single table making it possible to easily base calculations on the consolidated data.

2.2 Excel content as data source As well as an extensive range of external data sources, Power Query can use Excel workbook

contents as a data source. This can be data held in an Excel Table, a named range or just the used

cells in the worksheets in Excel workbooks. In this example we will just demonstrate the technique using some Excel Tables in a single worksheet:

Step 1 is to make each of our Tables the source of a separate query. We can do this by clicking in

each Table and using the From Table command:

Page 5: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

We can then 'Close & Load' our date to a range of different outputs. In our case we just want to

create a connection to use in a future step:

Having repeated this for the other three Tables, we then use New Query, Combine Queries, Append:

Page 6: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

You will notice that the help text says 'Append two queries from this workbook'. In fact, the April

2016 update has introduced an option to append multiple queries in one go:

Where our Table column headings are consistent this will create a single, consolidated Table which can be loaded to an Excel Table as shown here:

Page 7: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

When we change or add to data in any of our Tables we can manually refresh our queries, or we can

set our 'Appended' query to refresh every so many minutes using the Data Ribbon tab, Connections

option:

Page 8: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

3. Lookup and reference functions

3.1 Problem VLOOKUP().

Enough said.

3.2 Merge Queries We have just seen the Append option of the Combine Queries command, but there is also a Merge

option that allows us to establish database-type relationships between queries. This allows us to

replace the use of many different kinds of lookup operations that would normally require the use of

formulae in multiple cells with a single join operation. Merge allows for an extensive range of Join Kinds:

This example is very contrived in order to compare speed of operation. We have a list of over 1

million IDs with corresponding values. We are using the exact form of VLOOKUP() to return the

values for 10,000 IDs. The relationship between the time taken to recalculate and the position of our match value in our base table is close to linear.

Page 9: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

Using Combine, Merge replaces the 10,000 individual VLOOKUP() formulae with a single join

operation and can significantly improve the speed of calculation:

3.3 Approximate lookup This is a bit more controversial. It is certainly possible to use the M language to create an equivalent

of the approximate lookup1 but an alternative approach could be to create a Full Outer join between

the data table and the lookup table and then to create a custom column that uses the lookup table

code where there is no match with the data table code. In this example we want to report on our individual balances by category. A simplified coding chart allocates codes to categories:

1 http://www.excelguru.ca/blog/2015/01/28/creating-a-vlookup-function-in-power-query/#comment-267328

Page 10: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

We can use our two Tables as the source for two separate 'Connection Only' queries and then Merge

them using the Code columns as the join. By using a Full Outer join, any codes that are in the Coding

Chart Table but not in the Balances Table will also be included. This is vital because, to perform the

approximate lookup we are going to sort by code and then use Fill Down. If a code in the coding

chart isn't matched and we only use the codes in the Balances Table for the Fill Down, balances with

codes between the missing code and the next matching code will be allocated to the wrong category.

4. Case Study

4.1 Introduction This is one of the techniques used in a case study that uses the From File, From Folder data source to

consolidate lists of balances held in a set of Excel workbooks stored in a particular folder. The idea is

to allow a workbook containing a simple list of codes and values in one or more sheets that start

with 'Data-' to be created in, or moved to a particular folder, and for the balances to then be

automatically incorporated into a consolidated set of financial statements with no further manual

intervention:

Page 11: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries
Page 12: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

In our case study, our coding chart includes the range 280-319 as Telephone and communications:

However, none of our balance values have been allocated to code 280 so, without the full outer join,

the 280 row would not be included in our table and our Fill Down operation would fill all codes from

270 to 319 with the Advertising category:

Page 13: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

Note: I wouldn't claim to have exhaustively tested this approximate lookup approach so would be grateful for confirmation, or otherwise, that it contains no devastating logical or practical flaw.

5. Grouping

5.1 Problem Multiple complex conditional sum expressions.

5.2 Group By Power Query allows the grouping of query records by multiple fields and then the choice of

aggregate operation for multiple columns. Using our case study example again, we have grouped our

balances by our Coding Chart categories and summed the Value column:

Page 14: The use of the Power Query / Get & Transform tools in · PDF fileA key difference between a formula-based approach and the use of Power Query tools is the need to refresh queries

This gives us a simple table that we can use with a straightforward SUMIF() or SUMIFS() function to populate a summary report: