data blending tableau что это

Blend on Summary Data

Data blending is a method for combining data. Data blending works by supplementing the data in the primary data source with the data in the secondary data source.

When one of the data sources that you’re working with is large, you can reduce query time by using data blending to combine data in the primary data source with summary data from the secondary (large data source).

To blend on summary data, you can do the following:

Step 1:В Create a relationship between the primary and secondary data sources

Click Download Workbook in the upper-right corner.

In the Relationships dialog box, click Add again.

Step 2:В Bring summary data from the secondary data source into the primary data source

In the calculation dialog box, do the following:

This formula pulls in just the sales data from the large secondary data source.

In the calculation dialog box, do the following:

In the name text box, enter Total Sales (North and South).

In the formula text box, type the following:

IF ISNULL (SUM([Sales Total])) THEN 0 ELSE SUM([Sales Total]) END

Step 3:В Compare primary data source data to summary data of secondary data source

From the Data pane, drag Order Date to the Columns shelf and Prod 1 to Rows shelf.

Drag Measure Names to the Rows shelf, and then to the Filters shelf.

From the Data pane, drag Measure Values to Text.

The final result is a view that summarizes primary data source and secondary data source data on the Order Year and product category.

Источник

Blend Your Data

Data blending is a method for combining data from multiple sources. Data blending brings in additional information from a secondary data source and displays it with data from the primary data source directly in the view.

There are several ways to combine data, each with their own strengths and weaknesses.

Relationships are the default method and can be used in most instances, including across tables with different levels of detail. Relationships are flexible and are adaptable to the structure of the analysis on a sheet by sheet basis. However, relationships cannot be formed between tables from data sources published to Tableau Server or Tableau Online.

Joins combine tables by adding more columns of data across similar row structures. This can cause data loss or duplication if tables are at different levels of detail, and joined data sources must be fixed before analysis can begin.

Blends, unlike relationships or joins, never truly combine the data. Instead, blends query each data source independently, the results are aggregated to the appropriate level, then the results are presented visually together in the view. Because of this, blends can handle different levels of detail and working with published data sources. Blends are also established individually on every sheet and can never be published, because there is no true “blended data source”, simply blended results from multiple data sources in a visualisation.

Data blending is particularly useful when the blend relationship – linking fields – need to vary on a sheet-by-sheet basis, or when combining published data sources.

Important: Prior to version 2020.2, data blending was often the best way to handle data sources at different levels of detail. These can now be combined with relationships. Relationships have fewer technical limitations than data blending and are the recommended way of combining data when possible. Blending is only encouraged when it is the best method for your data or relationships are not available.

Steps for blending data

Data blending is performed on a sheet-by-sheet basis and is established when a field from a second data source is used in the view. To create a blend in a workbook already connected to at least two data sources, bring a field from one data source to the sheet – it becomes the primary data source. Switch to the other data source and use a field on the same sheet – it becomes a secondary data source. An orange linking icon will appear in the data pane, indicating which field(s) are being used to blend the data sources.

Note:В images in this topic have not been updated to reflect the most current UI. The Data pane no longer shows Dimensions and Measures as labels.

Tip: Adding another connection to the first data source enables relationships and joins on the Data Source page. Blending requires two or more distinct data sources, listed independently in the Data pane.

Understand primary and secondary data sources

Data blending requires a primary data source and at least one secondary data source. The first data source used in the view becomes the primary data source and defines the view. This can restrict values from the secondary data source – only values that have corresponding matches in the primary data source appear in the view. This is comparable to a left join.

For example, if the primary data source has a Month field that only contains April, May and June, any view built around months will only display April, May and June, even if the secondary data source has values for twelve months. If the desired analysis involves all twelve months, try switching which data source is primary by rebuilding the sheet and using the other data source first.

The following examples below use the same data sources linked on the same field, and the viz is built the same way both times. The difference between the results is due to which data source is designated as the primary.

Work across blended data sources

Due to the nature of a data blend, there are some things to keep in mind when working across blended data sources.

Performing calculations with fields from more than one data source can be slightly different than an ordinary calculation. A calculation must be created in one data source; this is indicated at the top of the calculation editor.

In addition to handling calculations slightly differently, there are some limitations on secondary data sources. You may not be able to sort by a field from a secondary data source, and action filters may not work as expected with blended data. For more information, see Other data blending issues.

Define blend relationships for blending

In order for Tableau to know how to combine the data from multiple sources, there must be a common dimension or dimensions between the data sources. This common dimension is called the linking field. Active linking fields are identified in the Data pane for the secondary data source with an active link icon () and potential linking fields are identified with a broken link icon ().

For example, in a blend of transactional and quota data, a geographic field might be the desired the linking field so you can analyse a region’s quota and performance towards that quota.

Note: In order for the blend to function, the linking fields must also share values or members. Tableau builds the view of blended data based on the shared values. For example, if Colour is the linking field in both data sources, Tableau will match data on «Purple» from the primary and «Purple» from the secondary. But «Lt. Blue» will not map correctly to «Light Blue», so one of them would need to be re-aliased. Like renaming fields to help Tableau identify linking fields, you can edit aliases for members in those fields. For more information, see Create Aliases to Rename Members in the View.

Establish a link

If the linking field in the primary and secondary data sources have the same name, Tableau automatically creates the relationship. When a primary data source has been established (that is, a field is in use in the view) and the secondary data source is selected in the Data pane, any fields with the same name between the two data sources will display a link icon ( or ) in the secondary data source. If the related field from the primary data source is used in the view, the link becomes active automatically.

If there are no link icons on the secondary data source, you may need to help Tableau establish the link in one of two ways:В

If common dimensions don’t have the same name (such as «Region» and «Sales Region»), renaming one will let Tableau identify them as common dimensions and establish the link.

Alternatively, you can manually define a relationship between the fields in the primary and secondary data sources. See below for more information on creating manual link relationship

There can be as many active or potential linking fields as necessary. Click the broken link icon () in the data pane to make the relationship active.

If your common dimensions do not share the same name, you can map the relationship between them manually.

In the Blend Relationships dialog box, verify that the primary data source is selected from the Primary data source drop-down list.

Читайте также:  что делает арбидол в организме человека

In the Add/Edit Field Mapping dialog box, do the following:

Select a field from the primary data source.

Select a field from the secondary data source to establish the linking field or the blend relationship between the data sources even though the fields do not have the same name.

In this example, a mapping between Segment and Cust Segment is created.

Tip:В For dates, the relationship can be specified precisely. Expand the date field and select the desired aspect of the date, such as exact date, month, year, etc.

Multiple links

As with relationships or joins, there are times when the links between the data sources are defined by more than one field. For example, if regional sales quotas are monthly, a blend between transactional sales data and quota data needs to be established on both region and month for the correct data to be brought together in the view. Multiple links can be active at the same time.

When data is blended based on multiple fields, values are included in the view only where the combination of data from those fields matches across both data sets. Let’s look at an example to understand this.

We have two tables, one for birds that were actually seen by birdwatchers, and one for birds that were reported as seen.

and

If we set up a blended view with the fields Birdwatchers and Number of birds from the primary data source (Birds seen)В and bring in the field Number of reports from the secondary data source (Birds reported), Tableau automatically blends on Birdwatcher.

Note:В images in this topic have not been updated to reflect the most current UI. The Data pane no longer shows Dimensions and Measures as labels.

We see that birdwatcher A saw three birds and made two reports, B saw four birds and made one report, and C saw 8 birds and made two reports.

But there’s another possible linking field, Species seen. Why not blend on this, too? Will it make a difference?В

It makes quite a dramatic difference. Now we see that there is only one report each for birdwatchers A and C, and B has a null. What’s going on?В

It turns out that these birdwatchers aren’t very honest. When they only reported their sightings based on a journal entry (blue bars in the image above), the species they reported seeing didn’t match what they actually saw (note the nulls in the second column, from the Birds seen secondary data source). When they backed up the report with a photograph (orange bars), they were honest (both columns of Species seen match). Because three reports did not match on species, those rows of data were dropped when the Birdwatcher and Species seen fields were used as a linking field. The view only shows data where values for both linking fields match.

Be careful when linking on multiple fields. Although it can be very easy to click the icon and establish an active link, over-linking or linking on undesired fields can have a serious impact on the analysis.

Differences between joins and data blending

Data blending simulates a traditional left join. The main difference between the two is when the aggregation is performed. A join combines the data and then aggregates. A blend aggregates and then combines the data.

Left join

When you use a left join to combine data, a query is sent to the database where the join is performed. A left join returns all rows from the left table and any corresponding rows from the right table. The results of the join are then sent back to Tableau and aggregated for display in the visualisation.

A left join takes all rows from the left table. The common columns are User IDВ and Patron ID; where there is corresponding information from the right table, that data is returned. Otherwise, there is a null.

Suppose you have the same tables, but flip the order. This new left join produces different results. Again, a left join takes all the data from the new left table, but essentially ignores a row from the right table. The row of data for User not included because there is no row for Patron in the left table.

Data blending

When you use data blending to combine data, a query is sent to the database for each data source that is used on the sheet. The results of the queries are sent back to Tableau as aggregated data and presented together in the visualisation.

Note:В Aggregating measures is straightforward – we can take the sum, average, maximum or other aggregation of a number with ease. Measure values are aggregated based on how the field is aggregated in the view. However, all fields from a secondary data source must be aggregated. How does that work for dimensions?В Dimension values are aggregated using the ATTR aggregate function, which returns a single value for all rows in the secondary data source. If there are multiple values contained in those rows, an asterisk (*) is shown. This can be interpreted as «there are multiple values in the secondary data source for this mark in the view».

The view uses all values from the primary data source (functioning as the left table) and the corresponding rows from the secondary data source (the right table) based on the linking field(s).

Suppose you have the following tables. If the linking fields are User IDВ and Patron ID, not all values can be a part of the resulting table because of the following:

A row in the left table does not have a corresponding row match in the right table, as indicated by the null value in the results.

There are multiple corresponding values in the rows in the right table, as indicated by the asterisk (*) in the results.

When measures are involved, they are also aggregated, as seen below:

Important: an asterisk (*) in a view with blended data indicates multiple values. This can be resolved by ensuring there is only one matching value in the secondary data source for each mark in the primary data source, potentially by swapping the primary and secondary data sources. For more information, see Troubleshoot Data Blending.

Источник

Tableau Data Blending – the Ultimate Guide

Data blending is a way to combine data in Tableau. Blending provides a quick and simple way to bring information from multiple data sources into a view. It can be misunderstood, but, when using a data blend correctly, it is an efficient way to merge data sources in Tableau.

Many Tableau developers find data blending frustrating. It has a few limitations which aren’t always clear, however, once data blending is understood, it is a valuable part of the Tableau toolkit. Some of the data blending limitations are covered below.

This guide to Tableau data blending covers:

Data blending can be very useful, but can also be problematic. When used well it provides a simple way to add additional data to a dashboard. When used incorrectly it can bring down a Tableau Server.

With data extracts, there’s no alternative but to blend data sources. It isn’t possible to join a Tableau data extract to another data source, therefore, when using extracts, data blending is the only option.

I use data blending in some articles on other topics. For example, I used it in the post showing how to create panel charts in Tableau, where the abbreviated State Name was blended in from a different data source.

Also, in the post demonstrating how to do date scaffolding in Tableau, one of the techniques showed to how to scaffold data using a data blend.

It’s also useful when creating filled maps using a shape (kml) file. In this post showing coronavirus in England by local authority, the primary data source is the kml file, and the coronavirus numbers are blended in from a separate data source.

Is a data blend the same as a join?

A very common question about data blending, is whether a blend is the same as a join? Data blending is hard to define. It’s a type of left outer join…but it’s not a proper join. Blending is like a flexible left join, best used to join in measures from another data source – dimensions often give an asterisk instead of an expected value.

A key difference between a data blend and a join is the order it occurs. With a join, the data is joined and then aggregated.

With data blending, the data is aggregated and then joined.

If you have a spare hour, Jonathan Drummey, one of the foremost experts in Tableau, presented this video on how a data blend is and isn’t like a left join.

As with a database join, relationships should be exist between some fields, connecting the data sources. Similar to joining TableA.Field ON TableB.Field, for example.

However, the connections between the fields aren’t enforced in the workbook. There’s a chain symbol showing whether the connection is active in the worksheet. If it’s orange then it’s connected; if grey it’s not active.

Читайте также:  что делать если в патенте ошибка в фамилии

The chain links indicate whether a relationship is active in the worksheet

Also, standard joins happen within a data source. This is a big difference to blending.

Data blending happens within the worksheet, not the data source.

The connecting fields are activated within each worksheet – so the blended data sources can join on different fields in different worksheets within the same Tableau workbook.

Standard joins occur at the record level, while data blending happens after the numbers have aggregated. This is a key difference between a join and a blend. A join joins the records then aggregates; a blend aggregates then joins.

Do the new Tableau Relationships make a data blend obsolete?

The simple answer is “no”. Tableau Relationships don’t make data blending obsolete.

When using published data extracts, which is normal in enterprise environments (it helps create a secure single source of the truth), blending is the only option to bring in additional data. (Apart from altering the published data source, of course.)

Therefore, data blending still has a prominent place in the Tableau ecosystem, and is important for developers to understand.

However, Tableau relationships have made blending more complicated in certain circumstances. If the secondary data source contains relationships, this doesn’t work in a data blend. If the pills in the workbook are red, without any apparent explanation, check the secondary source uses relationships, in which case the blend will fail.

Note, a Tableau data source using relationships still works as a primary data source; it only fails as a secondary source.

Recommendations for Tableau data blending

From a high level, as data blending can provide an output that may not be fully expected, try and follow these recommendations:

How to set up a data blend

Firstly, the Tableau workbook has to connect to at least two data sources.

To set up the relationships, on the menu bar, Data – Edit Blend Relationships.

Edit Blend Relationships from the Data option on the menu bar

Tableau will guess if there are related fields, this will be those in the Automatic relationship setting.

Set the blend relationships

When blending data sources in Tableau, it may be necessary to alter the “Automatic” relationships. To set the blend relationships manually, click Custom and amend them.

When data blending with dates, often it will be necessary to change the default relationships.. With dates, the relationship should be set up at the correct “level” in the date hierarchy. The correct date part should join with the corresponding date part.

Defining the relationships doesn’t enforce the relationship. The “joins” are activated within the worksheet by clicking the chain link symbols in the secondary source.

Click the chain link to activate/deactivate the relationship in the worksheet

Within the same workbook it’s possible to have the same data sources connected by different join fields in different worksheets.

An orange chain link means that specific join is active within that worksheet. The broken chain shows that join isn’t active within that worksheet.

To switch the connection on or off, click the chain symbol.

Tableau data blending with calculated field

There are rare situations where the requirement is to blend on a calculation, on a measure. An example, perhaps sales progress to a target impacts the bonus calculation – i.e. 85-95% of target may equal Bronze, 95-105% may be Silver, etc. This calculation vs target may depend on the data in the view, so needs calculating within Tableau.

There are times where blending in the results is the best option, especially when the outcome depends on the result of the calculation.

It’s not possible to blend data on a measure, BUT it is possible to convert a measure to a dimension.

Make the measure calculation a FIXED calculation, and it can convert to a dimension…and then be available as a blending field. To convert to a dimension, drag the field from the Measure section to the Dimension section, or right click on the field and ‘Convert to Dimension’.

Key points about data blending

An important point about data blending, the relationships should always blend at the least-granular level. For example, if you have a file containing annual targets, blend on the year. Even if the annual targets are broken down daily, when reporting at an annual level, blend on the year, not the day.

Avoid blending on especially with large data sources. This can bring down your Tableau Server.

This is one of the key differences between a standard join. With a join, the aim is to join at the most granular level. With a blend it’s the opposite; blend at the least granular level.

If possible, avoid blending a large data set with a large data set, it’s likely the dashboard performance will be bad; potentially unusable.

In addition, the joins are case sensitive. Therefore, if blending text fields, it can be a good idea to convert them to upper case first using the UPPER function.

Also, a data blend is always at the worksheet level. The blend occurs only within that specific worksheet, it isn’t shared across data sources and dashboards. This means a data blend can’t be published as a data source for others to use. The blend applies to that worksheet only.

Primary and Secondary sources in data blending

When blending in Tableau, there is always a primary source and a secondary data source. It is important to understand which is the primary source, it can impact your view.

The primary source has a blue check, secondary sources have an orange check mark

To reiterate, which data source is primary and which is secondary does matter. As it’s a type of left join, all fields will be included from the primary and related from the secondary.

Fields from a secondary data source are indicated by an orange check mark

In a worksheet, the primary source is defined by the field that is first placed on to a worksheet. The data source of the first field becomes the primary data source, and other data sources will be secondary. Even if the worksheet is cleared, if a field was on that worksheet, the primary data source remains set.

To change the primary data source on a worksheet, create a new worksheet and start again.

Data blending example

To better explain Tableau data blending, we can use a simple example. Download this example from Tableau Public and experiment yourself.

Using the sample superstore data source and a dummy file containing a budget for every month/year for every country, we can report sales vs budget. Note the budget numbers are generated using a random number, so probably won’t make sense!

Comparing actuals vs a budget is a common ask, and works very well in a bullet chart. Create a bullet chart, then we can experiment with different joins, filters and formulas to see what works and what doesn’t.

Firstly, pull in the superstore file and the budget file as separate data sources. Next, set up the relationships.

Tableau will assume the relationships, but these aren’t always correct. Especially when blending dates, set up the relationships at the right level.

In this example, the orders are daily and the budget is monthly. Therefore the relationship should join the date parts month to month and year to year. For completeness, also join the MY date parts.

Remember, setting up these relationships doesn’t enforce them throughout the dashboard. Blend relationships are only activated within each individual worksheet.

So, we have the bullet chart, with the sales data being the primary source and blending in the budget numbers data from the secondary source.

To show the monthly sales numbers vs budget, the blend needs activating only between the date fields.

Notice the relationship is active only on the Month field

Activating the blend between country and region doesn’t break anything, but also doesn’t add anything – therefore, for this view, the geographic blends are not required.

Switch the dimensions to show Region sales vs budget instead of Year and Month. Now, for this to display the correct budget number, the join between Region OR the join between Country (which rolls up to Region) have to be activated.

With this join, the budget numbers are aggregated up to the Region level, ignoring anything to do with dates. The date component doesn’t need joining for this view. However, if the user is able to filter by date, that relationship would also need activating.

For this chart blending at the Region level is appropriate

If the Country is being compared against budget, the join needs to be activated against the Country. Activating against the Region only would return the budget at the region level, against each country – which isn’t what we want.

The image below shows the consequence of only joining the Region. The budget shown against each Country is actually the budget of the Region because of the join.

Читайте также:  какой иконе ставить свечку перед операцией

The active blend relationship is at the wrong level for this view

The level of the join fields is very important for 2 reasons:

Care is needed when defining the correct relationship for the view. Take an example where there is a budget for a Country (secondary source), but no sales have happened, meaning the country isn’t in the primary data set. If reporting at the Region level, while activating the relationship at the Country level, this budget would not appear in the budget numbers because that country isn’t in the primary data source.

However, activating the relationship at the Region level would show the correct budget for the region (assuming the country is in a region that’s in the primary data set – e.g. the country has made some sales).

This same problem would also exist for a standard join. The workaround in Tableau is to create a scaffold data source, so all possible combinations of required data are forced to exist.

Limitations of Tableau data blending

The following situations are commonly seen when data blending. There are often difficulties caused by the following:

Not all calculations work with a data blend

Some of the standard Tableau formula calculations don’t work with data from a secondary source.

They sometimes give no error message, until dragged into the view, when the pill turns red. Other times the error message can read “Cannot blend the secondary data source because one or more fields use an unsupported aggregation”.

Aggregations not supported by a data blend cause this error. The unsupported aggregations include:

Also, Level Of Detail (LOD) calculations can cause errors with data blending.

For example, if trying to FIX a value from the secondary source within a calculated field in the primary source, it will error. The error message will be:

“All fields must be aggregate or constant when using table calculation functions or fields from multiple data sources”

For example, creating this type of calculated field in the primary data source will error:

LOD calculations written in the primary data source using fields from the secondary source cause an error

As an alternative, build the FIXED calculation into the secondary source as a calculated field. Then reference that calculated field from the secondary source in the primary source:

Write the FIXED calculation in the secondary source and reference the calculation in the primary source

However, to work, it requires the connected blending fields to appear in the view. This could be the Detail shelf if you don’t want it visible, but without it, the calculation will error.

You can also consider using table calculations, instead of LOD.

The Tableau data blending asterisk

Very common when blending data in Tableau, the asterisk! This happens when bringing in a non-aggregate value – such as a dimension – from the secondary source and there are multiple values per joining field. Tableau doesn’t know which value to show, therefore, the asterisk displays.

The Country is from the secondary source

For example, if blending data sources on a Region, which is a higher level than Country, and using the Country from the secondary source. If there is only one country in a region, it will return that country name. If there is more than one country in a region in the secondary source data, Tableau can’t aggregate multiple members of a dimension, so will return the asterisk.

In the above example there are multiple countries per region, therefore, Tableau displays the asterisk.

The options to get rid of the asterisk of a Tableau data blend are:

Note this isn’t a limitation of a data blend, but the asterisk can confuse those not understanding it. Arguably the asterisk is even a good thing. A similar situation with a standard database join would create duplication in the records, meaning the totals inflate, which is not good.

Filtering with a data blend

How and what a filter does with a data blend depends on 2 things:

Filtering from the primary data source

It’s preferable to filter from the primary data source. These filters function like “normal” filters. For example, these filters can show “Relevant Values” and can apply to all of the worksheets on a dashboard.

To filter both data sources with a data blend, join the fields. Earlier in this article, I wrote always join at a high level as it gives better performance. Joining on the Region is the example. This is great, until you want to filter on Country or Date. If joining on Region, filtering on Country or Date won’t filter the secondary source as they are not connected on these fields.

Filtering from the secondary data source

It’s possible to filter from the secondary data source, however, it’s not recommended if avoidable. It has limitations. This filters only the data from the secondary source AND the primary source if there are connected fields.

Watch out for unexpected filtering of the primary source.

For example, if the primary and secondary are connected on Region and there is a filter on Country from the secondary source, this will filter the secondary source on the country (as expected), but it will filter the primary source on the connected field, the Region. Clearly, this behaviour is not wanted; it looks to the user they are filtering Country, but they are inadvertently filtering the Region of the Country in the Primary source.

There are other limitations. For example:

If there are items in the secondary source but not in the primary – for example, a country could have a budget but not yet made any sales – if the join includes the Country field, this budget won’t be included. Not joining the Country field would mean the budget is included in the aggregate budget.

To summarise, try and avoid filtering from the secondary source, unless it’s a data source filter.

Use a data source filter on the secondary source

I do recommend using a data source filter on the secondary source, should the data source contain data not needed. For example, perhaps there are certain years or certain business lines that are irrelevant for the dashboard. In which case, remove them at the data source level by using a data source filter.

Removing redundant data reduces the size of the data source and therefore reduce the number of queries…speeding up Tableau dashboard performance.

“View Data” with a data blend

When clicking the View Data symbol from the tooltip, the Summary data contains all fields in the view, including those fields from the secondary source.

All fields appear in the Summary – if they appear in the worksheet twice, they appear twice in the View Data: Summary

However, switch to Full Data and it no longer includes the data from the secondary source. Instead, it only shows data from the primary data source.

Dashboard performance when data blending

When blending larger data sources with, this can have a negative impact on performance. That statement will sometimes be an understatement – if blending large data sources at a granular level, the performance impact can be huge.

Blending data at a granular level using large data sources can even bring down a Tableau Server. Or if not bringing down the server, it can degrade performance for ALL other users of dashboards on the server.

Therefore, be careful with joins and data sources. The impact of getting these wrong can be large without you, as the developer, having any idea of the poor performance all dashboard users are experiencing.

As already mentioned in this article, join the data sources at the highest possible level.

Use Data Blending to create a fixed value without a LOD

A final thing to mention. Sometimes there’s a need to have a constant value, but that constant may sometimes need to change. For example, a constant could be sales from last year, but on 1st Jan each year, “last year” changes. Plus sales figures can alter retroactively for various business reasons.

If this slowly changing constant comes from a data source, it’s possible to blend it into a worksheet. To make sure the value doesn’t change, apart from when the underlying data changes, pull the value from a secondary data source, but make sure there are no active connections between the primary and secondary data sources. No active connections will mean the number becomes a constant.

This can be an alternative to a FIXED calculation, but it will perform better then FIXED with a large data set, meaning the dashboard is quicker.

Used right, data blending is powerful

Data blending is useful to have in your Tableau armoury. I have to confess, it’s not my favourite thing; it’s fraught with complications and I often find the behaviour frustrating. However, sometimes it’s the best way to achieve a desired outcome.

Источник

Сказочный портал