What is the Zillow API used for?

Finding good data sets

To familiarize yourself with using Tableau Desktop (or creating sample proof of concept content), it's a good idea to find a dataset that interests you. When you have real questions that you want to answer with data, the steps of analyzing it become easier and more meaningful.

The reality of records

There are two inevitable facts about trying to find a record that is not official, commercially sanctioned data.

You won't find what you're looking for.

  • You should try not to make your desired concept overly specific.
  • Stay flexible and open about what can be used for a particular project.
  • Sometimes the data you want is behind a paywall. You have to decide if they are worth it to you.

You need to clean up the data.

What makes a good data set?

A good data set works for your purpose. As long as that need is met, it's a good data set. However, there are a few considerations that can help you weed out records that are unlikely to serve their purpose. By and large, find records that meet the following conditions:

  1. Contains the elements you need
  2. Has disaggregated data
  3. Has at least a few dimensions and a few metrics
  4. Has good metadata or a data dictionary
  5. Is usable (not in a proprietary format, too unstructured, or too cumbersome)
What makes superstore great?

Superstore is one of the sample data sources included with Tableau Desktop. Why is it such a good data set?

  • Required items: Superstore has dates, geographic data, fields with a hierarchical relationship (category, sub-category, product), positive and negative (profit) metrics, and so on. There are very few types of charts that you can't create with Superstore alone, and there are only very few features that cannot be represented with superstore.
  • Disaggregated: The row-level data is about every element in a transaction. These elements can be grouped at the order level (using the order ID) or according to any dimensions (e.g. date, customer, region).
  • Dimensions and measures: Superstore has different dimensions with which things like category or city can be broken down by "slice and dice". There are also multiple measures and dates, which opens up the possibilities for chart types and calculations.
  • Metadata: Superstore has well-named fields and values. You don't have to look up what values ​​mean.
  • Small and clean: Superstore is only a few megabytes in size, so it takes up very little space in the Tableau installer. It's also very decent data, with just the right values ​​in each field and a good data structure.

1. A good data set contains the elements that you need for your purposes

When looking for a record to create a particular visualization or to showcase certain functionality, make sure that the record has the field types you need. Cards are e.g. B. Great visuals, but require geographic information. For basic demonstrations, it is often necessary to drill down to drill down into the dates. Therefore, the data needs at least one date field (it would need to be more granular to see drill-down details than yearly). Not all records need all of these elements. So you should know what you need for your purpose and don't waste time on records that are missing key elements.

Common analysis elements:

  • Dates
  • Geographical data
  • Hierarchical data
  • "Interesting" metrics - significant differences in magnitude or positive and negative values

Some features or types of visualization may require specific properties of the data, for example:

  • Cluster
  • forecast
  • Trend lines
  • User filter
  • Spatial calculations
  • Certain calculations
  • Bullet Diagrams
  • Control charts

2. A good data set is disaggregated (raw) data

If the data is too aggregated, there isn't much you can do to analyze it. For example, if you want to find out what search trends are on Google for "pumpkin spice" but you have annual data, you only get a very general overview. Ideally, you could refer to daily data to see the steep surge when Starbucks started its #PSL offering.

What is considered disaggregated can vary depending on the analysis. Note that some data sets will never be more granular than a certain level due to data protection and practicality. For example, you are unlikely to find a data set of individual case reports on malaria cases, so the monthly totals by region could be granular enough.

Aggregation and granularity

Understanding the concept of aggregation and granularity is critical for many reasons. It affects things like finding useful datasets, creating the visualization you want, combining data correctly, and using LOD expressions. Aggregation and granularity are opposite ends of a spectrum.

The Aggregation refers to how the data is combined, e.g. B. all searches for "pumpkin spice" or the averaging of all temperature values ​​around Seattle on a certain day.

  • By default, metrics are aggregated in Tableau. The default aggregation is SUM. You can change the aggregation to things like "mean", "median", "count (unique)", "minimum", and so on.

The Granularity refers to how detailed the data is. What does a row (also known as a record) or a record in the record stand for? A person with malaria? The total number of malaria cases in the provinces for the month? That's the granularity. Knowing the granularity of data is critical to working with level of detail (LOD) expressions.

For more information, see the free aggregation and granularity training video (link opens in a new window) or the data aggregation in Tableau help topic.

3. A good data set has dimensions and measures

Many types of visualization require Dimensions and Key figures

  • Most of the time, if you only have dimensions, you are limited to counting, calculating percentages, or using the Number of Records field.
  • If you only have key figures, you cannot break down the values ​​according to any criterion. You can fully disaggregate the data or work with the grand total or AVG etc.

That is not to say that a dataset made up of only dimensions cannot be useful. Demographic data, for example, counts as highly dimensionally heavy data, although many analyzes around demographics are based on counting or percentage. However, for a more meaningful analytical dataset, you need at least a few dimensions and measures.

Dimensions and key figures, discreet and continuous

Notice in the above illustration that the numerical dimensionhas no aggregation on the Marks card, unlikeOngoing key figureandDiscrete key figure.

Dimensions and measures

Fields are divided into dimensions and measures in the data area. In Tableau, dimensions are included in the view as themselves. However, metrics are automatically aggregated. The standard aggregation for a measure is SUM.

  • Dimensions are qualitative, d. that is, they are described, not measured.
    • Dimensions are often things like city or country, eye color, category, team name, etc.
    • Dimensions are usually discrete.
  • Key figures are quantitative, i.e. H. they can be measured and (numerically) recorded.
    • Metrics are often things like sales, amount, number of clicks, etc.
    • As a rule, key figures are continuous.

If you can do calculations on a field, it should be a measure. If you're not sure whether a field should be a measure or a dimension, consider whether you can use the values ​​to make meaningful calculations. Is there any meaning in "AVG (RowID)" or in calculating the sum of two social security numbers or dividing a zip code by 10? No. These are dimensions that happen to be written as numbers. Think about how many countries have alphanumeric zip codes - they're just labels, although in the US they're only numeric. Tableau can recognize many field names that indicate that a numeric field is actually an ID or zip code and tries to create those dimensions. Tableau isn't perfect, however. Use the "Can I Expect?" Test to decide whether a numeric field should be a measure or a dimension, and rearrange the data range if necessary.

Note: Although you can calculate dates (such as the DATEDIFF calculation), dates are categorized as dimensions according to standard convention.

Discreet and continuous

Discrete or continuous fields, while somewhat aligned with the concepts of dimensions and measures, are not identical to them.

  • Discreet Fields contain different values. They form headers or labels in the view, and the pills are blue
  • Ongoing Fields "form an unbroken whole". They form an axis in the view, and the pills are green

A date field makes it easy to understand discrete and continuous elements. Dates can be discrete OR sequential.

  • If the average temperatures in August are considered over a decade or a century, this means that "August" is used as a discrete, qualitative date part.
  • For example, to look at the general trend in reported malaria cases since 1960, a single, uninterrupted axis is required, which means that the date is used as a rolling, quantitative value.

For more information, see the free training video "Understanding Pill Types" (Link opens in a new window) or the help topic Dimensions and Measures, Blue and Green.

Fields created by Tableau

Tableau automatically creates three fields regardless of the record:

  • Measure names (one dimension)
  • KPI values (a key figure)
  • Number of records (a key figure)

If the record contains geographic fields, Tableau also creates the fields Latitude (generated) and Longitude (generated).

Measure names and KPI values are two extremely useful fields. For more information, see the free training video (link opens in a new window) or the help topic Measure Values ​​and Measure Names.

Number of records is a field that basically assigns a "1" to every row in the record. This allows you to have at least one metric in your data set and can help with some analysis. You need to understand the granularity of your data (what a row represents) in order to define what the number of records means.

Here every line is a day. As a result, the number of records equals the number of days:

Here each line is a month. As a result, the number of records equals the number of months:

4. A good data set has metadata or a data dictionary

A dataset can only be useful if you know what the data is for stand. There are few things more frustrating when hunting for good data than opening a file that looks like this:

What does a source of 4 or 12 mean? And what information is there in the fields OTU0 – OTU4?

A good data set has well-labeled fields and elements or a data dictionary so that you can rename the data yourself. Think of Superstore - it is immediately obvious what the fields and their values ​​stand for, for example "Category" and the associated elements "Technology", "Furniture" and "Office Supplies". For the microbiome dataset in the picture above there is a data dictionary (Link opens in a new window) in which each source (4 is stool and 12 is stomach) and the taxonomy of the respective operative taxonomic units (OTU3 is a bacterium of the genus Parabacteroids) can be explained.

Data dictionaries can also be referred to as metadata, indicators, variable definitions, glossaries, or any number of other things. Ultimately, a data dictionary contains information about column names and elements in a column. This information can be fed into the data source or visualization in various ways. For example, you can do the following:

  • You can rename the columns so that they are easier to understand (this can be done in the dataset itself or in Tableau).
  • You can rearrange the elements of the field again (this can be done in the record itself or in Tableau).
  • You can create calculations to add data dictionary information.
  • You can comment on the field in Tableau (comments are not displayed in published visualizations, only in the authoring environment).
  • You can use the data dictionary as a different data source and combine the two data sources.

Losing a data dictionary can render a data set useless. When you bookmark a data set, you also bookmark the data dictionary. You need to download both and keep them in the same location.

5. A good data set is one that you can use

As long as you can understand the data set and it has the information you need, it can even be a small data set for analysis. Smaller data sets are also easy to store, share, and publish, and are likely to perform well.

Even if you can find the "perfect" dataset for your needs, it's not really perfect if the effort involved in cleaning it up is unrealistic. It is important to understand when to move away from a data set that is too disorganized.

For example, this data set comes from a Wikipedia article on relative letter frequencies. Initially it had 84 rows and 16 columns (pivoted to 1,245 rows and 3 columns). The Excel file is 16 KB in size. But with a few groups, sentences, calculations and other manipulations, it enables reliable analyzes and interesting visualizations.

Click the picture to download the workbook.

Renaming your data again

Once you've found a good record, you'll need to rename it again often. Renaming data again can be useful to either create fake data for sampling or proof of concepts, or to make the data more readable.

By the Rename a field changes how that field is displayed in Tableau, such as renaming Sales to Pipeline Sales or State to Province.

By the Re-aliasing the display of the elements of a field is changed, e.g. B. when re-aliasing values ​​in the "Country" field, so that CHN becomes China and RUS becomes Russia.

  • The values ​​in a discrete dimension field are saved as elementsdesignated. Only elements can be rearranged again. In the following we consider a key figure field for the temperature. A value of 12 ° C cannot be changed without changing the data itself. If the "CHN" element in a country field is changed to "China", the information is the same and is simply labeled differently.

Renaming and rearranging mean almost the same thing. It is the Tableau convention that fields are named and items are rearranged. For more information, see Organizing and Customizing Fields in the Data Area and Creating Aliases to Rename Items in the View.

Note: Renaming or rearranging only changes the appearance in Tableau Desktop. The changes are not transferred to the underlying data.

Renaming again to create fake data

Renaming existing records again is a great way to make samples or proof of concept content more compelling.

  1. With a simple data set (superstore for example) you can create whatever you want (a specific type of chart showing specific features, etc.).
  2. Rename the relevant fields, change tooltips, and otherwise change the aspects of the text to mask what the data actually represents.

Important: Do this only if it is clear that the information is falsified.Be careful not to create the impression that the data is real and try to use it for analysis. For example, use silly names or meaningless field names like colors or animals.

Re-alias to make the data easier to use

It is more efficient to store the data as numeric values ​​rather than string values, although numeric encoding can make the data difficult to understand. For small datasets, this is unlikely to have any performance impact. Therefore, you should prefer to be able to understand the data easily.

A disadvantage of re-aliasing is that you can no longer access these numeric values ​​(which makes certain operations difficult, such as sorting, assigning gradients, etc.). If necessary, you should duplicate the field and rearrange the copy again. Alternatively, doing a calculation in Tableau can be a great way to keep the original information while making it easier to understand.

Re-aliasing with the CASE function

Calculations can be very powerful for re-aliasing. In essence, CASE functions allow you to specify, "If this field is A, then X should be returned. If the value is B, Y should be returned".

In this case, the CASE function examines the F-scale in a tornado data set and provides the written description associated with each numeric value:

CASE [F-scale]
WHEN "0" THEN "Some damage to chimneys; branches broken off trees; shallow-rooted trees pushed over; sign boards damaged."
WHEN "1" THEN "The lower limit is the beginning of hurricane wind speed; peels surface off roofs; mobile homes pushed off foundations or overturned; moving autos pushed off the roads ..."
WHEN "2" THEN "Roofs torn off frame houses; mobile homes demolished; boxcars overturned; large trees snapped or uprooted; highrise windows broken and blown in; light-object missiles generated."
WHEN "3" THEN "Roofs and some walls torn off well-constructed houses; trains overturned; most trees in forest uprooted; heavy cars lifted off the ground and thrown."
WHEN "4" THEN "Well-constructed houses leveled; structures with weak foundations blown away some distance; cars thrown and large missiles generated."
WHEN "5" THEN "Strong frame houses lifted off foundations and carried considerable distances to disintegrate; ... trees debarked; steel reinforced concrete structures badly damaged."
END

We can now use the original F-scale field (0–5) or the field with the description of the damage based on the F-scale in the visualization.

Tips for finding records

Note: Try to make sure that the question "What does a row (or record) represent?" to be able to answer. If you cannot express this, you may not understand the data well enough to use it, or it may be poorly structured for analysis.

  • Track where the data is coming from.
  • Keep the data dictionary information with the data itself.
  • Avoid stale data if you want to keep the content up-to-date. Are you looking for:
    • updatable data (stocks, weather, regularly published reports, etc.)
    • timeless dates (the average mass of different animals will not change from year to year)
    • Data that you can future-proof by artificially changing to historical or future data
  • Just search what you're looking for on Google and you might be surprised.
  • Don't be afraid to give up a record if it's too much work to prepare.

Places to search for data

Where can you look for dates? Records can be found practically anywhere. Below are a few ways to get started. Note that the reality of the records applies to these sites. - You are unlikely to find what you are thinking about and you will most likely need to do some cleanup to prepare the data for analysis.

Disclaimer: These links to external websites are kept as accurate, current and relevant as possible. However, Tableau cannot guarantee the accuracy or timeliness of the content on these third party websites. Listing a website here is not an endorsement of any content or organization. If you have any questions about the content, please contact the external website directly.

Tableau Public (Link opens in a new window): Tableau Public is an amazing resource for Tableau-ready records. Find workbooks on a topic that interests you, browse for inspiration, and then download the workbook to access the data. Or take a look at the curated sample data (Link opens in a new window).

Wikipedia tables(Link opens in a new window): Get data from Wikipedia tables by copying and pasting them into a worksheet, copying and pasting them directly into Tableau or Google Sheets and the IMPORTHTML function (Link opens in a new window) use it to create a google worksheet of the data.

Google Dataset Search (Link opens in a new window): "A search engine to unite the fragmented world of online data sets."

Data is plural (Link opens in a new window) : Subscribe to a weekly newsletter of records or search the archive (link opens in a new window).

Makeover Monday (Link opens in a new window): "Work with us on a specific dataset every Monday, create better, more effective visualizations, and help us make information more accessible." You can see what other people have been doing with the same dataset to start your analysis or provide inspiration. Use #makeovermonday (Link opens in a new window) on Twitter to participate.

Other sites