Analyse-it blog The latest news on new features and software releases.

30-Jul-2008 The Analyse-it "Dataset" concept

For new and occasional Analyse-it users, datasets can sometimes seem confusing. Today we’ll explain why we devised the 'dataset' concept, a concept now copied by some other Excel add-ins.

Why can’t I select the cells containing the data to analysed?

We introduced the dataset concept so Analyse-it could automatically pick-up the data and variables from your Excel worksheet. As we found with Astute, the Analysis Toolpak, and other Excel add-ins, forcing you to select cells containing the data to be analysed can be problematic:

  • Selecting cells can be tedious.
    When you need to run a few statistical tests you have to select the same cells, again and again. When there are lots of cells to select you have to wait while Excel auto-scrolls the off-screen cells into view so you can select them.
  • Selecting cells can be error-prone.
    It’s easy to select the wrong cells, too many cells, or too few cells. You have to carefully select cells, otherwise the statistics calculated won't be valid.
  • Selecting a subset of the data often means re-organising and sorting the data.
    Most Excel add-ins require the data be in a rectangular contiguous range of cells. You might have to filter then sort it, or maybe even re-arrange it manually, to bring data together into a selectable rectangular block of cells.
  • Adding or removing cases and observations isn't easy.
    You have to run the statistical test again and re-select the cells again, now to include new data you’ve added, or exclude data you’ve removed.

Statistics software is supposed to simplify what’s already a complex and error-prone subject area. Forcing you to select ranges of cells just seemed, to us, to introduce more potential for errors.

What are the benefits of the Analyse-it way?

In Analyse-it we tried to solve all these problems. We wanted the software to do more of the work, eliminating the need to select or re-organise your data. You should be able to:

  • Select variables by name, not by selecting cells.
    Statistics and charts can then refer to variables by name, instead of cryptic meaning-less cell references. Which is clearer? “1-way ANOVA of Yield by Fertiliser”, or “1-way ANOVA of Sheet1!$A$2:$A$350, Sheet1!$C$2:C$350”?
  • Add, remove, or correct observations or cases in the dataset, and then easily re-run the analysis. 
    The software should automatically recognise when data is added, removed, or changed. You should be able to update any statistics and charts to reflect the new data, without running the test and selecting data again.
  • Slice and dice the data, to filter and limit analysis to a subset of data.
    Excel’s built-in AutoFilter provides a quick and easy way to filter data, providing a simple way to analyse a subset of the dataset without you having to re-arrange data on the worksheet.

How does Analyse-it recognise the data for analysis?

These requirements mean Analyse-it has to know exactly how your data is arranged on an Excel worksheet -- which cells contain data for analysis, and which cells contains the variable names.

This is when the real problem with spreadsheets (or benefit, if you like) becomes apparent. Spreadsheets enforce no structure to the data you place in the cells. You can use cells on a worksheet how you like, arranging data anywhere on the sheet, horizontally or vertically, with or without labels for each row/column.

To keep things simple, we chose to support a few layouts commonly used to organise data in Excel. We call these “datasets” and all are immediately recognisable. Over time we've found most users tend to naturally organise their data in the way Analyse-it expects it – experienced Excel users more so. For more information datasets, and how to arrange your data for Analyse-it, see the online help.

Comments are now closed.