In clearly titling this blog post, we’ve probably already revealed the answer, but... Can you spot the difference between the two rows of values in the Excel spreadsheet shown below?
Sorry, it’s a trick question, because (visually) there is no difference. The difference is how the values are stored by Microsoft Excel. The value 57 in the cell on second row is actually stored as a text string, not a number.
When does Excel store numbers as text?
When you type a value into a cell, Excel looks at what you’ve typed and decides whether it’s a valid number. If it is, the value is stored as a number, and if not it’s stored as text (a string of characters).
Considering this, how is it possible for Excel to store a value that looks like a number, as text? There are a few ways. Most common is when you copy-paste data from another application, and the application providing the data fools Excel into believing the values should be stored as text. Similarly, if you import data from a database field that contained numbers stored as text, the numbers will be imported as text. Finally, you can force Excel to store a number as text by prefixing it with an apostrophe (‘).
The side-effects of numbers stored as text
So what difference does it make? Venturing into computer science briefly, computers represent and store numbers and text values very differently – numbers are stored in a compact binary representation, and text strings are stored as a string of individual characters. The problem is that mathematical operators and functions can only be applied to values stored as numbers. Text strings, even those that look like numbers, cannot be operated on mathematically.
If you use Excel’s SUM function on the two rows, you can see the problem:
The SUM of the second row doesn’t match the first, because SUM works only on numeric values. The cell containing the text string 57 on the second row is ignored by SUM.
If you weren’t aware of this issue before, you might be surprised. Regardless of how Excel stores a value, if it looks like a number you might expect SUM and other worksheet functions to treat it as such!? Because Excel doesn't, this can lead to very subtle and difficult to spot errors.
Microsoft’s attempt to mitigate the problem
Microsoft recognises this problem and have tried to mitigate it since Excel 2002. Excel 2002, 2003 and 2007 now show a green triangular indicator in the top-left of any cell that contains a number stored as text. If you click the cell to activate it, a small pop-up menu appears so you can convert the cell content to a number. See below:
For more information, see Convert numbers stored as text to numbers.
Analyse-it’s solution to the problem
We identified the problem shortly after releasing Analyse-it in 1997. In all versions of Analyse-it since, when Analyse-it reads your data from the Excel worksheet it treats any numbers stored as text as numbers, so they are included in the analysis.
We feel this is a better approach than Excel’s current solution, but the different approaches can lead to confusion. Recently a customer was validating Analyse-it and was surprised to find the Excel function CORREL gave a different answer to that shown by Analyse-it’s Pearson correlation. You can probably guess why. Some numbers on his worksheet were actually stored as text, and just like Excel’s SUM function, the CORREL function ignored the text values. That meant the correlation coefficient was wrong, and didn't match Analyse-it. Once we converted the numbers stored as text to actual numbers, Excel's CORREL function calculated the correlation coefficient properly. It then matched Analyse-it.