How Excel Stores and Displays Data
For all the great things about Excel, one of the most confusing things is how it stores and displays data.
Every cell has a number format, which can be selected from the dropdown in the Home ribbon tab. However, it is common practice to leave it on the default option General, as this automatically adapts to the value.
There are 11 predefined number formats, but don’t be fooled by the name. Broadly speaking, they can be split into two groups: numbers and text. All of them apart from General (which is a mixture of both) and Text are of a numeric kind. Forget about the output differences of the currency, date, time and percentage formats; these are all stored as numbers.
If you change a cell from General to something else, only the visible output will change — not its storage type. This isn’t a problem if a cell is changed from one numeric type to another. Issues arise though when numbers are stored as text, or vice-versa.
Contrast this to Google Sheets, where changing the data format of a cell has an instant impact based on what you’ve chosen.
Just because a cell looks blank, it doesn’t mean it is. I know from experience how easy it is to be tripped up by this. You end up spending hours trying to figure out what has gone wrong when a formula is not producing the expected result. A lot of this comes down to an overreliance on Excel’s visible output. It is too easy to assume what you see is what you get, but this isn’t always the case.
Here, I explore how Excel handles blank cells that are actually blank, ‘blank’ cells that are not blank, empty strings, zero values, custom-formatted cells, text-based cells, number-based cells—amongst many others.
Confused? It’ll all make sense by the end!
Example Workbook
Follow along by downloading the Excel workbook:
🔗 How-Excel-Stores-and-Displays-Data.xlsm
A Microsoft 365 version of Excel is required for full functionality.
Part 1
Part 1 consists of three tables that demonstrate how you can check for blank cells, empty strings and zeros. Each one has been put through seven tests to see whether TRUE or FALSE is returned.
The tests are:
- Blank — an unused cell with nothing inside.
- Empty String — a cell containing an empty string (
=""
). - Invisible Character—a cell containing a single character space. Excel tends to add a
'
when you confirm a value like this to indicate text; this behaviour is inconsistent though. In any case, it will not appear in the output. - Apostrophe—a cell containing an apostrophe (
'
), which represents a text-based entry. - Custom Formatted (
;;;
)—a cell containing a numeric zero value with the custom formatting;;;
applied. Before the first semi-colon represents the format for positive values, the second—negative, the third for zeros, and after the third—for text. When these are all left blank, the output is hidden. - 0 (Number)—a cell containing a numeric zero value (0).
- 0 (String)—a cell containing a string with a zero inside (
="0"
).
Table P1.1—Check for Blanks
Syntax
=ISBLANK(cell reference)
Example
=ISBLANK(I11)
The ISBLANK function checks each row in the Value column to see if it’s blank. If it is, TRUE is returned—and if not — FALSE occurs.
The ISBLANK function does what it implies, as only the Blank row returns TRUE.
Table P1.2—Check for Empty Strings
Example
=I21=""
As expected, the Empty String row produces TRUE, however, so does Blank and Apostrophe. This proves that Excel does not distinguish between an empty string, a blank cell and an apostrophe.
Table P1.3—Check for Zeros
Example
=I31=0
Not surprisingly, 0 (Number) returns TRUE, but so does Blank and Custom Formatted (;;;). This is because zero values are treated the same way as blank cells. Moreover, do not be fooled by the absence of a visible zero for Custom Formatted (;;;).
Part 2
This part focuses on counting values to see how formulas react.
Table P2.1 contains seven columns:
- Blank — empty cells.
- Empty String — cells containing
=""
. - Invisible Character — cells containing a single space.
- Apostrophe — cells containing an apostrophe.
- Number — numeric values, e.g. 1, 2, 3 etc.
- Text — text values, e.g. A, B, C etc.
- Custom Formatted Number — same as Number, except each cell has the custom formatting
;;;
applied to it. - Custom Formatted Text — same as Text, except the values are not visible because each cell has the custom formatting
;;;
applied to it.
Table P2.2 lists 15 formulas tested on these columns.
Methods 1–10 feature COUNTIF formula variations, with the criteria
argument making use of comparison operators and symbols.
What do these all mean though? Firstly, you need to know what ASCII is.
ASCII is a 7-bit encoding set. There are 256 characters in total, and it’s mostly composed of the alphabet, numbers, and symbols you’ll find on your keyboard — directly and indirectly:
- 0–31 are unprintable control characters.
- 32–127 are printable characters.
- 128–255 are extended codes that vary depending on the character set your computer uses. Windows-1252 is the most common one on Windows.
Use the CHAR function to display a character by stating its ASCII decimal number. Conversely, the CODE function will output the number of a given character.
For example, =CHAR(42)
displays an asterisk and =CODE("*")
returns 42.
Although the ASCII codes are the same for all except the unprintable characters, the way Excel sorts them is not.
In the Part 2 Extras worksheet, there are three tables:
- Table P2E.1—the left half contains the original ASCII code list sorted by decimal number. The right shows how Excel converts these when the CODE and CHAR functions are used.
- Table P2E.2—Excel character list sorted by code number.
- Table P2E.3— Excel character list sorted by character.
- Table P2E.4—Examples of comparison checks on the characters.
Focussing on Table P2E.1, there are several observations:
- The ASCII decimal numbers start from zero, whereas Excel’s character values begin from one. Therefore a #VALUE! error is produced in the first row of the Code 2 and Character columns.
- #VALUE! errors also occur in the Code column for the decimal numbers not used. These are 129, 141, 143, 144 and 157. However, 127 and 160 also do despite representing a symbol.
- The CODE function only considers the first character. As the first 31 ASCII symbols have at least two, Excel returns the code number of the letter instead (Code column). That is why the unprintable character must be given—not the reference name.
Table P2E.3 sorts the values (ascending order) in Table P2E.2 according to the Character column.
What is interesting is how dissimilar the ordering is compared to the original ASCII list. Code 173, which is a soft hyphen, is the first character, followed by the unprintable characters, symbols, numbers, and finally letters.
One key difference with the Excel characters is all variations of each alphabetic letter are grouped together. For example, ‘A’ precedes ‘a’, despite the former having a character number of 65 compared to the latter’s 97.
With that out of the way, let’s swiftly go through each method:
Method 1
=COUNTIF(range,"<<")
The less than symbol (<) is the 89th value in Table P2E.3, and anything before is counted. Therefore, empty strings and invisible strings are included, but numbers and letters aren’t.
Method 2
=COUNTIF(range,">>")
Interestingly, despite numbers and letters appearing after the greater than symbol (>), only text is counted.
Method 3
=COUNTIF(range,"<>")
A less than and greater than symbol combined is a not equal to operator. As a reserved operator, Excel’s usual sorting rules do not apply.
Out of all the methods, this is the least stringent, as it only ignored blanks, but was able to count empty strings, invisible characters, text and numbers.
Method 4
=COUNTIF(range,"><")
Perhaps you could call this one ‘angry eyes’. It is the opposite of Method 3, as it only counts text.
Method 5
=COUNTIF(range," ")
This criterion was a bit more predictable as only the Invisible Character test counted the values.
Method 6
=COUNTIF(range,"")
A zero-character string not only counts the Empty String values, but it does so for the Blank ones, too.
Method 7
=COUNTIF(range,"*")
An asterisk represents a wildcard character, and on its own will count empty and invisible characters, as well as text. However, numbers are excluded.
Method 8
=COUNTIF(range,"*?")
This is a slight variation of Method 7. The question mark symbol (?), which represents a single character, means that only strings of at least one character are counted.
Method 9
=COUNTIF(range,"<*")
Any character before the asterisk is recognised, and this includes spaces, so empty strings and invisible characters are covered.
Method 10
=COUNTIF(range,">*")
The key difference compared to Method 9 is text is counted, but spaces aren’t.
Method 11
=COUNT(range)
The COUNT function is only meant to count numbers, and indeed it does, as only the Number column resulted in a count.
Method 12
=COUNTA(range)
COUNTA counts each cell that is not empty. Every test apart from Blank yielded a count.
Method 13
=COUNTBLANK(range)
Curiously, the COUNTBLANK function not only counts blank cells; it counts empty strings, too.
Method 14
=SUM(--ISBLANK(range))
As an alternative to Method 13, if you want to count blank cells only, nest the ISBLANK function inside SUM. ISBLANK ignores empty strings, unlike COUNTBLANK.
Method 15
=SUMPRODUCT(--(LEN(range)>0))
For counting a range of text and numeric values that are at least one character, use a combination of the SUMPRODUCT and LEN functions.
LEN will check to see if each cell in the range returns more than zero characters. An array of TRUE and FALSE values is returned, although the --
converts these into ones and zeros. This allows SUMPRODUCT to multiply each index by itself—and as anything multiplied by zero results in zero—only the ones will be summed.
The examples in Table P2E.4 show that although order number (# > #) is the determining factor, there are exceptions.
Take, for example, ‘a’ and ‘A’. Despite the former having a higher order and code number, these are considered equal. This makes some sense, as I already knew that text was not case-sensitive. I’m still unsure though about the secret order system that puts lowercase and uppercase letters on par with each other.
Part 3
Table P3.1 and Table P3.2 contain a set of values along with two data checks.
In Table P3.1 they are stored as numbers, so the ISNUMBER function returns TRUE for each. Oppositely, Table P3.2 stores them as text.
Another example is shown in the next two tables, with error values instead.
If you change the number format of a first column cell in Table P3.2 or Table P3.4, the storage type will only change in name. To properly change it, you have to edit the cell and press Return to reconfirm the value.
If you have many cells, doing this for each would get very tedious. That’s where the Text to Columns utility comes in handy. It’s found in the Data tab of the ribbon.
Text to Columns is generally used to split a column’s data into multiple ones based on a delimiter. However, it can also be used to reset a range’s values.
After selecting the cells and opening Text to Columns, you can click Finish straightaway. This will reset the cells to General format.
If you want to choose a specific format to convert the values to, Step 3 allows you to do this.
Final Words
I’ve covered quite a bit here, but I hope it’s raised awareness of how Excel is treating your values.
I do wish it was more stringent though. For instance, why is it that the COUNTBLANK function doesn’t only count blank cells? Why is the word ‘blank’ misused? And why do I need to use a combination of SUM and ISBLANK to count truly blank cells?
Why is it when I change the format of a cell from a numeric type to text, it doesn’t update until I reconfirm the value?
Why does Excel add an apostrophe in text-based cells and other times not?
There are still a lot of unanswered whys, and I know there are millions of users who are blissfully unaware of these nuances. So was I until I conducted this investigation.
I don’t expect you to remember every possible thing you’ve learnt here, but if are having trouble in the future, it might just be worth referring to my workbook again.