Five Times Excel Led to Disaster
Excel is so ubiquitous it can give many the false impression they are more adept with it than they actually are. Let’s face it: everyone has it on their computer — whether at home or the workplace. But how many users can say they are aware of all its features and capabilities?
Most of the time, the mistakes people make have nothing to do with not possessing guru-esque Excel knowledge. They usually stem from being unaware of the nuances and intricacies that make up the application.
For all the benefits spreadsheets provide, they can be a double-edged sword. On one hand, they give you freedom and flexibility. On the other, they can be mishandled and error-ridden.
Be warned: sloppy and amateurish mistakes have consequences. Sometimes those consequences are worth millions, billions, and even human lives.
I’m going to go through five occasions when Excel was misused in the workplace. I’ve also included a verdict for each with my own post-mortem!
5. Tickets Oversold at Olympics (2011)
Four synchronised swimming sessions at the 2012 London Olympics had 10,000 extra tickets sold for them. Human error was to blame, as an employee made a single keystroke error in a cell, typing 20,000 instead of 10,000.
The problem was only spotted when the London Organising Committee of the Olympic and Paralympic Games (LOCOG) reconciled the number of tickets sold against the seating plan.
Ticket booking agents had to spend Christmas contacting thousands of customers to alert them of the gaffe. The upside for these people though — is they were offered the chance to exchange their tickets for other events they had applied for unsuccessfully.
Verdict
We all make typos — usually, our spell checkers will let us know we’ve made them so they’ll be corrected. On other occasions they’ll be missed.
This was not words, however.
The curious thing about this case is how directly a numeric single-stroke typing error led to such a monumental cock-up.
Where was the data validation? If any thought had gone into the workbook, the master would have set in-cell maximum and minimum limits. There could also have been a formula that compared the inputted ticket quantity to the maximum capacity of the venue.
Furthermore, it appears the seating plan was not entwined with the worksheet throughout the process. It should have been an integrated part of the ticketing system, rather than be sat on the side until the end — when the error became apparent.
Sources
4. Extra Contracts Found in Purchase Agreement (2008)
179 contracts were unintentionally included in a Barclays Capital purchase agreement. This was part of its acquisition of assets owned by Lehman Brothers’, another financial services provider — who were filing for bankruptcy.
The bulk of the blame was attributed to a junior associate, who was unaware of hidden rows in the worksheet. These were marked with an ’N’ to indicate they were not part of the deal. However, after reformatting, deleting a few columns, globally resizing the rows and converting the workbook into a PDF document, it was posted on a website with the rows visible.
Barclays later made the affected parties aware of the error and posted a revised contract list.
Verdict
It’s easy to emphasise the fact the person who made the error was inexperienced, however, it wasn’t entirely their fault.
Question marks should be raised about whether they had been given sufficient training and instructions. Why were they, seemingly, not told about the ’N’ designations?
If the employee resized all the rows at once, as reported, this would have unhidden them. Checking whether the number of contracts matched up with what was expected would have been prudent.
It would also have been better to display the final list of contracts in a new worksheet or workbook — cleaned, polished, and with nothing hidden. If something is going to be published, it’s often a good idea to do this, as you can have your messy data in one place and a formal copy in another.
Sources
Finextra
The Register
AccountingWEB
3. Scientists Forced to Rename Human Genes
27 human genes were renamed by the HUGO Gene Nomenclature Committee (HGNC) due to Excel mistaking them for dates.
As part of a standardised and consistent naming system, an alphanumeric code is assigned to each gene — called a symbol.
However, because of the way Excel reads certain gene names such as MARCH1 and SEPT1, they were being converted to dates. This also affected their formatting.
Excel handles them like this:
- MARCH1 becomes 01/03/2001 and displays as Mar-01
- SEPT1 becomes 01/03/2001 and displays as Sep-01
Last year, MARCH1 and SEPT1 were renamed MARCHF1 and SEPTIN1, respectively. Additionally, gene names featuring common words have also been changed so they don’t get autocorrected by spellcheckers. For example, CARS has become CARS1 and WARS is now WARS1.
Verdict
It’s debatable how much of the blame can be attributed to Excel. I’ve long thought Excel’s default settings should stop assuming what data has been inputted in a cell. Problems of this ilk also include things like telephone numbers, where a leading zero can be knocked off unknowingly.
But a portion of the blame must be put on the shoulders of the people who were dealing with the data. Errors like this slip through the net because assumptions are made.
Surprisingly, Excel has still not budged on this issue. Type in any of these gene names and they will still convert to dates, providing the default data type of General hasn’t been changed. There is no way to change this behaviour. The only solution is to set the data type of the cells to Text, but this should be done before the data is planted there, otherwise it won’t work.
Scientists have effectively been forced to change gene names. According to a study done by Genome Biology, roughly one-fifth of genetics papers that used Excel contain erroneous names.
Sources:
BBC News
BioMed Central
The Verge
2. Professors Caught Out by Student (2010)
Just because you’re a professor at one of the leading universities in the world, it doesn’t mean you’re immune from making mistakes.
So that proved to be true, as University of Massachusetts student, Thomas Herndon, spotted a gaping mishap in a spreadsheet he obtained as part of his economics research.
“My heart sank. I thought I had likely made a gross error. Because I’m a student the odds were I’d made the mistake, not the well-known Harvard professors.”
— Thomas Herndon (student)
The Harvard professors, Carmen Reinhart and Kenneth Rogoff, were responsible for writing an academic paper titled ‘Growth in a Time of Debt’. It looked at the relationship between government debt and economic growth.
Their conclusion was: economic growth slows significantly if the debt of the country exceeds 90% of its gross domestic product (GDP).
Herndon acknowledged there’s a correlation between high debt and slow growth, however, not to the extent that Reinhart and Rogoff had claimed.
The supplementary workbook, which contained a table of countries with various metrics, had several issues:
- five of 20 countries (Australia, Austria, Belgium, Canada and Denmark) were not part of the calculation
- the methodology was questionable, as the New Zealand and United Kingdom weightings were put on par with each other, despite the obvious size difference in economies
- data was incomplete for some countries
Prominent figures and organisations were dependent on this paper. Paul Ryan, the 2012 Republican vice-presidential nominee, used it in his 2013 proposed budget. The Washington Post formed an economic consensus from it, and all over the world countries relied on it to justify austerity measures.
Verdict
Forgetting to update cell ranges is a common problem. Even today, many people still use pseudo-tables in their worksheets, which don’t provide any additional protection if there’s a change. This includes when new rows or columns are added, as well as data being shifted from one place to another.
It’s not clear what version of Excel the professors were using, but if it was the 2007 or 2010 edition, they would have benefited from using official tables. The advantage of these is they use structured references, so named columns are pointed to instead of cell ranges. This is a lot more reliable as formulas don’t need to be manually updated when there’s a change.
For example, based on what I see in the opening screenshot, instead of calculating the average using =AVERAGE(LL30:L44)
, housing the data in a table accompanied by a formula like =AVERAGE(GDP_Table[90 or Above])
would have been wiser.
Sources
BBC News
The Independent
KeyCuts
1. 16,000 Coronaviruses Cases Missed (2020)
A problem with the test result system meant that 48,000 contacts had not been traced. Many people were roaming about, oblivious to the fact they might have already been exposed to the virus when they should have been self-isolating.
Public Health England (PHE) was responsible for collating the test result data that was produced by commercial firms who were in charge of carrying out the swab tests.
They were storing the results in comma-separated-value (CSV) files. A system was set up in Excel, containing templates for this data to be housed. Each test result was made up of several rows, which meant a single template was limited to around 1,400 cases.
The crux of the problem was the Excel files were being saved in the old .xls format, instead of the newer .xlsx. The former only allows for 65,536 rows, whereas the latter’s limit is 1,048,576. This meant that any data that was beyond row 65,536 was cut off.
To circumvent the issue, PHE decided to start splitting the test result data into smaller batches, which meant an increase in Excel templates. This was only intended to be a makeshift solution though, as the ‘legacy system’ was said to be replaced soon after. Details of its replacement are currently unknown though.
Verdict
Relying on Excel as a national ‘database’ for recording coronavirus cases was an ill-thought idea from the start.
In the upper echelons of government and PHE, it was described as an ‘IT error’ and a ‘technical glitch’ that occurred because of its ‘legacy system’.
Enough of the euphemisms.
I call it gross incompetence.
Advanced automated solutions just didn’t seem to be a priority. During the early days of the pandemic when testing was limited, they weren’t even using computers. A phone along with pen and paper were used to perform some work manually.
Experts in data management, relational databases and programming should have been hired, to advise and steer the ship in the right direction. They would have instantly known a database was essential.
A bespoke Python application combined with an SQL database might have been suggested. This would have been a much more robust and scalable solution — even if it were pricier. Why it was never suggested in a meeting remains a mystery. Or maybe it was — they just wanted to save a bit of money.
“Excel was always meant for people mucking around with a bunch of data for their small company to see what it looked like.
“And then when you need to do something more serious, you build something bespoke that works — there’s dozens of other things you could do.
“But you wouldn’t use XLS. Nobody would start with that.”
— Professor Jon Crowcroft (University of Cambridge)
I remember posting about this on LinkedIn, baffled and confused about what I had read. Where was the foresight? It doesn’t take a rocket scientist to figure out that Excel probably isn’t the best tool to use for a national system. It was never going to be a sustainable solution for recording cases that were proliferating across the country. As they escalated, so did the rows.
Large volumes of data are not handled very well by Excel. Those of us who’ve attempted to work with humungous datasets will know that to avoid crashing it, the true usable row limit is far lower than the number that exist.
If they had used the modern .xlsx extension, the extra cases would not have been cut off, but this doesn’t detract from the fact they were using a stop-gap system.
Relational databases give structure, organisation and efficiency. A large and growing amount of data demands these things, otherwise you run into all kinds of issues.
Given my LinkedIn feed is perennially swamped with pro-Excel enthusiasts, inevitably I was going to get a defiant reaction to this story. Many leapt to its defence, arguing that it wasn’t Excel’s fault, but the people who used it.
Power Query was touted as a tool that should have been considered. The big plus is millions of data rows are allowed, so the CSV files could have been imported whilst bypassing the restrictions worksheets have.
Adding to the Data Model makes it possible to create a relational set of tables that function like a database. It can be used with Power Pivot for reporting purposes. This allows for more sophisticated pivot tables compared to the standard ones. Sure, Excel could have been used this way for analysis, but we’re talking about storage here.
I’ll finish by saying none of this should come as a surprise. After all, the NHS was still using Windows XP as late as 2019.
Sources
BBC News
The Guardian
Digital Health
Final Thoughts
It’s interesting how all five of these case studies have something in common: the effects of the mistakes were considerably greater than the magnitude of the mistakes themselves.
They were all avoidable if proper care had been taken. Spreadsheet errors occur all the time, although the vast majority don’t lead to the consequences these did, so you just don’t hear about them. But when there are financial implications or lives at risk, it becomes a serious matter. This is why procedures and protocols are necessary, as well as regular auditing.
Companies should be training their employees to a level appropriate for their job. They also need to consider creating workbook templates that have data validation built-in, so user input is restricted.
Awareness is paramount. Anyone dealing with an unfamiliar Excel workbook needs to consider the following:
- How many rows and columns are there?
- Check for hidden rows and columns. If so, why are they hidden?
- When pasting data, ensure it appears the same as when it was copied.
- Are all manual inputs and calculations correct?
- Ideally use Excel tables, for the benefit of structured references.
- What file format is the workbook in?
Hopefully, this has given you food for thought about your own spreadsheet practices, whether you are an employer or employee.
I do plan to write more about nuanced Excel issues in up-and-coming articles, so stay tuned!