Solving spreadsheet problems through the power of a database
If you need to make a list of anything, it’s tempting to see Excel as the default repository: after all, it’s only a small list of items for yourself or a few close colleagues.
Perhaps you need something more sophisticated – formulae for some calculations, or macro programming to automate the collection and processing of data. No problem: just type “=” to start writing a formula and Excel will be your guide.
Unfortunately, the ease with which you can start work in Excel or a rival spreadsheet program is also one of its problems. What starts as a small project in Excel grows and grows, until you’re left with a behemoth – at which point you could also be facing speed and stability issues, or even a development problem you just can’t solve.
Here, we examine the issues you can often come up against with spreadsheets, how you could possibly tackle them in Excel, and when you’d be better off taking the plunge and switching to a database instead.
1. Multi-user editing
When Excel systems grow organically, you quickly run into the problem that only one user can open a workbook at any one time. The second person to try to open the file is told it’s already open and that they can cancel, wait or view a read-only version. Excel’s promise to let you know when the other person closes the workbook is rather hollow, since it doesn’t check the status very often, and indeed it might never enlighten you. Even if it does, someone else might nip in and open the file before you.
There are three ways around this: you can use Excel Online, the cut-down, web-based version of Excel; you can turn on the Shared Workbooks feature; or you can split the data into several workbooks so that a different person can use each workbook without you all treading on each other’s toes.
2. Shared workbooks
Excel Online allows multiple editors by default, but it’s missing so much functionality that it isn’t really a contender for anything but the simplest tasks. Although its Shared Workbooks feature looks like it should do the job, it’s loaded with restrictions. You can’t create a table or delete a block of cells if the workbook is shared, for example.
There are workarounds for some restrictions – for others it’s a matter of changing the structure of the workbook, rather than using a workbook that’s already been set up – but they can get in the way. As a result, it can be impossible to use a shared workbook in the same way you might an ordinary, single-user workbook.
“When Excel systems grow organically, you run into the problem that only one user can open a workbook at any one time”
Changes in shared workbooks are synchronised between users each time the workbook is saved; this can be on a timed schedule, forcing a save every five minutes, for example. However, the overhead of regular saving and tracking every user’s changes is quite large: workbooks can easily balloon in size and put a strain on your network, slowing down other systems.
Shared workbooks are also fragile and prone to corruption. Microsoft is aware of the problem, but doesn’t seem to be doing much about the issue. It looks like it’s hoping Excel Online’s multi-authoring method will take over from the older shared workbook technology, but this won’t be a realistic proposition until the company removes all the restrictions and extends the multi-authoring technology to the full Excel desktop application, as it has with Word, PowerPoint and OneNote.
3. Linked workbooks
Splitting your data across multiple workbooks can provide a workaround to the problem of multi-user editing. But it’s likely these workbooks will need to have links between them so that values entered in one can be used in another. Links between workbooks are also useful for keeping logically separate data in separate files, rather than just separate sheets in one workbook.
Annoyingly, these links are another source of frustration and instability. They can be absolute, including the full path to the source workbook, or relative, including only the difference between the source and destination paths. Although this sounds sensible, Excel employs arcane rules to decide when to use each type of link and when to change them.
The rules are governed by a variety of options, some of which aren’t at all obvious, and by whether the workbooks were saved, and where they were saved before the links were inserted. The links can also change when you save the workbook or open and use Save As to make a copy, rather than copying the file using the File Explorer. The upshot of all this confusion and uncertainty is that the links between workbooks can break easily, and recovering from broken links can be a time-consuming process, during which no-one can actually use the files affected.
Linked data is only updated when the files are opened, unless you specifically click Data | Connections | Edit Links | Update Values. Because of this, if your links aren’t between two workbooks, but cover three or more, you have to open all the workbooks in the correct order to ensure the updated data flows in order, from the first to the second to the third. If you changed a value in the first workbook and then opened the third, it wouldn’t see any changes because the second workbook hadn’t updated its values.
This chaining of data is logical, but it increases the likelihood that data is either incorrect or that you’ll try to open a workbook that someone else is already editing.
Of course, you can try to avoid linked workbooks altogether, but there’s a chance you’ll end up entering the same data into more than one workbook – and with that comes the danger of typing it in slightly differently each time.
4. Data validation
Errors can creep into data in any computer system: people mistype words or transpose digits in numbers with monotonous regularity. If your data isn’t checked as it’s entered, you’re going to have a problem.
By default, Excel accepts whatever the user types. It is possible to set up validation on look-up lists, but these can be difficult to maintain, particularly if the same field is used in more than one place. If users have to enter document ID numbers or customer reference numbers without any checks, it’s easy to tie the wrong records together without realising it. The data integrity of the system is then fatally compromised, and any analysis of the data is suspect.
You may already be suffering the effect of this problem without realising the root cause. Consider a situation where you have a list of invoices in Excel where the user has typed the name of the customer slightly differently on each invoice. You’ve got invoices to “Jones Ltd”, “Jones Limited”, “Jonse Ltd” and “joness”. You may be aware that these are all referring to the same company, but Excel doesn’t. Any analysis of the invoice data, such as a pivot table by customer by month, will give multiple results where there should only be one.
5. Navigation issues
Large workbooks are difficult to navigate. The row of sheet tabs across the bottom of the window is a terrible mechanism for finding your way around if there are more than a handful of tabs; when there are more tabs than can be displayed across the screen, it becomes even more difficult to find what you’re looking for. You can right-click on the arrows to the left of the sheet names to bring up the Activate Sheet dialog, but even that only lists the first 20 sheets before you have to scroll the list – and there’s no way to sort, group or search for the sheet you want.
6. Security issues
You can add security to Excel workbooks, but it’s rife with problems. Protection is geared much more towards protecting the structure of the workbook, rather than the data. You can try to lock some sheets and cells to stop users changing the structure and formulae, but if they can see the data then they can usually change any or all of it (unless you do some clever macro programming).
7. Speed issues
Excel isn’t the fastest application in the world, and its programming language, VBA, is sluggish compared to more professional programming languages such as C#. This all stems from the intended use and flexible nature of Excel. It is, after all, a spreadsheet engine. Yes, it can be pressed into service to manage lists of data, but that doesn’t mean that it’s the best choice for this kind of work. Indeed, there are other applications better suited to such tasks – because they were designed to do them.
Introducing the idea of the database:
“You only need to enter a customer’s data once to create a record, then you can use it on as many invoices as you need”
If you’re hitting any of the problems outlined above, don’t ignore them: there’s a professional answer to storing “structured data”, and it’s our old friend the database. It doesn’t have to be scary or expensive, and it should give you the opportunity to think logically about your data, how it links together and how you interact with it.
Take heed, though: if you’re moving from a spreadsheet solution to a database, don’t slavishly duplicate the spreadsheet design: take the opportunity to make it better.
There are general-purpose database applications available, with which you can construct a bespoke solution. Alternatively, you may find that a specialist database application – one that has already been designed for the purpose you require – is cheaper, faster to implement and a better fit.
For instance, if you have a list of customers and details of all your interactions with these customers, that could be considered to be a customer relationship management (CRM) system. Despite its fancy name, a CRM system is simply a specialised database. Similarly, accounts packages such as QuickBooks and Sage are specialist databases. If you can’t find a prebuilt application that suits your particular needs, you can probably build one yourself or get one built for you by your IT department or a consultant.
The most common database type is a relational database. This stores its data in tables, which consist of rows and columns of data. Each row holds the data for a separate item – for example, a particular customer – and each column describes a different attribute of the item, such as the customer’s name or credit limit.
The tables have relationships defined between them so that, say, an invoice carries the customer ID. This means you can easily find all the invoices for a particular customer or, from a particular invoice, retrieve the customer’s phone number. You only need to enter the customer’s data once to create the customer record, and you can then use it on as many invoices as you need without having to type it in again. To create a database, you have to define these tables and relationships and then define the layout of the screens you want to use to list and edit this data.
There are dozens of database applications out there. Some are easy to use and do the whole job, allowing you to define the tables, data-entry screens and reports; others are more fully featured in specific areas but require other tools to do the complete job.
For instance, a program may be very powerful when it comes to defining the tables and relationships, and even have powerful analysis and reporting features, but completely lack any tools for defining data-entry screens. Microsoft SQL Server is the obvious example here. As with other large database systems, SQL Server takes care of the back-end and expects you to use another tool, such as Visual Studio, to develop the front-end.