The Five Safes

Effective decision-making for data and risk management

The Five Safes is a framework for helping make decisions about making effective use of data which is confidential or sensitive. The Five Safes model also places statistical disclosure control (SDC) in its proper context, as part of a system approach to data security.

Although mainly aimed at public sector managers of statistical/research data, the messages in here should be relevant to all data owners, data scientists, and data users.

What is the 'Five Safes' framework?

The Five Safes breaks down the decisions surrounding data access and use into five related but separate dimensions:
Safe projectsIs this use of the data appropriate, lawful, ethical and sensible?
Safe peopleCan the userd be trusted to use it in an appropriate manner?
Safe dataDoes the data itself contain sufficent information to allow confidentialty to be breached??
Safe settingsDoes the access facility limit unauthorised use or mistakes?
Safe outputsIs the confidentiality maintained for the outputs of the management regime?

A data management solution considers all of these, but may decide to make some elements 'safer' than others. The point is that, overall, the design protects confidentiality through a mix of statistical and non-statistical solutions.

Another way to see this is to think about control: which elements can you control, what do you want to control, and what is beyond your reach? For example, making data freely avaiable on the web implies that you have no control over why it is used, by whom, where, and to what end; therefore, the only thing you can do is control the detail in the data itself. In contrast, conisder allowing users under licence to view data over a secure link; in this case, you have much more control and so you can share data with a higher level of detail. Both of these are (B)safe use, what some authors call 'functional anonymisation'. They are both allowable under most legislation, but they inolve different costs, benefits, and compromises with the user.

The term 'safe' can be confusing. These are dimensions of looking at the problem, not limits that need to be met. For example, if you were sure that the data would only be accessed in a very controlled environment by very trustworthy users, then the data could be very detailed, but describing this as 'unsafe' is likely to be a presnetational disaster. Hence, some organisaitons use differnt terms - the 'five data sharing priniciples' of the Austrlian government is much more sensible. However, as 'five safes' is a very search-friendly term, we stick with it.

The Five Safes in context, and this website

The Five Safes is a practical aid for decision making, but it is there to clarify discussion. A normative framework is the EDRU (evidence-based, default-open, risk managed, user-centred) model which suggests useful attitudes to adopt. Together the two provide a modern approach to confidential data management focusing on delivery, principles and attitudes. Hence, this website covers the full range of concepts emobdied in the modern approach. It is, however, worth noting that the Five Safes can be used without needing the other EDRU elements.

Other concepts associated with Five Safes thinking include 'principles-based output statistical disclosure control', 'active researcher management', 'safe statistics', and the 'data access spectrum'. These can be employed independnetly of any Five Safes plan (and often are) but the ethos underlying them is very similar to Five Safes thinking; and the Five Safes provides a handy way of explaining how different elements fit together. Hence, this website brings together research on a lot of related issues so that a holistic picture of risk management can be taken.

There is a large literature on aspects of project management (ethics, law and consent) and IT environments. We do not provide a comprehensive guide to these here, but instead focus on showinh how thsese elements can fit together, and how asking certain questions can lead to substantially different answers. There is, at present, lretaviely literature on the user dimension.

There is a very large academic literature on 'statistical disclosure control' (SDC): preserving confidentiality by either reducing the risk in the data (input SDC), or in the outputs (output SDC). Input SDC has been a well-established field for many years, and this site provides only a link to a few key references. However, it does pull together the literature on how to employ input SDC effectively, aprticular by moving away fro the hypothetical worst-case planning. Output SDC in contrast is a very small research field, with the exception of simple tabular outputs. This site brings together all the relevant concepts.

User input

This site is designed to be a useful reference, and so we are grateful for suggestion to improve clarity, add links, or provide more/clearer context.

next level: