«Brandenburg University of Applied Sciences P.O. Box 2132, 14737 Brandenburg, Germany grebhahn Department of Technical and Business ...»
Secure Deletion: Towards Tailor-Made Privacy in Database
Alexander Grebhahn1, Martin Sch¨ ler2, Veit K¨ ppen2
Brandenburg University of Applied Sciences
P.O. Box 2132, 14737 Brandenburg, Germany
Department of Technical and Business Information Systems
Otto-von-Guericke University Magdeburg
P.O. Box 4120, 39016 Magdeburg, Germany
Abstract: In order to ensure a secure data life cycle, it is necessary to delete sensitive data in a forensic secure way. Current state of the art in common database systems is not to provide secure deletion at all. There exist academic demonstrators that address some aspects of secure deletion. However, they are limited to their deletion approach.
We argue, due to different data sensitivity levels (probably even on attribute level) and differences in policies (e.g., time when and how a data item has to be deleted), it is necessary to have a standardized, user deﬁned opportunity to enforce secure data deletion in a forensic secure manner. Our literature analysis reveals that most approaches are based on overwriting the data. Thus, in this paper, we examine how it is possible to integrate user-deﬁned overwriting procedures to allow a customizable deletion process based on existing default interfaces to minimize the integration overhead. In general, we propose an extension of SQL and a page propagation strategy allowing the integration of a user deﬁned deletion procedure.
1 Introduction The amount of data stored in computer-aided systems increases continuously. This inherits challenges with respect to privacy of sensitive data. Laws and guidelines exist for handling private information, like the HIPAA [Con96] or the Directive 2006/24/EC of the European Parliament [Eur06]. Forensic secure data deletion is an important privacy challenge. For instance, due to mentioned EU directive, it is only allowed to store data for a ﬁxed period of time. After this, data have to be depersonalized or removed. Consequently, to ensure a forensic secure life cycle data have to be deleted in a forensic secure way [DW10].
Additionally, it is necessary to use a system that supports access control, additional security mechanisms (e.g., public key infrastructures), and recovery components, to store sensitive data. Because database systems support these requirements, they are a good choice for storing sensitive data. That is why we focus on database systems, in this paper.
Using a database system, new challenges for secure deletion arise. These challenges are due to copies of original data, for instance, in backups, roll back segments used in transaction management, or due to implementation details of database operations. To quantify the impact of these challenges or to enable forensic investigations in database system, there have been several examinations of existing database systems (e.g., [Fow08]), revealing that there is currently no secure deletion at all. Thus, there are several approaches proposed in literature that are mainly based on overwriting the data in a predeﬁned way that is limited to this approach and is enforced for all data in the system (e.g., [SML07]).
However, there are use cases, where it is not useful to delete all data stored in a database in a forensically secure way, apply alternative overwriting patterns, or to delete speciﬁc data after different time slots. Typical reasons are different sensitivities of data, performance issues, new research results, or changing guidelines for secure deletion. As a result, it is necessary to have a choice to enforce user-deﬁned, ﬁne-grained data-privacy constraints that state the way of forensic secure deletion.
In this paper, we examine: How to integrate such user-deﬁned, ﬁne-grained data-privacy constraints based on existing standard interfaces? Moreover, we want to provide a solution that optimally introduces no or a limited performance overhead.
The remainder of this paper is structured as follows: In Section 2, we present background information of deleting data from a database system. Beside this, we give an overview of different data artifacts that remains in the database system. In Section 3, we give an overview of related work addressing forensic secure deletion from a hard disk and present results and limitations of existing prototypes. In Section 4, we present some SQL extensions to have tailor-made data privacy within a database system. Additionally, we present requirements on the propagation strategy used in a database system to forensic deleting data. Within this part, we also present some possibilities to improve performance of this strategy. Finally, we conclude our paper and present future work.
In the following, we list known challenges for forensic secure deletion from a database system to provide an overview of the complexity of this topic. Major challenges are hidden direct copies of data items or that some parts of the data are used (possibly in aggregated form) to create speciﬁc data structures out of the database system: For instance, such as indexes to improve performance of the overall system. Last, derived from the presented challenges, we present challenges, that are not in the scope of this paper.
2.1 Deletion of tuples from the database itself
With database, we mean the physical representation of data on persistent storage (e.g., hard disk) in the format of the database system. This is the obvious copy, we have to remove.
However, in databases, supporting SQL, there are several operations that, in fact, require deletion of a data item. Examples for these operations are: DELETE, DROP TABLE, TRUNCATE TABLE, VACUUM, and UPDATE. In the following, we unify our explanations for DELETE, DROP TABLE, TRUNCATE TABLE, since the last two operations enforce a deletion of all tuples equivalent to a DELETE FROM table command.
Pages in databases. Before we start to discuss details of the implementation of the single operations, we explain pages. In databases, a page is an abstraction of a physical segment of the persistent storage and the elementary unit that can be read or written. Consequently, whenever the system needs to access a single tuple the whole page is read. Hence, all our explanations are based on this elementary unit. In Figure 1.(a), we give an example for such a page. In our example, tuples are inserted from bottom to top.
DB Slacks: Deleting a tuple. In databases, deleting a tuple usually means setting a delete bit such as in SQL Server [Fow08], Oracle 10g [Lit07a], MySQL 5.1.32 using the InnoDB storage engine [FHMW10, SML07], or PostgreSQL [Gre12]. In Figure 1.(b), we exemplary delete Tuple 4 from Figure 1.(a) causing no change of the original tuple, but a modiﬁcation of the delete bit. Hence, with simple analysis of databases and basic knowledge on database internals, we can reconstruct the deleted data item [SML07, Gre12, FHMW10, Lit07a]. According to Stahlberg et al., we refer to these remains of a tuple as database (DB) Slack in the sequel.
DB Slacks: Updating a tuple. By updating a tuple, two different modiﬁcations of the page the item is located on can occur [SML07]. Firstly, the old version may be overwritten by its newer version (Figure 1.(c)). Secondly, the new version of the data item is stored at another physical storage segment (Figure 1.(d)). This new location is not necessarily at the same page. As result of the second alternative, the old version of the data item still exists after updating it. Even with the ﬁrst behavior, information of the old version may remain on the disk, if the new version of the data item needs less space than the old one. In many major database systems, such as PostgreSQL1, MySQL using the InnoDB storage engine, the second variant because they use the multi-version concurrency control protocol [SML07].
FS Slacks: The Vacuum command. The existence of a large amount of DB Slacks consumes unnecessary disk space and reduces overall performance of database systems, because there are deleted data items on pages which have to be read/written. To overcome this challenge there are operations like VACUUM (e.g., in PostgreSQL) that minimizes the space used by database systems. When performing this operation, ﬁrstly, active tuples are reorganized to minimize the overall number of needed pages. Secondly, disk space of no more needed pages is returned to the ﬁle system. We call these pages and respective tuples according to Stahlberg et al. [SML07] ﬁles system (FS) Slacks. To sum up, to delete a tuple in a forensic secure way, the database system either has to prevent or to be aware of all copies of a tuple that exist in DB Slacks or FS Slacks. If these copies exist, they have to be deleted too.
2.2 Hidden copies in database systems
Besides database, metadata of a database system and the log ﬁle have to be considered.
In general, metadata information can be divided in two groups [Gre12]. The ﬁrst group are data independent information not relevant for secure deletion. Examples for this group are table schemata or integrity constraints. The second group, however, data dependent information relevant for secure deletion. Examples for the second group are histograms and indexes.
Histograms. In contrast to the database itself, metadata do not necessarily contain a whole copy of a data item. For example, in a histogram only information of one column of data items is stored. In general, considering histograms, it is only possible to retrieve information in which range, the value of a deleted tuple was located. Nevertheless, in some database system (e.g. SQL Server [Fow08]), the values of existing tuples are used 1 http://www.postgresql.org/docs/9.1/static/storage-vm.html to set the border of two histogram buckets. Thus, by deleting the tuple, it is necessary to modify the border of the buckets.
Indexes. Moreover, index structures have to be considered, too. One of the most famous index structure is the B+ -Tree. In a B+ -Tree, values of tuples are used as decision criterion for searching in the tree. That means, deleting these values from the database, it is necessary to modify the structure of the tree, too. In worst case, this reconstruction can cause a complete new construction of the tree.
Materialization of intermediate results. Some database systems store additional data dependent information. Examples for this are intermediate results, stored for speed up query performance. A database system storing this information is MonetDB [IKNG09].
From this information, conclusions about values of deleted data items can be drawn. As a result, it is necessary to consider this information by forensic deleting data items.
Database log. Finally, database logs have to be considered. Here, it is necessary to differentiate whether a logical or a physical log protocol is used. By using a physical log protocol, for every modiﬁcation a BEFORE image of the unmodiﬁed data item is stored.
In contrast, in a logical protocol, functions are stored which describe modiﬁcations to be executed. At this point, one has to be aware, such functions cannot only modify one item [BHG87]. As a result, different solutions have to be used for the different protocols.
2.3 Challenges beyond the scope of this paper.
Besides the challenges we introduced so far, there are several effects that raise the difﬁculty for forensic secure deletion. However, these challenges are, for instance, due to operations system behavior and thus, it is hardly possible to address them by means of a database
system. Although, these challenges are important, they are beyond the scope of this paper:
• Swap main-memory to persistent storage.
• Copying a page to a different location on the persistent storage device like on SSDs [WGSS11].
• Treatment of copies in backups, for instance, stored magnetic tapes.
• Reconstruction of the data through the use of microscopes like in [WKSR08] 3 Analysis of related work In this paper, we do not want to provide a new approach how to delete data in a speciﬁc database, but abstraction from current solutions to provide a more general approach.
Furthermore, we want to introduce a general solution that can be integrated into existing database systems due to the modiﬁcation of default interfaces. Therefore, we analyze existing approaches to identify similarities to provide such a generalized solution as well as to point out shortcomings of the state of the art to motivate the necessity of our new approach. As a consequence, we ﬁrst present related work on forensic secure deletion from a hard disk in general.
Forensic secure deletion from hard disk Deleting data in a database system means removing the data traces from persistent memory (e.g., HDD). Thus, we have to consider related work on secure deletion from disk as well. There are approaches to physically destroy the storage medium and modiﬁcations that affect neighboring storage cells that shall not be removed. However, these approaches are infeasible for our purpose. Moreover, cryptographic approaches rely on keeping the encryption key secret and there may be a successful attack to break the encryption in the near future. Hence, we do not consider these approaches. Finally, there has been a lot of research how to overwrite data in a way it cannot be restored (e.g., [Gut96, WKSR08, DW10]). Since this approach allows secure deletion without destroying the storage medium, and is possible to implement with means of the database system as shown by Stahlberg [SML07] and demonstrated in our own work [Gre12], we consider it as the appropriate technique for forensic secure deletion. However, there are different results on how to overwrite data including the number of overwrites (PASSES) and bit sequences (PATTERN) that have to be applied [DW10].
For instance, Gutman suggests performing 35 PASSES using predeﬁned and random PATTERN. In contrast to this, Wright et al. proposes, that one overwriting pass with random data is adequate for secure deletion [WKSR08].
Database speciﬁc forensic analysis Within the last years, a lot of work was done to analyze, which information from a database system can be used to allow forensic examinations.