Towards Privacy-Preserving Data Mining in Law Enforcement
| Author | Stijn Vanderlooy, Joop Verbeek, and Jaap van den Herik |
| Position | Maastricht ICT Competence Centre - Institute for Knowledge and Agent Technology Maastricht University, Maastricht, The Netherlands |
| Pages | 202-210 |
Page 202
Recent advances in information and communication technology have made data easy to use and cheap to store and exchange. Databases across the world contain large amounts of data of various types. We mention digitized text documents, video and audio files, and financial transactions. The so-called data explosion is also apparent in the domain of law enforcement. After new knowledge has been extracted from data, an information position can be established by which a more effective and efficient execution of law enforcement is possible than before. The new form of law enforcement guided by means of data analysis is known as intelligence led policing (Cope, 2004; Tilley, 2005).
It is non-trivial to extract knowledge from data. Yet, many believe that data mining will enable us to obtain new knowledge in a reasonable amount of time, so that law enforcement can be adequately executed on a tactical, strategical, and operational level (Yen & Popp, 2005). Obviously, data mining provides a variety of tools designed to analyze the available data automatically, i.e. by computer. However, we argue that a serious problem arises when data mining is applied in law enforcement: the output of data mining is not reliable in the sense that mistakes are made. These mistakes may have serious consequences, e.g., violations of privacy when personal data are involved. Clearly, the legal consequences of mistakes are severe and therefore not affordable. To make the problem even worse, it is unknown how many mistakes will be made before the data mining is actually applied. Since the output of data mining cannot be shown to be reliable, there is no ground for making legally correct decisions (Groothuis & Svensson, 2000). It follows that it is important to consider the reliability of the data- mining output in legislation.
The remainder of this paper is organized as follows. In Section 2 we discuss the necessity of knowledge in law enforcement and we introduce two types of investigations that we use throughout the paper. In Section 3 we focus on data mining and we comment on the problem of inevitable mistakes. In Section 4 we review a recently introduced approach for data mining, called the ROC isometrics approach, that is proven to produce reliable outputs in the sense that we can set the number of mistakes before the data mining is actually applied (Vanderlooy et al., 2006). In Section 5, we use a juridical framework to discuss the implications of the approach. Consequently, we provide several recommendations for upcoming legislations that try to reach a balance between new possibilities of automatic data analysis and the protection of civilians' privacy. In Section 6 we give our conclusions.
Law-enforcement agencies store large amounts of data that need to be analyzed in order to find previously unknown and relevant knowledge. This knowledge is used to establish and maintain an information position. However, obvious legal questions arise concerning (1) which data may be analyzed, (2) in which situations, and (3) for which goals.
In Subsection 2.1 we show that knowledge is a necessity for law enforcement in order to fulfil its tasks appropriately. In Subsection 2.2 we distinguish two types of investigation by means of data analysis. We provide a Page 203 systematic comparison of the investigation types, relate them to different law-enforcement tasks, and focus on some legal questions that will naturally arise.
The main task of law enforcement is to secure legal order and to provide assistance to civilians in need (Shavell, 2003; Newburn, 2005). Securing legal order implies two subtasks. First, public safety should be established and maintained to reduce the growing unsafe feelings of civilians. Second, crimes should be tracked down and the offenders should be prosecuted subsequently. Executing the second subtask significantly contributes to executing the first subtask, e.g., terrorism aims at causing fear and disruption, and therefore unravelling upcoming terrorist plots strongly increases public safety.
One of the most important activities of law-enforcement agencies is the investigation of suspicions and clues about persons and events. Investigation can result in the prevention of crime in two distinct ways. First, the suspicions and / or clues are correct and the law-enforcement agency responded sufficiently fast to stop the crime before it has been committed. Second, the investigation did not lead to an early termination of the specific crime, but it resulted in new knowledge that is used to manage activities more efficiently and effectively than was previously possible. The latter is called proactive investigation and assumes that sufficient data is available to be analyzed (Thibault et al., 2006). Analysis has to be done automatically due to the large amounts of data. The obtained knowledge is necessary to establish and maintain a good information position. This is the motivation behind intelligence led policing: a good information position enables law enforcement to prevent crimes and reduce risks of potential dangers (Cope, 2004; Tilley, 2005).
Knowledge extraction by means of automatic data analysis can be performed (1) in various ways, (2) using various types of data, and (3) for various investigation goals. With respect to the investigation goals, we distinguish between investigation aimed to obtain knowledge for solving specific cases and investigation aimed to obtain any knowledge that leads to new investigations or contributes to the execution of current specific investigations. We call the former goal-oriented investigation and the latter global-oriented investigation.Table 1 summarizes the differences between both types of investigation.
[NO INCLUYE TABLA]
A goal-oriented investigation takes place when there are strong suspicions or clues concerning a specific event, person, or small groups of events and / or persons. Hence, this type of investigation has a temporary character. It aims at improving a specific information position and it is clearly most useful on the operational level of law enforcement. An example is the investigation of a murder case. The available data to be analyzed naturally consists out of relevant facts for the specific investigation and therefore disproportional privacy violations are not likely to occur. This is in contrast to the global-oriented investigation where such privacy violations are likely to occur. The global-oriented investigation has a permanent character since it aims at improving and maintaining a general information position. The investigation is not oriented towards a specific case or person, and therefore there are no strong suspicions and clues that can be verified. Consequently, the available data to be analyzed contain facts about persons and events that are not related to known crimes. For example, assume for a moment that the global-oriented investigation resulted in knowledge about an organized network of human smugglers. Persons involved in such networks operate across the borders of nations and take advantage of economic corruptibility and conflicts in certain regions. There are almost no reports about human smuggling filed in law- enforcement agencies due to the secrecy of human smuggling and the intimidation of victims (Dutch Upper Chamber, 2007c). So, specific data is not available and knowledge about human smuggling networks should therefore be extracted by analyzing a wider range of data. Care is nonetheless necessary since the legal question arises whether it is allowed to analyze the data for general law-enforcement purposes. This is a difficult but important question to answer since without global-oriented investigation it is virtually impossible to establish and Page 204 maintain a good information position. Finally, we note that the results of a global-oriented investigation can lead to a goal-oriented investigation.
It follows that it is challenging to protect the civilians' privacy right, in particular if a global-oriented investigation takes place. Hence, legislation should find the ideal balance between (1) an effective automatic data analysis and (2) the protection of the privacy of civilians. In the next section, we focus on data mining to analyze data and emphasize that the reliability of the data-mining output is most important to consider in legislation.
The collection of data and the extraction of new knowledge from data is a significant economic and political activity. The...
Get this document and AI-powered insights with a free trial of vLex and Vincent AI
Get Started for FreeUnlock full access with a free 7-day trial
Transform your legal research with vLex
-
Complete access to the largest collection of common law case law on one platform
-
Generate AI case summaries that instantly highlight key legal issues
-
Advanced search capabilities with precise filtering and sorting options
-
Comprehensive legal content with documents across 100+ jurisdictions
-
Trusted by 2 million professionals including top global firms
-
Access AI-Powered Research with Vincent AI: Natural language queries with verified citations
Unlock full access with a free 7-day trial
Transform your legal research with vLex
-
Complete access to the largest collection of common law case law on one platform
-
Generate AI case summaries that instantly highlight key legal issues
-
Advanced search capabilities with precise filtering and sorting options
-
Comprehensive legal content with documents across 100+ jurisdictions
-
Trusted by 2 million professionals including top global firms
-
Access AI-Powered Research with Vincent AI: Natural language queries with verified citations
Unlock full access with a free 7-day trial
Transform your legal research with vLex
-
Complete access to the largest collection of common law case law on one platform
-
Generate AI case summaries that instantly highlight key legal issues
-
Advanced search capabilities with precise filtering and sorting options
-
Comprehensive legal content with documents across 100+ jurisdictions
-
Trusted by 2 million professionals including top global firms
-
Access AI-Powered Research with Vincent AI: Natural language queries with verified citations
Unlock full access with a free 7-day trial
Transform your legal research with vLex
-
Complete access to the largest collection of common law case law on one platform
-
Generate AI case summaries that instantly highlight key legal issues
-
Advanced search capabilities with precise filtering and sorting options
-
Comprehensive legal content with documents across 100+ jurisdictions
-
Trusted by 2 million professionals including top global firms
-
Access AI-Powered Research with Vincent AI: Natural language queries with verified citations
Unlock full access with a free 7-day trial
Transform your legal research with vLex
-
Complete access to the largest collection of common law case law on one platform
-
Generate AI case summaries that instantly highlight key legal issues
-
Advanced search capabilities with precise filtering and sorting options
-
Comprehensive legal content with documents across 100+ jurisdictions
-
Trusted by 2 million professionals including top global firms
-
Access AI-Powered Research with Vincent AI: Natural language queries with verified citations