Data Mining in the Context of Legality, Privacy, and Ethics
- July 29, 2023
- Posted by: RSIS
- Categories: Computer Science and Engineering, IJRSI
Data Mining in the Context of Legality, Privacy, and Ethics
Amos Okomayin1, Tosin Ige2, Abosede Kolade3
1Department of Computer Science, Middlesex University, London. United Kingdom
2Department of Computer Science, University of Texas at El Paso, Texas, USA
3Department of Marketing and Bus., Texas A&M University, Commerce Texas, USA
DOI: https://doi.org/10.51244/IJRSI.2023.10702
Received: 06 June 2023; Revised: 25 June 2023; Accepted: 01 July 2023; Published: 30 July 2023
Abstract: Data mining possess a significant threat to ethics, privacy, and legality, especially when we consider the fact that data mining makes it difficult for an individual or consumer (in the case of a company) to control accessibility and usage of his data. Individuals should be able to control how his/ her data in the data warehouse is being access and utilize while at the same time providing enabling environment which enforces legality, privacy and ethicality on data scientists, or data engineer during data mining process. This paper review issues of legality, privacy, and ethicality in data mining, review processes of Data mining, and also proposes solution to current ethical and privacy issue in data mining. It introduces a new method which enforces data mining without infringing on the privacy of individual or consumer whose data are being used. The sole aim of this paper is to propose a new method of mining data which restricts scientists within the constraints of legality, privacy, and ethicality.
I. Introduction
In an ethical sense, database security is closely related to privacy as it inhibits the unauthorized dissemination of personal data thus further enhancing, albeit indirectly, an individual’s capacity to regulate access to their data. When data can be viewed from many angles and at abstraction levels, it threatens the goal of protecting data security and guarding against the invasion of privacy (Ige & Adewale 2022a). It is important to study when knowledge discovery may lead to an invasion of privacy, and what security measures can be developed for preventing the disclosure of sensitive information (Chen et al. 1996). The development of data warehouses has increased the importance of database security. Prior to this, data were typically held in separate databases to which access was controlled and limited to people with a specific functional role. Data warehouses bring together data from multiple sources and therefore more complex factors need to be considered when establishing security measures. In terms of database security, two forms of mining operation need to be considered:
1. Those operating as authorized applications by an individual or organization that owns and has full access to the data.
2. Those operating as unauthorized applications by an individual or organization that has access to the data only insomuch as has been permitted for other allowable purposes. Note that an individual need not be external to the organization that owns the data for the second point to occur. Conventional database security protects data via user authorization techniques (O’Leary 1991) making no distinction between the degrees of sensitivity present in the database (Mills 1997). A more sophisticated model, Multi Level Security (MLS), extends conventional security measures by classifying data according to its confidentiality (Elmasri & Navathe 2004, Ige & Adewale 2022b).
This chapter is analysis current model to privacy and ethical issues in data mining, implementation of the model, as well as limitations of the existing model which eventually calls for a new model that eventually guarantee and address problem of legality, ethically, and privacy in a well secured environment for knowledge discovery in database.
1.1 Limitation of existing Model
Firstly, existing data mining techniques involves exportation of database data to a file which can be in SQL, XML, JSON file format, and are then mined by querying the file using programming such as python, R, or SQL. This method has a disadvantage because data in such file format are not dynamic or in real time.
Let assume a database was exported by the database administrator on Sunday November 1, 2020 at 12:00 GMT for data mining, by the time we start the process of data mining like classification, association, clustering, regression, analysis, prediction and so on which can take several hours to days to complete. The record will not had been an updated record as our prediction would had been based on the exported file as of Sunday November 1, 2020 at 12:00 GMT which was when it was exported.