• Title/Summary/Keyword: Data De-Identification

Search Result 120, Processing Time 0.022 seconds

A Study on the de-identification of Personal Information of Hotel Users (호텔 이용 고객의 개인정보 비식별화 방안에 관한 연구)

  • Kim, Taekyung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.12 no.4
    • /
    • pp.51-58
    • /
    • 2016
  • In the area of hotel and tourism sector, various research are analyzed using big data. Big data is being generated by any digital devices around us all the times. All the digital process and social media exchange produces the big data. In this paper, we analyzed the de-identification method of big data to use the personal information of hotel guests. Through the analysis of these big data, hotel can provide differentiated and diverse services to hotel guests and can improve the service and support the marketing of hotels. If the hotel wants to use the information of the guest, the private data should be de-identified. There are several de-identification methods of personal information such as pseudonymisation, aggregation, data reduction, data suppression and data masking. Using the comparison of these methods, the pseudonymisation is discriminated to the suitable methods for the analysis of information for the hotel guest. Also, among the pseudonymisation methods, the t-closeness was analyzed to the secure and efficient method for the de-identification of personal information in hotel.

A Study on De-Identification of Metering Data for Smart Grid Personal Security in Cloud Environment

  • Lee, Donghyeok;Park, Namje
    • Journal of Multimedia Information System
    • /
    • v.4 no.4
    • /
    • pp.263-270
    • /
    • 2017
  • Various security threats exist in the smart grid environment due to the fact that information and communication technology are grafted onto an existing power grid. In particular, smart metering data exposes a variety of information such as users' life patterns and devices in use, and thereby serious infringement on personal information may occur. Therefore, we are in a situation where a de-identification algorithm suitable for metering data is required. Hence, this paper proposes a new de-identification method for metering data. The proposed method processes time information and numerical information as de-identification data, respectively, so that pattern information cannot be analyzed by the data. In addition, such a method has an advantage that a query such as a direct range search and aggregation processing in a database can be performed even in a de-identified state for statistical processing and availability.

De-identification Techniques for Big Data and Issues (빅데이타 비식별화 기술과 이슈)

  • Woo, SungHee
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.750-753
    • /
    • 2017
  • Recently, the processing and utilization of big data, which is generated by the spread of smartphone, SNS, and the internet of things, is emerging as a new growth engine of ICT field. However, in order to utilize such big data, De-identification of personal information should be done. De-identification removes identifying information from a data set so that individual data cannot be linked with specific individuals. De-identification can reduce the privacy risk associated with collecting, processing, archiving, distributing or publishing information, thus it attempts to balance the contradictory goals of using and sharing personal information while protecting privacy. De-identified information has also been re-identified and has been controversial for the protection of personal information, but the number of instances where personal information such as big data is de-identified and processed is increasing. In addition, many de-identification guidelines have been introduced and a method for de-identification of personal information has been proposed. Therefore, in this study, we describe the big data de-identification process and follow-up management, and then compare and analyze de-identification methods. Finally we provide personal information protection issues and solutions.

  • PDF

Research on the development of automated tools to de-identify personal information of data for AI learning - Based on video data - (인공지능 학습용 데이터의 개인정보 비식별화 자동화 도구 개발 연구 - 영상데이터기반 -)

  • Hyunju Lee;Seungyeob Lee;Byunghoon Jeon
    • Journal of Platform Technology
    • /
    • v.11 no.3
    • /
    • pp.56-67
    • /
    • 2023
  • Recently, de-identification of personal information, which has been a long-cherished desire of the data-based industry, was revised and specified in August 2020. It became the foundation for activating data called crude oil[2] in the fourth industrial era in the industrial field. However, some people are concerned about the infringement of the basic rights of the data subject[3]. Accordingly, a development study was conducted on the Batch De-Identification Tool, a personal information de-identification automation tool. In this study, first, we developed an image labeling tool to label human faces (eyes, nose, mouth) and car license plates of various resolutions to build data for training. Second, an object recognition model was trained to run the object recognition module to perform de-identification of personal information. The automated personal information de-identification tool developed as a result of this research shows the possibility of proactively eliminating privacy violations through online services. These results suggest possibilities for data-based industries to maximize the value of data while balancing privacy and utilization.

  • PDF

Analysis of k Value from k-anonymity Model Based on Re-identification Time (재식별 시간에 기반한 k-익명성 프라이버시 모델에서의 k값에 대한 연구)

  • Kim, Chaewoon;Oh, Junhyoung;Lee, Kyungho
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.43-52
    • /
    • 2020
  • With the development of data technology, storing and sharing of data has increased, resulting in privacy invasion. Although de-identification technology has been introduced to solve this problem, it has been proved many times that identifying individuals using de-identified data is possible. Even if it cannot be completely safe, sufficient de-identification is necessary. But current laws and regulations do not quantitatively specify the degree of how much de-identification should be performed. In this paper, we propose an appropriate de-identification criterion considering the time required for re-identification. We focused on the case of using the k-anonymity model among various privacy models. We analyzed the time taken to re-identify data according to the change in the k value. We used a re-identification method based on linkability. As a result of the analysis, we determined which k value is appropriate. If the generalized model can be developed by results of this paper, the model can be used to define the appropriate level of de-identification in various laws and regulations.

Re-defining Named Entity Type for Personal Information De-identification and A Generation method of Training Data (개인정보 비식별화를 위한 개체명 유형 재정의와 학습데이터 생성 방법)

  • Choi, Jae-hoon;Cho, Sang-hyun;Kim, Min-ho;Kwon, Hyuk-chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.206-208
    • /
    • 2022
  • As the big data industry has recently developed significantly, interest in privacy violations caused by personal information leakage has increased. There have been attempts to automate this through named entity recognition in natural language processing. In this paper, named entity recognition data is constructed semi-automatically by identifying sentences with de-identification information from de-identification information in Korean Wikipedia. This can reduce the cost of learning about information that is not subject to de-identification compared to using general named entity recognition data. In addition, it has the advantage of minimizing additional systems based on rules and statistics to classify de-identification information in the output. The named entity recognition data proposed in this paper is classified into twelve categories. There are included de-identification information, such as medical records and family relationships. In the experiment using the generated dataset, KoELECTRA showed performance of 0.87796 and RoBERTa of 0.88.

  • PDF

Data Quality Measurement on a De-identified Data Set Based on Statistical Modeling (통계모형의 정확도에 기반한 비식별화 데이터의 품질 측정)

  • Chun, Heuiju;Yi, Hyun Jee;Yeon, Kyupil;Kim, Dongrae
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.5
    • /
    • pp.553-561
    • /
    • 2019
  • In this study, the method of quality measurement for the statistical usefulness of de-identified data was examined in terms of prediction accuracy by statistical modeling. In the era of the 4th industrial revolution, effective use of big data is essential to innovation through information and communication technology, but personal information issues are constrained to actively utilize big data. In order to solve this problem, de-identification guidelines have been established and the possibility of actual re-identification of personal information has become very low due to the utilization of various de-identification methods. On the other hand, strong de-identification can have side effects that degrade the usefulness of the data. We have studied the quality of statistical usefulness of the de-identified data by KLT model which is a representative de-identification method, A case study was conducted to see how statistical accuracy of prediction is degraded by de-identification. We also proposed a new measure of data usefulness of the de-identified data by quantifying how much data is added to the de-identified data to restore the accuracy of the predictive model.

A Study on De-Identification Methods to Create a Basis for Safety Report Text Mining Analysis (항공안전 보고 데이터 텍스트 분석 기반 조성을 위한 비식별 처리 기술 적용 연구)

  • Hwang, Do-bin;Kim, Young-gon;Sim, Yeong-min
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.29 no.4
    • /
    • pp.160-165
    • /
    • 2021
  • In order to identify and analyze potential aviation safety hazards, analysis of aviation safety report data must be preceded. Therefore, in consideration of the provisions of the Aviation Safety Act and the recommendations of ICAO Doc 9859 SMM Edition 4th, personal information in the reporting data and sensitive information of the reporter, etc. It identifies the scope of de-identification targets and suggests a method for applying de-identification processing technology to personal and sensitive information including unstructured text data.

De-identification Policy Comparison and Activation Plan for Big Data Industry (비식별화 정책 비교 및 빅데이터 산업 활성화 방안)

  • Lee, So-Jin;Jin, Chae-Eun;Jeon, Min-Ji;Lee, Jo-Eun;Kim, Su-Jeong;Lee, Sang-Hyun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.2 no.4
    • /
    • pp.71-76
    • /
    • 2016
  • In this study, de-identification policies of the US, the UK, Japan, China and Korea are compared to suggest a future direction of de-identification regulations and a method for vitalizing the big data industry. Efficiently using the de-identification technology and the standard of adequacy evaluation contributes to using personal information for the industry to develop services and technology while not violating the right of private lives and avoiding the restrictions specified in the Personal Information Protection Act. As a counteraction, the re-identification issue may occur, for re-identifying each person as a de-identified data collection. From the perspective of business, it is necessary to mitigate schemes for discarding some regulations and using big data, and also necessary to strengthen security and refine regulations from the perspective of information security.

De-identifying Unstructured Medical Text and Attribute-based Utility Measurement (의료 비정형 텍스트 비식별화 및 속성기반 유용도 측정 기법)

  • Ro, Gun;Chun, Jonghoon
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.1
    • /
    • pp.121-137
    • /
    • 2019
  • De-identification is a method by which the remaining information can not be referred to a specific individual by removing the personal information from the data set. As a result, de-identification can lower the exposure risk of personal information that may occur in the process of collecting, processing, storing and distributing information. Although there have been many studies in de-identification algorithms, protection models, and etc., most of them are limited to structured data, and there are relatively few considerations on de-identification of unstructured data. Especially, in the medical field where the unstructured text is frequently used, many people simply remove all personally identifiable information in order to lower the exposure risk of personal information, while admitting the fact that the data utility is lowered accordingly. This study proposes a new method to perform de-identification by applying the k-anonymity protection model targeting unstructured text in the medical field in which de-identification is mandatory because privacy protection issues are more critical in comparison to other fields. Also, the goal of this study is to propose a new utility metric so that people can comprehend de-identified data set utility intuitively. Therefore, if the result of this research is applied to various industrial fields where unstructured text is used, we expect that we can increase the utility of the unstructured text which contains personal information.