Abstract
The purpose of a data leakage prevention system is to protect corporate information assets. The system monitors the packet exchanges between internal systems and the Internet, filters packets according to the data security policy defined by each company, or discretionarily deletes important data included in packets in order to prevent leakage of corporate information. However, the problem arises that the system may monitor employees' personal information, thus allowing their privacy to be violated. Therefore, it is necessary to find not only a solution for detecting leakage of significant information, but also a way to minimize the leakage of internal users' personal information. In this paper, we propose two models for representing the level of personal information disclosure during data leakage detection. One model measures only the disclosure frequencies of keywords that are defined as personal data. These frequencies are used to indicate the privacy violation level. The other model represents the context of privacy violation using a private data matrix. Each row of the matrix represents the disclosure counts for personal data keywords in a given time period, and each column represents the disclosure count of a certain keyword during the entire observation interval. Using the suggested matrix model, we can represent an abstracted context of the privacy violation situation. Experiments on the privacy violation situation to demonstrate the usability of the suggested models are also presented.