• Title/Summary/Keyword: 시간 마이닝

Search Result 400, Processing Time 0.031 seconds

A study on NLP Text Preprocessing for digital forensic investigation (디지털 포렌식 조사를 위한 NLP의 텍스트 전처리 연구)

  • Lee, Sung-won;Kim, Dohyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.189-191
    • /
    • 2022
  • In modern society, messenger services are necessary to communication with others, and criminals are no exception. In representative cases of Burning Sun Gate(2018) and NthRoom(2019), messenger data analysis was used as a smoking gun to solve these criminal cases. Therefore messenger text analytics is critical for the resolution of crimes in a modern environment. also, it takes a lot of time to analyze messenger data in the digital forensic investigation process, so researchers in text mining need to be more effective to respond with the current situation In this paper, we study various natural language preprocessing(NLP) methods according to the characteristics of instant messages to effectively proceed with NLP analysis on instant messengers.

  • PDF

Automatic Recommendation on (IP)TV Program schedules in a personalized way using sequential pattern mining (순차 패턴 마이닝 기법을 이용한 개인 맞춤형 (IP)TV 프로그램 스케줄 자동 추천 -프로그램 시청 시간의 정량적 정보를 고려한 패턴 추출 및 개인 선호도 정보 추출을 통한 스케줄 추천 시스템-)

  • Pyo, Shin-Jee;Kim, Eun-Hui;Kim, Mun-Churl
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.105-110
    • /
    • 2009
  • Conventional TV viewing environment had provided limited numbers of channels and contents so that accessibility of contents was made user's manual change of TV channels and by manual selection of TV program contents. However, with advent of IPTV and various contents and channels available to users’ terminals, excessive numbers of TV contents become available to users’ terminals, thus leading to totally different TV viewing environments. In this TV environment, users are required to make much effort to choose their preferred TV channels or program contents, which becomes much cumbersome to the users. Therefore, in this paper, we will propose TV contents schedule recommendation by making reasoning on users’ TV viewing patterns from TV viewing history data using sequential pattern mining so that so that it increases accessibility of users to many TV program contents which may be or may not be aware of the users.

  • PDF

Fast K-Means Clustering Algorithm using Prediction Data (예측 데이터를 이용한 빠른 K-Means 알고리즘)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.1
    • /
    • pp.106-114
    • /
    • 2009
  • In this paper we proposed a fast method for a K-Means Clustering algorithm. The main characteristic of this method is that it uses precalculated data which possibility of change is high in order to speed up the algorithm. When calculating distance to cluster centre at each stage to assign nearest prototype in the clustering algorithm, it could reduce overall computation time by selecting only those data with possibility of change in cluster is high. Calculation time is reduced by using the distance information produced by K-Means algorithm when computing expected input data whose cluster may change, and by using such distance information the algorithm could be less affected by the number of dimensions. The proposed method was compared with original K-Means method - Lloyd's and the improved method KMHybrid. We show that our proposed method significantly outperforms in computation speed than Lloyd's and KMHybrid when using large size data which has large amount of data, great many dimensions and large number of clusters.

Location Generalization of Moving Objects for the Extraction of Significant Patterns (의미 패턴 추출을 위한 이동 객체의 위치 일반화)

  • Lee, Yon-Sik;Ko, Hyun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.1
    • /
    • pp.451-458
    • /
    • 2011
  • In order to provide the optimal location based services such as the optimal moving path search or the scheduling pattern prediction, the extraction of significant moving pattern which is considered the temporal and spatial properties of the location-based historical data of the moving objects is essential. In this paper, for the extraction of significant moving pattern we propose the location generalization method which translates the location attributes of moving object into the spatial scope information based on $R^*$-tree for more efficient patterning the continuous changes of the location of moving objects and for indexing to the 2-dimensional spatial scope. The proposed method generates the moving sequences which is satisfied the constraints of the time interval between the spatial scopes using the generalized spatial data, and extracts the significant moving patterns using them. And it can be an efficient method for the temporal pattern mining or the analysis of moving transition of the moving objects to provide the optimal location based services.

A Technique for Detecting Companion Groups from Trajectory Data Streams (궤적 데이터 스트림에서 동반 그룹 탐색 기법)

  • Kang, Suhyun;Lee, Ki Yong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.12
    • /
    • pp.473-482
    • /
    • 2019
  • There have already been studies analyzing the trajectories of objects from data streams of moving objects. Among those studies, there are also studies to discover groups of objects that move together, called companion groups. Most studies to discover companion groups use existing clustering techniques to find groups of objects close to each other. However, these clustering-based methods are often difficult to find the right companion groups because the number of clusters is unpredictable in advance or the shape or size of clusters is hard to control. In this study, we propose a new method that discovers companion groups based on the distance specified by the user. The proposed method does not apply the existing clustering techniques but periodically determines the groups of objects close to each other, by using a technique that efficiently finds the groups of objects that exist within the user-specified distance. Furthermore, unlike the existing methods that return only companion groups and their trajectories, the proposed method also returns their appearance and disappearance time. Through various experiments, we show that the proposed method can detect companion groups correctly and very efficiently.

Iceberg Query Evaluation Technical Using a Cuboid Prefix Tree (큐보이드 전위트리를 이용한 빙산질의 처리)

  • Han, Sang-Gil;Yang, Woo-Sock;Lee, Won-Suk
    • Journal of KIISE:Databases
    • /
    • v.36 no.3
    • /
    • pp.226-234
    • /
    • 2009
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to the characteristics of a data stream, it is impossible to save all the data elements of a data stream. Therefore it is necessary to define a new synopsis structure to store the summary information of a data stream. For this purpose, this paper proposes a cuboid prefix tree that can be effectively employed in evaluating an iceberg query over data streams. A cuboid prefix tree only stores those itemsets that consist of grouping attributes used in GROUP BY query. In addition, a cuboid prefix tree can compute multiple iceberg queries simultaneously by sharing their common sub-expressions. A cuboid prefix tree evaluates an iceberg query over an infinitely generated data stream while efficiently reducing memory usage and processing time, which is verified by a series of experiments.

A Technique to Detect Change-Coupled Files Using the Similarity of Change Types and Commit Time (변경 유형의 유사도 및 커밋 시간을 이용한 파일 변경 결합도)

  • Kim, Jung Il;Lee, Eun Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.2
    • /
    • pp.65-72
    • /
    • 2014
  • Change coupling is a measure to show how strongly change-related two entities are. When two source files have been frequently changed together, they are regarded as change-coupled files and they will probably be changed together in the near future. In the previous studies, the change coupling between two files is defined with the number of common changed time, that is, common commit time of the files. However, the frequency-based technique has limitations because of 'tangled changes', which frequently happens in the development environments with version control systems. The tangled change means that several code hunks have been changed at the same time, though they have no relation with each other. In this paper, the change types of the code hunks are also used to define change coupling, in addition to the common commit time of target files. First, the frequency vector based on change types are defined with the extracted change types, and then, the similarity of change patterns are calculated using the cosine similarity measure. We conducted experiments on open source project Eclipse JDT and CDT for case studies. The result shows that the applicability of the proposed method, compared to the previous studies.

The Stream of Uncertainty in Scientific Knowledge using Topic Modeling (토픽 모델링 기반 과학적 지식의 불확실성의 흐름에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.1
    • /
    • pp.191-213
    • /
    • 2019
  • The process of obtaining scientific knowledge is conducted through research. Researchers deal with the uncertainty of science and establish certainty of scientific knowledge. In other words, in order to obtain scientific knowledge, uncertainty is an essential step that must be performed. The existing studies were predominantly performed through a hedging study of linguistic approaches and constructed corpus with uncertainty word manually in computational linguistics. They have only been able to identify characteristics of uncertainty in a particular research field based on the simple frequency. Therefore, in this study, we examine pattern of scientific knowledge based on uncertainty word according to the passage of time in biomedical literature where biomedical claims in sentences play an important role. For this purpose, biomedical propositions are analyzed based on semantic predications provided by UMLS and DMR topic modeling which is useful method to identify patterns in disciplines is applied to understand the trend of entity based topic with uncertainty. As time goes by, the development of research has been confirmed that uncertainty in scientific knowledge is moving toward a decreasing pattern.

In-memory Compression Scheme Based on Incremental Frequent Patterns for Graph Streams (그래프 스트림 처리를 위한 점진적 빈발 패턴 기반 인-메모리 압축 기법)

  • Lee, Hyeon-Byeong;Shin, Bo-Kyoung;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.35-46
    • /
    • 2022
  • Recently, with the development of network technologies, as IoT and social network service applications have been actively used, a lot of graph stream data is being generated. In this paper, we propose a graph compression scheme that considers the stream graph environment by applying graph mining to the existing compression technique, which has been focused on compression rate and runtime. In this paper, we proposed Incremental frequent pattern based compression technique for graph streams. Since the proposed scheme keeps only the latest reference patterns, it increases the storage utilization and improves the query processing time. In order to show the superiority of the proposed scheme, various performance evaluations are performed in terms of compression rate and processing time compared to the existing method. The proposed scheme is faster than existing similar scheme when the number of duplicated data is large.

A Method for Optimal Moving Pattern Mining using Frequency of Moving Sequence (이동 시퀀스의 빈발도를 이용한 최적 이동 패턴 탐사 기법)

  • Lee, Yon-Sik;Ko, Hyun
    • The KIPS Transactions:PartD
    • /
    • v.16D no.1
    • /
    • pp.113-122
    • /
    • 2009
  • Since the traditional pattern mining methods only probe unspecified moving patterns that seem to satisfy users' requests among diverse patterns within the limited scopes of time and space, they are not applicable to problems involving the mining of optimal moving patterns, which contain complex time and space constraints, such as 1) searching the optimal path between two specific points, and 2) scheduling a path within the specified time. Therefore, in this paper, we illustrate some problems on mining the optimal moving patterns with complex time and space constraints from a vast set of historical data of numerous moving objects, and suggest a new moving pattern mining method that can be used to search patterns of an optimal moving path as a location-based service. The proposed method, which determines the optimal path(most frequently used path) using pattern frequency retrieved from historical data of moving objects between two specific points, can efficiently carry out pattern mining tasks using by space generalization at the minimum level on the moving object's location attribute in consideration of topological relationship between the object's location and spatial scope. Testing the efficiency of this algorithm was done by comparing the operation processing time with Dijkstra algorithm and $A^*$ algorithm which are generally used for searching the optimal path. As a result, although there were some differences according to heuristic weight on $A^*$ algorithm, it showed that the proposed method is more efficient than the other methods mentioned.