Search | Korea Science

An Update-Efficient, Disk-Based Inverted Index Structure for Keyword Search on Data Streams (데이터 스트림에 대한 키워드 검색을 위한, 효율적인 갱신이 가능한 디스크 기반 역색인 구조)

Park, Eun Ju;Lee, Ki Yong
- KIPS Transactions on Software and Data Engineering
- /
- v.5 no.4
- /
- pp.171-180
- /
- 2016
As social networking services such as twitter become increasingly popular, data streams are widely prevalent these days. In order to search data accumulated from data streams efficiently, the use of an index structure is essential. In this paper, we propose an update-efficient, disk-based inverted index structure for efficient keyword search on data streams. When new data arrive at the data stream, the index needs to be updated to incorporate the new data. The traditional inverted index is very inefficient to update in terms of disk I/O, because all index data stored in the disk need to be read and written to the disk each time the index is updated. To solve this problem, we divide the whole inverted index into a sequence of inverted indices with exponentially increasing size. When new data arrives, it is first inserted into the smallest index and, later, the small indices are merged with the larger indices, which leads to a small amortize update cost for each new data. Furthermore, when indices stored in the disk are merged with each other, we minimize the disk I/O cost incurred for the merge operation, resulting in an even smaller update cost. Through various experiments, we compare the update efficiency of the proposed index structure with the previous one, and show the performance advantage of the proposed structure in terms of the update cost.
https://doi.org/10.3745/KTSDE.2016.5.4.171 인용 PDF KSCI

UHD Video Transcoding System in Cloud Computing Environment (클라우드 기반 UHD 영상 트랜스코딩 시스템)

Moon, Hee-Cheol;Kim, Yong-Hwan;Kim, Dong-Hyeok
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2014.11a
- /
- pp.203-205
- /
- 2014
UHD 영상 콘텐츠는 FHD 영상에 비해 생생하고 더 좋은 고화질의 영상을 제공하지만 영상정보의 데이터 양은 4K UHD 경우 4 배 이상이다. 이러한 초대용량의 UHD 영상을 기존의 병렬/분산 처리를 이용하여 비디오 코딩 한다면 UHD 의 초대용량 특성으로 인하여 연산량 부하가 발생하게 된다. 따라서 UHD 영상은 기존의 분산처리 방식이 아닌 초대용량 데이터를 빠르게 처리 할 수 있는 새로운 분산 처리기술이 필요하다. 본 논문은 UHD 콘텐츠를 빠르게 트랜스코딩 할 수 있는 클라우드 기반 UHD 영상 트랜스코딩 시스템을 제안한다. 본 논문에서 제안하는 UHD 영상 트랜스코딩 시스템은 다음 3 가지 패킷 분석기, 분산 트랜스코더, 스트림 합성기로 구성된다. 패킷 분석기는 입력 영상을 분석하여 오디오와 비디오 스트림을 분할하고 비디오 스트림은 분산처리를 할 수 있도록 영상 패킷을 분할한다. 분산 트랜스코더는 클라우드 환경을 이용하여 분할된 영상 패킷들을 분산 디코드 및 인코드 처리한다. 스트림 합성기는 트랜스코딩이 완료된 비디오 스트림과 패킷 분석기에서 획득하였던 오디오 스트림을 합성하는 기능을 한다. 제시하는 방안을 적용하여 클라우드 기반 영상 트랜스 코딩 시스템을 구현하였으며, 구현된 시스템은 대용량의 UHD 영상을 빠른 속도로 트랜스코딩이 가능하다.
PDF

Efficient Processing of Multidimensional Vessel USN Stream Data using Clustering Hash Table (클러스터링 해쉬 테이블을 이용한 다차원 선박 USN 스트림 데이터의 효율적인 처리)

Song, Byoung-Ho;Oh, Il-Whan;Lee, Seong-Ro
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.47 no.6
- /
- pp.137-145
- /
- 2010
Digital vessel have to accurate and efficient mange the digital data from various sensors in the digital vessel. But, In sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. In this paper, We propose efficient processing method that arrange some sensors (temperature, humidity, lighting, voice) and process query based on sliding window for efficient input stream and pre-clustering using multiple Support Vector Machine(SVM) algorithm and manage hash table to summarized information. Processing performance improve as store and search and memory using hash table and usage reduced so maintain hash table in memory. We obtained to efficient result that accuracy rate and processing performance of proposal method using 35,912 data sets.
PDF KSCI

Processing of Sensor Data Stream for OSGi Frameworks (OSGi를 위한 실시간 센서 데이터스트림 처리 방법)

Cha, Ji-Yun;Byun, Yung-Cheol;Lee, Dong-Cheal
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.13 no.5
- /
- pp.1014-1021
- /
- 2009
In an environment of home network where a number of technologies including heterogeneous hardware platforms, networking and protocols, middleware systems, and etc, exist, OSGi provides a platform for deployment and sharing of services managed in hardware and guarantees compatibility among applications. However, only simple control and processing of event data are considered in a home network using OSGi, and the consideration about real time processing of data stream generated by sensors is not enough. Therefore, researches allowing users to effectively develop OSGi applications by using various kinds of sensors generating data streams in the home network environment using OSGi are needed. In this paper, we propose an effective method of processing various types of real time data streams supplied to OSGi applications, including filtering, grouping, and counting, etc.
https://doi.org/10.6109/JKIICE.2009.13.5.1014 인용 PDF KSCI

Discovering Frequent Itemsets Reflected User Characteristics Using Weighted Batch based on Data Stream (스트림 데이터 환경에서 배치 가중치를 이용하여 사용자 특성을 반영한 빈발항목 집합 탐사)

Seo, Bok-Il;Kim, Jae-In;Hwang, Bu-Hyun
- The Journal of the Korea Contents Association
- /
- v.11 no.1
- /
- pp.56-64
- /
- 2011
It is difficult to discover frequent itemsets based on whole data from data stream since data stream has the characteristics of infinity and continuity. Therefore, a specialized data mining method, which reflects the properties of data and the requirement of users, is required. In this paper, we propose the method of FIMWB discovering the frequent itemsets which are reflecting the property that the recent events are more important than old events. Data stream is splitted into batches according to the given time interval. Our method gives a weighted value to each batch. It reflects user's interestedness for recent events. FP-Digraph discovers the frequent itemsets by using the result of FIMWB. Experimental result shows that FIMWB can reduce the generation of useless items and FP-Digraph method shows that it is suitable for real-time environment in comparison to a method based on a tree(FP-Tree).
https://doi.org/10.5392/JKCA.2011.11.1.056 인용 PDF KSCI

Data Stream Allocation for Fair Performance in Multiuser MIMO Systems (다중 사용자 MIMO 환경에서 균등한 성능을 보장하는 데이터 스트림 할당 기법)

Lim, Dong-Ho;Choi, Kwon-Hue
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.34 no.12A
- /
- pp.1006-1013
- /
- 2009
This paper proposes a data stream allocation technique for fair capacity performance in multiuser multiple-input multiple-output (MIMO) systems using block diagonalization (BD) algorithm. Conventional studies have been focused on maximum sum capacity. Thus, there is a very large difference of capacity among users, since user capacity unfairly distributed according to each user channel environment. In additional, poor channel user has very small capacity, since base station allocates the power by using water-filling technique. Also, almost studies limited itself to obtain the additional gain by using the same number of data streams for all users. In this paper, we propose the technique for maximizing sum capacity under the fair performance constraint by allocating data stream according to user channel environment. Also, proposed algorithm has more gain of sum capacity and transmit power than conventional equal allocation via computer simulation.
PDF KSCI

Discovering Temporal Relation Considering the Weight of Events in Multidimensional Stream Data Environment (다차원 스트림 데이터 환경에서 이벤트 가중치를 고려한 시간 관계 탐사)

Kim, Jae-In;Kim, Dae-In;Song, Myung-Jin;Han, Dae-Young;Hwang, Bu-Hyun
- The Journal of the Korea Contents Association
- /
- v.10 no.2
- /
- pp.99-110
- /
- 2010
An event means a flow which has a time attribute such as a symptom of patient. Stream data collected by sensors can be summarized as an interval event which has a time interval between the start-time point and the end-time point in multiple stream data environment. Most of temporal mining techniques have considered only the frequent events. However, these approaches may ignore the infrequent event even if it is important. In this paper, we propose a new temporal data mining that can find association rules for the significant temporal relation based on interval events in multidimensional stream data environment. Our method considers the weight of events and stream data on the sensing time point of abnormal events. And we can discover association rules on the significant temporal relation regardless of the occurrence frequency of events. The experimental analysis has shown that our method provide more useful knowledge than other conventional methods.
https://doi.org/10.5392/JKCA.2010.10.2.099 인용 PDF KSCI

Processing Sliding Window Multi-Joins using a Graph-Based Method over Data Streams (데이터 스트림에서 그래프 기반 기법을 이용한 슬라이딩 윈도우 다중 조인 처리)

Zhang, Liang;Ge, Jun-Wei;Kim, Gyoung-Bae;Lee, Soon-Jo;Bae, Hae-Young;You, Byeong-Seob
- Journal of Korea Spatial Information System Society
- /
- v.9 no.2
- /
- pp.25-34
- /
- 2007
Existing approaches that select an order for the join of three or more data streams have always used the simple heuristics. For their disadvantage - only one factor is considered and that is join selectivity or arrival rate, these methods lead to poor performance and inefficiency In some applications. The graph-based sliding window multi -join algorithm with optimal join sequence is proposed in this paper. In this method, sliding window join graph is set up primarily, in which a vertex represents a join operator and an edge indicates the join relationship among sliding windows, also the vertex weight and the edge weight represent the cost of join and the reciprocity of join operators respectively. Then the optimal join order can be found in the graph by using improved MVP algorithm. The final result can be produced by executing the join plan with the nested loop join procedure, The advantages of our algorithm are proved by the performance comparison with existing join algorithms.
PDF

Ontology based Preprocessing Scheme for Mining Data Streams from Sensor Networks (센서 네트워크의 데이터 스트림 마이닝을 위한 온톨로지 기반의 전처리 기법)

Jung, Jason J.
- Journal of Intelligence and Information Systems
- /
- v.15 no.3
- /
- pp.67-80
- /
- 2009
By a number of sensors and sensor networks, we can collect environmental information from a certain sensor space. To discover more useful information and knowledge, we want to employ data mining methodologies to sensor data stream from such sensor spaces. In this paper, we present a novel data preprocessing scheme to improve the performances of the data mining algorithms. Especially, ontologies are applied to represent meanings of the sensor data. For evaluating the proposed method, we have collected sensor streams for about 30 days, and simulated them to compare with other approaches.
PDF

Mining of Frequent Structures over Streaming XML Data (스트리밍 XML 데이터의 빈발 구조 마이닝)

Hwang, Jeong-Hee
- The KIPS Transactions:PartD
- /
- v.15D no.1
- /
- pp.23-30
- /
- 2008
The basic research of context aware in ubiquitous environment is an internet technique and XML. The XML data of continuous stream type are popular in network application through the internet. And also there are researches related to query processing for streaming XML data. As a basic research to efficiently query, we propose not only a labeled ordered tree model representing the XML but also a mining method to extract frequent structures from streaming XML data. That is, XML data to continuously be input are modeled by a stream tree which is called by XFP_tree and we exactly extract the frequent structures from the XFP_tree of current window to mine recent data. The proposed method can be applied to the basis of the query processing and index method for XML stream data.
https://doi.org/10.3745/KIPSTD.2008.15-D.1.23 인용 PDF KSCI

Search Result 918, Processing Time 0.037 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)