• Title/Summary/Keyword: 지명 노이즈제거

Search Result 4, Processing Time 0.022 seconds

Event Detection System Based on Twitter Applied Geographical Name Denoising (지명 노이즈제거 기법을 적용한 트위터 기반 이벤트 탐지 시스템)

  • Woo, Seungmin;Hwang, Byung-Yeon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.1095-1097
    • /
    • 2015
  • 본 논문에서는 트위터 기반 이벤트 탐지에서의 기계학습을 통한 지명 노이즈제거 방식을 제안한다. 이벤트 탐지 시스템은 트위터 사용자 개개인을 이벤트 탐지의 센서로 이용하여 특정 지명에서 발생하는 이벤트를 탐지하였다. 그러나 지명과 동형이의어 관계의 단어가 탐지되어 이벤트 탐지의 정확도를 낮추는 요인이 된다. 이에 본 논문에서는 먼저 노이즈 관련 데이터베이스 구축을 이용하여 제거 필터링을 진행한 후에 기계학습을 이용해서 지명 유무를 결정하였다. 실험결과 본 논문에서 제시하는 예측기법은 89.6%의 신뢰도로 노이즈제거 기법의 필요성을 보였다.

A Method for Detecting Event-location based on Example in Tweet (트위터에서의 사례 기반 이벤트 지명 검출 기법)

  • Ha, HyunSoo;Hwang, Byung-Yeon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.1119-1121
    • /
    • 2015
  • 본 논문에서는 트위터 내용을 통해 이벤트를 탐지하는 시스템에서 지명 검출 정확도를 개선하는 방법을 제안한다. SNS를 이용한 개인 정보 유출 사례들이 늘어감에 따라 자신의 위치 정보를 공개하기 꺼려하기 때문에 이벤트가 발생한 지역을 검출하기 위해서는 텍스트 내용을 직접 분석해야한다. 그러나 오타나 줄임말, 동형이의어의 사용으로 정확한 지명 검출에 어려움이 발생하였다. 따라서 정확도를 향상시키기 위해 본 논문에서는 두 가지 지명 검출 기법을 제안한다. 지명 단어에서 발생되는 노이즈를 제거하는 지명 노이즈 제거 기법과 랜드 마크를 이용하여 지명 단어를 확정하는 지명 확정 기법이다. 실험 결과 기존 시스템의 정확도 49%에서 지명 노이즈 제거기법은 56%, 지명 확정 기법은 73%로 각각 향상되었다.

Geographical Name Denoising by Machine Learning of Event Detection Based on Twitter (트위터 기반 이벤트 탐지에서의 기계학습을 통한 지명 노이즈제거)

  • Woo, Seungmin;Hwang, Byung-Yeon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.10
    • /
    • pp.447-454
    • /
    • 2015
  • This paper proposes geographical name denoising by machine learning of event detection based on twitter. Recently, the increasing number of smart phone users are leading the growing user of SNS. Especially, the functions of short message (less than 140 words) and follow service make twitter has the power of conveying and diffusing the information more quickly. These characteristics and mobile optimised feature make twitter has fast information conveying speed, which can play a role of conveying disasters or events. Related research used the individuals of twitter user as the sensor of event detection to detect events that occur in reality. This research employed geographical name as the keyword by using the characteristic that an event occurs in a specific place. However, it ignored the denoising of relationship between geographical name and homograph, it became an important factor to lower the accuracy of event detection. In this paper, we used removing and forecasting, these two method to applied denoising technique. First after processing the filtering step by using noise related database building, we have determined the existence of geographical name by using the Naive Bayesian classification. Finally by using the experimental data, we earned the probability value of machine learning. On the basis of forecast technique which is proposed in this paper, the reliability of the need for denoising technique has turned out to be 89.6%.

Keyword Filtering about Disaster and the Method of Detecting Area in Detecting Real-Time Event Using Twitter (트위터를 활용한 실시간 이벤트 탐지에서의 재난 키워드 필터링과 지명 검출 기법)

  • Ha, Hyunsoo;Hwang, Byung-Yeon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.7
    • /
    • pp.345-350
    • /
    • 2016
  • This research suggests the keyword filtering about disaster and the method of detecting area in real-time event detecting system by analyzing contents of twitter. The diffusion of smart-mobile has lead to a fast spread of SNS and nowadays, various researches based on studying SNS are being processed. Among SNS, the twitter has a characteristic of fast diffusion since it is written in 140 words of short paragraph. Therefore, the tweets that are written by twitter users are able to perform a role of sensor. By using these features the research has been constructed which detects the events that have been occurred. However, people became reluctant to open their information of location because it is reported that private information leakage are increasing. Also, problems associated with accuracy are occurred in process of analyzing the tweet contents that do not follow the spelling rule. Therefore, additional designing keyword filtering and the method of area detection on detecting real-time event process were required in order to develop the accuracy. This research suggests the method of keyword filtering about disaster and two methods of detecting area. One is the method of removing area noise which removes the noise that occurred in the local name words. And the other one is the method of determinating the area which confirms local name words by using landmarks. By applying the method of keyword filtering about disaster and two methods of detecting area, the accuracy has improved. It has improved 49% to 78% by using the method of removing area noise and the other accuracy has improved 49% to 89% by using the method of determinating the area.