• Title/Summary/Keyword: 지도 군집화

Search Result 592, Processing Time 0.028 seconds

Question and Answering System through Search Result Summarization of Q&A Documents (Q&A 문서의 검색 결과 요약을 활용한 질의응답 시스템)

  • Yoo, Dong Hyun;Lee, Hyun Ah
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.4
    • /
    • pp.149-154
    • /
    • 2014
  • A user should pick up relevant answers by himself from various search results when using user participation question answering community like Knowledge-iN. If refined answers are automatically provided, usability of question answering community must be improved. This paper divides questions in Q&A documents into 4 types(word, list, graph and text), then proposes summarizing methods for each question type using document statistics. Summarized answers for word, list and text type are obtained by question clustering and calculating scores for words using frequency, proximity and confidence of answers. Answers for graph type is shown by extracting user opinion from answers.

Similarity Measure and Clustering Technique for XML Documents by a Parent-Child Matrix (부모-자식 행렬을 사용한 XML 문서 유사도 측정과 군집 기법)

  • Lee, Yun-Gu;Kim, Woosaeng
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.7
    • /
    • pp.1599-1607
    • /
    • 2015
  • Recently, researches have been developing efficient techniques for accessing, querying, and managing XML documents which are frequently used in the Internet. In this paper, we propose a parent-child matrix to cluster XML documents efficiently. A parent-child matrix analyzes both the content and structural features of an XML document. Each cell of a parent-child matrix has either the value of a node in an XML tree or the value of a child node, where a parent-child relationship exists in the XML tree. Then, the similarity between two XML documents can be measured by the similarity between two corresponding parent-child matrices. The experiment shows that our proposed method has good performance.

Twitter Sentiment Analysis for the Recent Trend Extracted from the Newspaper Article (신문기사로부터 추출한 최근동향에 대한 트위터 감성분석)

  • Lee, Gyoung Ho;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.10
    • /
    • pp.731-738
    • /
    • 2013
  • We analyze public opinion via a sentiment analysis of tweets collected by using recent topic keywords extracted from newspaper articles. Newspaper articles collected within a certain period of time are clustered by using K-means algorithm and topic keywords for each cluster are extracted by using term frequency. A sentiment analyzer learned by a machine learning method can classify tweets according to their polarity values. We have an assumption that tweets collected by using these topic keywords deal with the same topics as the newspaper articles mentioned if the tweets and the newspapers are generated around the same time. and we tried to verify the validity of this assumption.

Quantitative Annotation of Edges, in Bayesian Networks with Condition-Specific Data (베이지안 망 연결 구조에 대한 데이터 군집별 기여도의 정량화 방법에 대한 연구)

  • Jung, Sung-Won;Lee, Do-Heon;Lee, Kwang-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.3
    • /
    • pp.316-321
    • /
    • 2007
  • We propose a quatitative annotation method for edges in Bayesian networks using given sets of condition-specific data. Bayesian network model has been used widely in various fields to infer probabilistic dependency relationships between entities in target systems. Besides the need for identifying dependency relationships, the annotation of edges in Bayesian networks is required to analyze the meaning of learned Bayesian networks. We assume the training data is composed of several condition-specific data sets. The contribution of each condition-specific data set to each edge in the learned Bayesian network is measured using the ratio of likelihoods between network structures of including and missing the specific edge. The proposed method can be a good approach to make quantitative annotation for learned Bayesian network structures while previous annotation approaches only give qualitative one.

Personalized Document Summarization Using NMF and Clustering (군집과 비음수 행렬 분해를 이용한 개인화된 문서 요약)

  • Park, Sun
    • Journal of Advanced Navigation Technology
    • /
    • v.13 no.1
    • /
    • pp.151-155
    • /
    • 2009
  • We proposes a new method using the non-negative matrix factorization (NMF) and clustering method to extract the sentences for personalized document summarization. The proposed method uses clustering method for retrieving documents to extract sentences which are well reflected topics and sub-topics in document. Beside it can extract sentences with respect to query which are well reflected user interesting by using the inherent semantic features in document by NMF. The experimental results shows that the proposed method achieves better performance than other methods use the similarity and the NMF.

  • PDF

Development of an AI-based Early Warning System for Water Meter Freeze-Burst Detection Using AI Models (AI기반 물공급 시스템내 동파위험 조기경보를 위한 AI모델 개발 연구)

  • So Ryung Lee;Hyeon June Jang;Jin Wook Lee;Sung Hoon Kim
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.511-511
    • /
    • 2023
  • 기후변화로 동절기 기온 저하에 따른 수도계량기의 동파는 지속적으로 심화되고 있으며, 이는 계량기 교체 비용, 누수, 누수량 동결에 의한 2차 피해, 단수 등 사회적 문제를 야기한다. 이와같은 문제를 해결하고자 구조적 대책으로 개별 가정에서 동파 방지형 계량기를 설치할 수 있으나 이를 위한 비용발생이 상당하고, 비구조적 대책으로는 기상청의 동파 지도 알림 서비스를 활용하여 사전적으로 대응하고자 하나, 기상청자료는 대기 온도를 중심으로 제공하고 있기 때문에 해당서비스만으로는 계량기의 동파를 예측하는데 필요한 추가적인 다양한 변수를 활용하는데 한계가 있다. 최근 정부와 공공부문에서 22개 지역, 110개소 이상의 수도계량기함내 IoT 온도센서를 시범 설치하여 계량기 함내의 상태 등을 확인할 수 있는 사업을 수행했다. 전국적인 계량기 상태의 예측과 진단을 위해서는 추가적인 센서 설치가 필요할 것이나, IoT센서 설치 비용 등의 문제로 추가 설치가 더딘 실정이다. 본 연구에서는 겨울 동파 예방을 위해 실제 온도센서를 기반으로 가상센서를 구축하고, 이를 혼합한 하이브리드 방식으로 동파위험 기준에 따라 전국 동파위험 지도를 구축하였다. 가상센서 개발을 위해 독립변수로 위경도, 고도, 음·양지, 보온재 여부 및 기상정보(기온, 강수량, 풍속, 습도)를 활용하고, 종속변수로 실제 센서의 온도를 사용하여 기계학습 모델을 개발하였다. 지역 특성에 따라 정확한 모델을 구축하기 위해 위치정보 및 보온재여부 등의 변수를 활용하여 K-means 방법으로 군집화 하였으며, 각 군집별로 3가지의 기계학습 회귀모델을 적용하였다. 최적의 군집 수를 검토한 결과 4개가 적정한 것으로 판단되었다. 군집의 특성은 지역별 구분과 유사한 패턴을 보이며, 모든 군집에서 Gradient Boosting 회귀모델을 적용하는 것이 적합한 것으로 나타났다. 본 연구에서 개발한 모델을 바탕으로 조건에 따라 동파 예측 알람서비스에 실무적으로 활용할 수 있도록 양호·주의·위험·매우위험 총 4개의 기준을 설정하였다. 실제 본 연구에서 개발된 알고리즘을 국가상수도정보 시스템에 반영하여 테스트 수행중에 있으며, 향후 지속 검증을 할 예정에 있다. 이를 통해 동파 예방 및 피해 최소화, 물절약 등 직간접적 편익이 기대된다.

  • PDF

Segmentation of Target Objects Based on Feature Clustering in Stereoscopic Images (입체영상에서 특징의 군집화를 통한 대상객체 분할)

  • Jang, Seok-Woo;Choi, Hyun-Jun;Huh, Moon-Haeng
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.10
    • /
    • pp.4807-4813
    • /
    • 2012
  • Since the existing methods of segmenting target objects from various images mainly use 2-dimensional features, they have several constraints due to the shortage of 3-dimensional information. In this paper, we therefore propose a new method of accurately segmenting target objects from three dimensional stereoscopic images using 2D and 3D feature clustering. The suggested method first estimates depth features from stereo images by using a stereo matching technique, which represent the distance between a camera and an object from left and right images. It then eliminates background areas and detects foreground areas, namely, target objects by effectively clustering depth and color features. To verify the performance of the proposed method, we have applied our approach to various stereoscopic images and found that it can accurately detect target objects compared to other existing 2-dimensional methods.

Coarse-to-fine Classifier Ensemble Selection using Clustering and Genetic Algorithms (군집화와 유전 알고리즘을 이용한 거친-섬세한 분류기 앙상블 선택)

  • Kim, Young-Won;Oh, Il-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.9
    • /
    • pp.857-868
    • /
    • 2007
  • The good classifier ensemble should have a high complementarity among classifiers in order to produce a high recognition rate and its size is small in order to be efficient. This paper proposes a classifier ensemble selection algorithm with coarse-to-fine stages. for the algorithm to be successful, the original classifier pool should be sufficiently diverse. This paper produces a large classifier pool by combining several different classification algorithms and lots of feature subsets. The aim of the coarse selection is to reduce the size of classifier pool with little sacrifice of recognition performance. The fine selection finds near-optimal ensemble using genetic algorithms. A hybrid genetic algorithm with improved searching capability is also proposed. The experimentation uses the worldwide handwritten numeral databases. The results showed that the proposed algorithm is superior to the conventional ones.

Development of IoT Service Classification Method based on Service Operation Characteristic (세부 동작 기반 사물인터넷 서비스 분류 기법 개발)

  • Jo, Jeong hoon;Lee, HwaMin;Lee, Dae won
    • Journal of Internet Computing and Services
    • /
    • v.19 no.2
    • /
    • pp.17-26
    • /
    • 2018
  • Recently, through the emergence and convergence of Internet services, the unified Internet of thing(IoT) service platform have been researched. Currently, the IoT service is constructed as an independent system according to the purpose of the service provider, so information exchange and module reuse are impossible among similar services. In this paper, we propose a operation based service classification algorithm for various services in order to provide an environment of unfied Internet platform. In implementation, we classify and cluster more than 100 commercial IoT services. Based on this, we evaluated the performance of the proposed algorithm compared with the K-means algorithm. In order to prevent a single clustering due to the lack of sample groups, we re-cluster them using K-means algorithm. In future study, we will expand existing service sample groups and use the currently implemented classification system on Apache Spark for faster and more massive data processing.

Word Separation in Handwritten Legal Amounts on Bank Check by Measuring Gap Distance Between Connected Components (연결 성분 간 간격 측정에 의한 필기체 수표 금액 문장에서의 단어 추출)

  • Kim, In-Cheol
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.1
    • /
    • pp.57-62
    • /
    • 2004
  • We have proposed an efficient method of word separation in a handwritten legal amount on bank check based on the spatial gaps between the connected components. The previous gap measures all suffer from the inherent problem of underestimation or overestimation that causes a deterioration in separation performance. In order to alleviate such burden, we have developed a modified version of each distance measure. Also, 4 class clustering based method of integrating three different types of distance measures has been proposed to compensate effectively the errors in each measure, whereby further improvement in performance of word separation is expected. Through a series of word separation experiments, we found that the modified distance measures show a better performance with over 2 - 3% of the word separation rate than their corresponding original distance measures. In addition, the proposed combining method based on 4-class clustering achieved further improvement by effectively reducing the errors common to two of three distance measures as well as the individual errors.