• Title/Summary/Keyword: Post Clustering

Search Result 72, Processing Time 0.027 seconds

Unsupervised Document Clustering for Constructing User Profile of Web Agent (웹 에이전트 사용자 특성모델 구축을 위한 비감독 문서 분류)

  • 오재준;박영택
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10c
    • /
    • pp.105-107
    • /
    • 1998
  • 본 연구는 웹 에이전트에 있어서 가장 핵심적인 부분이라 할 수 있는 사용자 특성모델 구축방법을 개선하는데 목적을 두고 있다. 사용자 특성모델을 귀납적 기계학습 방식으로 자동 추출하기 위해서는, 사용자가 관심을 가지는 분야별로 문서를 자동 분류하는 작업이 매우 중요하다. 지금까지의 방식은 사람이 관심부여에 따라 문서를 수동적으로 분류해 왔으나, 문서의 양이 기하급수적으로 증가할 경우 처리할 수 있는 문서의 양에는 한계가 있을 수밖에 없다. 또한 수작업 문서 분류 방식을 웹 에이전트에 그대로 적용하였을 경우 사용자가 일일이 문서를 분류해야한다는 번거로움으로 인해 웹 에이전트의 효용성이 반감될 것이다. 따라서 본 연구에서는 비감독 문서 분류 알고리즘과 그것을 바탕으로 얻어진 문서 분류 정보를 후처리 (Post-Processing)함으로써 보다 간결하고 정확한 문서 분류 결과를 얻을 수 있는 구체적인 방법을 제공하고자 한다.

  • PDF

Unsupervised Document Clustering for Constructing User Profile of Web Agent (웹 에이전트 사용자 특성모델 구축을 위한 비감독 문서 분류)

  • 오재준;박영택
    • Journal of Intelligence and Information Systems
    • /
    • v.4 no.2
    • /
    • pp.61-83
    • /
    • 1998
  • 본 연구는 웹 에이전트에 있어서 가장 핵심적인 부분이라 할 수 있는 사용자 특성모델 구축방법을 개선하는데 목적을 두고 있다. 사용자 특성모델을 귀납적 기계학습 방식으로 자동 추출하기 위해서는 사용자가 관심을 가지는 분야별로 문서를 자동 분류하는 작업이 매우 중요하다 지금까지의 방식은 사람이 관심여부에 따라 문서를 수동적으로 분류해 왔으나, 문서의 양이 기하급수적으로 증가할 경우 처리할 수 있는 문서의 양에는 한계가 있을 수밖에 없다. 또한 수작업 문서분류 방식을 웹 에이전트에 그대로 적용하였을 경우 사용자가 일일이 문서를 분류해야한다는 번거로움으로 인해 웹 에이전트의 효용성이 반감될 것이다. 따라서 본 연구에서는 비감독 문서분류 알고리즘과 그것을 바탕으로 얻어진 문서분류정보를 후처리(Post-Processing)함으로써 보다 간결하고 정확한 문서분류 결과를 얻을 수 있는 구체적인 방법을 제공하고자 한다.

  • PDF

Analysis of Land Uses in the Nakdong River Floodplain Using RapidEye Imagery and LiDAR DEM (RapidEye 영상과 LiDAR DEM을 이용한 낙동강 범람원 내 토지 이용 현황 분석)

  • Choung, Yun-Jae
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.17 no.4
    • /
    • pp.189-199
    • /
    • 2014
  • Floodplain is a flat plain between levees and rivers. This paper suggests a methodology for analyzing the land uses in the Nakdong River floodplain using the RapidEye imagery and the given LiDAR(LIght Detection And Ranging) DEM(Digital Elevation Models). First, the levee boundaries are generated using the LiDAR DEM, and the area of the floodplain is extracted from the given RapidEye imagery. The land uses in the floodplain are identified in the extracted RapidEye imagery by the ISODATA(Iterative Self-Organizing Data Analysis Technique Analysis) clustering. The overall accuracy of the identified land uses by the ISODATA clustering is 91%. Analysis of the identified land uses in the floodplain is implemented by counting the number of the pixels constituting the land cover clusters. The results of this research shows that the area of the river occupies 46%, the area of the bare soil occupies 36%, the area of the marsh occupies 11%, and the area of the grass occupies 7% in the identified floodplain.

Density-Based Estimation of POI Boundaries Using Geo-Tagged Tweets (공간 태그된 트윗을 사용한 밀도 기반 관심지점 경계선 추정)

  • Shin, Won-Yong;Vu, Dung D.
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.453-459
    • /
    • 2017
  • Users tend to check in and post their statuses in location-based social networks (LBSNs) to describe that their interests are related to a point-of-interest (POI). While previous studies on discovering area-of-interests (AOIs) were conducted mostly on the basis of density-based clustering methods with the collection of geo-tagged photos from LBSNs, we focus on estimating a POI boundary, which corresponds to only one cluster containing its POI center. Using geo-tagged tweets recorded from Twitter users, this paper introduces a density-based low-complexity two-phase method to estimate a POI boundary by finding a suitable radius reachable from the POI center. We estimate a boundary of the POI as the convex hull of selected geo-tags through our two-phase density-based estimation, where each phase proceeds with different sizes of radius increment. It is shown that our method outperforms the conventional density-based clustering method in terms of computational complexity.

The syllable recovrey rule-based system and the application of a morphological analysis method for the post-processing of a continuous speech recognition (연속음성인식 후처리를 위한 음절 복원 rule-based 시스템과 형태소분석기법의 적용)

  • 박미성;김미진;김계성;최재혁;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.3
    • /
    • pp.47-56
    • /
    • 1999
  • Various phonological alteration occurs when we pronounce continuously in korean. This phonological alteration is one of the major reasons which make the speech recognition of korean difficult. This paper presents a rule-based system which converts a speech recognition character string to a text-based character string. The recovery results are morphologically analyzed and only a correct text string is generated. Recovery is executed according to four kinds of rules, i.e., a syllable boundary final-consonant initial-consonant recovery rule, a vowel-process recovery rule, a last syllable final-consonant recovery rule and a monosyllable process rule. We use a x-clustering information for an efficient recovery and use a postfix-syllable frequency information for restricting recovery candidates to enter morphological analyzer. Because this system is a rule-based system, it doesn't necessitate a large pronouncing dictionary or a phoneme dictionary and the advantage of this system is that we can use the being text based morphological analyzer.

  • PDF

A Fast Way for Alignment Marker Detection and Position Calibration (Alignment Marker 고속 인식 및 위치 보정 방법)

  • Moon, Chang Bae;Kim, HyunSoo;Kim, HyunYong;Lee, Dongwon;Kim, Tae-Hoon;Chung, Hae;Kim, Byeong Man
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.1
    • /
    • pp.35-42
    • /
    • 2016
  • The core of the machine vision that is frequently used at the pre/post-production stages is a marker alignment technology. In this paper, a method to detect the angle and position of a product at high speed by use of a unique pattern present in the marker stamped on the product, and calibrate them is proposed. In the proposed method, to determine the angle and position of a marker, the candidates of the marker are extracted by using a variation of the integral histogram, and then clustering is applied to reduce the candidates. The experimental results revealed about 5s 719ms improvement in processing time and better precision in detecting the rotation angle of a product.

Fast k-NN based Malware Analysis in a Massive Malware Environment

  • Hwang, Jun-ho;Kwak, Jin;Lee, Tae-jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6145-6158
    • /
    • 2019
  • It is a challenge for the current security industry to respond to a large number of malicious codes distributed indiscriminately as well as intelligent APT attacks. As a result, studies using machine learning algorithms are being conducted as proactive prevention rather than post processing. The k-NN algorithm is widely used because it is intuitive and suitable for handling malicious code as unstructured data. In addition, in the malicious code analysis domain, the k-NN algorithm is easy to classify malicious codes based on previously analyzed malicious codes. For example, it is possible to classify malicious code families or analyze malicious code variants through similarity analysis with existing malicious codes. However, the main disadvantage of the k-NN algorithm is that the search time increases as the learning data increases. We propose a fast k-NN algorithm which improves the computation speed problem while taking the value of the k-NN algorithm. In the test environment, the k-NN algorithm was able to perform with only the comparison of the average of similarity of 19.71 times for 6.25 million malicious codes. Considering the way the algorithm works, Fast k-NN algorithm can also be used to search all data that can be vectorized as well as malware and SSDEEP. In the future, it is expected that if the k-NN approach is needed, and the central node can be effectively selected for clustering of large amount of data in various environments, it will be possible to design a sophisticated machine learning based system.

Hair Classification and Region Segmentation by Location Distribution and Graph Cutting (위치 분포 및 그래프 절단에 의한 모발 분류와 영역 분할)

  • Kim, Yong-Gil;Moon, Kyung-Il
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.3
    • /
    • pp.1-8
    • /
    • 2022
  • Recently, Google MedeiaPipe presents a novel approach for neural network-based hair segmentation from a single camera input specifically designed for real-time, mobile application. Though neural network related to hair segmentation is relatively small size, it produces a high-quality hair segmentation mask that is well suited for AR effects such as a realistic hair recoloring. However, it has undesirable segmentation effects according to hair styles or in case of containing noises and holes. In this study, the energy function of the test image is constructed according to the estimated prior distributions of hair location and hair color likelihood function. It is further optimized according to graph cuts algorithm and initial hair region is obtained. Finally, clustering algorithm and image post-processing techniques are applied to the initial hair region so that the final hair region can be segmented precisely. The proposed method is applied to MediaPipe hair segmentation pipeline.

A Study on the Reading Program Improvement Plan of a Public Library Based on the Reading Culture Promotion Policy (독서문화진흥 정책에 기반한 공공도서관의 독서 프로그램 개선 방안 연구)

  • Miah Cho;Seung-Jin Kwak
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.3
    • /
    • pp.191-210
    • /
    • 2023
  • The purpose of this study is to draw implications through domestic and international best case studies of library programs, and to suggest ways to improve a public library reading programs through analysis based on the 3rd Reading Culture Promotion Basic Plan in line with the changing role of future libraries. there is To this end, first, prior studies were analyzed from various angles to derive clustering standards for library programs. Based on this, programs of various domestic and foreign libraries were analyzed based on clustering criteria. And based on the clustering criteria of library programs and the 13 key tasks under the 4 strategies of the 3rd Reading Culture Promotion Basic Plan, the status of a specific public library reading programs was analyzed. Through this, in consideration of the demand of users in the era of the 4th Industrial Revolution, participatory reading promotion programs are expanded, and in response to the post-COVID-19 era, beyond face-to-face library services, non-face-to-face and non-contact library services are also considered. A development plan was presented. It is expected that this analysis and application attempt will ultimately go beyond the unit library and contribute to improving the public library service in Korea into a library program closely related to the lives of users.

Ecoregional Characteristics of Korea for Application on Forest Landscape Restoration in North Korea (북한 산림경관복원 적용을 위한 한반도 생태지역 특성)

  • Yu, Jaeshim;Kim, Kyoungmin
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.18 no.6
    • /
    • pp.61-71
    • /
    • 2015
  • The objectives of this study are to construct an ecoregion map and to extract ecological factors from each ecoregion to adapt FLR (Forest Landscape Restoration) of North Korea. An ecological map was constructed by PCA(Principal Component Analysis) and MGC(Multivatiate Geographical Clustering). An ANOVA test verified the differences among ecoregions, and post-hoc pair wise comparisons were performed to determine similarities between them. Factor analysis was conducted to extract ecoregional characteristics. Ecoregions were distributed into clusters reflecting differences of south and north and of east and west of their ecological factors. About 12% of land area in North Korea shared similar ecological factors with South Korea, but the remaining 88% was found to be ecologically different. The ANOVA test showed a p-value of 0.000, indicating significant differences between the regions. Post-hoc pair wise comparisons indicated statistically significant similarities in annual mean temperature between ecoregion D and G, precipitation seasonality between ecoregion H and O, and precipitation of the warmest quarter between ecoregion K and O. Because ecoregion A and N showed same in their soil water contents, they were assumed that the dense of forest cover in the Southern ecoregion A is similar to that in the Northern ecoregion N of Korean peninsular. Based on the results of this study, it is necessary to accommodate quantitative and spatial based planning, when South Korea aids forest restoration projects in North Korea. In addition, it is recommended for both South and North Korea to share on Forest Landscape Restoration methodologies with each other.