• Title/Summary/Keyword: cluster representative term

Search Result 16, Processing Time 0.029 seconds

Automatic Generation of the Local Level Knowledge Structure of a Single Document Using Clustering Methods (클러스터링 기법을 이용한 개별문서의 지식구조 자동 생성에 관한 연구)

  • Han, Seung-Hee;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.3
    • /
    • pp.251-267
    • /
    • 2004
  • The purpose of this study is to generate the local level knowledge structure of a single document, similar to end-of-the-book indexes and table of contents of printed material through the use of term clustering and cluster representative term selection. Furthermore, it aims to analyze the functionalities of the knowledge structure. and to confirm the applicability of these methods in user-friend1y information services. The results of the term clustering experiment showed that the performance of the Ward's method was superior to that of the fuzzy K -means clustering method. In the cluster representative term selection experiment, using the highest passage frequency term as the representative yielded the best performance. Finally, the result of user task-based functionality tests illustrate that the automatically generated knowledge structure in this study functions similarly to the local level knowledge structure presented In printed material.

Designing Hierarchical User Interface Model for Browsing the Knowledge Structure of a Single Document Using MDS (MDS를 이용한 개별문서의 계층적 지식구조 브라우징 인터페이스 설계)

  • Han, Seung-Hee;Lee, Jae-Yun
    • Journal of Information Management
    • /
    • v.35 no.3
    • /
    • pp.125-138
    • /
    • 2004
  • The purpose of this study is to propose a hierarchical user interfaces for browsing the knowledge structure of a single document. To generate the hierarchical knowledge structure, hierarchical term clustering and cluster representative term selection were performed with a single thesis in information science field, and the result was applied to design the interfaces which browse a single document hierarchically using multidimensional scaling. The interfaces can be applied to develop the user-friendly information retrieval system.

Statistical Analysis for Ozone Long-term Trend Stations in Seoul, Korea (통계적 기법을 적용한 서울의 오존 장기변동 대표측정소 선정)

  • Shin, Hyejung;Park, Jihoon;Son, Jungseok;Rho, Soona;Hong, Youdeong
    • Journal of Environmental Impact Assessment
    • /
    • v.24 no.2
    • /
    • pp.111-118
    • /
    • 2015
  • This study was conducted for the establishment of statistical method to determine the representative air quality monitoring station representing long-term ozone trends of Seoul. In this study, hourly ozone concentrations from 2002 to 2011 were used for further analysis. KZ-filter, correlation matrix, cluster analysis, and Kriging method were applied to select the representative station. The analysis based on correlation matrix found that long-term trend of ozone concentrations measured at Sinjung, Sadang, and Bun-dong showed a high correlation. The cluster analysis found that the former three stations belonged to the same cluster. The analysis based on Kriging method also showed that the former three stations were highly correlated with other stations in spatial distribution. Considering these results and the highest correlation coefficient of Sinjung station, the Sinjung station was the most suitable as the representative station used to understand the long-term ozone trend of Seoul. This result could be applied to understand long-term trend of other pollutants. Furthermore, this result can also be used to assess the appropriacy of spatial distribution of national air quality monitoring stations.

TEMPORAL CLASSIFICATION METHOD FOR FORECASTING LOAD PATTERNS FROM AMR DATA

  • Lee, Heon-Gyu;Shin, Jin-Ho;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.594-597
    • /
    • 2007
  • We present in this paper a novel mid and long term power load prediction method using temporal pattern mining from AMR (Automatic Meter Reading) data. Since the power load patterns have time-varying characteristic and very different patterns according to the hour, time, day and week and so on, it gives rise to the uninformative results if only traditional data mining is used. Also, research on data mining for analyzing electric load patterns focused on cluster analysis and classification methods. However despite the usefulness of rules that include temporal dimension and the fact that the AMR data has temporal attribute, the above methods were limited in static pattern extraction and did not consider temporal attributes. Therefore, we propose a new classification method for predicting power load patterns. The main tasks include clustering method and temporal classification method. Cluster analysis is used to create load pattern classes and the representative load profiles for each class. Next, the classification method uses representative load profiles to build a classifier able to assign different load patterns to the existing classes. The proposed classification method is the Calendar-based temporal mining and it discovers electric load patterns in multiple time granularities. Lastly, we show that the proposed method used AMR data and discovered more interest patterns.

  • PDF

Drought Classification Method for Jeju Island using Standard Precipitation Index (표준강수지수를 활용한 제주도 가뭄의 공간적 분류 방법 연구)

  • Park, Jae-Kyu;Lee, Jun-ho;Yang, Sung-Kee;Kim, Min-Chul;Yang, Se-Chang
    • Journal of Environmental Science International
    • /
    • v.25 no.11
    • /
    • pp.1511-1519
    • /
    • 2016
  • Jeju Island relies on subterranean water for over 98% of its water resources, and it is therefore necessary to continue to perform studies on drought due to climate changes. In this study, the representative standardized precipitation index (SPI) is classified by various criteria, and the spatial characteristics and applicability of drought in Jeju Island are evaluated from the results. As the result of calculating SPI of 4 weather stations (SPI 3, 6, 9, 12), SPI 12 was found to be relatively simple compared to SPI 6. Also, it was verified that the fluctuation of SPI was greater fot short-term data, and that long-term data was relatively more useful for judging extreme drought. Cluster analysis was performed using the K-means technique, with two variables extracted as the result of factor analysis, and the clustering was terminated with seven-time repeated calculations, and eventually two clusters were formed.

Hydrometeorological Characteristics and The Spatial Distribution of Agricultural Droughts (농업가뭄의 수문기상학적 특성 및 공간적 분포에 관한 연구)

  • Jang, Jung seok
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.61 no.2
    • /
    • pp.105-115
    • /
    • 2019
  • For 159 administrative areas, SPI(Standardized Precipitation Index), ARDI(Agricultural Reservoir Drought Index) and ARDIs(Agricultural Reservoir Drought Index Simulated) were developed and applied to analyze the characteristics of agricultural drought index and agricultural droughts. In order to identify hydrometeorological characteristics of agricultural droughts, SPI, ARDI and ARDIs were calculated nationwide, and the applicability was compared and examined. SPI and ARDI showed significant differences in time and depth of drought in both spatial and temporal. ARDI and ARDIs showed similar tendency of change, and ARDIs were considered to be more representative of agricultural drought characteristics. The results of this study suggest that agricultural drought is a problem to be solved in the medium and long term rather than short term due to various forms of development, complexity of development, and difficulty in forecasting. Therefore, it is concluded that a preliminary and systematic approach is needed in consideration of meteorological, hydrological and hydrometeorological characteristics rather than a fragmentary approach, and that an agricultural drought index is needed to quantitatively evaluate agricultural drought.

Improvement on Density-Independent Clustering Method (밀도에 무관한 클러스터링 기법의 개선)

  • Kim, Seong-Hoon;Heo, Gyeongyong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.5
    • /
    • pp.967-973
    • /
    • 2017
  • Clustering is one of the most well-known unsupervised learning methods that clusters data into homogeneous groups. Clustering has been used in various applications and FCM is one of the representative methods. In Fuzzy C-Means(FCM), however, cluster centers tend leaning to high density areas because the Euclidean distance measure forces high density clusters to make more contribution to clustering result. Previously proposed was density-independent clustering method, where cluster centers were made not to be close each other and relived the center deviation problem. Density-independent clustering method has a limitation that it is difficult to specify the position of the cluster centers. In this paper, an enhanced density-independent clustering method with an additional term that makes cluster centers to be placed around dense region is proposed. The proposed method converges more to real centers compared to FCM and density-independent clustering, which can be verified with experimental results.

A Text Summarization Model Based on Sentence Clustering (문장 클러스터링에 기반한 자동요약 모형)

  • 정영미;최상희
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.3
    • /
    • pp.159-178
    • /
    • 2001
  • This paper presents an automatic text summarization model which selects representative sentences from sentence clusters to create a summary. Summary generation experiments were performed on two sets of test documents after learning the optimum environment from a training set. Centroid clustering method turned out to be the most effective in clustering sentences, and sentence weight was found more effective than the similarity value between sentence and cluster centroid vectors in selecting a representative sentence from each cluster. The result of experiments also proves that inverse sentence weight as well as title word weight for terms and location weight for sentences are effective in improving the performance of summarization.

  • PDF

Development Strategy of Seosan-Daesan Port using AHP Analysis (AHP를 이용한 서산 대산항의 발전전략에 관한 연구)

  • Yun, Kyong-Jun;Ahn, Seung-Bum;Lee, Hyang-sook
    • Journal of Korea Port Economic Association
    • /
    • v.34 no.4
    • /
    • pp.39-52
    • /
    • 2018
  • The Seosan-Daesan Port is a representative trade port in Chungnam, and has the sixth largest total cargo throughput and the third largest oil cargo throughput in Korea. However, research on this port's development is lacking relative to that for Busan Port, Incheon Port, and Gwangyang Port, and no study exists that suggests the direction of the development strategy for Seosan-Daesan Port. This study discusses the future role of Seosan-Daesan Port in preparation for a rapidly changing future and the development strategy that should be established. Using the AHP, a development strategy is provided for Seosan-Daesan Port from short/mid-term and long-term viewpoints for three aspects: operation activation, infrastructure construction, and policy support. Operation activation is chosen as the most significant factor from a short/mid-term viewpoint, whereas infrastructure construction is recognized as important from a long-term viewpoint. Specifically, from a short/mid-term viewpoint, sustainable container cargo attraction, multipurpose dock construction, management pier construction, and opening of international passenger ferry lines are important factors while from the long-term viewpoint, hinterland construction, petrochemical industry cluster construction, automobile industry cluster construction, and management improvement system are important. Establishing action plans for each strategy and a cooperative network for sharing goals and strengthening cooperation is necessary.

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.