• Title/Summary/Keyword: SOM(self-Organizing Maps)

Search Result 57, Processing Time 0.021 seconds

Sparse Document Data Clustering Using Factor Score and Self Organizing Maps (인자점수와 자기조직화지도를 이용한 희소한 문서데이터의 군집화)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.2
    • /
    • pp.205-211
    • /
    • 2012
  • The retrieved documents have to be transformed into proper data structure for the clustering algorithms of statistics and machine learning. A popular data structure for document clustering is document-term matrix. This matrix has the occurred frequency value of a term in each document. There is a sparsity problem in this matrix because most frequencies of the matrix are 0 values. This problem affects the clustering performance. The sparseness of document-term matrix decreases the performance of clustering result. So, this research uses the factor score by factor analysis to solve the sparsity problem in document clustering. The document-term matrix is transformed to document-factor score matrix using factor scores in this paper. Also, the document-factor score matrix is used as input data for document clustering. To compare the clustering performances between document-term matrix and document-factor score matrix, this research applies two typed matrices to self organizing map (SOM) clustering.

A Document Ranking Method by Document Clustering Using Bayesian SoM and Botstrap (베이지안 SOM과 붓스트랩을 이용한 문서 군집화에 의한 문서 순위조정)

  • Choe, Jun-Hyeok;Jeon, Seong-Hae;Lee, Jeong-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.7
    • /
    • pp.2108-2115
    • /
    • 2000
  • The conventional Boolean retrieval systems based on vector spae model can provide the results of retrieval fast, they can't reflect exactly user's retrieval purpose including semantic information. Consequently, the results of retrieval process are very different from those users expected. This fact forces users to waste much time for finding expected documents among retrieved documents. In his paper, we designed a bayesian SOM(Self-Organizing feature Maps) in combination with bayesian statistical method and Kohonen network as a kind of unsupervised learning, then perform classifying documents depending on the semantic similarity to user query in real time. If it is difficult to observe statistical characteristics as there are less than 30 documents for clustering, the number of documents must be increased to at least 50. Also, to give high rank to the documents which is most similar to user query semantically among generalized classifications for generalized clusters, we find the similarity by means of Kohonen centroid of each document classification and adjust the secondary rank depending on the similarity.

  • PDF

Estimation of Inundation Area by Linking of Rainfall-Duration-Flooding Quantity Relationship Curve with Self-Organizing Map (강우량-지속시간-침수량 관계곡선과 자기조직화 지도의 연계를 통한 범람범위 추정)

  • Kim, Hyun Il;Keum, Ho Jun;Han, Kun Yeun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.6
    • /
    • pp.839-850
    • /
    • 2018
  • The flood damage in urban areas due to torrential rain is increasing with urbanization. For this reason, accurate and rapid flooding forecasting and expected inundation maps are needed. Predicting the extent of flooding for certain rainfalls is a very important issue in preparing flood in advance. Recently, government agencies are trying to provide expected inundation maps to the public. However, there is a lack of quantifying the extent of inundation caused by a particular rainfall scenario and the real-time prediction method for flood extent within a short time. Therefore the real-time prediction of flood extent is needed based on rainfall-runoff-inundation analysis. One/two dimensional model are continued to analyize drainage network, manhole overflow and inundation propagation by rainfall condition. By applying the various rainfall scenarios considering rainfall duration/distribution and return periods, the inundation volume and depth can be estimated and stored on a database. The Rainfall-Duration-Flooding Quantity (RDF) relationship curve based on the hydraulic analysis results and the Self-Organizing Map (SOM) that conducts unsupervised learning are applied to predict flooded area with particular rainfall condition. The validity of the proposed methodology was examined by comparing the results of the expected flood map with the 2-dimensional hydraulic model. Based on the result of the study, it is judged that this methodology will be useful to provide an unknown flood map according to medium-sized rainfall or frequency scenario. Furthermore, it will be used as a fundamental data for flood forecast by establishing the RDF curve which the relationship of rainfall-outflow-flood is considered and the database of expected inundation maps.

Input Pattern Vector Extraction and Pattern Recognition of Taste using fMRI (fMRI를 이용한 맛의 입력패턴벡터 추출 및 패턴인식)

  • Lee, Sun-Yeob;Lee, Yong-Gu;Kim, Dong-Ki
    • Journal of radiological science and technology
    • /
    • v.30 no.4
    • /
    • pp.419-426
    • /
    • 2007
  • In this paper, the input pattern vectors are extracted and the learning algorithms is designed to recognize taste(bitter, sweet, sour and salty) pattern vectors. The signal intensity of taste are used to compose the input pattern vectors. The SOM(Self Organizing Maps) algorithm for taste pattern recognition is used to learn initial reference vectors and the ot-star learning algorithm is used to determine the class of the output neurons of the sunclass layer. The weights of the proposed algorithm which is between the input layer and the subclass layer can be learned to determine initial reference vectors by using SOM algorithm and to learn reference vectors by using LVQ(Learning Vector Quantization) algorithm. The pattern vectors are classified into subclasses by neurons in the subclass layer, and the weights between subclass layer and output layer are learned to classify the classified subclass, which is enclosed a class. To classify the pattern vectors, the proposed algorithm is simulated with ones of the conventional LVQ, and it is confirmed that the proposed learning method is more successful classification than the conventional LVQ.

  • PDF

Comparison of clustering methods of microarray gene expression data (마이크로어레이 유전자 발현 자료에 대한 군집 방법 비교)

  • Lim, Jin-Soo;Lim, Dong-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.39-51
    • /
    • 2012
  • Cluster analysis has proven to be a useful tool for investigating the association structure among genes and samples in a microarray data set. We applied several cluster validation measures to evaluate the performance of clustering algorithms for analyzing microarray gene expression data, including hierarchical clustering, K-means, PAM, SOM and model-based clustering. The available validation measures fall into the three general categories of internal, stability and biological. The performance of clustering algorithms is evaluated using simulated and SRBCT microarray data. Our results from simulated data show that nearly every methods have good results with same result as the number of classes in the original data. For the SRBCT data the best choice for the number of clusters is less clear than the simulated data. It appeared that PAM, SOM, model-based method showed similar results to simulated data under Silhouette with of internal measure as well as PAM and model-based method under biological measure, while model-based clustering has the best value of stability measure.

Characteristics of Trend and Pattern for Water Quality Monitoring Networks Data using Seasonal-kendall, SOM and RDA on the Mulgeum in the Nakdong River (경향성 및 패턴 분석을 이용한 낙동강 물금지역의 수질 특성)

  • Ahn, Jung-Min;Lee, In-Jung;Jung, Kang-Young;Kim, Jueon;Lee, Kwonchul;Cheon, Seuk;Lyu, Siwan
    • Journal of Environmental Science International
    • /
    • v.25 no.3
    • /
    • pp.361-371
    • /
    • 2016
  • Ministry of Environment has been operating water quality monitoring network in order to obtain the basic data for the water environment policies and comprehensively understand the water quality status of public water bodies such as rivers and lakes. The observed water quality data is very important to analyze by applying statistical methods because there are seasonal fluctuations. Typically, monthly water quality data has to analyze that the transition comprise a periodicity since the change has the periodicity according to the change of seasons. In this study, trends, SOM and RDA analysis were performed at the Mulgeum station using water quality data for temperature, BOD, COD, pH, SS, T-N, T-P, Chl-a and Colon-bacterium observed from 1989 to 2013 in the Nakdong River. As a result of trends, SOM and RDA, the Mulgeum station was found that the water quality is improved, but caution is required in order to ensure safe water supply because concentrations in water quality were higher in the early spring(1~3 month) the most.

Identification of the Marker-Genes for Dioxin(2, 3, 7, 8- tetradibenzo-p-dioxin)-Induced Immune Dysfunction by Using the High-Density Oligonucleotide Microarray

  • Kim, Jeong-Ah;Lee, Eun-Ju;Chung, In Hye;Kim, Hyung-Lae
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.75-80
    • /
    • 2004
  • In a variety of animal species, the perinatal exposure of experimental animals to the 2,3,7,8-tetrachlorodibenzo­p-dioxin (TCDD) leads to the immune dysfunction, which is more severe and persistent than that caused by adult exposure. We report here the changes of gene expression and the identification of the marker-genes representing the dioxin exposure. The expressions of the transcripts were analyzed using the 11 K oligonucleotide­microarray from the bone marrow cells of male C57BL/6J mice after an intraperitoneal injection of $1{\mu}g$ TCDD/kg body weight at various time intervals: gestational 6.5 day(G6.5), 13.5 day(G13.5), 18.5 day(G18.5), and postnatal 3 (P3W)and 6 week (P6W). The type of self-organizing maps(SOM) representing the specific exposure dioxin could be identified as follows; G6.5D(C14), G13.5D(C0, C5, C10, C18), G18.5D(7): P3W(C2, C21), and P6W(C4, C15, C20). The candidate marker-genes were restricted to the transcripts, which could be consistently expressed greater than $\pm$2-fold in three experiments. The resulting candidates were 85 genes, the characteristics of that were involved in cell physiology and cell functions such as cell proliferation and immune function. We identified the biomarker-genes for dioxin exposure: smc -like 2 from SOM C14 for the dioxin exposure at G6.5D, focal adhesion kinase and 6 other genes from C0, and protein tyrosine phosphatase 4a2 and 3 other genes from C5 for G13.5D, platelet factor 4 from C7 for G18.5D, fos from C2 for P3W.