• Title/Summary/Keyword: word cluster

Search Result 146, Processing Time 0.028 seconds

An Analysis of the Word-Final Cluster of the Syllable Structure (음절구조의 어말 자음군에 관한 분석)

  • Oh, Kwan-Young
    • English Language & Literature Teaching
    • /
    • v.10 no.2
    • /
    • pp.67-87
    • /
    • 2004
  • The purpose of this paper is to show how the coda of a syllable and word-final clusters are represented in the English syllable structure. Previous theories on the syllable assume that there is only one segment in the coda position. And, as we know, the theories that license only one segment in the coda make it difficult to syllabicate the word-final cluster appropriately when more than two segments in the word-final cluster are encountered. I considered three approaches: the previous syllable structure (Selkirk, 1982; Borowsky 1989), sonority sequencing (Giegerich, 1992; Roca, 1999) and feature analysis (Goldsmith, 1990), But, all the considered methods don't give us a satisfactory explanation regarding word-final clusters. Finally, I will suggest a modified syllable representation as an alternative by placing two different appendixes under the Phonological Word which forms a constituent above the syllable node. From this it is possible to explain the former problematic word-final clusters including morphological information asan inflectional suffix in the structure.

  • PDF

Selection of Cluster Topic Words in Hierarchical Clustering using K-Means Algorithm

  • Lee Shin Won;Yi Sang Seon;An Dong Un;Chung Sung Jong
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.885-889
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Hierarchical clustering improves the performance of retrieval and makes that users can understand easily. For outperforming of clustering, we implemented hierarchical structure with variety and readability, by careful selection of cluster topic words and deciding the number of clusters dynamically. It is important to select topic words because hierarchical clustering structure is summarizes result of searching. We made choice of noun word as a cluster topic word. The quality of topic words is increased $33\%$ as follows. As the topic word of each cluster, the only noun word is extracted for the top-level cluster and the used topic words for the children clusters were not reused.

  • PDF

Word Cluster-based Mobile Application Categorization (단어 군집 기반 모바일 애플리케이션 범주화)

  • Heo, Jeongman;Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.3
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we propose a mobile application categorization method using word cluster information. Because the mobile application description can be shortly written, the proposed method utilizes the word cluster seeds as well as the words in the mobile application description, as categorization features. For the fragmented categories of the mobile applications, the proposed method generates the word clusters by applying the frequency of word occurrence per category to K-means clustering algorithm. Since the mobile application description can include some paragraphs unrelated to the categorization, such as installation specifications, the proposed method uses some word clusters useful for the categorization. Experiments show that the proposed method improves the recall (5.65%) by using the word cluster information.

The characteristics of eye-movement in Korean sentence reading: cluster length, word frequency, and landing position effects (우리 문장 읽기에서 안구 운동의 특성: 어절 길이, 단어 빈도 및 착지점 관련 효과)

  • Koh, Sung-Ryongng;Yoon, Nak-Yeong
    • Korean Journal of Cognitive Science
    • /
    • v.18 no.4
    • /
    • pp.325-350
    • /
    • 2007
  • This study investigated global and local characteristics of eye movement while 16 college students read 48 easy Korean sentences. It was found that readers lusted for about 225ms at the word cluster(eojeol), made a forward saccade of about 3.6 characters to the next word, skipped short and high-frequent words about 25% during the first-pass reading, and regressed backward at 19%. There were also individual differences in readers' pattern of fixation and saccade. In addition, the effects of word cluster length and word frequency and the effects related to landing position were examined. The eyes landed on the center of a word cluster more frequently than on the boundaries. When the eyes landed at the boundaries, the eyes fixated the word cluster again more frequently. The word clusters with high-frequency words were read faster than those with low-frequency words.

  • PDF

Enhancing Text Document Clustering Using Non-negative Matrix Factorization and WordNet

  • Kim, Chul-Won;Park, Sun
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.241-246
    • /
    • 2013
  • A classic document clustering technique may incorrectly classify documents into different clusters when documents that should belong to the same cluster do not have any shared terms. Recently, to overcome this problem, internal and external knowledge-based approaches have been used for text document clustering. However, the clustering results of these approaches are influenced by the inherent structure and the topical composition of the documents. Further, the organization of knowledge into an ontology is expensive. In this paper, we propose a new enhanced text document clustering method using non-negative matrix factorization (NMF) and WordNet. The semantic terms extracted as cluster labels by NMF can represent the inherent structure of a document cluster well. The proposed method can also improve the quality of document clustering that uses cluster labels and term weights based on term mutual information of WordNet. The experimental results demonstrate that the proposed method achieves better performance than the other text clustering methods.

A Performance Analysis Based on Hadoop Application's Characteristics in Cloud Computing (클라우드 컴퓨팅에서 Hadoop 애플리케이션 특성에 따른 성능 분석)

  • Keum, Tae-Hoon;Lee, Won-Joo;Jeon, Chang-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.5
    • /
    • pp.49-56
    • /
    • 2010
  • In this paper, we implement a Hadoop based cluster for cloud computing and evaluate the performance of this cluster based on application characteristics by executing RandomTextWriter, WordCount, and PI applications. A RandomTextWriter creates given amount of random words and stores them in the HDFS(Hadoop Distributed File System). A WordCount reads an input file and determines the frequency of a given word per block unit. PI application induces PI value using the Monte Carlo law. During simulation, we investigate the effect of data block size and the number of replications on the execution time of applications. Through simulation, we have confirmed that the execution time of RandomTextWriter was proportional to the number of replications. However, the execution time of WordCount and PI were not affected by the number of replications. Moreover, the execution time of WordCount was optimum when the block size was 64~256MB. Therefore, these results show that the performance of cloud computing system can be enhanced by using a scheduling scheme that considers application's characteristics.

Use of Word Clustering to Improve Emotion Recognition from Short Text

  • Yuan, Shuai;Huang, Huan;Wu, Linjing
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.4
    • /
    • pp.103-110
    • /
    • 2016
  • Emotion recognition is an important component of affective computing, and is significant in the implementation of natural and friendly human-computer interaction. An effective approach to recognizing emotion from text is based on a machine learning technique, which deals with emotion recognition as a classification problem. However, in emotion recognition, the texts involved are usually very short, leaving a very large, sparse feature space, which decreases the performance of emotion classification. This paper proposes to resolve the problem of feature sparseness, and largely improve the emotion recognition performance from short texts by doing the following: representing short texts with word cluster features, offering a novel word clustering algorithm, and using a new feature weighting scheme. Emotion classification experiments were performed with different features and weighting schemes on a publicly available dataset. The experimental results suggest that the word cluster features and the proposed weighting scheme can partly resolve problems with feature sparseness and emotion recognition performance.

The Effect of Online Word-of-mouth on Fashion Involvement and Internet Purchase Behavior (온라인 패션 구전에 따른 패션제품 관여와 인터넷 구매행동)

  • Song, So-Jin;Hwang, Jin-Sook
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.31 no.3 s.162
    • /
    • pp.410-419
    • /
    • 2007
  • The purposes of this study were to segment consumers by on-line word of month and to find the differences among the segmented groups in regard to fashion involvement, internet perceived risk, and internet purchase behavior. The subjects of this study were female consumers who were members of online cafe in Korea. The data were collected during October, 2004. The respondents returned the questionnaires through internet and 480 questionnaires were finally used in the data analysis. The statistical analyses used for the study were factor analysis, cluster analysis, t-test, and $X^2-test$. The results showed that word-of·mouth communication on internet(e-WOM) is composed of two factors, word-of-mouth transmission and word-of-mouth acceptance. These two factors were put under cluster analysis and were classified into two groups of the word-of·mouth communication: WOM group and non-WOM group. T-test showed that word-of-mouth communication groups were significantly different in regard to fashion involvement, internet perceived risk, and internet purchase behavior. For example, WOM group was more uncertain of their clothing choices, put more weight on the internal factors of clothing selection, and was a frequent purchaser of internet fashion products. Internet fashion business needs to implement the proper marketing strategies based on the results of the study.

Korean Named Entity Recognition and Classification using Word Embedding Features (Word Embedding 자질을 이용한 한국어 개체명 인식 및 분류)

  • Choi, Yunsu;Cha, Jeongwon
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.678-685
    • /
    • 2016
  • Named Entity Recognition and Classification (NERC) is a task for recognition and classification of named entities such as a person's name, location, and organization. There have been various studies carried out on Korean NERC, but they have some problems, for example lacking some features as compared with English NERC. In this paper, we propose a method that uses word embedding as features for Korean NERC. We generate a word vector using a Continuous-Bag-of-Word (CBOW) model from POS-tagged corpus, and a word cluster symbol using a K-means algorithm from a word vector. We use the word vector and word cluster symbol as word embedding features in Conditional Random Fields (CRFs). From the result of the experiment, performance improved 1.17%, 0.61% and 1.19% respectively for TV domain, Sports domain and IT domain over the baseline system. Showing better performance than other NERC systems, we demonstrate the effectiveness and efficiency of the proposed method.

The Analysis of Knowledge Structure using Co-word Method in Quality Management Field (동시단어분석을 이용한 품질경영분야 지식구조 분석)

  • Park, Man-Hee
    • Journal of Korean Society for Quality Management
    • /
    • v.44 no.2
    • /
    • pp.389-408
    • /
    • 2016
  • Purpose: This study was designed to analyze the behavioral change of knowledge structures and the trends of research topics in the quality management field. Methods: The network structure and knowledge structure of the words were visualized in map form using co-word analysis, cluster analysis and strategic diagram. Results: Summarizing the research results obtained in this study are as follows. First, the word network derived from co-occurrence matrix had 106 nodes and 5,314 links and its density was analyzed to 0.95. Average betweenness centrality of word network was 2.37. In addition, average closeness centrality and average eigenvector centrality of word network were 0.01. Second, by applying optimal criteria of cluster decision and K-means algorithm to word co-occurrence matrix, 106 words were grouped into seven clusters such as standard & efficiency, product design, reliability, control chart, quality model, 6 sigma, and service quality. Conclusion: According to the results of strategic diagram analysis over time, the traditional research topics of quality management field related to reliability, 6 sigma, control chart topics in the third quadrant were revealed to be declined for their study importance. Research topics related to product design and customer satisfaction were found to be an important research topic over analysis periods. Research topic related to management innovation was emerging state and the scope of research topics related to process model was extended to research topics with system performance. Research topic related to service quality located in the first quadrant was analyzed as the key research topic.