• Title/Summary/Keyword: number word

Search Result 697, Processing Time 0.025 seconds

Selection of Cluster Topic Words in Hierarchical Clustering using K-Means Algorithm

  • Lee Shin Won;Yi Sang Seon;An Dong Un;Chung Sung Jong
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.885-889
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Hierarchical clustering improves the performance of retrieval and makes that users can understand easily. For outperforming of clustering, we implemented hierarchical structure with variety and readability, by careful selection of cluster topic words and deciding the number of clusters dynamically. It is important to select topic words because hierarchical clustering structure is summarizes result of searching. We made choice of noun word as a cluster topic word. The quality of topic words is increased $33\%$ as follows. As the topic word of each cluster, the only noun word is extracted for the top-level cluster and the used topic words for the children clusters were not reused.

  • PDF

Compression Effects of Number of Syllables on Korean Vowel

  • Yun, Il-Sung
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.173-184
    • /
    • 2002
  • The question of Korean rhythmic type is still a controversial issue (syllable-timed; stress-timed; word-timed). As a step toward solving the question, an experiment was carried out to examine compression effects in Korean. There has been a general belief that the increase of the number of following or preceding syllables causes compression of a vowel (or syllable) in many languages, and a marked anticipatory compression effect can be especially indicative of stress timing. The purpose of this research, therefore, was to obtain some evidence to determine whether or not Korean is stress-timed. The durations of the target vowel/a/ of the monosyllabic word /pap/ were measured at both word and sentence level. In general, marked anticipatory and backward compression effects on the target vowel were observed across one-, two- and three-syllable words in citation form, whereas the effects were neither marked nor consistent at sentence level. These results led us to claim that Korean is not stress-timed.

  • PDF

The Study on Possibility of Applying Word-Level Word Embedding Model of Literature Related to NOS -Focus on Qualitative Performance Evaluation- (과학의 본성 관련 문헌들의 단어수준 워드임베딩 모델 적용 가능성 탐색 -정성적 성능 평가를 중심으로-)

  • Kim, Hyunguk
    • Journal of Science Education
    • /
    • v.46 no.1
    • /
    • pp.17-29
    • /
    • 2022
  • The purpose of this study is to look qualitatively into how efficiently and reasonably a computer can learn themes related to the Nature of Science (NOS). In this regard, a corpus has been constructed focusing on literature (920 abstracts) related to NOS, and factors of the optimized Word2Vec (CBOW, Skip-gram) were confirmed. According to the four dimensions (Inquiry, Thinking, Knowledge and STS) of NOS, the comparative evaluation on the word-level word embedding was conducted. As a result of the study, according to the previous studies and the pre-evaluation on performance, the CBOW model was determined to be 200 for the dimension, five for the number of threads, ten for the minimum frequency, 100 for the number of repetition and one for the context range. And the Skip-gram model was determined to be 200 for the number of dimension, five for the number of threads, ten for the minimum frequency, 200 for the number of repetition and three for the context range. The Skip-gram had better performance in the dimension of Inquiry in terms of types of words with high similarity by model, which was checked by applying it to the four dimensions of NOS. In the dimensions of Thinking and Knowledge, there was no difference in the embedding performance of both models, but in case of words with high similarity for each model, they are sharing the name of a reciprocal domain so it seems that it is required to apply other models additionally in order to learn properly. It was evaluated that the dimension of STS also had the embedding performance that was not sufficient to look into comprehensive STS elements, while listing words related to solution of problems excessively. It is expected that overall implications on models available for science education and utilization of artificial intelligence could be given by making a computer learn themes related to NOS through this study.

Sub-word Based Offline Handwritten Farsi Word Recognition Using Recurrent Neural Network

  • Ghadikolaie, Mohammad Fazel Younessy;Kabir, Ehsanolah;Razzazi, Farbod
    • ETRI Journal
    • /
    • v.38 no.4
    • /
    • pp.703-713
    • /
    • 2016
  • In this paper, we present a segmentation-based method for offline Farsi handwritten word recognition. Although most segmentation-based systems suffer from segmentation errors within the first stages of recognition, using the inherent features of the Farsi writing script, we have segmented the words into sub-words. Instead of using a single complex classifier with many (N) output classes, we have created N simple recurrent neural network classifiers, each having only true/false outputs with the ability to recognize sub-words. Through the extraction of the number of sub-words in each word, and labeling the position of each sub-word (beginning/middle/end), many of the sub-word classifiers can be pruned, and a few remaining sub-word classifiers can be evaluated during the sub-word recognition stage. The candidate sub-words are then joined together and the closest word from the lexicon is chosen. The proposed method was evaluated using the Iranshahr database, which consists of 17,000 samples of Iranian handwritten city names. The results show the high recognition accuracy of the proposed method.

Feature Extraction of Web Document using Association Word Mining (연관 단어 마이닝을 사용한 웹문서의 특징 추출)

  • 고수정;최준혁;이정현
    • Journal of KIISE:Databases
    • /
    • v.30 no.4
    • /
    • pp.351-361
    • /
    • 2003
  • The previous studies to extract features for document through word association have the problems of updating profiles periodically, dealing with noun phrases, and calculating the probability for indices. We propose more effective feature extraction method which is using association word mining. The association word mining method, by using Apriori algorithm, represents a feature for document as not single words but association-word-vectors. Association words extracted from document by Apriori algorithm depend on confidence, support, and the number of composed words. This paper proposes an effective method to determine confidence, support, and the number of words composing association words. Since the feature extraction method using association word mining does not use the profile, it need not update the profile, and automatically generates noun phrase by using confidence and support at Apriori algorithm without calculating the probability for index. We apply the proposed method to document classification using Naive Bayes classifier, and compare it with methods of information gain and TFㆍIDF. Besides, we compare the method proposed in this paper with document classification methods using index association and word association based on the model of probability, respectively.

A Design of Japanese Analyzer for Japanese to Korean Translation System (일반 번역시스탬을 위한 일본어 해석기 설계)

  • 강석훈;최병욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.1
    • /
    • pp.136-146
    • /
    • 1995
  • In this paper, a Japanese morphological analyzer for Japanese to Korean Machine Translation System is designed. The analyzer reconstructs the Japanese input sentence into word phrases that include grammatical and dictionary informations. Thus we propose the algorithm to separate morphemes and then connect them by reference to a corresponding Korean word phrases. And we define the connector to control Japanese word phrases It is used in controlling the start and the end point of the word phrase in the Japanese sentence which is without a space. The proposed analyzer uses the analysis dictionary to perform more efficient analysis than the existing analyzer. And we can decrease the number of its dictionary searches. Since the analyzer, proposed in this paper, for Japanese to Korean Machine Translation System processes each word phrase in consideration of the corresponding Korean word phrase, it can generate more accurate Korean expressions than the existing one which places great importance on the generation of the entire sentence structure.

  • PDF

Comparison Research of Non-Target Sentence Rejection on Phoneme-Based Recognition Networks (음소기반 인식 네트워크에서의 비인식 대상 문장 거부 기능의 비교 연구)

  • Kim, Hyung-Tai;Ha, Jin-Young
    • MALSORI
    • /
    • no.59
    • /
    • pp.27-51
    • /
    • 2006
  • For speech recognition systems, rejection function as well as decoding function is necessary to improve the reliability. There have been many research efforts on out-of-vocabulary word rejection, however, little attention has been paid on non-target sentence rejection. Recently pronunciation approaches using speech recognition increase the need for non-target sentence rejection to provide more accurate and robust results. In this paper, we proposed filler model method and word/phoneme detection ratio method to implement non-target sentence rejection system. We made performance evaluation of filler model along to word-level, phoneme-level, and sentence-level filler models respectively. We also perform the similar experiment using word-level and phoneme-level word/phoneme detection ratio method. For the performance evaluation, the minimized average of FAR and FRR is used for comparing the effectiveness of each method along with the number of words of given sentences. From the experimental results, we got to know that word-level method outperforms the other methods, and word-level filler mode shows slightly better results than that of word detection ratio method.

  • PDF

Effective Demand Lifting through Pre-Launch Movie Marketing Activities

  • Song, Tae Ho;Yoo, Shijin;Lee, Janghyuk
    • Asia Marketing Journal
    • /
    • v.18 no.3
    • /
    • pp.1-18
    • /
    • 2016
  • The purpose of this paper is to examine empirically how to balance advertising expenditure before and after launch with regard to the direction of word of mouth in the motion picture industry. The vector auto-regression model is applied to assess the dynamic impact of advertising and word of mouth on sales. Empirical data, including advertising, word of mouth, and sales (the number of entries) of 83 movies are used for analysis. The research results show that for a movie having more positive word of mouth in the pre- and post-launch periods, it is worthwhile to spend the advertising budget in the pre-launch period only and to spare it in post-launch period. However, it is worthwhile to spare the advertising budget in the pre-launch period for movies having less positive word of mouth before and after launch, and to concentrate spending in post-launch period instead. Mangers who handle products and services facing shortened lifecycles, such as games, eBooks, and digital music contents, need to check the quality of pre-launch word of mouth for their advertising budget decisions in the pre- and post-launch periods and spend more of the advertising budget in the post- (pre-) launch period if pre-launch word of mouth is negative (positive). For products and services with a shortened lifecycle, it is recommended to spend more of the advertising budget in the post- (pre-) launch period if pre-launch word of mouth is negative (positive).

A Linear-Time Algorithm to Find the First Overlap in a Binary Word

  • Park, Thomas H.
    • Proceedings of the IEEK Conference
    • /
    • 2000.06c
    • /
    • pp.165-168
    • /
    • 2000
  • First, we give a linear-time algorithm to find the first overlap in an arbitrary binary word. Second, we implement the algorithm in the C language and show that the number of comparisons in this algorithm is less than 31n, where n$\geq$3 is the length of the input word.

  • PDF

A Study on Jamokai(子母蓋) (子母蓋의 硏究)

  • 김진구
    • The Research Journal of the Costume Culture
    • /
    • v.5 no.1
    • /
    • pp.11-18
    • /
    • 1997
  • The purpose of this research was to ideatify and to trace the origin and meaning of jamokai(子母蓋). Comparative linguistic analytical approaches were used for the analysis of this study. Summaries of findings in this research are as follows : 1. There wese a number of phonetic value for jamokai (子母蓋) in Chinese. 2. The term, jamokai was derived from Hebrew. 3. The meaning of jamokai of Koryo was originated from Hebrew word meaning woman's veil. 4. The word jamokai was related to Perian, Arakic, Indian. 5. It is considered that tsi-ma-kai (치마개) was transliterated to tsi-m∂u-kai (子母蓋) in Chinese by the author of Keirim Yusa (鷄林類事). 6. The word jamokai (子母蓋) of Koryo was not related to Chinese.

  • PDF