• Title/Summary/Keyword: number word

Search Result 698, Processing Time 0.021 seconds

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.163-170
    • /
    • 2019
  • Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.

A Study on Korean Speech Analysis using Walsh Transform (Walsh변환을 이용한 한국어 숫자음 음성분석에 관한 연구)

  • 김계현;김준현
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.37 no.4
    • /
    • pp.251-256
    • /
    • 1988
  • This work describes a speech analysis of Korean number ('1'-'10') which are spoken by several speakers using Fast Walsh Transform(FWHT) method. FWHT includes only addition and subtraction operations, therefore faster and needs less memory than FFT(Fast Fourier Transfifrm) or LPC(Linear Predictive Coding) analysis method. We have investigated that FWHT method can find speaker independent feature(which represents same cue about some word independent of different speakers) The results of this experiment, the 70% of same words(korean number '2')which spoken by several speakers have had slmilar patterns.

  • PDF

LDA Topic Modeling and Recommendation of Similar Patent Document Using Word2vec (LDA 토픽 모델링과 Word2vec을 활용한 유사 특허문서 추천연구)

  • Apgil Lee;Keunho Choi;Gunwoo Kim
    • Information Systems Review
    • /
    • v.22 no.1
    • /
    • pp.17-31
    • /
    • 2020
  • With the start of the fourth industrial revolution era, technologies of various fields are merged and new types of technologies and products are being developed. In addition, the importance of the registration of intellectual property rights and patent registration to gain market dominance of them is increasing in oversea as well as in domestic. Accordingly, the number of patents to be processed per examiner is increasing every year, so time and cost for prior art research are increasing. Therefore, a number of researches have been carried out to reduce examination time and cost for patent-pending technology. This paper proposes a method to calculate the degree of similarity among patent documents of the same priority claim when a plurality of patent rights priority claims are filed and to provide them to the examiner and the patent applicant. To this end, we preprocessed the data of the existing irregular patent documents, used Word2vec to obtain similarity between patent documents, and then proposed recommendation model that recommends a similar patent document in descending order of score. This makes it possible to promptly refer to the examination history of patent documents judged to be similar at the time of examination by the examiner, thereby reducing the burden of work and enabling efficient search in the applicant's prior art research. We expect it will contribute greatly.

Font Change Blindness Triggered by the Text Difficulty in Moving Window Technique (움직이는 창 기법에서의 덩이글 난이도에 따른 글꼴 변화맹)

  • Seong-Jun Bak;Joo-Seok Hyun
    • Korean Journal of Cognitive Science
    • /
    • v.34 no.4
    • /
    • pp.259-275
    • /
    • 2023
  • The aim of this study was to investigate font change blindness based on text difficulty in the "Moving Window Task", as originally introduced by McConkie and Rayner(1975). During the reading process where the moving window was applied, different target words in terms of font style compared to the text were presented. As participants' gaze reached the position of the target word, the font of the target word was changed to match the text font. The font of the target word before the change was either sans-serif when the text font was serif, or serif when the text font was sans-serif. After completing the reading task, more than half of the participants(62.5%) reported not detecting the font change. Observation of eye movements at the target word positions revealed that when understanding the content within the text was difficult, there was an increase in the number of regressions, an extended gaze duration, and a reduction in saccade length. Specifically, the increase in the number of regressions was evident only when the text font was serif, in other words, when the font of the target word shifted from sans-serif to serif. These results suggest that sensory interference unrelated to content understanding is not easily detected during reading. However, the possibility of detection increases when comprehension of the content becomes challenging. Furthermore, this exceptional detection possibility implies that it may be higher when the text font is serif compared to when it is sans-serif.

A NEW UPPER BOUND FOR SINGLE ERROR-CORRECTING CODES

  • Kim, Jun-Kyo
    • Bulletin of the Korean Mathematical Society
    • /
    • v.38 no.4
    • /
    • pp.797-801
    • /
    • 2001
  • The purpose of this paper is to give an upper bound for A[n,4], the maximum number of codewords in a binary code of word length n with minimum distance 4 between codewords. We have improved upper bound for A[12k+11,4]. In this correspondence we prove $A[23,4]\leq173716$.

  • PDF

Improving on Matrix Factorization for Recommendation Systems by Using a Character-Level Convolutional Neural Network (문자 수준 컨볼루션 뉴럴 네트워크를 이용한 추천시스템에서의 행렬 분해법 개선)

  • Son, Donghee;Shim, Kyuseok
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.2
    • /
    • pp.93-98
    • /
    • 2018
  • Recommendation systems are used to provide items of interests for users to maximize a company's profit. Matrix factorization is frequently used by recommendation systems, based on an incomplete user-item rating matrix. However, as the number of items and users increase, it becomes difficult to make accurate recommendations due to the sparsity of data. To overcome this drawback, the use of text data related to items was recently suggested for matrix factorization algorithms. Furthermore, a word-level convolutional neural network was shown to be effective in the process of extracting the word-level features from the text data among these kinds of matrix factorization algorithms. However, it involves a large number of parameters to learn in the word-level convolutional neural network. Thus, we propose a matrix factorization algorithm which utilizes a character-level convolutional neural network with which to extract the character-level features from the text data. We also conducted a performance study with real-life datasets to show the effectiveness of the proposed matrix factorization algorithm.

Design of A Reed-Solomon Code Decoder for Compact Disc Player using Microprogramming Method (마이크로프로그래밍 방식을 이용한 CDP용 Reed-Solomon 부호의 복호기 설계)

  • 김태용;김재균
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.10
    • /
    • pp.1495-1507
    • /
    • 1993
  • In this paper, an implementation of RS (Reed-Solomon) code decoder for CDP (Compact Disc Player) using microprogramming method is presented. In this decoding strategy, the equations composed of Newton's identities are used for computing the coefficients of the error locator polynomial and for checking the number of erasures in C2(outer code). Also, in C2 decoding the values of erasures are computed from syndromes and the results of C1(inner code) decoding. We pulled up the error correctability by correcting 4 erasures or less. The decoder contains an arithmetic logic unit over GF(28) for error correcting and a decoding controller with programming ROM, and also microinstructions. Microinstructions are used for an implementation of a decoding algorithm for RS code. As a result, it can be easily modified for upgrade or other applications by changing the programming ROM only. The decoder is implemented by the Logic Level Modeling of Verilog HDL. In the decoder, each microinstruction has 14 bits( = 1 word), and the size of the programming ROM is 360 words. The number of the maximum clock-cycle for decoding both C1 and C2 is 424.

  • PDF

Analysis of Content and Structure of Library Week Slogans (도서관 주간 표어의 내용 및 구조 분석)

  • Lim, Seong-Kwan
    • Journal of Korean Library and Information Science Society
    • /
    • v.51 no.3
    • /
    • pp.53-80
    • /
    • 2020
  • Library Week was established in 1964 and has continued without interruption for 56 years. It has grown to become one of the largest and most important activities within the library field today. The Korean Library Association publicly selects slogans to be used for Library Week publicity purposes and their poster advertising campaign. A study and evaluation analyzing slogan contents was performed to determine the overall effects of these slogans on the public and provide suggestions for more effective and focused slogans as part of a branding strategy. To achieve this purpose, the contents of 116 official slogans of the library week were analyzed according to the linguistic techniques and key words suggested by Young-Jun Park (2001). As a result, there were 103 slogans (88.79%) composed of only 'Korean characters'. And the key scoring type with the highest number of hits was the word 'sentences' 46 units (39.66%). Furthermore the word in the slogans resulting in the highest number of hits was 'library' with 111 units (96.52%). Therefore, it can be said that most of the weekly slogans of the library are 'sentence type' consisting of 'Korean characters' containing the word 'library'.

Dynamic recomposition of document category using user intention tree (사용자 의도 트리를 사용한 동적 카테고리 재구성)

  • Kim, Hyo-Lae;Jang, Young-Cheol;Lee, Chang-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.8B no.6
    • /
    • pp.657-668
    • /
    • 2001
  • It is difficult that web documents are classified with exact user intention because existing document classification systems are based on word frequency number using single keyword. To improve this defect, first, we use keyword, a query, domain knowledge. Like explanation based learning, first, query is analyzed with knowledge based information and then structured user intention information is extracted. We use this intention tree in the course of existing word frequency number based document classification as user information and constraints. Thus, we can classify web documents with more exact user intention. In classifying document, structured user intention information is helpful to keep more documents and information which can be lost in the system using single keyword information. Our hybrid approach integrating user intention information with existing statistics and probability method is more efficient to decide direction and range of document category than existing word frequency approach.

  • PDF

A Study for Development of a Korean Pain Measurement Tool(II). A Study for Testing Ranks of Words in each Subclass of a Korean Pain Measurement Tool (동통 평가도구 개발을 위한 연구 -한국 통증 어휘별 강도 순위의 유의도 및 신뢰도 검사-)

  • 이은옥;송미순
    • Journal of Korean Academy of Nursing
    • /
    • v.13 no.3
    • /
    • pp.106-118
    • /
    • 1983
  • The main purpose of this study is to systematically classify words indicating pain in terms of their ranks in each subclass. This study is a part of developing a Korean Pain Measurement Tool. This study didnot include exploration of each word's dimension such as sensory or affective. Eighty three Korean words tentatively classified in 19 subclasses in previous study were used for this study. At least three to six words were included in each subclass and the words were randomly placed in which each subject indicates their rank of pain degree. One hundred and fifty nursing students and one hundred clinical nurses were requested to indicate the rank of each word. One hundred and sixteen students and eighty three nurses completed the ratings for analysis. The data were collected from June 1983 to July 1983. The data using ordinal scale were analyzed by Friedman ANOVA to test significant difference between rank means. All of pain words indicated significant rank mean difference in all of 19 subclasses. Some of the words were either cancelled or replaced by other words, or rearranged for their ranks. Subclasses of which words were cancelled were 1) Simple stimulating pain, 2) Punctuate pressure, 3) peripheral nerve pain, 4) radiation pain, 5) punishment-related pain, and 6) suffering-related pain. Subclasses of which words were replaced or rearranged were 1) incisive pressure, 2) constrictive pressure, 3) dull pain, 4) tract pain, 5) digestion-related pain and 6) fear-related pain. Four subclasses such as traction pressure, thermal, cavity pressure, and fatigue- elated pain indicated significant differences among rank means in each subclasses and showed no visible overlaps of the ranks among means. Further research is needed using high level measurement of pain degree of each word and more sophisticated analysis of the pain degrees. Three pain words which would be related to chemical stimulation were newly explored and included as a new subclass. Through this study, the total number of subclasses increases from 19 to 20 and the total number of Korean words in the scale decreases from 83 to 80.

  • PDF