• Title/Summary/Keyword: domain of words

Search Result 214, Processing Time 0.03 seconds

Text Mining and Sentiment Analysis for Predicting Box Office Success

  • Kim, Yoosin;Kang, Mingon;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.4090-4102
    • /
    • 2018
  • After emerging online communications, text mining and sentiment analysis has been frequently applied into analyzing electronic word-of-mouth. This study aims to develop a domain-specific lexicon of sentiment analysis to predict box office success in Korea film market and validate the feasibility of the lexicon. Natural language processing, a machine learning algorithm, and a lexicon-based sentiment classification method are employed. To create a movie domain sentiment lexicon, 233,631 reviews of 147 movies with popularity ratings is collected by a XML crawling package in R program. We accomplished 81.69% accuracy in sentiment classification by the Korean sentiment dictionary including 706 negative words and 617 positive words. The result showed a stronger positive relationship with box office success and consumers' sentiment as well as a significant positive effect in the linear regression for the predicting model. In addition, it reveals emotion in the user-generated content can be a more accurate clue to predict business success.

Speech Feature Extraction for Isolated Word in Frequency Domain (주파수 영역에서의 고립단어에 대한 음성 특징 추출)

  • 조영훈;박은명;강홍석;박원배
    • Proceedings of the IEEK Conference
    • /
    • 2000.06d
    • /
    • pp.81-84
    • /
    • 2000
  • In this paper, a new technology for extracting the feature of the speech signal of an isolated word by the analysis on the frequency domain is proposed. This technology can be applied efficiently for the limited speech domain. In order to extract the feature of speech signal, the number of peaks is calculated and the value of the frequency for a peak is used. Then the difference between the maximum peak and the second peak is also considered to identify the meanings among the words in the limited domain. By implementing this process hierarchically, the feature of speech signal can be extracted more quickly.

  • PDF

A Time-Domain Parameter Extraction Method for Speech Recognition using the Local Peak-to-Peak Interval Information (국소 극대-극소점 간의 간격정보를 이용한 시간영역에서의 음성인식을 위한 파라미터 추출 방법)

  • 임재열;김형일;안수길
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.2
    • /
    • pp.28-34
    • /
    • 1994
  • In this paper, a new time-domain parameter extraction method for speech recognition is proposed. The suggested emthod is based on the fact that the local peak-to-peak interval, i.e., the interval between maxima and minima of speech waveform is closely related to the frequency component of the speech signal. The parameterization is achieved by a sort of filter bank technique in the time domain. To test the proposed parameter extraction emthod, an isolated word recognizer based on Vector Quantization and Hidden Markov Model was constructed. As a test material, 22 words spoken by ten males were used and the recognition rate of 92.9% was obtained. This result leads to the conclusion that the new parameter extraction method can be used for speech recognition system. Since the proposed method is processed in the time domain, the real-time parameter extraction can be implemented in the class of personal computer equipped onlu with an A/D converter without any DSP board.

  • PDF

The function of language and its limitations in the Modern theater (현대 연극에 나타난 언어의 위기 및 그 한계)

  • Yang, Gi-Chan
    • Lingua Humanitatis
    • /
    • v.8
    • /
    • pp.79-93
    • /
    • 2006
  • The modern play is going through a change that is differentiating it from the plays of yesterday. The importance of narration through language, specifically that of words spoken on stage as a means of communication is being replaced by images and minimalism of words. The narration that depended on spoken words today depends more on the images that are conjured on stage. This movement shows also the very development of stage and its craft in the domain of theater and especially holds true in the avant-garde theaters of today. The avant-garde theater, in trying to duplicate the reality does not confine itself to oratory rhetorics that we see in the traditional plays of the past but expresses itself by mimicking the reality to the utmost possible.

  • PDF

The Set Expansion System Using the Mutual Importance Measurement Method to Automatically Build up Named Entity Domain Dictionaries (영역별 개체명 사전 자동 구축을 위한 상호 중요도 계산 기법 기반의 집합 확장 시스템)

  • Bae, Sang-Joon;Ko, Young-Joong
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.4
    • /
    • pp.443-458
    • /
    • 2008
  • Since Web pages contain a lot of information today, the Web becomes an important resource to extract some information. In this paper, we proposes a set expansion system which can automatically extract named entities from the Web. Overall, the proposed method consists of three steps. First of all, Web pages, which may include many named entities of a domain, are collected by using several seed words of the domain. Then some pattern rules are extracted by using seed words and the collected Web pages, and the named entity candidates are selected through applying the extracted pattern rules into Web pages. To distinguish real named entities, we develop the new mutual importance measurement method which estimates the importance of named entity candidates. We conducted experiments for 3 domains for Korean and for 8 domains for English. As a result, the proposed method obtained 78.72% MAP in Korean and 96.48% MAP in English. In particular, the performances of English domains are better than the results of the Google set.

  • PDF

Articulatory Manifestation of Prosodic Strengthening in English /i/ and /I/

  • Kim, Sa-Hyang;Cho, Tae-Hong
    • Phonetics and Speech Sciences
    • /
    • v.3 no.4
    • /
    • pp.13-21
    • /
    • 2011
  • The present study investigated the effects of two different sources of prosodic strengthening, i.e., boundary and accent, in the articulation of English high front vowels, /i/ and /I/. The vowels were investigated in vowel-initial ('eat' vs. 'it'), /h/-initial ('heat' vs. 'hit') and /p/-initial words ('Pete' vs. 'pit'), which were placed in varying prosodic conditions. Using Electromagnetic Articulograph (EMA), the tongue dorsum positions in the x and y dimensions, the lip opening and the jaw opening (lowering) were measured. With respect to the boundary-induced strengthening, results showed that /i/ and /I/ in vowel-initial words ('eat' - 'it') are produced with a higher tongue position in the domain-intial than domain-medial positions. The fact that the vowels only in the vowel-initial condition showed the domain-intial strengthening (DIS) effect suggests that the DIS effect is localized mainly to the initial position (the locality account). As for the accent-induced strengthening, vowels were produced with a more fronted tongue position and larger lip opening in accented than unaccented positions. This suggests that the presence of accent increases overall sonority of the vowels in various prosodic contexts, and enhances primarily the frontedness of the front high vowels. Taken together, the results indicate that the two types of prosodic strengthening are articulatorily realized differently, supporting the view that they are encoded separately in the speech planning process. The present study also showed the distinction between the two high front vowels in the tongue position (in both the frontedness and the height dimensions), while the jaw did not seem to contribute to the distinction robustly, suggesting that the tongue contributes more in distinguishing the two vowels than the jaw does.

  • PDF

Construction of English-Korean Automatic Translation System for Patent Documents Based on Domain Customizing Method (도메인 특화 방법에 의한 영한 특허 자동 번역 시스템의 구축)

  • Choi, Sung-Kwon;Kwon, Oh-Woog;Lee, Ki-Young;Roh, Yoon-Hyung;Park, Sang-Kyu
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.2
    • /
    • pp.95-103
    • /
    • 2007
  • This paper describes an English-to-Korean automatic translation system for patent documents which is constructed by a method customizing from a general domain to a specific domain. The customizing method consists of following steps: 1) linguistically studying about characteristics of patent documents, 2) extracting unknown words from large patent documents and terminologically constructing, 3) customizing the target language words of existing terms, 4) extracting and constructing patent translation patterns peculiar to patent documents, 5) customizing existing translation engine modules according to linguistic study about characteristics of patent documents, 6) evaluation of automatic translation results. The English-to-Korean patent machine translation system implemented by these customization steps shows a translation accuracy of 81.03% and is improving.

Deep Learning-based Target Masking Scheme for Understanding Meaning of Newly Coined Words

  • Nam, Gun-Min;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.157-165
    • /
    • 2021
  • Recently, studies using deep learning to analyze a large amount of text are being actively conducted. In particular, a pre-trained language model that applies the learning results of a large amount of text to the analysis of a specific domain text is attracting attention. Among various pre-trained language models, BERT(Bidirectional Encoder Representations from Transformers)-based model is the most widely used. Recently, research to improve the performance of analysis is being conducted through further pre-training using BERT's MLM(Masked Language Model). However, the traditional MLM has difficulties in clearly understands the meaning of sentences containing new words such as newly coined words. Therefore, in this study, we newly propose NTM(Newly coined words Target Masking), which performs masking only on new words. As a result of analyzing about 700,000 movie reviews of portal 'N' by applying the proposed methodology, it was confirmed that the proposed NTM showed superior performance in terms of accuracy of sensitivity analysis compared to the existing random masking.

A study of Traditional Korean Medicine(TKM) term's Normalization for Enlarged Reference terminology model (참조용어(Reference Terminology) 모델 확장을 위한 한의학용어 정형화(Normalization) 연구)

  • Jeon, Byoung-Uk;Hong, Seong-Cheon
    • Journal of the Korean Institute of Oriental Medical Informatics
    • /
    • v.15 no.2
    • /
    • pp.1-6
    • /
    • 2009
  • The discipline of terminology is based on its own theoretical principles and consists primarily of the following aspects: analysing the concepts and concept structures used in a field or domain of activity, identifying the terms assigned to the concepts, in the case of bilingual or multilingual terminology, establishing correspondences between terms in the various languages, creating new terms, as required. The word properties has syntax, morphology and orthography. The syntax is that how words are put together. The morphology is consist of inflection, derivation, and compounding. The orthography is spelling. Otherwise, the terms of TKM(Traditional Korean Medicine) is two important element of visual character and phonetic notation. A visual character consist of spell, sort words, stop words, etc. For example, that is a case of sort words in which this '다한', '한다', '多汗', '汗多' as same. A phonetic notation consist of palatalization, initial law, etc. For example, that is a case of palatalization in which this '수족랭', '수족냉', '手足冷', '手足冷' as same. Therefore, to enlarged reference terminology is a method by term's normalization. For such a reason, TKM's terms of normalization is necessary.

  • PDF

The trends of Nursing Research in the Journal of Korean Academy of Adult Nursing (최근 3년간 성인간호학회지 게재 논문의 내용과 경향 분석 (2004-2006년))

  • Park, Yeon-Hwan;Lee, Young-Whee;Kim, Ok-Soo;Cho, Myung-Ok
    • Korean Journal of Adult Nursing
    • /
    • v.20 no.1
    • /
    • pp.176-186
    • /
    • 2008
  • Purpose: The purpose of this study was to analyze the published articles in the Journal of Korean Academy of Adult Nursing from 2004 through 2006. Methods: Two hundreds and ten articles were analyzed focusing on research methodology and key words using descriptive statistics. Results: The proportion of quantitative research was 88.1%, while the proportion of qualitative research was 5.2%. The majority of the qualitative research design was survey(67.1%). Seventy-four percent of the research had verbal consent and 8% had written consent from the participants. Eight percent of the research provided conceptual framework. The prevailing data collection settings were hospitals(50.5%) and community(37.1%). For the data analysis, 95% used parametric analysis methods; descriptive statistics(26.2%), chi-square test(18.3%), t-test(18%) and ANOVA(17.4%). Key words were categorized into four nursing domain: human, health, nursing, and environment. The most frequently used domain was health. Conclusion: The number of the published articles in the Journal of Korean Academy of Adult Nursing has been increased and quality has been improved compared with the articles published before the 2000 year. Varied research methodology and data analysis methods were utilized.

  • PDF