• Title/Summary/Keyword: 어휘사용빈도

Search Result 104, Processing Time 0.024 seconds

Addressing Low-Resource Problems in Statistical Machine Translation of Manual Signals in Sign Language (말뭉치 자원 희소성에 따른 통계적 수지 신호 번역 문제의 해결)

  • Park, Hancheol;Kim, Jung-Ho;Park, Jong C.
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.163-170
    • /
    • 2017
  • Despite the rise of studies in spoken to sign language translation, low-resource problems of sign language corpus have been rarely addressed. As a first step towards translating from spoken to sign language, we addressed the problems arising from resource scarcity when translating spoken language to manual signals translation using statistical machine translation techniques. More specifically, we proposed three preprocessing methods: 1) paraphrase generation, which increases the size of the corpora, 2) lemmatization, which increases the frequency of each word in the corpora and the translatability of new input words in spoken language, and 3) elimination of function words that are not glossed into manual signals, which match the corresponding constituents of the bilingual sentence pairs. In our experiments, we used different types of English-American sign language parallel corpora. The experimental results showed that the system with each method and the combination of the methods improved the quality of manual signals translation, regardless of the type of the corpora.

Design and Implementation of Ontology-Based Natural Language Search System (온톨로지 기반의 자연어 검색 시스템 설계 및 구현)

  • Kang, Rae-Goo;Lim, Dong-Il;Jung, Chai-Yeoung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.875-878
    • /
    • 2007
  • Up until now, when a user search product information, the keyword-based search that mainly uses frequency of words or vocabulary information has been utilized in large. In the keyword-based research, the user should have to bear additional burden in order to search the displayed results manually once again because it shows those files that have no connection at all with the inquiries made by the user. To resolve such a problem, ontology has been emerged. In this paper, product search system using ontology was constructed directly and also tested how accurate search it does perform through the searching according to classification. To test this, about 40,000 product data of A discount store, which was operating on/off line discount stores, were constructed as database, and developmental environment for User Interface was tested by having developed the search system using JSP and PowerBuilder 9.0. Results from the test proved that the search method using Domain Ontology for product presented and designed in this paper was superior to the existing keyword-based search method.

  • PDF

Design and Implementation of Search System Using Domain Ontology (도메인 온톨로지를 이용한 검색 시스템 설계 및 구현)

  • Kang, Rae-Goo;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.7
    • /
    • pp.1318-1324
    • /
    • 2007
  • TSP(Traveling Salesman Problem) is a problem finding out the shortest distance out of many courses where given cities of the number of N, one starts a certain city and turns back to a starting city, visiting every city only once. As the number of cities having visited increases, the calculation rate increases geometrically. This problem makes TSP classified in NP-Hard Problem and genetic algorithm is used representatively. To obtain a better result in TSP, various operators have been developed and studied. This paper suggests new method of population initialization and of sequential transformation, and then proves the improvement of capability by comparing them with existing methods.

A Study on Data Cleansing Techniques for Word Cloud Analysis of Text Data (텍스트 데이터 워드클라우드 분석을 위한 데이터 정제기법에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.745-750
    • /
    • 2021
  • In Big data visualization analysis of unstructured text data, raw data is mostly large-capacity, and analysis techniques cannot be applied without cleansing it unstructured. Therefore, from the collected raw data, unnecessary data is removed through the first heuristic cleansing process and Stopwords are removed through the second machine cleansing process. Then, the frequency of the vocabulary is calculated, visualized using the word cloud technique, and key issues are extracted and informationalized, and the results are analyzed. In this study, we propose a new Stopword cleansing technique using an external Stopword set (DB) in Python word cloud, and derive the problems and effectiveness of this technique through practical case analysis. And, through this verification result, the utility of the practical application of word cloud analysis applying the proposed cleansing technique is presented.

Analysis of Language Message Expression in Beauty Magazine's Cosmetic Ads : Focusing on "Hyang-jang", AMOREPACIFIC's from 1958 to 2018 (화장품광고에 나타난 언어메시지 표현분석 : 1958년~2018년의 아모레퍼시픽 뷰티매거진<향장>을 중심으로)

  • Choi, Eun-Sob
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.7
    • /
    • pp.99-118
    • /
    • 2019
  • This study confirmed the followings based on analysis of language messages in 718 advertisement in , AMOREPACIFIC's beauty magazine, published from 1958 to 2018 by product categories, era, in terms of purchase information, persuasive expression, word type. First, the number of pieces among 1980s to 1990s advertisement were the largest and, in terms of product categories, there were the greatest number of pieces in skincare, makeup and mens products. Second, headline and bodycopy had a different aspect in persuasive expression. "focused on image-making" was mainly used for head lines. Specifically, "situational image" was generally dominant. While the "user image" was higher before 1990's, "brand image" was as recent times. "Informal" was mostly applied for bodycopies, especially, "general information" and "differentiated information" was used the most. It is important to know what kind of information the brand established in each brand should be embodied rather than simply dividing the appeal method into "rational appeal" and "emotional appeal."Third, persuasive expression has different aspects in headlines and body copies. "focused on image-making" was mainly used as headlines. Specifically, "situational image" is dominant. Also, "user image" was high before 1990s but "brand image" got higher in recent times. "Informal" was mostly used as body copies, especially "general information" and "differentiated information" were the most frequently selected. Therefore, it is important to apprehend which information to specify established images by brands, rather than to divide "rational appeals" and "emotional appeals". Lastly, categorizing word type into brand names and headlines, foreign language was the most dominant in brand names and Chinese characters in headline. Remarkably, brand names in native language temporarily high in 70's and 80's, which could be interpreted to be resulted from the government policy promoting native language brands in those times. In addition, foreign language was frequently used in cosmetics and Chinese characters in men's product. It could be explained that colors or seasons in cosmetic products were expressed in foreign language in most case. On the other hand, the inclination of men's product consumers, where they pursue prestige or confidence in Chinese character, was actively reflected to language messages.

Comparative Analysis of Medical Terminology Among Korea, China, and Japan in the Field of Cardiopulmonary Bypass (한.중.일 의학용어 비교 분석 - 심폐바이패스 영역를 중심으로 -)

  • Kim, Won-Gon
    • Journal of Chest Surgery
    • /
    • v.40 no.3 s.272
    • /
    • pp.159-167
    • /
    • 2007
  • Background: Vocabularies originating from Chinese characters constitute an important common factor in the medical terminologies used 3 eastern Asian countries; Korea, China and Japan. This study was performed to comparatively analyze the medical terminologies of these 3 countries in the field of cardiopulmonary bypass (CPB) and; thereby, facilitate further understanding among the 3 medical societies. Material and Method: A total of 129 English terms (core 85 and related 44) in the field of CPB were selected and translated into each country's official terminology, with help from Seoul National University Hospital (Korea), Tokyo Michi Memorial Hospital(Japan), and Yanbian Welfare Hospital and Harbin Children Hospital (China). Dictionaries and CPB textbooks were also cited. In addition to the official terminology used in each country, the frequency of use of English terms in a clinical setting was also analyzed. Result and Conclusion: Among the 129 terms, 28 (21.7%) were identical between the 3 countries, as based on the Chinese characters. 86 terms were identical between only two countries, mostly between Korea and Japan. As a result, the identity rate in CPB terminology between Korea and Japan was 86.8%; whereas, between Korea and China and between Japan and China the rates were both 24.8%. The frequency of use of English terms in clinical practices was much higher in Korea and Japan than in China. Despite some inherent limitations involved in the analysis, this study can be a meaningful foundation in facilitating mutual understanding between the medical societies of these 3 eastern Asian countries.

Semantic Network Analysis of Presidential Debates in 2007 Election in Korea (제17대 대통령 후보 합동 토론 언어네트워크 분석 - 북한 관련 이슈를 중심으로)

  • Park, Sung-Hee
    • Korean journal of communication and information
    • /
    • v.45
    • /
    • pp.220-254
    • /
    • 2009
  • Presidential TV debates serve as an important instrument for the general viewers to evaluate the candidates’ character, to examine their policy, and finally to make an important political decisions to cast ballots. Every words candidates utter in the course of entire election campaign exert influence of a certain significance by delivering their ideas and by creating clashes with their respective opponents. This study focuses on the conceptual venue, coined as ‘stasis’ by ancient rhetoricians, in which the clashes take place, and examines the words selection made by each candidates, the manners in which they form stasis, call for evidence, educate the public, and finally create a legitimate form of political argumentation. The study applied computer based content analysis using KrKwic and UCINET software to analyze semantic networks among the candidates. The results showed three major candidates, namely Lee Myung Bak, Jung Dong Young, and Lee Hoi Chang, displayed separate patterns in their use of language, by selecting the words that are often neglected by their opponents. Apparently, the absence of stasis and the lack of speaking mutual language significantly undermined the effects of debates. Central questions regarding issues of North Korea failed to meet basic requirements, and the respondents failed to engage in effective argumentation process.

  • PDF

Topiramate for the Treatment of Binge Eating Disorder or Bulimia Nervosa : A Systemic Review of Human Clinical Studies and Case Reports (Topiramate의 신경성 폭식증 치료효과: 국내외 보고된 임상연구결과 및 치험사례 중심으로)

  • Lee, Yu-Jeung;Bang, Joon-Seok
    • Korean Journal of Clinical Pharmacy
    • /
    • v.17 no.1
    • /
    • pp.6-12
    • /
    • 2007
  • The clinical investigations above suggest that topiramate may be an effective agent in the treatment of BED or BN by reducing binging/purging episodes, improving the HRQOL, and decreasing weight. The case report and case series also support these findings. However, there are several limitations in the above studies and cases. All these had relatively small sample size, and two of them were only 10-week-period studies. Optimal duration of treatment with topiramate in patients with BED or BN is unknown. As most clinicians treat the patient with BED or BN for 6 to 12 months and then reassess, at least 6 months period is needed to show its efficacy. One of studies included only women in the patient group. In the case series, all patients had severe comorbid mood disorders such as major depression and bipolar disorder besides BN. Therefore, notwithstanding its clinical usefulness, additional researches are needed to define the role and the benefits of topiramate in the treatment of BED or BN more thoroughly.

  • PDF

Variables affecting Korean word recognition: focusing on syllable shape (한글 단어 재인에 영향을 미치는 변인: 음절 형태를 중심으로)

  • Min, Suyoung;Lee, Chang H.
    • Korean Journal of Cognitive Science
    • /
    • v.29 no.4
    • /
    • pp.193-220
    • /
    • 2018
  • Recent studies have demonstrated that word frequency, word length, neighborhood and word shape may have a role in visual word recognition. Shape information may affect word processing in different ways as Korean letter system works differently than that of English. The purpose of this study was to apply Gestalt's continuity principle to Korean alphabetic script(hangul), and to investigate the processing unit of hangul and to verify whether syllable shape affects word recognition in hangul. In experiment 1, three syllable words were utilized and two variables; 1) syllable types(horizontal syllable shape, e.g., "가". vertical syllable shape, e.g., "고") and 2) presenting direction (horizontal, vertical) were manipulated. Whereas "가" meets the criteria of Gestalt's continuity principle, "고" does not. Based on the result of lexical decision time, horizontal syllable shape type showed significant performance improvement, when compared to vertical syllable shape type, regardless of the presenting direction. In experiment 2, syllable types(horizontal syllable shape, vertical syllable shape) and the visual relationship between prime and target(identical, similar, different) were manipulated by using masked priming. There was a significant performance difference between the visual relationship of prime and target, and thus the effect of syllable shape was verified.

Effects of Association and Imagery on Word Recognition (단어재인에 미치는 연상과 심상성의 영향)

  • Kim, Min-Jung;Lee, Seung-Bok;Jung, Bum-Suk
    • Korean Journal of Cognitive Science
    • /
    • v.20 no.3
    • /
    • pp.243-274
    • /
    • 2009
  • The association, word frequency and imagery have been considered as the main factors that affect the word recognition. The present study aimed to examine the imagery effect and the interaction of the association effect while controlling the frequency effect. To explain the imagery effect, we compared the two theories (dual-coding theory, context availability model). The lexical decision task using priming paradigm was administered. The duration of prime words was manipulated as 20ms, 50ms, and 450ms in experiments 1, 2, and 3, respectively. The association and imagery of prime words were manipulated as the main factors in each of the three experiments. In experiment 1, the duration of prime words (20ms) which is expected to not activate the semantic context enough to affects the word recognition was used. As a result, only imagery effect was statically significant. In experiment 2, the duration of prime word was 50ms, which we expected to activate the semantic context without perceptual awareness. The result showed both the association and imagery effects. The interaction between the two effects was also significant. In experiment 3, to activate the semantic context with perceptual awareness, the prime words were presented for 450ms. Only association effect was statically significant in this experimental condition. The results of the three experiments suggest that the influence of the imagery was at the early stages of word recognition, while the association effect appeared rather later than the imagery. These results implied that the two theories are not contrary to each other. The dual-coding theory just concerned imagery effect which affects the early stage of word recognition, and context-availability model is more for the semantic context effect which affects rather later stage of word recognition. To explain the word recognition process more completely, some integrated model need to be developed considering not only the main 3 effects but also the stages which extends along the time course of the process.

  • PDF