• Title/Summary/Keyword: measure phrases

Search Result 23, Processing Time 0.025 seconds

An Automatic Summarization of Call-For-Paper Documents Using a 2-Phase hidden Markov Model (2단계 은닉 마코프 모델을 이용한 논문 모집 공고의 자동 요약)

  • Kim, Jeong-Hyun;Park, Seong-Bae;Lee, Sang-Jo;Park, Se-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.2
    • /
    • pp.243-250
    • /
    • 2008
  • This paper proposes a system which extracts necessary information from call-for-paper (CFP) documents using a hidden Markov model (HMM). Even though a CFP does not follow a strict form, there is, in general, a relatively-fixed sequence of information within most CFPs. Therefore, a hiden Markov model is adopted to analyze CFPs which has an advantage of processing consecutive data. However, when CFPs are intuitively modeled with a hidden Markov model, a problem arises that the boundaries of the information are not recognized accurately. In order to solve this problem, this paper proposes a two-phrase hidden Markov model. In the first step, the P-HMM (Phrase hidden Markov model) which models a document with phrases recognizes CFP documents locally. Then, the D-HMM (Document hidden Markov model) grasps the overall structure and information flow of the document. The experiments over 400 CFP documents grathered on Web result in 0.49 of F-score. This performance implies 0.15 of F-measure improvement over the HMM which is intuitively modeled.

Analyzing Errors in Bilingual Multi-word Lexicons Automatically Constructed through a Pivot Language

  • Seo, Hyeong-Won;Kim, Jae-Hoon
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.39 no.2
    • /
    • pp.172-178
    • /
    • 2015
  • Constructing a bilingual multi-word lexicon is confronted with many difficulties such as an absence of a commonly accepted gold-standard dataset. Besides, in fact, there is no everybody's definition of what a multi-word unit is. In considering these problems, this paper evaluates and analyzes the context vector approach which is one of a novel alignment method of constructing bilingual lexicons from parallel corpora, by comparing with one of general methods. The approach builds context vectors for both source and target single-word units from two parallel corpora. To adapt the approach to multi-word units, we identify all multi-word candidates (namely noun phrases in this work) first, and then concatenate them into single-word units. As a result, therefore, we can use the context vector approach to satisfy our need for multi-word units. In our experimental results, the context vector approach has shown stronger performance over the other approach. The contribution of the paper is analyzing the various types of errors for the experimental results. For the future works, we will study the similarity measure that not only covers a multi-word unit itself but also covers its constituents.

Distinguishing Referential Expression 'Geot' Using Decision Tree (결정 트리를 이용한 지시 표현 '것'의 구별)

  • Jo, Eun-Kyoung;Kim, Hark-Soo;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.9
    • /
    • pp.880-888
    • /
    • 2007
  • Referential expression 'Geot' is often occurred in Korean dialogues. However, it has not been properly dealt with by the previous researchers of reference resolution, since it is not by itself the referential expression like pronoun and definite noun phrases, and it has never been discriminated from non-referring 'geot'. To resolve this problem, we establish a feature set which is based on the linguistic property of 'geot' and the discourse property of its text, and propose a method to identify referential 'geot' from non-referring 'geot' using decision tree. In the experiment, our system achieved the F-measures of 92.3% for non-referring geot and of 82.2% for referential geot and the total classification performance of 89.27%, and outperformed the classification system based on pattern rules.

A Study of Translation Conformity on Korean Version of a Balance Evaluation Systems Test (한국어판 Balance Evaluation Systems Test의 번역 적합성 연구)

  • Jeon, Yong-jin;Kim, Gyoung-mo
    • Physical Therapy Korea
    • /
    • v.25 no.1
    • /
    • pp.53-61
    • /
    • 2018
  • Background: The process of language translation, adaptation, and cross-cultural validation of tools for use in multiple countries requires the adoption of well-established, comprehensive, and rigorous methodological approaches. Back translation, which is the most recommended method, permits the detection of errors in the translation and the identification of words or phrases that cannot be accurately or literally translated. Objects: The aim of this study was to verify the content validity of a Korean version of a Balance Evaluation Systems test (BESTest) by using a back-translation method. Methods: This research was conducted in six steps: 1) translation of the BESTest into Korean, 2) evaluation of the translation conformity of Korean-translated BESTest, 3) evaluation of the degree of translation comprehension, 4) back translation of Korean BESTest, 5) evaluation of the technical and conceptual equivalence, and 6) completion of the Korean version of BESTest by the translation verification committee. Results: In this study, Korean version of the BESTest achieved a rating of more than 3 (moderate) for translation comprehension, and technical equivalence and conceptual equivalence of back translation were evaluated as 3 (moderate) or more. Conclusion: The Korean version of the BESTest has proven content validity and is an appropriate tool to measure balance function.

A Social Network Analysis of Research Topics in Korean Nursing Science (한국 간호학 연구주제의 사회 연결망 분석)

  • Lee, Soo-Kyoung;Jeong, Senator;Kim, Hong-Gee;Yom, Young-Hee
    • Journal of Korean Academy of Nursing
    • /
    • v.41 no.5
    • /
    • pp.623-632
    • /
    • 2011
  • Purpose: This study was done to explore the knowledge structure of Korean Nursing Science. Methods: The main variables were key words from the research papers that were presented in the Journal of Korean Academy of Nursing and journals of the seven branches of the Korean Academy of Nursing. English titles and abstracts of the papers (n=5,936) published from 1995 through 2009 were included. Noun phrases were extracted from the corpora using an in-house program (BiKE Text Analyzer), and their co-occurrence networks were generated via a cosine similarity measure, and then the networks were analyzed and visualized using Pajek, a Social Network Analysis program. Results: With the hub and authority measures, the most important research topics in Korean Nursing Science were identified. Newly emerging topics by three-year period units were observed as research trends. Conclusion: This study provides a systematic overview on the knowledge structure of Korean Nursing Science. The Social Network Analysis for this study will be useful for identifying the knowledge structure in Nursing Science.

A Study on Extracting Ideas from Documents and Webpages in the Field of Idea Mining (아이디어 마이닝 분야에서 문헌과 웹페이지의 아이디어 발췌에 대한 연구)

  • Lee, Tae-Young
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.1
    • /
    • pp.25-43
    • /
    • 2012
  • The ideas and quasi-ideas useful for human's creation were drawn out from documents and webpages with extraction methods used in idea mining, opinion mining, and topic signal mining. The extraction methods comprised (1) decisive cue phrases, (2) cue figures and sounds, (3) contextual signals, and (4) discourse segmentations, They tested on the idea samples, such as thoughts, plans, opinions, writings, figures, sounds, and formulas. Methods (1), (3), and (4) received largely positive evaluation, judging the efficiency of 4 methods by F measure, a mixture of recall and precision ratio. In particular, decisive cue phrase method was effective to search idea and contextual signal method was effective to detect quasi-idea.

A Study on Phon Call Big Data Analytics (전화통화 빅데이터 분석에 관한 연구)

  • Kim, Jeongrae;Jeong, Chanki
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.3
    • /
    • pp.387-397
    • /
    • 2013
  • This paper proposes an approach to big data analytics for phon call data. The analytical models for phon call data is composed of the PVPF (Parallel Variable-length Phrase Finding) algorithm for identifying verbal phrases of natural language and the word count algorithm for measuring the usage frequency of keywords. In the proposed model, we identify words using the PVPF algorithm, and measure the usage frequency of the identified words using word count algorithm in MapReduce. The results can be interpreted from various viewpoints. We design and implement the model based HDFS (Hadoop Distributed File System), verify the proposed approach through a case study of phon call data. So we extract useful results through analysis of keyword correlation and usage frequency.

Knowledge Structure of the Korean Journal of Occupational Health Nursing through Network Analysis (네트워크분석을 통한 직업건강간호학회지 논문의 지식구조 분석)

  • Kwon, Sun Young;Park, Eun Jung
    • Korean Journal of Occupational Health Nursing
    • /
    • v.24 no.2
    • /
    • pp.76-85
    • /
    • 2015
  • Purpose: The purpose of this study was to identify knowledge structure of the Korean Journal of Occupational Health Nursing from 1991 to 2014. Methods: 400 articles between 1991 and 2014 were collected. 1,369 keywords as noun phrases were extracted from articles and standardized for analysis. Co-occurrence matrix was generated via a cosine similarity measure, then the network was analyzed and visualized using PFNet. Also NodeXL was applied to visualize intellectual interchanges among keywords. Results: According to the results of the content analysis and the cluster analysis of author keywords from the Korean Journal of Occupational Health Nursing articles, 7 most important research topics of the journal were 'Workers & Work-related Health Problem', 'Recognition & Preventive Health Behaviors', 'Health Promotion & Quality of Life', 'Occupational Health Nursing & Management', 'Clinical Nursing Environment', 'Caregivers and Social Support', and 'Job Satisfaction, Stress & Performance'. Newly emerging topics for 4-year period units were observed as research trends. Conclusion: Through this study, the knowledge structure of the Korean Journal of Occupational Health Nursing was identified. The network analysis of this study will be useful for identifying the knowledge structure as well as finding general view and current research trends. Furthermore, The results of this study could be utilized to seek the research direction in the Korean Journal of Occupational Health Nursing.

Intelligent Spam-mail Filtering Based on Textual Information and Hyperlinks (텍스트정보와 하이퍼링크에 기반한 지능형 스팸 메일 필터링)

  • Kang, Sin-Jae;Kim, Jong-Wan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.7
    • /
    • pp.895-901
    • /
    • 2004
  • This paper describes a two-phase intelligent method for filtering spam mail based on textual information and hyperlinks. Scince the body of spam mail has little text information, it provides insufficient hints to distinguish spam mails from legitimate mails. To resolve this problem, we follows hyperlinks contained in the email body, fetches contents of a remote webpage, and extracts hints (i.e., features) from original email body and fetched webpages. We divided hints into two kinds of information: definite information (sender`s information and definite spam keyword lists) and less definite textual information (words or phrases, and particular features of email). In filtering spam mails, definite information is used first, and then less definite textual information is applied. In our experiment, the method of fetching web pages achieved an improvement of F-measure by 9.4% over the method of using on original email header and body only.

Construction of an Efficient Pre-analyzed Dictionary for Korean Morphological Analysis (한국어 형태소 분석을 위한 효율적 기분석 사전의 구성 방법)

  • Kwak, Sujeong;Kim, Bogyum;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.12
    • /
    • pp.881-888
    • /
    • 2013
  • A pre-analyzed dictionary is used to increase the speed and the accuracy of morphological analyzers and to decrease the over-generation. However, if the dictionary includes 'Insufficiently-analyzed word-phrases', which do not include all the possible analysis of the word-phrase, it may cause the decrease of the analysis accuracy. In this paper, we measure the accuracy changes according to the number of word-phrase frequency and the size changes of corpus by Sejong corpus. And performance of integrate system(SMA with pre-dictionary) is highest when sufficient analysis rate of pre-dictionary is more than 99.82%. Also pre-dictionary is constructed with word-phrase that frequency more than 32(64) when size of corpus is 1,600,000(6,300,000) word-phrase.