Search | Korea Science

Similarity checking between XML tags through expanding synonym vector (유사어 벡터 확장을 통한 XML태그의 유사성 검사)

Lee, Jung-Won;Lee, Hye-Soo;Lee, Ki-Ho
- Journal of KIISE:Software and Applications
- /
- v.29 no.9
- /
- pp.676-683
- /
- 2002
The success of XML(eXtensible Markup Language) is primarily based on its flexibility : everybody can define the structure of XML documents that represent information in the form he or she desires. XML is so flexible that XML documents cannot be automatically provided with an underlying semantics. Different tag sets, different names for elements or attributes, or different document structures in general mislead the task of classifying and clustering XML documents precisely. In this paper, we design and implement a system that allows checking the semantic-based similarity between XML tags. First, this system extracts the underlying semantics of tags and then expands the synonym set of tags using an WordNet thesaurus and user-defined word library which supports the abbreviation forms and compound words for XML tags. Seconds, considering the relative importance of XML tags in the XML documents, we extend a conventional vector space model which is the most generally used for document model in Information Retrieval field. Using this method, we have been able to check the similarity between XML tags which are represented different tags.
PDF KSCI

A Semi-Automatic Semantic Mark Tagging System for Building Dialogue Corpus (대화 말뭉치 구축을 위한 반자동 의미표지 태깅 시스템)

Park, Junhyeok;Lee, Songwook;Lim, Yoonseob;Choi, Jongsuk
- KIPS Transactions on Software and Data Engineering
- /
- v.8 no.5
- /
- pp.213-222
- /
- 2019
Determining the meaning of a keyword in a speech dialogue system is an important technology for the future implementation of an intelligent speech dialogue interface. After extracting keywords to grasp intention from user's utterance, the intention of utterance is determined by using the semantic mark of keyword. One keyword can have several semantic marks, and we regard the task of attaching the correct semantic mark to the user's intentions on these keyword as a problem of word sense disambiguation. In this study, about 23% of all keywords in the corpus is manually tagged to build a semantic mark dictionary, a synonym dictionary, and a context vector dictionary, and then the remaining 77% of all keywords is automatically tagged. The semantic mark of a keyword is determined by calculating the context vector similarity from the context vector dictionary. For an unregistered keyword, the semantic mark of the most similar keyword is attached using a synonym dictionary. We compare the performance of the system with manually constructed training set and semi-automatically expanded training set by selecting 3 high-frequency keywords and 3 low-frequency keywords in the corpus. In experiments, we obtained accuracy of 54.4% with manually constructed training set and 50.0% with semi-automatically expanded training set.
https://doi.org/10.3745/KTSDE.2019.8.5.213 인용 PDF KSCI HTML

A Checklist of North Korea Plant and Current Status of Genetic Resources Held by Domestic and International Arboreta (북한식물 목록과 국내·외 수목원의 북한식물 유전자원 보유 현황)

Young-Min Choi;Seungju Jo;Hyeonji Lee;Jung-Won Yoon
- Korean Journal of Plant Resources
- /
- v.37 no.2
- /
- pp.171-202
- /
- 2024
If the plant genetic resources and information-sharing systems held by arboretums worldwide are effectively utilized, it is believed that a conservation system for plant diversity in the currently inaccessible North Korean region could be established. This study was conducted to review the scientific names of plants native to North Korea but not to South Korea and to assess the status of genetic resources held in domestic and international arboretums. To compile a list and status of North Korean plant's genetic resources, updated checklists of vascular plants in Korean Peninsula and online plant information databases were consulted to compile synonym, distribution range, and other related information. A total of 486 taxa (449 species, 13 subspecies, 21 varieties, 1 forma and 2 hybrids) from 236 genera and 64 families, representing 12.34% of the total native flora of the Korean Peninsular were presented in the North Korea plant list, and the presence of rare, endemic and northern lineage species was confirmed. It was found that 384 taxa from 190 genera, 53 families of North Korean plants are held as genetic resources in 333 arboretums and plant research institutions across 46 countries and 5 continents worldwide. This study is expected to contribute to the construction and application of a species list for plants native to the Korean Peninsula.
https://doi.org/10.7732/kjpr.2024.37.2.171 인용 PDF

Dynamic Expansion of Semantic Dictionary for Topic Extraction in Automatic Summarization (자동요약의 주제어 추출을 위한 의미사전의 동적 확장)

Choo, Kyo-Nam;Woo, Yo-Seob
- Journal of IKEEE
- /
- v.13 no.2
- /
- pp.241-247
- /
- 2009
This paper suggests the expansion methods of semantic dictionary, taking Korean semantic features account. These methods will be used to extract a practical topic word in the automatic summarization. The first is the method which is constructed the synonym dictionary for improving the performance of semantic-marker analysis. The second is the method which is extracted the probabilistic information from the subcategorization dictionary for resolving the syntactic and semantic ambiguity. The third is the method which is predicted the subcategorization patterns of the unregistered predicate, for the resolution of an affix-derived predicate.
PDF

Design and Implementation of a Subjective-type Evaluation System Using Natural Language Processing Technique (유의어 사전을 이용한 주관식 문제 채점 시스템 설계 및 구현)

Park, HeeJung;Kang, WonSeog
- The Journal of Korean Association of Computer Education
- /
- v.6 no.3
- /
- pp.207-216
- /
- 2003
An instructor in education generally takes the objective-type evaluation for grading. The subjective-type evaluation has the merit that it can estimate the high-recognition ability, but the problem of the objectivity and reliability of the evaluation. This paper proposes the model which grades for the subjective-type evaluation. and designs and implements the evaluation system using the synonym thesaurus. This system can process the diverse and wide subjective-type questions and provide the easy usage for a beginner. It also can reduce the time and endeavor for evaluation and provide the objectivity of the evaluation. The system results the 73% success rate. We expect that this system will become a basis of the research on the subjective-type evaluation.
PDF

Re-evaluation of green tide-forming species in the Yellow Sea

Kang, Eun Ju;Kim, Ju-Hyoung;Kim, Keunyong;Choi, Han-Gu;Kim, Kwang Young
- ALGAE
- /
- v.29 no.4
- /
- pp.267-277
- /
- 2014
Green tides occur every year in the Yellow Sea (YS), and numerous investigations are proceeding on various aspects of the phenomenon. We have identified bloom-forming species collected from diverse locations in the YS using morphological traits and the chloroplast gene for the large subunit of ribulose-1,5-bisphosphate carboxylase (rbcL). Morphological and rbcL sequence data analyses characterized the blooming species on both sides of the YS as belonging to the Ulva linza-procera-prolifera (LPP) complex clade or U. prolifera of earlier reports. However, U. procera within the LPP complex must be regarded as synonym of U. linza. Moreover, U. prolifera in free-floating samples collected from the Qingdao coast in 2009 was clearly in a distinct clade from that of the blooming species. Therefore, U. linza is the main green tide alga in the YS and has the procera-morphology. The green drift mats in the southeastern part of the YS (southwest sea of Korea) consisted predominantly of U. linza and rarely of U. compressa or U. prolifera.
https://doi.org/10.4490/algae.2014.29.4.267 인용 PDF KSCI KPUBS

A Semantic Representation Based-on Term Co-occurrence Network and Graph Kernel

Noh, Tae-Gil;Park, Seong-Bae;Lee, Sang-Jo
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.11 no.4
- /
- pp.238-246
- /
- 2011
This paper proposes a new semantic representation and its associated similarity measure. The representation expresses textual context observed in a context of a certain term as a network where nodes are terms and edges are the number of cooccurrences between connected terms. To compare terms represented in networks, a graph kernel is adopted as a similarity measure. The proposed representation has two notable merits compared with previous semantic representations. First, it can process polysemous words in a better way than a vector representation. A network of a polysemous term is regarded as a combination of sub-networks that represent senses and the appropriate sub-network is identified by context before compared by the kernel. Second, the representation permits not only words but also senses or contexts to be represented directly from corresponding set of terms. The validity of the representation and its similarity measure is evaluated with two tasks: synonym test and unsupervised word sense disambiguation. The method performed well and could compete with the state-of-the-art unsupervised methods.
https://doi.org/10.5391/IJFIS.2011.11.4.238 인용 PDF KSCI

Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Knowledgebase (지식베이스를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선)

Kim, Kwang-Ho;Lim, Min-Kyu;Kim, Ji-Hwan
- MALSORI
- /
- v.68
- /
- pp.115-126
- /
- 2008
In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using knowledgebase. A vocabulary in CSR is normally derived from a word frequency list. Therefore, the vocabulary coverage is dependent on a corpus. In the previous research, we presented an improved way of vocabulary generation using part-of-speech (POS) tagged corpus. We analyzed all words paired with 101 among 152 POS tags and decided on a set of words which have to be included in vocabularies of any size. However, for the other 51 POS tags (e.g. nouns, verbs), the vocabulary inclusion of words paired with such POS tags are still based on word frequency counted on a corpus. In this paper, we propose a corpus independent word inclusion method for noun-, verb-, and named entity(NE)-related POS tags using knowledgebase. For noun-related POS tags, we generate synonym groups and analyze their relative importance using Google search. Then, we categorize verbs by lemma and analyze relative importance of each lemma from a pre-analyzed statistic for verbs. We determine the inclusion order of NEs through Google search. The proposed method shows better coverage for the test short message service (SMS) text corpus.
PDF

Automatic Extraction of Alternative Words using Parallel Corpus (병렬말뭉치를 이용한 대체어 자동 추출 방법)

Baik, Jong-Bum;Lee, Soo-Won
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.12
- /
- pp.1254-1258
- /
- 2010
In information retrieval, different surface forms of the same object can cause poor performance of systems. In this paper, we propose the method extracting alternative words using translation words as features of each word extracted from parallel corpus, korean/english title pair of patent information. Also, we propose an association word filtering method to remove association words from an alternative word list. Evaluation results show that the proposed method outperforms other alternative word extraction methods.
PDF KSCI

Interface Mapping and Generation Methods for Intuitive User Interface and Consistency Provision (사용자 인터페이스의 직관적인 인식 및 일관성 부여를 위한 인터페이스 매핑 및 생성 기법)

Yoon, Hyo-Seok;Woo, Woon-Tack
- 한국HCI학회:학술대회논문집
- /
- 2009.02a
- /
- pp.135-139
- /
- 2009
In this paper we present INCUI, a user interface based on natural view of physical user interface of target devices and services in pervasive computing environment. We present a concept of Intuitively Natural and Consistent User Interface (INCUI) consisted of an image of physical user interface and a description XML file. Then we elaborate how INCUI template can be used to consistently map user interface components structurally and visually. We describe the process of INCUI mapping and a novel mapping method selection architecture based on domain size, types of source and target INCUI. Especially we developed and applied an extended LCS-based algorithm using prefix/postfix/synonym for similarity calculation.
PDF

Search Result 216, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)