• Title/Summary/Keyword: dictionary construction

Search Result 114, Processing Time 0.028 seconds

A Web-Based Multimedia Dictionary System Supporting Media Synchronization (미디어 동기화를 지원하는 웹기반 멀티미디어 전자사전 시스템)

  • Choi, Yong-Jun;Hwang, Do-Sam
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.8
    • /
    • pp.1145-1161
    • /
    • 2004
  • The purpose of this research is to establish a method for the construction of a multimedia electronic dictionary system by integrating the media data available from linguistic resources on the Internet. As the result of this study, existing text-oriented electronic dictionary systems can be developed into multimedia lexical systems with greater efficiency and effectiveness. A method is proposed to integrate the media data of linguistic resources on the Internet by a web browser. In the proposed method, a web browser carries out all the work related to integration of media data, and it does not need a dedicated server system. The system constructed by our web browser environment integrates text, image, and voice sources, and also can produce moving pictures. Each media is associated with the meaning of data so that the data integration and movement may be specified in the associations. SMIL documents are generated by analyzing the meaning of each data unit and they are executed in a web browser. The proposed system can be operated without a dedicated server system. And also, the system saves storage space by sharing the each media data distributed on the Internet, and makes it easier to update data.

  • PDF

A Spelling Error Correction Model in Korean Using a Correction Dictionary and a Newspaper Corpus (교정사전과 신문기사 말뭉치를 이용한 한국어 철자 오류 교정 모델)

  • Lee, Se-Hee;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.427-434
    • /
    • 2009
  • With the rapid evolution of the Internet and mobile environments, text including spelling errors such as newly-coined words and abbreviated words are widely used. These spelling errors make it difficult to develop NLP (natural language processing) applications because they decrease the readability of texts. To resolve this problem, we propose a spelling error correction model using a spelling error correction dictionary and a newspaper corpus. The proposed model has the advantage that the cost of data construction are not high because it uses a newspaper corpus, which we can easily obtain, as a training corpus. In addition, the proposed model has an advantage that additional external modules such as a morphological analyzer and a word-spacing error correction system are not required because it uses a simple string matching method based on a correction dictionary. In the experiments with a newspaper corpus and a short message corpus collected from real mobile phones, the proposed model has been shown good performances (a miss-correction rate of 7.3%, a F1-measure of 97.3%, and a false positive rate of 1.1%) in the various evaluation measures.

Study on the social issue sentiment classification using text mining (텍스트마이닝을 이용한 사회 이슈 찬반 분류에 관한 연구)

  • Kang, Sun-A;Kim, Yoo Sin;Choi, Sang Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1167-1173
    • /
    • 2015
  • The development of information and communication technology like SNS, blogs, and bulletin boards, was provided a variety of places where you can express your thoughts and comments and allowing Big Data to grow, many people reveal the opinion of the social issues in SNS such as Twitter. In this study, we would like to pre-built sentimental dictionary about social issues and conduct a sentimental analysis with structured dictionary, to gather opinions on social issues that are created on twitter. The data that I used is "bikini", "nakkomsu" including tweet. As the result of analysis, precision is 61% and F1- score is 74%. This study expect to suggest the standard of dictionary construction allowing you to classify positive/negative opinion on specific social issues.

Construction and application of Korean Semantic-Network based on Korean Dictionary (사전을 기반으로 한 한국어 의미망 구축과 활용)

  • 최호섭;옥철영;장문수;장명길
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.448-450
    • /
    • 2002
  • 시소러스 의미망, 온톨로지 등과 같은 지식베이스는 자연언어처리와 관련된 여러 분야에서 중요한 언어자원의 역할을 담당하고 있다. 하지만 정보검색, 기계번역과 같은 특정 분야마다 다르게 구축되어 이러한 지식베이스는 실질적인 한국어 처리에는 크게 효과를 보지 못하고 있는 실정이다. 본 논문은 한국어를 대상으로 한 시소러스, 의미망의 등의 구축 방법론적 문제를 지적하고 말뭉치를 중심으로 한 텍스트 언어처리에 필요한 의미망의 구축 방법과 포괄적인 활용방안을 모색한다. 의미망 구축의 기반이 되는 지식은 각종 사전(dictionary)를 이용했으며, 구축하고 있는 의미망의 활용 가능성을 평가하기 위하여 ETRI의 의미기반 정보검색과 언어처리의 큰 문제 중 하나인 단어 중의성 해소(WSD)에서 어떻게 활용되는지를 살핀다. 그리하여 언어자인의 처리 방안 중의 하나인 의미망을 구축함으로써 언어를 효과적으로 처리하기 위한 기본적이면서 중요한 어휘 데이터베이스 마련과 동시에 언어자원 구축의 한 방향을 제시하고자 한다.

  • PDF

Construction of Vietnamese SentiWordNet by using Vietnamese Dictionary (베트남어 사전을 사용한 베트남어 SentiWordNet 구축)

  • Vu, Xuan-Son;Park, Seong-Bae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.745-748
    • /
    • 2014
  • SentiWordNet is an important lexical resource supporting sentiment analysis in opinion mining applications. In this paper, we propose a novel approach to construct a Vietnamese SentiWordNet (VSWN). SentiWordNet is typically generated from WordNet in which each synset has numerical scores to indicate its opinion polarities. Many previous studies obtained these scores by applying a machine learning method to WordNet. However, Vietnamese WordNet is not available unfortunately by the time of this paper. Therefore, we propose a method to construct VSWN from a Vietnamese dictionary, not from WordNet. We show the effectiveness of the proposed method by generating a VSWN with 39,561 synsets automatically. The method is experimentally tested with 266 synsets with aspect of positivity and negativity. It attains a competitive result compared with English SentiWordNet that is 0.066 and 0.052 differences for positivity and negativity sets respectively.

Cost Effective Image Classification Using Distributions of Multiple Features

  • Sivasankaravel, Vanitha Sivagami
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.7
    • /
    • pp.2154-2168
    • /
    • 2022
  • Our work addresses the issues associated with usage of the semantic features by Bag of Words model, which requires construction of the dictionary. Extracting the relevant features and clustering them into code book or dictionary is computationally intensive and requires large storage area. Hence we propose to use a simple distribution of multiple shape based features, which is a mixture of gradients, radius and slope angles requiring very less computational cost and storage requirements but can serve as an equivalent image representative. The experimental work conducted on PASCAL VOC 2007 dataset exhibits marginally closer performance in terms of accuracy with the Bag of Word model using Self Organizing Map for clustering and very significant computational gain.

Construction of Local Data Dictionary in the Field of Nuclear Medicine

  • Hwang, Kyung-Hoon;Lee, Haejun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.465-465
    • /
    • 2010
  • A controlled medical vocabulary is a vital component of medical information management because it enables computers to use information meaningfully and different institutions to share the medical data. There are currently many standard medical vocabularies - SNOMED-CT, ICD-10, UMLS, GALEN, MED, etc, but none is universally accepted as an optimal controlled medical vocabulary for application to medical information system. Moreover, it is difficult to settle the well-designed local data dictionary consisting of controlled medical vocabularies for the individual hospital information system (HIS). One of the major reasons is the local terminology with poor contents have been used in the hospital. Thus, as a trial, the local controlled vocabulary referencing system has being constructed in a limited medical field - nuclear medicine. We selected practical nuclear medicine terms from interpretation reports and electronic medical records, and removed ambiguity and redundancy, mapping the selected terms to standard medical vocabularies. Relationship and hierarchy structure between terms have being made, referring to standard medical vocabularies. Further studies may be warranted.

Analyzing the Effect of Characteristics of Dictionary on the Accuracy of Document Classifiers (용어 사전의 특성이 문서 분류 정확도에 미치는 영향 연구)

  • Jung, Haegang;Kim, Namgyu
    • Management & Information Systems Review
    • /
    • v.37 no.4
    • /
    • pp.41-62
    • /
    • 2018
  • As the volume of unstructured data increases through various social media, Internet news articles, and blogs, the importance of text analysis and the studies are increasing. Since text analysis is mostly performed on a specific domain or topic, the importance of constructing and applying a domain-specific dictionary has been increased. The quality of dictionary has a direct impact on the results of the unstructured data analysis and it is much more important since it present a perspective of analysis. In the literature, most studies on text analysis has emphasized the importance of dictionaries to acquire clean and high quality results. However, unfortunately, a rigorous verification of the effects of dictionaries has not been studied, even if it is already known as the most essential factor of text analysis. In this paper, we generate three dictionaries in various ways from 39,800 news articles and analyze and verify the effect each dictionary on the accuracy of document classification by defining the concept of Intrinsic Rate. 1) A batch construction method which is building a dictionary based on the frequency of terms in the entire documents 2) A method of extracting the terms by category and integrating the terms 3) A method of extracting the features according to each category and integrating them. We compared accuracy of three artificial neural network-based document classifiers to evaluate the quality of dictionaries. As a result of the experiment, the accuracy tend to increase when the "Intrinsic Rate" is high and we found the possibility to improve accuracy of document classification by increasing the intrinsic rate of the dictionary.

Application of Data Dictionary to BIM for Small and Medium Project (중소규모 사업용 BIM을 위한 데이터 사전의 활용)

  • Lee, Hwan Woo;Lee, Kyung Sub;Kim, Kwang Yang
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.26 no.6
    • /
    • pp.431-438
    • /
    • 2013
  • The systemization of construction information is required over whole life cycle of facilities to improve productivity of construction industry. BIM(Building Information Modeling) is a technology to manage information based on 3D information model. It has been actively suggested as one of the alternatives. However, it may be currently concentrated on the large project while the small and medium project based on BIM are slightly treated in indifference. In the case of small and medium project, the loss of information has been occurred more seriously than large project. However, it is hard to introduce BIM to the small and medium companies due to the lack of investment resources. This study has been performed to set up information management system based on BIM considering characteristics of small and medium project without excessive investment. In this study, pseudo BIM is defined as BIM for small and medium project. The concept of pseudo BIM has been suggested. The PLIB of ISO and construction information classification system of MOLIT in Korea are used to construct data dictionary for pseudo BIM. A pilot test is performed to verify the effectiveness of pseudo BIM.

Advanced CBS (Cost Breakdown Structure) Code Search Technology Applying NLP (Natural Language Processing) of Artificial Intelligence (인공지능 자연어 처리 기법을 이용한 개선된 내역코드 탐색방법)

  • Kim, HanDo;Nam, JeongYong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.44 no.5
    • /
    • pp.719-731
    • /
    • 2024
  • For efficient construction management, linking BIM with schedule and cost is essential, but there are limits to the application of 5D BIM due to the difficulty in disassembling thousands of WBS and CBS. To solve this problem, a standardized WBS-CBS set is configured in advance, and when a new construction project occurs, the CBS in the BOQ is automatically linked to the WBS when a text most similar to it is found among the standard CBS (Public Procurement Service standard construction code) of the already linked set. A method was used to compare the text similarity of CBS more efficiently using artificial intelligence natural language processing techniques. Firstly, we created a civil term dictionary (CTD) that organized the words used in civil projects and assigned numerical values, tokenized the text of all CBS into words defined in the dictionary, converted them into TF-IDF vectors, and determined them by cosine similarity. Additionally, the search success rate increased to nearly 70 % by considering CBS' hierarchical structure and changing keywords. The threshold value for judging similarity was 0.62 (1: perfect match, 0: no match).