• Title/Summary/Keyword: data dictionary

Search Result 346, Processing Time 0.028 seconds

A Mobile Dictionary based on a Prefetching Method (선인출 기반의 모바일 사전)

  • Hong, Soon-Jung;Moon, Yang-Sae;Kim, Hea-Suk;Kim, Jin-Ho;Chung, Young-Jun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.3
    • /
    • pp.197-206
    • /
    • 2008
  • In the mobile Internet environment, frequent communications between a mobile device and a content server are required for searching or downloading learning materials. In this paper, we propose an efficient prefetching technique to reduce the network cost and to improve the communication efficiency in the mobile dictionary. Our prefetching-based approach can be explained as follows. First, we propose an overall framework for the prefetching-based mobile dictionary. Second, we present a systematic way of determining the amount of prefetching data for each of packet-based and flat-rate billing cases. Third, by focusing on the English-Korean mobile dictionary for middle or high school students, we propose an intuitive method of determining the words to be prefetched in advance. Fourth, based on these determination methods, we propose an efficient prefetching algorithm. Fifth, through experiments, we show the superiority of our prefetching-based method. From this approach, we can summarize major contributions as follows. First, to our best knowledge, this is the first attempt to exploit prefetching techniques in mobile applications. Second, we propose a systematic way of applying prefetching techniques to a mobile dictionary. Third, using prefetching techniques we improve the overall performance of a network-based mobile dictionary. Experimental results show that, compared with the traditional on-demand approach, our prefetching based approach improves the average performance by $9.8%{\sim}33.2%$. These results indicate that our framework can be widely used not only in the mobile dictionary but also in other mobile Internet applications that require the prefetching technique.

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

사단업무 MIS를 위한 전산화 방안

  • Bae Tae-Cheol
    • Journal of the military operations research society of Korea
    • /
    • v.10 no.2
    • /
    • pp.15-32
    • /
    • 1984
  • This thesis is to investigate the general characteristics of MIS and provides a definite data base for computerization of the division operations based on the analysis and design of personnel business. Therefore, through the analysis of DFD (Data Flow Diagram), DD (Data Dictionary) and SD (System Dictionary), for designing of system, and the analysis of the requirement information, the relationships of business and the process of operations, for the current status analysis of division business, the computerized alternatives corresponding to a choice of a piece of information system were decided. The priority of a developing operations was also determined. A point of view is not so much the study about the division operations so far. This paper will be very helpful and will be achieved through the computerization of MIS.

  • PDF

Project Management Methodology using Managing Data Dictionary (데이터 사전 관리를 통한 프로젝트 관리 기법)

  • Lee, Byoung-Yup;Park, Yong-Hoon;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.3
    • /
    • pp.72-80
    • /
    • 2009
  • With the development of IT technologies, IT environment is making great change over life whole and is displacing business and business achievement systems of industry at the fast speed. Software development using project management tool is more important because of constructing the consistent and reliable system. So, design and implementation of the project management tool which support data standardization of project is proposed in this paper.

Cloud Storage Security Deduplication Scheme Based on Dynamic Bloom Filter

  • Yan, Xi-ai;Shi, Wei-qi;Tian, Hua
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1265-1276
    • /
    • 2019
  • Data deduplication is a common method to improve cloud storage efficiency and save network communication bandwidth, but it also brings a series of problems such as privacy disclosure and dictionary attacks. This paper proposes a secure deduplication scheme for cloud storage based on Bloom filter, and dynamically extends the standard Bloom filter. A public dynamic Bloom filter array (PDBFA) is constructed, which improves the efficiency of ownership proof, realizes the fast detection of duplicate data blocks and reduces the false positive rate of the system. In addition, in the process of file encryption and upload, the convergent key is encrypted twice, which can effectively prevent violent dictionary attacks. The experimental results show that the PDBFA scheme has the characteristics of low computational overhead and low false positive rate.

Construction of Local Data Dictionary in the Field of Nuclear Medicine

  • Hwang, Kyung-Hoon;Lee, Haejun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.465-465
    • /
    • 2010
  • A controlled medical vocabulary is a vital component of medical information management because it enables computers to use information meaningfully and different institutions to share the medical data. There are currently many standard medical vocabularies - SNOMED-CT, ICD-10, UMLS, GALEN, MED, etc, but none is universally accepted as an optimal controlled medical vocabulary for application to medical information system. Moreover, it is difficult to settle the well-designed local data dictionary consisting of controlled medical vocabularies for the individual hospital information system (HIS). One of the major reasons is the local terminology with poor contents have been used in the hospital. Thus, as a trial, the local controlled vocabulary referencing system has being constructed in a limited medical field - nuclear medicine. We selected practical nuclear medicine terms from interpretation reports and electronic medical records, and removed ambiguity and redundancy, mapping the selected terms to standard medical vocabularies. Relationship and hierarchy structure between terms have being made, referring to standard medical vocabularies. Further studies may be warranted.

A VLSI Design and Implementation of a Single-Chip Encoder/Decoder with Dictionary Search Processor(DISP) using LZSS Algorithm and Entropy Coding (LZSS 알고리즘과 엔트로피 부호를 이용한 사전탐색처리장치를 갖는 부호기/복호기 단일-칩의 VLSI 설계 및 구현)

  • Kim, Jong-Seop;Jo, Sang-Bok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.2
    • /
    • pp.103-113
    • /
    • 2001
  • This paper described a design and implementation of a single-chip encoder/decoder using the LZSS algorithm and entropy coding in 0.6${\mu}{\textrm}{m}$ CMOS technology. Dictionary storage for the dictionary search processor(DISP) used a 2K$\times$8bit on-chip memory with 50MHz clock speed. It performs compression on byte-oriented input data at a data rate of one byte per clock cycle except when one out of every 33 cycles is used to update the string window of dictionary. In result, the average compression ratio is 46% by applied entropy coding of the LZSS codeword output. This is to improved on the compression performance of 7% much more then LZSS.

  • PDF

Extracting Korean-English Parallel Sentences from Wikipedia (위키피디아로부터 한국어-영어 병렬 문장 추출)

  • Kim, Sung-Hyun;Yang, Seon;Ko, Youngjoong
    • Journal of KIISE:Software and Applications
    • /
    • v.41 no.8
    • /
    • pp.580-585
    • /
    • 2014
  • This paper conducts a variety of experiments for "the extraction of Korean parallel sentences using Wikipedia data". We refer to various methods that were previously proposed for other languages. We use two approaches. The first one is to use translation probabilities that are extracted from the existing resources such as Sejong parallel corpus, and the second one is to use dictionaries such as Wiki dictionary consisting of Wikipedia titles and MRDs (machine readable dictionaries). Experimental results show that we obtained a significant improvement in system using Wikipedia data in comparison to one using only the existing resources. We finally achieve an outstanding performance, an F1-score of 57.6%. We additionally conduct experiments using a topic model. Although this experiment shows a relatively lower performance, an F1-score of 51.6%, it is expected to be worthy of further studies.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Analyzing the Effect of Characteristics of Dictionary on the Accuracy of Document Classifiers (용어 사전의 특성이 문서 분류 정확도에 미치는 영향 연구)

  • Jung, Haegang;Kim, Namgyu
    • Management & Information Systems Review
    • /
    • v.37 no.4
    • /
    • pp.41-62
    • /
    • 2018
  • As the volume of unstructured data increases through various social media, Internet news articles, and blogs, the importance of text analysis and the studies are increasing. Since text analysis is mostly performed on a specific domain or topic, the importance of constructing and applying a domain-specific dictionary has been increased. The quality of dictionary has a direct impact on the results of the unstructured data analysis and it is much more important since it present a perspective of analysis. In the literature, most studies on text analysis has emphasized the importance of dictionaries to acquire clean and high quality results. However, unfortunately, a rigorous verification of the effects of dictionaries has not been studied, even if it is already known as the most essential factor of text analysis. In this paper, we generate three dictionaries in various ways from 39,800 news articles and analyze and verify the effect each dictionary on the accuracy of document classification by defining the concept of Intrinsic Rate. 1) A batch construction method which is building a dictionary based on the frequency of terms in the entire documents 2) A method of extracting the terms by category and integrating the terms 3) A method of extracting the features according to each category and integrating them. We compared accuracy of three artificial neural network-based document classifiers to evaluate the quality of dictionaries. As a result of the experiment, the accuracy tend to increase when the "Intrinsic Rate" is high and we found the possibility to improve accuracy of document classification by increasing the intrinsic rate of the dictionary.