• Title/Summary/Keyword: Data dictionary

Search Result 346, Processing Time 0.031 seconds

Constructing the Dictionary of Flue using unstructured data (비정형 데이터를 활용한 감기 판단 사전 구축)

  • Kim, KangMin;Nam, KiHun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.1187-1190
    • /
    • 2015
  • 최근에 비정형 데이터의 잠재적 가치를 유용한 데이터로써 사용하려는 경우가 많아지고 있다. 특히 트위터는 사용자의 상태나 이벤트가 잘 나타나 있어서 하나의 사용자의 이벤트로서 간주될 수 있다. 본 논문은 트위터에서 발생하는 이벤트에 주목하여, 감기라는 이벤트를 트위터 내에서 추적하고자 한다. 추적을 위해서는 트위터를 판단할 필요가 있는데, 이를 위해 기존의 감성 사전 방식 중 하나인 통계적 사전 구축을 기반으로 키워드를 활용하여 감기 판단 사전을 구축하는 방식을 제안한다.

Evaluation of different attacks on Knowledge Based Authentication technique

  • Vijeet Meshram
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.111-115
    • /
    • 2023
  • Knowledge Based Authentication is the most well-known technique for user authentication in a computer security framework. Most frameworks utilize a straightforward PIN (Personal Identification Number) or psssword as an data authenticator. Since password based authenticators typically will be software based, they are inclined to different attacks and weaknesses, from both human and software.Some of the attacks are talked about in this paper.

A Study on the Standardized Classification Scheme of the Various Railway Information Systems

  • Choi, Yong-Ho;An, Tae-Ki;Kim, Hyoung-Geun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.1
    • /
    • pp.85-90
    • /
    • 2018
  • The new information service has been demanded due to the recent mobile internet activation, and the government is promoting the activation of the private use of the public data by putting up the Government 3.0. According to government policy, many public sectors provide public data, but the railway sector is inferior to other public sector. In the case of national railway corporation, urban railway is now operated by 14 corporations such as Seoul Metro through the nation and high-speed railway is now operated by Korea Railroad Corporation and Supreme Railways. It is very difficult to standardize and integrate data due to mutual interests of national railway corporation. This paper describes a way to standardize and integrate rail passengers information collected through research project.

Named entity recognition using transfer learning and small human- and meta-pseudo-labeled datasets

  • Kyoungman Bae;Joon-Ho Lim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.59-70
    • /
    • 2024
  • We introduce a high-performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta-pseudo-label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human- and pseudo-labeled data. Second, the influence of noisy pseudo-labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human-labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.

CCIC: A Climate Change Information Center on the Internet (인터넷을 이용한 기후변화 정보시스템 개발)

  • 강병도;남인길;백희정
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.4 no.3
    • /
    • pp.15-20
    • /
    • 1999
  • This paper presents a climate change information system that provides the data and information about climate change. The system shows the meteorologic data observed, climate change research institutes, and research programs. As the result of analyzing the meteorologic data, it also provides users with the climate change information using the graphic and multimedia data. The terminology retrieval and dictionary facility in the climate change can be useful to the users who are interested in the climate change.

  • PDF

A Basie Study on Improvement and Computerization of Nursing Record (간호기록의 개선과 전산화를 위한 기초연구)

  • 지성애;최경숙;박경숙;정용기
    • Journal of Korean Academy of Nursing
    • /
    • v.29 no.1
    • /
    • pp.21-33
    • /
    • 1999
  • This study was designed to develop a basic plan for computerization of nursing records. The subjects were 7 nursing record forms, 58 charts, 23 nurses, 2 nurse managers, a nurse and computer specialist, 16 master course students and 3 professors. Data collection was conducted through questionnaire, observation and interview. The collected data were analyzed for problems, plan of improvement and needs for computerization. Based upon these results, it is recommended that nursing record computerization was needed a basic plan to integrate needs of nursing record computerization. The basic plan as fellows : 1. To illustrate a data flow path of nursing record and data dictionary that show nurse's work and record process. 2. To establish a system in order to use multi -tasking and graphic user interface. 3. To establish hardware and software in order to embody integrated management of computer based system through structured walkthrough. 4. To choose effective database management system and to achieve Log as record unit.

  • PDF

Dual Dictionary Learning for Cell Segmentation in Bright-field Microscopy Images (명시야 현미경 영상에서의 세포 분할을 위한 이중 사전 학습 기법)

  • Lee, Gyuhyun;Quan, Tran Minh;Jeong, Won-Ki
    • Journal of the Korea Computer Graphics Society
    • /
    • v.22 no.3
    • /
    • pp.21-29
    • /
    • 2016
  • Cell segmentation is an important but time-consuming and laborious task in biological image analysis. An automated, robust, and fast method is required to overcome such burdensome processes. These needs are, however, challenging due to various cell shapes, intensity, and incomplete boundaries. A precise cell segmentation will allow to making a pathological diagnosis of tissue samples. A vast body of literature exists on cell segmentation in microscopy images [1]. The majority of existing work is based on input images and predefined feature models only - for example, using a deformable model to extract edge boundaries in the image. Only a handful of recent methods employ data-driven approaches, such as supervised learning. In this paper, we propose a novel data-driven cell segmentation algorithm for bright-field microscopy images. The proposed method minimizes an energy formula defined by two dictionaries - one is for input images and the other is for their manual segmentation results - and a common sparse code, which aims to find the pixel-level classification by deploying the learned dictionaries on new images. In contrast to deformable models, we do not need to know a prior knowledge of objects. We also employed convolutional sparse coding and Alternating Direction of Multiplier Method (ADMM) for fast dictionary learning and energy minimization. Unlike an existing method [1], our method trains both dictionaries concurrently, and is implemented using the GPU device for faster performance.

A Study on the Factors Influencing Semantic Relation in Building a Structured Glossary (구조적 학술용어사전 데이터베이스 구축에 있어서 용어의 의미관계 형성에 영향을 미치는 요인에 관한 연구)

  • Kwon, Sun-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.48 no.2
    • /
    • pp.353-378
    • /
    • 2014
  • The purpose of this study is to find factors to affect on the formation of semantic relation from terminology and what is to be affected by these factors to build the database scheme of terminology dictionary by a structural definition. In this research, 826,905 keywords of 88,874 social science articles and 985,580 keywords of 125,046 humanities science articles in the KCI journals from 2007 to 2011 were collected. From collected data, subject complexity, structural hole, term frequency, occurrence pattern and an effect between the number of nodes and the number of patterns which were derived from the semantic relation of linked terms of established 'STNet' System were analyzed. The summarized results from analyzed data and network patterns are as follows. Betweenness Centrality, term frequency, and effective size affect the numbers of semantic relation node. Among these factors, betweenness centrality was the most effective and effective size. But term frequency was the least effective. Betweenness Centrality, term frequency, and effective size affect the numbers of semantic relation type. Term frequency is the most effective. Therefore, when building a terminology dictionary, factors of betweenness centrality, term frequency, effective size, and complexity of subject are needed to select term. As a result, these factors can be expected to improve the quality of terminology dictionary.

WellnessWordNet: A Word Net for Unconstrained Subjective Well-Being Monitor ing Based on Unstructured Data and Contextual Polarity (웰니스워드넷: 비정형데이터와 상황적 긍부정성에 기반하여 주관적 웰빙 상태를 무구속적으로 모니터링하기 위한 워드넷 개발)

  • Song, Yeongeun;Nam, Suhyun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.1-21
    • /
    • 2016
  • IT-based subjective well-being (SWB) services, a main part of wellness IT, should measure the SWB state of individuals in an unrestrained, cost-effective manner. The dictionaries for sentiment analysis available in the market may be useful for this purpose, but obtaining proper sentiment values using only words from the sentiment lexicon is impossible; therefore, a new dictionary including wellness vocabulary is needed. The existing sentiment dictionaries link only a single sentiment value to a single sentiment word, although sentiment values may vary depending on personal traits. In this study, we develop an extended version of the SenticNet sentiment dictionary dubbed WellnessWordNet. SenticNet is considered the best and most expressive among the already existing sentiment dictionaries. Using the information provided by SenticNet, we created a database including the wellness states (estimated values) of stress, depression, and anger to develop the WellnessWordNet system. The accuracy of the system was validated through actual tests with live subjects. This study is unique and unprecedented in that i) an extended sentiment dictionary, WellnessWordNet, is developed; ii) values for wellness state language are offered; and iii) different sentiment values, namely contextual polarity, for people of the same gender or age group are suggested.

Symbolizing Numbers to Improve Neural Machine Translation (숫자 기호화를 통한 신경기계번역 성능 향상)

  • Kang, Cheongwoong;Ro, Youngheon;Kim, Jisu;Choi, Heeyoul
    • Journal of Digital Contents Society
    • /
    • v.19 no.6
    • /
    • pp.1161-1167
    • /
    • 2018
  • The development of machine learning has enabled machines to perform delicate tasks that only humans could do, and thus many companies have introduced machine learning based translators. Existing translators have good performances but they have problems in number translation. The translators often mistranslate numbers when the input sentence includes a large number. Furthermore, the output sentence structure completely changes even if only one number in the input sentence changes. In this paper, first, we optimized a neural machine translation model architecture that uses bidirectional RNN, LSTM, and the attention mechanism through data cleansing and changing the dictionary size. Then, we implemented a number-processing algorithm specialized in number translation and applied it to the neural machine translation model to solve the problems above. The paper includes the data cleansing method, an optimal dictionary size and the number-processing algorithm, as well as experiment results for translation performance based on the BLEU score.