• Title/Summary/Keyword: data dictionary

Search Result 350, Processing Time 0.027 seconds

Symbolizing Numbers to Improve Neural Machine Translation (숫자 기호화를 통한 신경기계번역 성능 향상)

  • Kang, Cheongwoong;Ro, Youngheon;Kim, Jisu;Choi, Heeyoul
    • Journal of Digital Contents Society
    • /
    • v.19 no.6
    • /
    • pp.1161-1167
    • /
    • 2018
  • The development of machine learning has enabled machines to perform delicate tasks that only humans could do, and thus many companies have introduced machine learning based translators. Existing translators have good performances but they have problems in number translation. The translators often mistranslate numbers when the input sentence includes a large number. Furthermore, the output sentence structure completely changes even if only one number in the input sentence changes. In this paper, first, we optimized a neural machine translation model architecture that uses bidirectional RNN, LSTM, and the attention mechanism through data cleansing and changing the dictionary size. Then, we implemented a number-processing algorithm specialized in number translation and applied it to the neural machine translation model to solve the problems above. The paper includes the data cleansing method, an optimal dictionary size and the number-processing algorithm, as well as experiment results for translation performance based on the BLEU score.

A Study on Flexible Attribude Tree and Patial Result Matrix for Content-baseed Retrieval and Browsing of Video Date. (비디오 데이터의 내용 기반 검색과 브라우징을 위한 유동 속성 트리 및 부분 결과 행렬의 이용 방법 연구)

  • 성인용;이원석
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.1
    • /
    • pp.1-13
    • /
    • 2000
  • While various types of information can be mixed in a continuous video stream without any cleat boundary, the meaning of a video scene can be interpreted by multiple levels of abstraction, and its description can be varied among different users. Therefore, for the content-based retrieval in video data it is important for a user to be able to describe a scene flexibly while the description given by different users should be maintained consistently This paper proposes an effective way to represent the different types of video information in conventional database models such as the relational and object-oriented models. Flexibly defined attributes and their values are organized as tree-structured dictionaries while the description of video data is stored in a fixed database schema. We also introduce several browsing methods to assist a user. The dictionary browser simplifies the annotation process as well as the querying process of a user while the result browser can help a user analyze the results of a query in terms of various combinations of Query conditions.

  • PDF

A Study on Preservation Metadata Elements for Research Information (연구정보를 위한 보존 메타데이터 요소 개발에 관한 연구: 경제·인문사회연구회 연구관리시스템을 중심으로)

  • Kim, Pan-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.4
    • /
    • pp.169-191
    • /
    • 2010
  • This study aimed at developing preservation metadata elements and its applications for research information which is considered as a valuable digital resource these days. Specifically, the developed preservation metadata intends to provide a basis for the research information of the government-funded research institutes in economic and social science fields which are major knowledge producers of national policy. To ensure the interoperability of the research information across various departments and organizations, this study compared the elements from the CERIF(European Standard) and those from the PREMIS Data Dictionary which is based on OAIS reference model (ISO 14721). Based on this comparative analysis, this study developed complementary preservation metadata elements based on the two standards' characteristics. Consequently, this study suggested a new preservation metadata elements and its applications that are compatible between the two systems and can be implemented in practice.

A Morpheme Analyzer based on Transformer using Morpheme Tokens and User Dictionary (사용자 사전과 형태소 토큰을 사용한 트랜스포머 기반 형태소 분석기)

  • DongHyun Kim;Do-Guk Kim;ChulHui Kim;MyungSun Shin;Young-Duk Seo
    • Smart Media Journal
    • /
    • v.12 no.9
    • /
    • pp.19-27
    • /
    • 2023
  • Since morphemes are the smallest unit of meaning in Korean, it is necessary to develop an accurate morphemes analyzer to improve the performance of the Korean language model. However, most existing analyzers present morpheme analysis results by learning word unit tokens as input values. However, since Korean words are consist of postpositions and affixes that are attached to the root, even if they have the same root, the meaning tends to change due to the postpositions or affixes. Therefore, learning morphemes using word unit tokens can lead to misclassification of postposition or affixes. In this paper, we use morpheme-level tokens to grasp the inherent meaning in Korean sentences and propose a morpheme analyzer based on a sequence generation method using Transformer. In addition, a user dictionary is constructed based on corpus data to solve the out - of-vocabulary problem. During the experiment, the morpheme and morpheme tags printed by each morpheme analyzer were compared with the correct answer data, and the experiment proved that the morpheme analyzer presented in this paper performed better than the existing morpheme analyzer.

Experimental Estimation of Data Flow Diagram for Man/Month Prediction Model Derivation (공수 예측 모델 요도를 위한 자료 흐름도의 실험적 평가)

  • Kim, Myeong-Ok;Baek, Cheong-Ho;Yang, Hae-Sul
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.1
    • /
    • pp.34-44
    • /
    • 1995
  • One of the most important problems faced by software developers and users is the prediction of the size of programming system and its development effort. This article define the identical characteristics for structured specification which is consisted of Data Flow Diagram, Data Dictionary and Mini Specification and apply quantitative estimation factor of structured specification to program code metrics, Moreover, concerning DFD which is made up of component element of structured specification executed quantitative estimation experiment. In the result, we propose man/month prediction model of lower progression with production on analysis phase of upper progression.

  • PDF

Design and Implementation of Recommendation Sites Based on Web Data using Morphological Analysis (형태소 분석을 활용한 웹 데이터 기반의 여행지 추천 사이트의 설계 및 구현)

  • Yoon, Kyung Seob;Lim, Dong Wook
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.311-314
    • /
    • 2018
  • 매 년 여행에 대한 관심이 증가함에 따라 여행지에 대한 정보를 찾는 사용자들의 수요가 많아지게 되었다. 현재 존재하는 여행 정보 사이트들은 사이트 회원들의 좋아요 수를 활용하여 여행지를 추천해 주기 때문에 사이트의 사용자가 많지 않을 경우 실제로 인기 있는 여행지인지 확인할 수 없어 추천 정보의 신뢰도가 떨어진다는 단점이 존재한다. 본 논문에서 제안하는 시스템은 웹상에 산재되어 있는 여행 관련 데이터들을 수집한 후 실제로 각 여행지들이 웹 사이트에서 얼마나 언급 되었는지 분석하여 언급 수로 여행지를 추천하는 시스템으로써 사이트의 사용자수에 구애받지 않는 보다 신뢰도 높은 여행지 추천에 도움을 주고자 한다.

  • PDF

A Study on Comparison of Open Application Programming Interface of Securities Companies Supporting Python

  • Ryu, Gui Yeol
    • International journal of advanced smart convergence
    • /
    • v.10 no.1
    • /
    • pp.97-104
    • /
    • 2021
  • Securities and investment services had the most data per company on the average, and used the most data. Investors are increasingly demanding to invest through their own analysis methods. Therefore, securities and investment companies provide stock data to investors through open API. The data received using the open API is in text format. Python is effective and convenient for requesting and receiving text data. We investigate there are 22 major securities and investment companies in Korea and only 6 companies. Only Daishin Securities Co. supports Python officially. We compare how to receive stock data through open API using Python, and Python programming features. The open APIs for the study are Daishin Securities Co. and eBest Investment & Securities Co. Comparing the two APIs for receiving the current stock data, we find the main two differences are the login method and the method of sending and receiving data. As for the login method, CYBOS plus has login information, but xingAPI does not have. As for the method of sending and receiving data, Cybos Plus sends and receives data by calling the request method, and the reply method. xingAPI sends and receives data in the form of an event. Therefore, the number of xingAPI codes is more than that of CYBOS plus. And we find that CYBOS plus executes a loop statement by lists and tuple, dictionary, and CYBOS plus supports the basic commands provided by Python.

Design and Implementation of Feature Catalogue Builder based on the S-100 Standard (S-100 표준 기반 피처 카탈로그 제작지원 시스템의 설계 및 구현)

  • Park, Daewon;Kwon, Hyuk-Chul;Park, Suhyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.8
    • /
    • pp.571-578
    • /
    • 2013
  • The IHO S-100 is a standard on the universal hydorgraphic data model for supporting information services that integrate various data in maritime and provide proper information for safety of vessels. The S-100 is used to develop S-10x product specifications which are standards on guideline for creation and delivery of specific data set in maritime. The product specification for feature-based data such as ENC(Electronic Navigational Chart) data includes a feature catalogue that describes characteristics of features in that feature-based data. The feature catalogue is developed by domain experts with knowledge on data of the target domain. However, it is not feasible to develop a feature catalogue according to the XML schema by manual. In the IHO TSMAD committee meeting, needs of developing technology on building feature catalogue has been discussed. Therefore, we present a feature catalogue builder that is a GUI(Graphic User Interface) system supporting domain experts to build feature catalogues in XML. The feature catalogue builder is developed to connect with the FCD(Feature Concept Dictionary) register in the IHO(International Hydrographic Organization) GI(Geographic Information) registry. Also, it supports domain experts to select proper feature items based on the relationships between register items.

A Data Processing System on the Transportable Meteorological Radar (이동식 기상 레이더 자료 시스템 개발)

  • 이채욱;오신범
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.5 no.3
    • /
    • pp.44-50
    • /
    • 2000
  • This paper presents the effective data processing system of a transportable meteorological radar(DWSR-200x). Transportable meteorological radar is useful as it can be moved to target area for special purpose. First of all, to use this radar effectively, it is desirable that the data transmitting should be taken place between the radar system and the data center located in a distance. From this raw data we can analyze the property of atmosphere, as well as sore and display the demanded shape of users. In this paper, we make use of wireless LAN that communicates the data between the radar system and the information center. And the display program of transportable radar is developed with transmitted data. It provides meteorologists with the echo searching function in real time and dictionary faculty using the graphic and multimedia data.

  • PDF

Implementation of Korean TTS System based on Natural Language Processing (자연어 처리 기반 한국어 TTS 시스템 구현)

  • Kim Byeongchang;Lee Gary Geunbae
    • MALSORI
    • /
    • no.46
    • /
    • pp.51-64
    • /
    • 2003
  • In order to produce high quality synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model from texts using natural language processing. Robust preprocessing for non-Korean characters should also be required. In this paper, we analyzed Korean texts using a morphological analyzer, part-of-speech tagger and syntactic chunker. We present a new grapheme-to-phoneme conversion method for Korean using a hybrid method with a phonetic pattern dictionary and CCV (consonant vowel) LTS (letter to sound) rules, for unlimited vocabulary Korean TTS. We constructed a prosody model using a probabilistic method and decision tree-based method. The probabilistic method atone usually suffers from performance degradation due to inherent data sparseness problems. So we adopted tree-based error correction to overcome these training data limitations.

  • PDF