• Title/Summary/Keyword: Standard Korean Dictionary

Search Result 79, Processing Time 0.027 seconds

한중한자자형비교연구(韓中漢字字形比較硏究)2 - 한문(漢文) 교육용(敎育用) 기초한자(基礎漢字) 고등학교용(高等學校用) 900자(字)를 중심(中心)으로

  • Gang, Hye-Geun
    • 중국학논총
    • /
    • no.62
    • /
    • pp.1-25
    • /
    • 2019
  • 作者对韩国教育部指定的"漢文敎育用基礎漢字高等學校用900字"跟中国规范汉字字形, 进行比较分析的结果如下: (1)字形完全一样的(在附录"高中学校用900字"汉字旁边标注为"="), 一共有424个汉字(约占47%); (2)字形相似的(在附录"高中学校用900字"汉字旁边标注为"Δ"), 一共有86个汉字(约占10%); (3)字形不同的(在附录"高中学校用900字"汉字旁边标注为"×"), 一共有389个汉字(约占43%). 字形相似, 不等于字形相同, 所以也应该看作字形不同的字, 属于这两种情况的字合起来, 一共有475个(约占53%). 韩中汉字字形不同的主要来源, 不止"简化字"和"传承字里的新字形", 还有"从一些异体字里选出来的正体字"也和韩国常用汉字字形不同.

A Study on the Features of the <Classification-Search Term Dictionary>, the Library Classification Scheme in North Korea (북한 문헌분류표 <분류-검색어사전>의 특징 분석)

  • Jae-Hwang Choi
    • Journal of Korean Library and Information Science Society
    • /
    • v.53 no.4
    • /
    • pp.123-142
    • /
    • 2022
  • In 2000, North Korea developed and published a two-volume, <Classification-Search Term Dictionary> and is currently used throughout North Korea. The purpose of this study is to examine the development process of the classification schemes of the North Korea after liberation and to understand the contents, composition, and principles of the <Classification-Search Term Dictionary> published in 2000 and revised in 2014. Until now, all the studies of the North Korean classification schemes were studies on the <Book Classification Scheme> published in North Korea in 1964, and there has been no discussion on North Korea's classification schemes since then. The first volume of the <Classification-Search Term Dictionary> consists of 'classification symbols - search terms', and the second volume consists of 'search terms - classification symbols'. Volume 1 is based on the <Books and Bibliography Classification Scheme (1996)>, and there are a total of 41 main classes in five categories. Volume 1 allocates 1 main class (11/19) to 'revolutionary ideas and theories', 8 main classes (20~27) to 'natural sciences', 19 main classes (30~69) to 'engineering technology and applied sciences', 12 main classes (70~85) to 'social sciences', and 1 main class (90) to 'total sciences'. Volume 2 is similar to subject-headings. North Korea's <Classification-Search Term Dictionary> is the first classification scheme introduced in South Korea and is expected to be the starting point for future studies on the establishment of the standard unification classification schemes.

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.

Korean Compound Noun Decomposition and Semantic Tagging System using User-Word Intelligent Network (U-WIN을 이용한 한국어 복합명사 분해 및 의미태깅 시스템)

  • Lee, Yong-Hoon;Ock, Cheol-Young;Lee, Eung-Bong
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.63-76
    • /
    • 2012
  • We propose a Korean compound noun semantic tagging system using statistical compound noun decomposition and semantic relation information extracted from a lexical semantic network(U-WIN) and dictionary definitions. The system consists of three phases including compound noun decomposition, semantic constraint, and semantic tagging. In compound noun decomposition, best candidates are selected using noun location frequencies extracted from a Sejong corpus, and re-decomposes noun for semantic constraint and restores foreign nouns. The semantic constraints phase finds possible semantic combinations by using origin information in dictionary and Naive Bayes Classifier, in order to decrease the computation time and increase the accuracy of semantic tagging. The semantic tagging phase calculates the semantic similarity between decomposed nouns and decides the semantic tags. We have constructed 40,717 experimental compound nouns data set from Standard Korean Language Dictionary, which consists of more than 3 characters and is semantically tagged. From the experiments, the accuracy of compound noun decomposition is 99.26%, and the accuracy of semantic tagging is 95.38% respectively.

Data compresson for high speed data transmission (고속전송을 위한 V.42bis 데이터 압축 기법의 개선)

  • Cho, Sung-Ryul;Choi, Hyuk;Kim, Tae-Young;Kim, Tae-Jeong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.7
    • /
    • pp.1817-1823
    • /
    • 1998
  • V.42bis, a type of LZW(Lempel-Ziv-Welch) code, is well-known as theinter national standard is asynchronous data compression. In this paper, we analyze several undesirable phenomena arising from the application of v.42bis to high speed data transmission, and we propose a modified technique to overcome them. the proposed technique determines the proper size of the dictionary, one of important factors affecting the compression ratio, and improves the method of dictionary generation for a higher compression ratio. Furthermore, we analyze the problem of excessive mode changes and solve it to a certain degree by adjusting the threshold for mode change. By doing this, we can achieve smiller variation of the compression ratio in time. This improvement chtributes to easier and better design and control of the buffer in high speed data transmission.

  • PDF

A Study of N-Insertion Preferences in Korean (선호도 조사를 통한 ㄴ첨가 현상의 실현 양상 연구)

  • Kook, Kyungnk-A;Kim, Ju-Won;Lee, Ho-Young
    • MALSORI
    • /
    • no.53
    • /
    • pp.37-60
    • /
    • 2005
  • A Study of N-Insertion Preferences in KoreanKyung-A Kook, Ju-Won Kim, Ho-Young LeeSince n-insertion is not an obligatory process in Korean, it is necessary to investigate what factors influence n-insertion preferences and whether n-insertion preferences have been changed over time. To find answers to these questions, an n-insertion preference test using a questionnaire was conducted. 183 words were selected for this test and 167 subjects participated in the test. The results of this test show that the n-insertion preferences were influenced by the speakers' age, the number and structure of the syllable, word class, phonetic environments, and familiarity. It is suggested that the results of this test should be incorporated into the Principles of Standard Pronunciation and in the Grand Dictionary of Standard Korean.

  • PDF

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

An e-Catalog to Support e-Machining of ETO Mold Parts (주문형 금형 부품의 디지털 제조를 지원하는 전자 카달로그)

  • Mun D.H.;Cho J.M.;Kim B.C.;Jang K.S.;Han S.H.;Ryu B.W.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.10 no.3
    • /
    • pp.188-198
    • /
    • 2005
  • There are two types of mold parts, ready-made standard parts and ETO (Engineered-to-Order) parts, the latter are of increasing importance to manufacturers. However, the ETO parts require more engineering support and communication than the ready-made standard parts. Existing e-Catalog modules provide classification structures of products that allow customers to select products based on their needs, and the trade begins with the provided specification. However, machine parts or mold parts have different purchasing patterns. Customers do not purchase the ready-made standard parts offered by an e-Catalog. They usually (1) add own options to the provided specifications or (2) change specification items such as length. To support these trades, a new e-Catalog system is proposed. The proposed system is based on the product design process and the specification selection process in addition to the parts classification structure.

Reflections on the Study of national Language in Korea (국어학 연구의 성격과 태도에 대한 반성)

  • 임용기
    • Lingua Humanitatis
    • /
    • v.5
    • /
    • pp.55-74
    • /
    • 2003
  • The issues concerning the nature of the attitude toward the study of national language may vary from country to country, depending on the national or racial characteristics. The problem domains and the methodologies dealing with them may vary accordingly. Ever since the Korean language was equipped with a writing system in the year of 1443 through King Sejong's long-cherished desire, investigations have been constantly made into the real nature of the language itself in pursuit of a better method for representing the spoken language in written form. This is how the study of the Korean language began to take shape. Among such investigations are Hunmin-jeong-eum(the Korean script: 1446) compiled by Jiphyon-jon, the royal office of schloarly researches, Doongguk-jeonghun-yokhun (the orthodox script of Korean: 1448), Hongmu-jeonghun-yeokhun(interlinear gloss for the Chinese script of the Ming Dynasty: 1455), An Orthodox Approach to Written Korean (1909) by the institute of the National Script, Re Standardized Spelling System (1933) by Chosun Language Society, An Authorized Dictionary of Standard Korean (1936), How to Write Borrowed Words(1940), and A Grand dictionary of Korea (1947-57). Chu Shi-Gyung's Phonetics of the Korean Script(1908), Korean Grammar(1910), and Sound Patterns of Korean(1914) were all written in this vein; so was Choi Hyun-Bae's Uri-mal-bon (the rudiments of Korean Grammar: 1929/1937). All these achievements in the study of the Korean language are the end-products of the constant endeavor to solve the issues related to the spoken and written farms of the Korean language. And this is how the uniqueness and autonomy of the language study in korea have been established. It should be borne in mind, however, that, in seeking solutions to the problems inherent in the Korean linguistic studies of foreign countries. On the contrary, they have been very active in accommodating such results. While they have set up their problem domains on the basis of the korean language, they been progressively open-minded in looking for the solutions to the problems at hand.

  • PDF

Selection of Korean General Vocabulary for Machine Readable Dictionaries (자연언어처리용 전자사전을 위한 한국어 기본어휘 선정)

  • 배희숙;이주호;시정곤;최기선
    • Language and Information
    • /
    • v.7 no.1
    • /
    • pp.41-54
    • /
    • 2003
  • According to Jeong Ho-seong (1999), Koreans use an average of only 20% of the 508,771 entries of the Korean standard unabridged dictionary. To establish MRD for natural language processing, it is necessary to select Korean lexical units that are used frequently and are considered as basic words. In this study, this selection process is done semi-automatically using the KAIST large corpus. Among about 220,000 morphemes extracted from the corpus of 40,000,000 eojeols, 50,637 morphemes (54,797 senses) are selected. In addition, the coverage of these morphemes in various texts is examined with two sub-corpora of different styles. The total coverage is 91.21 % in formal style and 93.24% in informal style. The coverage of 6,130 first degree morphemes is 73.64% and 81.45%, respectively.

  • PDF