• Title/Summary/Keyword: Korean nouns

Search Result 232, Processing Time 0.033 seconds

Analysis of the Directives and Wh-words in the Directives of Elementary Korean Textbooks (초등 국어교과서 지시문과 의문사 분석)

  • Lee, Suhyang
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.3
    • /
    • pp.134-140
    • /
    • 2022
  • The purpose of this study was to investigate the directives and Wh-words in the directives from elementary 2nd, 4th and 6th grade Korean textbooks. After entering all directives into Microsoft Office Excel, directives with Wh-words were separated. The analysis program, Natmal, was used for the analysis of the directives and Wh-words. The criteria from previous studies were also applied for this analysis process. As a result of the study, there are a lot of nouns and verbs in directives. They were consisted of sentences with an average of 6.9 Eojeol. There were a total of 11 types of Wh-words and 'Mueot(what), Eotteon(which), eotteohge(how)' appeared most frequently in all grades. For question types, both grades had more inferential questions than literal information questions. This results were expected to be used as basic data for language interventions with school aged children who have language disorders.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Exploring the Alternative to Discrepant Terms in Earth Science I·II Textbooks (지구과학 I·II 교과서에 수록된 불일치 용어의 대안 탐색)

  • Choe, Seung-Urn;Ham, Dong-Cheol;Yu, Hee-Won
    • Journal of the Korean earth science society
    • /
    • v.31 no.7
    • /
    • pp.813-826
    • /
    • 2010
  • The purpose of this study is to investigate discrepant Earth Science terms in high school curriculums and to explore the alternative to those terms. In this study, we defined discrepant terms as different terms which had the same meaning in Earth Science textbooks. Discrepant terms were compared with terms in references and precedent studies, and the preference by 284 of teachers and students was investigated. The results of this study are as follows: A number of discrepant terms were found in references as well as high school textbooks. Participants preferred terms that are more understandable, were learned previously, and were correct to loanword orthography. As for the cases of discrepant terms caused by different notation of proper nouns or different references and background knowledge, the alternative could be explored by the rule of loanword orthography or the journal publications. In conclusion, confusion may be reduced by utilizing common terms that are both based on authorized theory and easy to convey the meaning.

Aspects of Language Use in Newspaper Articles: A Corpus Linguistic Perspective (신문 기사의 언어 사용 양상: 코퍼스언어학적 접근)

  • Song, Kyung-Hwa;Kang, Beom-Mo
    • Korean Journal of Cognitive Science
    • /
    • v.17 no.4
    • /
    • pp.255-269
    • /
    • 2006
  • The purpose of this study is to analyze newspaper articles from corpus linguistic point of view. We used a large corpus of newspaper articles built from <21st century Sejong Project> and counted occurrences of certain expressions. A newspaper article is divided into the headline, the lead and the body. We tried to figure out how to measure the characteristics of indication and compression which are typical to headlines. Then, we focused on the differences between the headline and the lead. finally, we analyzed the sentence structure and measured the ratio of the frequency of common nouns in the body. This study verifies the existing stylistic theories of newspapers and shows new aspects of language use in newspaper articles. Texts like newspaper articles are the results of human language processing and they in turn affect the development of cognitive ability of language.

  • PDF

Two Statistical Models for Automatic Word Spacing of Korean Sentences (한글 문장의 자동 띄어쓰기를 위한 두 가지 통계적 모델)

  • 이도길;이상주;임희석;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.358-371
    • /
    • 2003
  • Automatic word spacing is a process of deciding correct boundaries between words in a sentence including spacing errors. It is very important to increase the readability and to communicate the accurate meaning of text to the reader. The previous statistical approaches for automatic word spacing do not consider the previous spacing state, and thus can not help estimating inaccurate probabilities. In this paper, we propose two statistical word spacing models which can solve the problem of the previous statistical approaches. The proposed models are based on the observation that the automatic word spacing is regarded as a classification problem such as the POS tagging. The models can consider broader context and estimate more accurate probabilities by generalizing hidden Markov models. We have experimented the proposed models under a wide range of experimental conditions in order to compare them with the current state of the art, and also provided detailed error analysis of our models. The experimental results show that the proposed models have a syllable-unit accuracy of 98.33% and Eojeol-unit precision of 93.06% by the evaluation method considering compound nouns.

Analysis of Values through the Establishment of a Concept of Eco-friendly Design - Focusing on an Analysis of the Contents of Previous Studies - (친환경 디자인의 개념정립에 따른 가치 분석 - 선행연구의 내용분석을 중심으로 -)

  • Ha, Seung-Yeon;Park, Jae-Ok
    • Journal of the Korean Society of Costume
    • /
    • v.59 no.9
    • /
    • pp.146-162
    • /
    • 2009
  • In the current product and fashion design, the 'eco-friendliness' is affecting practically and conceptually on all the sectors of industry and culture. Therefore, this study seeks to examine specific values in the concept of eco-friendly design. The subjects of this paper are studied on the scholarly journals, and are confined to those from 1990, when naturalism and ecology trend started to be in product and fashion, to the moment of search of February 2009. This study used 'Naturalism', 'Green', 'Environment-friendly', 'Eco', 'Sustainable', 'Well-being' and 'Lohas' as key words for the search. Analysis is performed by content analysis and the unit of analysis was based upon the adjectives, nouns and phrases which is related key words in the concept of eco-friendly design. The study realized that there are personal value, environmental value, economic value, and social value in the concept of eco-friendly design. In the result, it is not enough to consider the effect on environment only. Understanding the personal, environmental, economic, and social value from the viewpoint of customers, finding the optimal design factors, and reflecting them in development of product and fashion are necessary to pave the way for advanced eco-friendly design. The results of this paper would help to the future product and fashion development for eco-friendly brands.

HMM-based Korean Named Entity Recognition (HMM에 기반한 한국어 개체명 인식)

  • Hwang, Yi-Gyu;Yun, Bo-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.2
    • /
    • pp.229-236
    • /
    • 2003
  • Named entity recognition is the process indispensable to question answering and information extraction systems. This paper presents an HMM based named entity (m) recognition method using the construction principles of compound words. In Korean, many named entities can be decomposed into more than one word. Moreover, there are contextual relationships among nouns in an NE, and among an NE and its surrounding words. In this paper, we classify words into a word as an NE in itself, a word in an NE, and/or a word adjacent to an n, and train an HMM based on NE-related word types and parts of speech. Proposed named entity recognition (NER) system uses trigram model of HMM for considering variable length of NEs. However, the trigram model of HMM has a serious data sparseness problem. In order to solve the problem, we use multi-level back-offs. Experimental results show that our NER system can achieve an F-measure of 87.6% in the economic articles.

Functional Expansion of Morphological Analyzer Based on Longest Phrase Matching For Efficient Korean Parsing (효율적인 한국어 파싱을 위한 최장일치 기반의 형태소 분석기 기능 확장)

  • Lee, Hyeon-yoeng;Lee, Jong-seok;Kang, Byeong-do;Yang, Seung-weon
    • Journal of Digital Contents Society
    • /
    • v.17 no.3
    • /
    • pp.203-210
    • /
    • 2016
  • Korean is free of omission of sentence elements and modifying scope, so managing it on morphological analyzer is better than parser. In this paper, we propose functional expansion methods of the morphological analyzer to ease the burden of parsing. This method is a longest phrase matching method. When the series of several morpheme have one syntax category by processing of Unknown-words, Compound verbs, Compound nouns, Numbers and Symbols, our method combines them into a syntactic unit. And then, it is to treat by giving them a semantic features as syntax unit. The proposed morphological analysis method removes unnecessary morphological ambiguities and deceases results of morphological analysis, so improves accuracy of tagger and parser. By empirical results, we found that our method deceases 73.4% of Parsing tree and 52.4% of parsing time on average.

Study on the Development of Guidelines for Thesaurus Construction at University Archives: Case Study of Myongji University Archives Center (대학기록관 시소러스 구축 지침의 개발 연구 - 명지대학교 대학사료실의 사례를 중심으로 -)

  • Rieh, Hae-Young;Lee, Mi-Yeong;Lee, Eun-Yeong;Lee, Hyeok-Jun;Lee, Hyeon-Jeong;Choe, Yeong-Sil;Park, Mi-Ja
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.8 no.1
    • /
    • pp.189-210
    • /
    • 2008
  • Some issues and solutions considered for the various situations that we faced in the process of developing guidelines of thesaurus construction are described in this paper. There were many proper names and proper nouns among the terms considered in the process. The thesaurus needed to include a function of an authority file. Preferred terms were selected based on what the university's official records would use. The scope of the proper names for inclusion was the people who held official positions in the university and the people who were the subject of the materials. However, when the system allows synthesized retrieval of the field of creator and donor, inclusion of too many names were considered unnecessary.

Related Documents Classification System by Similarity between Documents (문서 유사도를 통한 관련 문서 분류 시스템 연구)

  • Jeong, Jisoo;Jee, Minkyu;Go, Myunghyun;Kim, Hakdong;Lim, Heonyeong;Lee, Yurim;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.77-86
    • /
    • 2019
  • This paper proposes using machine-learning technology to analyze and classify historical collected documents based on them. Data is collected based on keywords associated with a specific domain and the non-conceptuals such as special characters are removed. Then, tag each word of the document collected using a Korean-language morpheme analyzer with its nouns, verbs, and sentences. Embedded documents using Doc2Vec model that converts documents into vectors. Measure the similarity between documents through the embedded model and learn the document classifier using the machine running algorithm. The highest performance support vector machine measured 0.83 of F1-score as a result of comparing the classification model learned.