• Title/Summary/Keyword: morpheme frequency

Search Result 28, Processing Time 0.031 seconds

Affixation effects on word-final coda deletion in spontaneous Seoul Korean speech

  • Kim, Jungsun
    • Phonetics and Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.9-14
    • /
    • 2016
  • This study investigated the patterns of coda deletion in spontaneous Seoul Korean speech. More specifically, the current study focused on three factors in promoting coda deletion, namely, word position, consonant type, and morpheme type. The results revealed that, first, coda deletion frequently occurred when affixes were attached to the ends of words, rather than in affixes in word-internal positions or in roots. Second, alveolar consonants [n] and [l] in the coda positions of high-frequency affixes [nɨn] and [lɨl] were most likely to be deleted. Additionally, regarding affix reduction in the word-final position, all subjects seemed to depend on this articulatory strategy to a similar degree. In sum, the current study found that affixes without primary semantic content in spontaneous speech tend to undergo the process of reduction, favoring the occurrence of specific pronunciation variants.

Keyword Analysis Based Document Compression System

  • Cao, Kerang;Lee, Jongwon;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.1
    • /
    • pp.48-51
    • /
    • 2018
  • The traditional documents analysis was centered on words based system was implemented using a morpheme analyzer. These traditional systems can classify used words in the document but, cannot help to user's document understanding or analysis. In this problem solved, System needs extract for most valuable paragraphs what can help to user understanding documents. In this paper, we propose system extracts paragraphs of normalized XML document. User insert to system what filename when wants for analyze XML document. Then, system is search for keyword of the document. And system shows results searched keyword. When user choice and inserts keyword for user wants then, extracting for paragraph including keyword. After extracting paragraph, system operating maintenance paragraph sequence and check duplication. If exist duplication then, system deletes paragraph of duplication. And system informs result to user what counting each keyword frequency and weight to user, sorted paragraphs.

An Automatic Korean Lexical Acquisition System (한국어 어휘자동획득 시스템)

  • Lim, Heui-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.5
    • /
    • pp.1087-1091
    • /
    • 2007
  • This paper proposes a automatic korean lexical acquisition system which reflects the characteristics of human language acquisition. The proposed system automatically builds two kinds of lexicon, full-form lexicon and decomposition using Korean corpus as its input. As the experimental results using Korean Sejeong corpus of which size is 10 million Eojeols, the system acquired 2,097 full-form Eojeols and 3,488 morphemes. The accumulated frequency of the acquired full-form Eojeols covers the 38.63% of the input corpus and accuracy of morpheme acquisition is 99.87%.

  • PDF

Recommendation System using Associative Web Document Classification by Word Frequency and α-Cut (단어 빈도와 α-cut에 의한 연관 웹문서 분류를 이용한 추천 시스템)

  • Jung, Kyung-Yong;Ha, Won-Shik
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.1
    • /
    • pp.282-289
    • /
    • 2008
  • Although there were some technological developments in improving the collaborative filtering, they have yet to fully reflect the actual relation of the items. In this paper, we propose the recommendation system using associative web document classification by word frequency and ${\alpha}$-cut to address the short comings of the collaborative filtering. The proposed method extracts words from web documents through the morpheme analysis and accumulates the weight of term frequency. It makes associative rules and applies the weight of term frequency to its confidence by using Apriori algorithm. And it calculates the similarity among the words using the hypergraph partition. Lastly, it classifies related web document by using ${\alpha}$-cut and calculates similarity by using adjusted cosine similarity. The results show that the proposed method significantly outperforms the existing methods.

On the base inflectional forms of Korean old vernacular letters (언간에 나타나는 어기활용형에 대한 고찰)

  • Lee, Hyun-Ju
    • (The)Study of the Eastern Classic
    • /
    • no.56
    • /
    • pp.297-329
    • /
    • 2014
  • This paper aims to examine the base inflectional forms of Korean old vernacular letters, and explain why it appears with frequency. In the korean old vernacular letters, the suffix 'ha-' and ending of the 'Base+ha-' adjective derivation are not appear with extraordinary frequency. I called it the base inflectional forms. I consider it in function and morphological constructions and also the syntactic constructions. Whenever Joseon-era people wrote a letter with a time limit, they have need to diminish their exertion to use of the brush. Therefore the base inflectional forms appear with extraordinary frequency in comparison with other papers. In the 'X ha-' word formation of Korean old vernacular letters, 'ha-' is formal morpheme without substantial meaning. So 'X' is left and 'ha-' and ending can be omitted resolutely. The base inflectional forms are occurred to voluntary language performance for a particular intention. but it is not appear in all conditions. In some circumstances, it appear. I checked out the constructions on base inflectional forms. In the 'X ha-' word formation, 'X' is predicative base without fail. and the ending which take part in base inflectional forms has a grammatical function unadulteratedly.

A Recognition Method for Korean Spatial Background in Historical Novels (한국어 역사 소설에서 공간적 배경 인식 기법)

  • Kim, Seo-Hee;Kim, Seung-Hoon
    • Journal of Information Technology Services
    • /
    • v.15 no.1
    • /
    • pp.245-253
    • /
    • 2016
  • Background in a novel is most important elements with characters and events, and means time, place and situation that characters appeared. Among the background, spatial background can help conveys topic of a novel. So, it may be helpful for choosing a novel that readers want to read. In this paper, we are targeting Korean historical novels. In case of English text, It can be recognize spatial background easily because it use upper and lower case and words used with the spatial information such as Bank, University and City. But, in case Korean text, it is difficult to recognize that spatial background because there is few information about usage of letter. In the previous studies, they use machine learning or dictionaries and rules to recognize about spatial information in text such as news and text messages. In this paper, we build a nation dictionaries that refer to information such as 'Korean history' and 'Google maps.' We Also propose a method for recognizing spatial background based on patterns of postposition in Korean sentences comparing to previous works. We are grasp using of postposition with spatial background because Korean characteristics. And we propose a method based on result of morpheme analyze and frequency in a novel text for raising accuracy about recognizing spatial background. The recognized spatial background can help readers to grasp the atmosphere of a novel and to understand the events and atmosphere through recognition of the spatial background of the scene that characters appeared.

A Convergence Study for Development of Psychological Language Analysis Program: Comparison of Existing Programs and Trend Analysis of Related Literature (심리학적 언어분석 프로그램 개발을 위한 융합연구: 기존 프로그램의 비교와 관련 문헌의 동향 분석)

  • Kim, Youngjun;Choi, Wonil;Kim, Tae Hoon
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.1-18
    • /
    • 2021
  • While content word-based frequency analysis has obvious limitations to intentional deception or irony, KLIWC has evolved into functional word analysis and KrKwic has evolved as a way to visualize co-occurrence frequencies. However, after more than 10 years of development, several issues still need improvement. Therefore, we tried to develop a new psychological language analysis program by analyzing KLIWC and KrKwic. First, the two programs were analyzed. In particular, the morpheme classification of KLIWC and the Korean morpheme analyzer was compared to enhance the functional word analysis function, and the psychological dictionary were analyzed to strengthen the psychological analysis. As a result of the analysis, the Hannanum part-of-speech analyzer was the most subdivided, but KLIWC for personal pronouns and KKMA for endings and endings were more subdivided, suggesting the integrated use of multiple part-of-speech analyzers to strengthen functional word analysis. Second, the research trends of studies that analyzed texts with these programs were analyzed. As a result of the analysis, the two programs were used in various academic fields, including the field of Interdisciplinary Studies. In particular, KrKwic was used a lot for the analysis of papers and reports, and KLIWC was used a lot for the comparative study of the writer's thoughts, emotions, and personality. Based on these results, the necessity and direction of development of a new psychological language analysis program were suggested.

XML Document Keyword Weight Analysis based Paragraph Extraction Model (XML 문서 키워드 가중치 분석 기반 문단 추출 모델)

  • Lee, Jongwon;Kang, Inshik;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2133-2138
    • /
    • 2017
  • The analysis of existing XML documents and other documents was centered on words. It can be implemented using a morpheme analyzer, but it can classify many words in the document and cannot grasp the core contents of the document. In order for a user to efficiently understand a document, a paragraph containing a main word must be extracted and presented to the user. The proposed system retrieves keyword in the normalized XML document. Then, the user extracts the paragraphs containing the keyword inputted for searching and displays them to the user. In addition, the frequency and weight of the keyword used in the search are informed to the user, and the order of the extracted paragraphs and the redundancy elimination function are minimized so that the user can understand the document. The proposed system can minimize the time and effort required to understand the document by allowing the user to understand the document without reading the whole document.

Design and Implementation of Minutes Summary System Based on Word Frequency and Similarity Analysis (단어 빈도와 유사도 분석 기반의 회의록 요약 시스템 설계 및 구현)

  • Heo, Kanhgo;Yang, Jinwoo;Kim, Donghyun;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.10
    • /
    • pp.620-629
    • /
    • 2019
  • An automated minutes summary system is required to objectively summarize and classify the contents of discussions or discussions for decision making. This paper designs and implements a minutes summary system using word2vec model to complement the existing minutes summary system. The proposed system is further implemented with word2vec model to remove index words during morpheme analysis and to extract representative sentences with common opinions from documents. The proposed system automatically classifies documents collected during the meeting process and extracts representative sentences representing the agenda among various opinions. The conference host can quickly identify and manage all the agendas discussed at the meeting through the proposal system. The proposed system analyzes various agendas of large-scale debates or discussions and summarizes sentences that can be representative opinions to support fast and accurate decision making.

Analysis of interest in non-face-to-face medical counseling of modern people in the medical industry (의료 산업에 있어 현대인의 비대면 의학 상담에 대한 관심도 분석 기법)

  • Kang, Yooseong;Park, Jong Hoon;Oh, Hayoung;Lee, Se Uk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1571-1576
    • /
    • 2022
  • This study aims to analyze the interest of modern people in non-face-to-face medical counseling in the medical industrys. Big data was collected on two social platforms, 지식인, a platform that allows experts to receive medical counseling, and YouTube. In addition to the top five keywords of telephone counseling, "internal medicine", "general medicine", "department of neurology", "department of mental health", and "pediatrics", a data set was built from each platform with a total of eight search terms: "specialist", "medical counseling", and "health information". Afterwards, pre-processing processes such as morpheme classification, disease extraction, and normalization were performed based on the crawled data. Data was visualized with word clouds, broken line graphs, quarterly graphs, and bar graphs by disease frequency based on word frequency. An emotional classification model was constructed only for YouTube data, and the performance of GRU and BERT-based models was compared.