• Title/Summary/Keyword: 한국어 언어모델

Search Result 1,028, Processing Time 0.027 seconds

Implementation of the Automatic Segmentation and Labeling System (자동 음성분할 및 레이블링 시스템의 구현)

  • Sung, Jong-Mo;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.5
    • /
    • pp.50-59
    • /
    • 1997
  • In this paper, we implement an automatic speech segmentation and labeling system which marks phone boundaries automatically for constructing the Korean speech database. We specify and implement the system based on conventional speech segmentation and labeling techniques, and also develop the graphic user interface(GUI) on Hangul $Motif^{TM}$ environment for the users to examine the automatic alignment boundaries and to refine them easily. The developed system is applied to 16kHz sampled speech, and the labeling unit is composed of 46 phoneme-like units(PLUs) and silence. The system uses both of the phonetic and orthographic transcription as input methods of linguistic information. For pattern-matching method, hidden Markov models(HMM) is employed. Each phoneme model is trained using the manually segmented 445 phonetically balanced word (PBW) database. In order to evaluate the performance of the system, we test it using another database consisting of sentence-type speech. According to our experiment, 74.7% of phoneme boundaries are within 20ms of the true boundary and 92.8% are within 40ms.

  • PDF

A Comparative Study of Second Language Acquisition Models: Focusing on Vowel Acquisition by Chinese Learners of Korean (중국인 학습자의 한국어 모음 습득에 대한 제2언어 습득 모델 비교 연구)

  • Kim, Jooyeon
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.27-36
    • /
    • 2014
  • This study provided longitudinal examination of the Chinese learners' acquisition of Korean vowels. Specifically, I examined the Chinese learners' Korean monophthongs /i, e, ɨ, ${\Lambda}$, a, u, o/ that were created at the time of 1 month and 12 months, tried to verify empirically how they learn by dealing with their mother tongue, and Korean vowels through dealing with pattern of the Perceptual Assimilation Model (henceforth PAM) of Best (Best, 1993; 1994; Best & Tyler, 2007) and the Speech Learning Model (henceforth SLM) of Flege (Flege, 1987; Bohn & Flege, 1992, Flege, 1995). As a result, most of the present results are shown to be similarly explained by the PAM and SLM, and the only discrepancy between these two models is found in the 'similar' category of sounds between the learners' native language and the target language. Specifically, the acquisition pattern of /u/ and /o/ in Korean is well accounted for the PAM, but not in the SLM. The SLM did not explain why the Chinese learners had difficulty in acquiring the Korean vowel /u/, because according to the SLM, the vowel /u/ in Chinese (the native language) is matched either to the vowel /u/ or /o/ in Korean (the target language). Namely, there is only a one-to-one matching relationship between the native language and the target language. In contrast, the Chinese learners' difficulty for the Korean vowel /u/ is well accounted for in the PAM in that the Chinese vowel /u/ is matched to the vowel pair /o, u/ in Korean, not the single vowel, /o/ or /u/.

3D Graphic Nursery Contents Developed by Mobile AR Technology (모바일 기반 증강현실 기술을 활용한 3D전래동화 콘텐츠 연구)

  • Park, Young-sook;Park, Dea-woo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.11
    • /
    • pp.2125-2130
    • /
    • 2016
  • In this paper, we researched the excellency of 3D graphic nursery contents which is developed by mobile AR technology. AR technology has currently people's attention because of the potential to be core contents of future ICT industry. We applied AR nursery contents for kid's subtitle language selection in Korean, Chinese and English education. The original fairy tale consisted of 6~8 scenes for the 3D contents production, and was adapted and translated. Dubbing was dubbed by the native speaker using the standard pronunciation, and the effect sound was edited separately to fit the scene. After composing a scenario, constructing a 3D model, constructing a interaction, constructing a sound effect, and creating content metadata, the Unity 3D game engine is executed to create a project and describe it as a script. It provides a fun and informative tradition of fairy tales with abundant content that incorporates ICT technology, accepting advanced technology-based education, and having opportunities to perceive software in daily life.

An An.0, pproach to the Reorganization of University Libraries in the 21st Century

  • 홍현진;이병목
    • Journal of Korean Library and Information Science Society
    • /
    • v.29
    • /
    • pp.443-464
    • /
    • 1998
  • 21세기를 맞이하여 대학도서관은 정보기술의 도입, 업무내용의 변화, 이용자의 요구변화등 급격하게 변화하는 새로운 환경에 직면해 있다. 본 연구는 한국의 대학도서관 조직구조의 현황에 대한 분석과 함께 다양한 조직이론들과 정보환경의 변화에 기초해서 도서관조직을 활성화시키기위한 개념적인 조직모델을 제시하고자 한다. 한국의 대학도서관은 거의 10년동안 법적인 제약과 조직내외의 환경적인 한계 등으로 인해 전산화시스템의 도입, 도서관부관장의 임명, 그리고 도서관과 컴퓨터 센터와의 통합시도와 같은 약간의 변화외에는 거의 변화가 없었다. 전형적인 한국의 대학도서관은 수서, 기술서비스, 열람과 참고봉사 부문으로 조직되었다. 여기서 수서 기능을 기술서비스의 부문으로 간주한다면, 본 연구의 대상인 대학도서관 114개관 중 95개관(82.5%)이 전통적인 도서관조직의 형태인 기술서비스와 공공서비스 부문으로 조직된 것으로 나타났다. 본 연구에서는 전통적인 도서관조직의 문제점들을 급복할 수 있는 21세기의 개념적인 대학도서관 조직모델로서, 네가지 부문 - 서비스 부문, 서비스지원 부문, 기술지원 부문, 그리고 통합·조정부문-을 대학도서관의 개념적인 기본 구성요소로써 제안하였다. 그러나 모든 도서관의 서비스나 업무과정에 대해 적합한 잉상적인 조직구조는 없으며, 조직의 재조직과정은 도서관의 형태와 목적, 업무과정에 따라 매우 다양하다. 따라서 도서관의 재조직화는 환경의 변화에 따라 끊임없는 과정이 될 것이며, 도서관조직의 성공은 이러한 변화에 적응할 수 있는 개인과 조직의 역량에 달려있다고 하겠다.대한 순서에 있어서 차이가 있다. 4) 도서관에 대한 태도에 있어서 두 집단은 상이한 입장을 보이고 있다. 학자들의 과반수는 중요 정보원으로서 자신의 개인장서를 활용하며, 도서관의 장서 및 그 조직방법에 대해서도 별로 만족하지를 못하고 있다. 반면에, 실무가들은 도서관에 대하여 비교적 만족하며 따라서 도서관에 대한 이용도도 높다. 5) 두 집단 모두 보조인을 적극적으로 활용하지 않으며 사서의 도움을 받는 경우도 극소수에 불과하다. 이러한 조사결과를 기초로 하여 볼 때 법률전문직을 둘러싼 정보환경을 개선하기 위하여는, 인쇄된 일차적 정보자료의 검색방법등을 개선하고, 나아가서는 법령과 판례정보를 위한 효율적인 시스템을 구축하며, 뿐만 아니라 이용자의 요구에 충분히 대처할 수 잇는 도서관으로 변화되는 것이다. 이와 함께 가장 중요한 것은 법과대학과 사법연수원에서 법학 연구방법에 관한 강좌를 개설하여 각종 법률정보원의 활용 내지 도서관 이용방법에 관하여 교육하는 것이다.글을 연구하고, 그 결과에 의존하여서 우리의 실제의 생활에 사용하는 $\boxDr$한국어사전$\boxUl$등을 만드는 과정에서, 어떤 의미에서 실험되었다고 말할 수가 있는 언어과학의 연구의 결과에 의존하여서 수행되는 철학적인 작업이다. 여기에서는 하나의 철학적인 연구의 시작으로 받아들여지는 이 의미분석의 문제를 반성하여 본다. 것이 필요하다고 사료된다.크기에 의존하며, 또한 이러한 영향은 $(Ti_{1-x}AI_{x})N$ 피막에 존재하는 AI의 함량이 높고, 초기에

  • PDF

GenAI(Generative Artificial Intelligence) Technology Trend Analysis Using Bigkinds: ChatGPT Emergence and Startup Impact Assessment (빅카인즈를 활용한 GenAI(생성형 인공지능) 기술 동향 분석: ChatGPT 등장과 스타트업 영향 평가)

  • Lee, Hyun Ju;Sung, Chang Soo;Jeon, Byung Hoon
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.18 no.4
    • /
    • pp.65-76
    • /
    • 2023
  • In the field of technology entrepreneurship and startups, the development of Artificial Intelligence(AI) has emerged as a key topic for business model innovation. As a result, venture firms are making various efforts centered on AI to secure competitiveness(Kim & Geum, 2023). The purpose of this study is to analyze the relationship between the development of GenAI technology and the startup ecosystem by analyzing domestic news articles to identify trends in the technology startup field. Using BIG Kinds, this study examined the changes in GenAI-related news articles, major issues, and trends in Korean news articles from 1990 to August 10, 2023, focusing on the emergence of ChatGPT before and after, and visualized the relevance through network analysis and keyword visualization. The results of the study showed that the mention of GenAI gradually increased in the articles from 2017 to 2023. In particular, OpenAI's ChatGPT service based on GPT-3.5 was highlighted as a major issue, indicating the popularization of language model-based GenAI technologies such as OpenAI's DALL-E, Google's MusicLM, and VoyagerX's Vrew. This proves the usefulness of GenAI in various fields, and since the launch of ChatGPT, Korean companies have been actively developing Korean language models. Startups such as Ritten Technologies are also utilizing GenAI to expand their scope in the technology startup field. This study confirms the connection between GenAI technology and startup entrepreneurship activities, which suggests that it can support the construction of innovative business strategies, and is expected to continue to shape the development of GenAI technology and the growth of the startup ecosystem. Further research is needed to explore international trends, the utilization of various analysis methods, and the possibility of applying GenAI in the real world. These efforts are expected to contribute to the development of GenAI technology and the growth of the startup ecosystem.

  • PDF

Audience Cognitive Reconstruction of the Extended Meaning of Complex Mechanism Text : For Communication Education using Story Media Expressions (복합기제 텍스트의 확장 의미에 대한 수용자의 인지적 재구성 : 서사적 미디어 표현을 활용한 의사소통 교육을 위해)

  • Lim, Ji-Won
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.7
    • /
    • pp.137-143
    • /
    • 2021
  • This discussion can be said to be a qualitative study on the possibility of linking communication education for college students and literacy education for Korean language-linked educators based on the theory of interpretation of cognitive meaning of media text containing complex mechanisms. The implicit meaning of media content expression used as an interactive communication strategy will be accepted as a multilateral interpretation according to the individual learner's cognitive environment. If so, how is the general media content meaning intended by the content creator being accepted? These doubts are the starting point for discussion. To solve the problem, I leaned on the experimental pragmatic methodology of cognitive aesthetics and applied a model of relevance of cognitive linguistics to connect learners' creative cognitive environment and present content to find a contrast. As a result of the discussion, it was possible to establish a basic framework for learners to express their subjectivity and creative thinking that could connect the cognitive environment and present content themselves. In particular, active and positive learners also revealed direct descriptive expressions to build a new cognitive environment, such as suggesting a third alternative to argue the ability to question produced media texts and the validity of the meaning implied in the text. In the future, since media text containing complex mechanisms is an indirect and persuasive communication behavior that occurs easily through various media in modern society, the universal communication principle of reliable conversation between media text creators and audiences should exist.

KOMUChat: Korean Online Community Dialogue Dataset for AI Learning (KOMUChat : 인공지능 학습을 위한 온라인 커뮤니티 대화 데이터셋 연구)

  • YongSang Yoo;MinHwa Jung;SeungMin Lee;Min Song
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.219-240
    • /
    • 2023
  • Conversational AI which allows users to interact with satisfaction is a long-standing research topic. To develop conversational AI, it is necessary to build training data that reflects real conversations between people, but current Korean datasets are not in question-answer format or use honorifics, making it difficult for users to feel closeness. In this paper, we propose a conversation dataset (KOMUChat) consisting of 30,767 question-answer sentence pairs collected from online communities. The question-answer pairs were collected from post titles and first comments of love and relationship counsel boards used by men and women. In addition, we removed abuse records through automatic and manual cleansing to build high quality dataset. To verify the validity of KOMUChat, we compared and analyzed the result of generative language model learning KOMUChat and benchmark dataset. The results showed that our dataset outperformed the benchmark dataset in terms of answer appropriateness, user satisfaction, and fulfillment of conversational AI goals. The dataset is the largest open-source single turn text data presented so far and it has the significance of building a more friendly Korean dataset by reflecting the text styles of the online community.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.