• Title/Summary/Keyword: 핵심단어 학습

Search Result 38, Processing Time 0.023 seconds

A Single-End-Point DTW Algorithm for Keyword Spotting (핵심어 검출을 위한 단일 끝점 DTW알고리즘)

  • 최용선;오상훈;이수영
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.3
    • /
    • pp.209-219
    • /
    • 2004
  • In order to implement a real time hardware for keyword spotting, we propose a Single-End-Point DTW(SEP-DTW) algorithm which is simple and less complex for computation. The SEP-DTW algorithm only needs a single end point which enables efficient applications, and it has a small wont of computations because the global search area is divided into successive local search areas. Also, we adopt new local constraints and a new distance measure for a better performance of the SEP-DTW algorithm. Besides, we make a normalization of feature same vectors so that they have the same variance in each frequency bin, and each frame has the same energy levels. To construct several reference patterns for each keyword, we use a clustering algorithm for all training patterns, and mean vectors in every cluster are taken as reference patterns. In order to detect a key word for input streams of speech, we measure the distances between reference patterns and input pattern, and we make a decision whether the distances are smaller than a pre-defined threshold value. With isolated speech recognition and keyword spotting experiments, we verify that the proposed algorithm has a better performance than other methods.

A Keyphrase Extraction Model for Each Conference or Journal (학술대회 및 저널별 기술 핵심구 추출 모델)

  • Jeong, Hyun Ji;Jang, Gwangseon;Kim, Tae Hyun;Sin, Donggu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.81-83
    • /
    • 2022
  • Understanding research trends is necessary to select research topics and explore related works. Most researchers search representative keywords of interesting domains or technologies to understand research trends. However some conferences in artificial intelligence or data mining fields recently publish hundreds to thousands of papers for each year. It makes difficult for researchers to understand research trend of interesting domains. In our paper, we propose an automatic technology keyphrase extraction method to support researcher to understand research trend for each conference or journal. Keyphrase extraction that extracts important terms or phrases from a text, is a fundamental technology for a natural language processing such as summarization or searching, etc. Previous keyphrase extraction technologies based on pretrained language model extract keyphrases from long texts so performances are degraded in short texts like titles of papers. In this paper, we propose a techonolgy keyphrase extraction model that is robust in short text and considers the importance of the word.

  • PDF

Understand the Current Status of Teaching and Learning Informatization and Develop Indicators in the 4th Industrial Revolution (4차산업혁명 시대를 대비한 대학의 교수학습 정보화 현황 파악 및 지표 개발)

  • Kim, Sang-Woo;Lee, Myung-Suk
    • Journal of Digital Convergence
    • /
    • v.18 no.4
    • /
    • pp.67-74
    • /
    • 2020
  • The purpose of this study has developed a teaching and learning informatization indicator that provides the basis for utilizing or disseminating the beneficial teaching and learning informatization environment promoted by each university. The research method analyzes various informatization indicators developed by KERIS from 2002 to 2015 and recent environment such as Edutech, future education report, teaching and learning field report, and reflects them in indicator development. The development of the third indicator was completed by dividing it into Input, Process, Output stages by reflecting expert opinions in the first and second indicators. As a result, the core words of the university's teaching-learning informatization infrastructure building, sharing of educational resources, open development and sharing, joint purchase of resources, information safety system and literacy education, current status grasping, and resource utilization were derived. In the future, I will fill out the questionnaire to supplement the question through a pilot test and to grasp the current status of teaching and learning informatization in the entire university.

A Study on Core Values Appeared in Missions and Visions of School Library Standards (학교도서관 기준의 사명과 비전에 나타난 핵심 가치에 대한 연구)

  • Song, Gi-Ho
    • Journal of Korean Library and Information Science Society
    • /
    • v.40 no.3
    • /
    • pp.225-247
    • /
    • 2009
  • The values of a organization generate belief and activate participation of community members. Accordingly, it is necessary to find new core values of the school library in addition to access and assistance which are the library's traditional values. This study analyzed the central keywords in the contents of mission and vision appeared in the International School Library Standards and acquired core values being composed of 5 fields such as education, collaboration, access, cultural awareness and sensitivity, and democratic citizenship. As a result of analyses of missions and visions in America, Australia, UK, and Canada School Library Standards, it seems that life-long learning skills, physical access and social responsibility are common core values. America and UK School Library Standards which were revised recently are underlining technology skills, multiple literacies and integrated information literacy curriculum particularly. Ultimately, these core values will be utilized as the directions and basic materials for the establishment of missions and visions for the school library in Korea.

  • PDF

Automatic Meeting Summary System using Enhanced TextRank Algorithm (향상된 TextRank 알고리즘을 이용한 자동 회의록 생성 시스템)

  • Bae, Young-Jun;Jang, Ho-Taek;Hong, Tae-Won;Lee, Hae-Yeoun
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.5
    • /
    • pp.467-474
    • /
    • 2018
  • To organize and document the contents of meetings and discussions is very important in various tasks. However, in the past, people had to manually organize the contents themselves. In this paper, we describe the development of a system that generates the meeting minutes automatically using the TextRank algorithm. The proposed system records all the utterances of the speaker in real time and calculates the similarity based on the appearance frequency of the sentences. Then, to create the meeting minutes, it extracts important words or phrases through a non-supervised learning algorithm for finding the relation between the sentences in the document data. Especially, we improved the performance by introducing the keyword weighting technique for the TextRank algorithm which reconfigured the PageRank algorithm to fit words and sentences.

Optimizing Language Models through Dataset-Specific Post-Training: A Focus on Financial Sentiment Analysis (데이터 세트별 Post-Training을 통한 언어 모델 최적화 연구: 금융 감성 분석을 중심으로)

  • Hui Do Jung;Jae Heon Kim;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.57-67
    • /
    • 2024
  • This research investigates training methods for large language models to accurately identify sentiments and comprehend information about increasing and decreasing fluctuations in the financial domain. The main goal is to identify suitable datasets that enable these models to effectively understand expressions related to financial increases and decreases. For this purpose, we selected sentences from Wall Street Journal that included relevant financial terms and sentences generated by GPT-3.5-turbo-1106 for post-training. We assessed the impact of these datasets on language model performance using Financial PhraseBank, a benchmark dataset for financial sentiment analysis. Our findings demonstrate that post-training FinBERT, a model specialized in finance, outperformed the similarly post-trained BERT, a general domain model. Moreover, post-training with actual financial news proved to be more effective than using generated sentences, though in scenarios requiring higher generalization, models trained on generated sentences performed better. This suggests that aligning the model's domain with the domain of the area intended for improvement and choosing the right dataset are crucial for enhancing a language model's understanding and sentiment prediction accuracy. These results offer a methodology for optimizing language model performance in financial sentiment analysis tasks and suggest future research directions for more nuanced language understanding and sentiment analysis in finance. This research provides valuable insights not only for the financial sector but also for language model training across various domains.

Bootstrapping for Semantic Role Assignment of Korean Case Marker (부트스트래핑 알고리즘을 이용한 한국어 격조사의 의미역 결정)

  • Kim Byoung-Soo;Lee Yong-Hun;Na Seung-Hoon;Kim Jun-Gi;Lee Jong-Hyeok
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06b
    • /
    • pp.4-6
    • /
    • 2006
  • 본 논문은 자연언어처리에서 문장의 서술어와 그 서술어가 가지는 명사 논항들 사이의 문법관계를 의미 관계로 사상하는 즉 논항이 서술어에 대해 가지는 역할을 정하는 문제를 다루고 있다. 의미역 결정은 단어의 의미 중의성 해소와 함께 자연언어의 의미 분석의 핵심 문제 중 하나이며 반드시 해결해야 하는 매우 중요한 문제 중 하나이다. 본 연구에서는 언어학적으로 유용한 자원인 세종전자사전을 이용하여 용언격틀사전을 구축하고 격틀 선택 방법으로 의미역을 결정한 후. 결정된 의미역들에 대한 확률 정보를 확률 모델에 적용하여 반복적으로 학습하는 부트스트래핑(Bootstrapping) 알고리즘을 사용하였다. 실험 결과, 기본 모델에 대해 10% 정도의 성능 향상을 보였다.

  • PDF

Recognition of Korean Implicit Citation Sentences Using Machine Learning with Lexical Features (어휘 자질 기반 기계 학습을 사용한 한국어 암묵 인용문 인식)

  • Kang, In-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.8
    • /
    • pp.5565-5570
    • /
    • 2015
  • Implicit citation sentence recognition is to locate citation sentences which lacks explicit citation markers, from articles' full-text. State-of-the-art approaches exploit word ngrams, clue words, researcher's surnames, mentions of previous methods, and distance relative to nearest explicit citation sentences, etc., reaching over 50% performance. However, most previous works have been conducted on English. As for Korean, a rule-based method using positive/negative clue patterns was reported to attain the performance of 42%, requiring further improvement. This study attempted to learn to recognize implicit citation sentences from Korean literatures' full-text using Korean lexical features. Different lexical feature units such as Eojeol, morpheme, and Eumjeol were evaluated to determine proper lexical features for Korean implicit citation sentence recognition. In addition, lexical features were combined with the position features representing backward/forward proximities to explicit citation sentences, improving the performance up to over 50%.

Exploring the Objectives and Contents of Global Citizenship Education in the NSFCS 3.0: Focusing on the View of the 'World' and the Keywords (미국 국가 기준 가정과교육과정에 포함된 세계시민교육 관련 목표와 내용 탐색: '세계'관점과 핵심어를 중심으로)

  • Heo, Young-Sun;Kim, Nam-Eun;Chae, Jung Hyun
    • Journal of Korean Home Economics Education Association
    • /
    • v.33 no.3
    • /
    • pp.107-127
    • /
    • 2021
  • The purpose of this study is to examine the relationship between the content areas and competencies of the Family & Consumer Sciences National Standards(NSFCS 3.0) of the U. S. and UNESCO Global Citizenship Education(GCED). For this purpose, the global perspective, content areas and competencies in NSFCS 3.0 and the keywords related to the three areas of content areas of UNESCO GCED were analyzed. Specifically, the content standards and competencies related to the words 'world' or 'global' were extracted and their relationship to the GCED topics and keywords were analyzed. The results of the study are as follows. First, NSFCS 3.0 described the direct correlation between individuals and the world by recognizing individuals as global citizens in 14 areas except for 'interpersonal relations' and 'parenting', specifically using the keyword of 'world' in content standards and competencies. Second, in the content standards and competencies of NSFCS 3.0, the keywords related to the topics of GCED areas were presented evenly in the three areas of FCS, dietary habits, family life, and human development. The social and emotional areas were not presented in clothing, housing, and consumer life. On the other hand, the behavioral area, which is emphasized most in the GCED, is presented in all the FCS content areas. From this, it is apparent that the learning field for GCED may be considered as the area of life pursued by the home economics curriculum. The results of this study provide foundational bases for understanding the relationship between NSFCS 3.0 and the GCED, and implications as to how to implement the content of the GCED in the next revision of the national home economics curriculum of Korea.

Korean Semantic Role Labeling Using Case Frame Dictionary and Subcategorization (격틀 사전과 하위 범주 정보를 이용한 한국어 의미역 결정)

  • Kim, Wan-Su;Ock, Cheol-Young
    • Journal of KIISE
    • /
    • v.43 no.12
    • /
    • pp.1376-1384
    • /
    • 2016
  • Computers require analytic and processing capability for all possibilities of human expression in order to process sentences like human beings. Linguistic information processing thus forms the initial basis. When analyzing a sentence syntactically, it is necessary to divide the sentence into components, find obligatory arguments focusing on predicates, identify the sentence core, and understand semantic relations between the arguments and predicates. In this study, the method applied a case frame dictionary based on The Korean Standard Dictionary of The National Institute of the Korean Language; in addition, we used a CRF Model that constructed subcategorization of predicates as featured in Korean Lexical Semantic Network (UWordMap) for semantic role labeling. Automatically tagged semantic roles based on the CRF model, which established the information of words, predicates, the case-frame dictionary and hypernyms of words as features, were used. This method demonstrated higher performance in comparison with the existing method, with accuracy rate of 83.13% as compared to 81.2%, respectively.