• Title/Summary/Keyword: 자동 키워드추출

Search Result 108, Processing Time 0.023 seconds

A Technique to Link Bug and Commit Report based on Commit History (커밋 히스토리에 기반한 버그 및 커밋 연결 기법)

  • Chae, Youngjae;Lee, Eunjoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.5
    • /
    • pp.235-239
    • /
    • 2016
  • 'Commit-bug link', the link between commit history and bug reports, is used for software maintenance and defect prediction in bug tracking systems. Previous studies have shown that the links are automatically detected based on text similarity, time interval, and keyword. Existing approaches depend on the quality of commit history and could thus miss several links. In this paper, we proposed a technique to link commit and bug report using not only messages of commit history, but also the similarity of files in the commit history coupled with bug reports. The experimental results demonstrated the applicability of the suggested approach.

ICLAL: In-Context Learning-Based Audio-Language Multi-Modal Deep Learning Models (ICLAL: 인 컨텍스트 러닝 기반 오디오-언어 멀티 모달 딥러닝 모델)

  • Jun Yeong Park;Jinyoung Yeo;Go-Eun Lee;Chang Hwan Choi;Sang-Il Choi
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.514-517
    • /
    • 2023
  • 본 연구는 인 컨택스트 러닝 (In-Context Learning)을 오디오-언어 작업에 적용하기 위한 멀티모달 (Multi-Modal) 딥러닝 모델을 다룬다. 해당 모델을 통해 학습 단계에서 오디오와 텍스트의 소통 가능한 형태의 표현 (Representation)을 학습하고 여러가지 오디오-텍스트 작업을 수행할 수 있는 멀티모달 딥러닝 모델을 개발하는 것이 본 연구의 목적이다. 모델은 오디오 인코더와 언어 인코더가 연결된 구조를 가지고 있으며, 언어 모델은 6.7B, 30B 의 파라미터 수를 가진 자동회귀 (Autoregressive) 대형 언어 모델 (Large Language Model)을 사용한다 오디오 인코더는 자기지도학습 (Self-Supervised Learning)을 기반으로 사전학습 된 오디오 특징 추출 모델이다. 언어모델이 상대적으로 대용량이기 언어모델의 파라미터를 고정하고 오디오 인코더의 파라미터만 업데이트하는 프로즌 (Frozen) 방법으로 학습한다. 학습을 위한 과제는 음성인식 (Automatic Speech Recognition)과 요약 (Abstractive Summarization) 이다. 학습을 마친 후 질의응답 (Question Answering) 작업으로 테스트를 진행했다. 그 결과, 정답 문장을 생성하기 위해서는 추가적인 학습이 필요한 것으로 보였으나, 음성인식으로 사전학습 한 모델의 경우 정답과 유사한 키워드를 사용하는 문법적으로 올바른 문장을 생성함을 확인했다.

A Study of Relationship Derivation Technique using object extraction Technique (개체추출기법을 이용한 관계성 도출기법)

  • Kim, Jong-hee;Lee, Eun-seok;Kim, Jeong-su;Park, Jong-kook;Kim, Jong-bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.309-311
    • /
    • 2014
  • Despite increasing demands for big data application based on the analysis of scattered unstructured data, few relevant studies have been reported. Accordingly, the present study suggests a technique enabling a sentence-based semantic analysis by extracting objects from collected web information and automatically analyzing the relationships between such objects with collective intelligence and language processing technology. To be specific, collected information is stored in DBMS in a structured form, and then morpheme and feature information is analyzed. Obtained morphemes are classified into objects of interest, marginal objects and objects of non-interest. Then, with an inter-object attribute recognition technique, the relationships between objects are analyzed in terms of the degree, scope and nature of such relationships. As a result, the analysis of relevance between the information was based on certain keywords and used an inter-object relationship extraction technique that can determine positivity and negativity. Also, the present study suggested a method to design a system fit for real-time large-capacity processing and applicable to high value-added services.

  • PDF

Boolean Query Formulation From Korean Natural Language Queries using Syntactic Analysis (구문분석에 기반한 한글 자연어 질의로부터의 불리언 질의 생성)

  • Park, Mi-Hwa;Won, Hyeong-Seok;Lee, Geun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1219-1229
    • /
    • 1999
  • 일반적으로 AND, OR, NOT과 같은 연산자를 사용하는 불리언 질의는 사용자의 검색의도를 정확하게 표현할 수 있기 때문에 검색 전문가들은 불리언 질의를 사용하여 높은 검색성능을 얻는다고 알려져 있지만, 일반 사용자는 자신이 원하는 정보를 불리언 형태로 표현하는데 익숙하지 않다. 본 논문에서는 검색성능의 향상과 사용자 편의성을 동시에 만족하기 위하여 사용자의 자연어 질의를 확장 불리언 질의로 자동 변환하는 방법론을 제안한다. 먼저 자연어 질의를 범주문법에 기반한 구문분석을 수행하여 구문트리를 생성하고 연산자 및 키워드 정보를 추출하여 구문트리를 간략화한다. 다음으로 간략화된 구문트리로부터 명사구를 합성하고 키워드들에 대한 가중치를 부여한 후 불리언 질의를 생성하여 검색을 수행한다. 또한 구문분석의 오류로 인한 검색성능 저하를 최소화하기 위하여 상위 N개 구문트리에 대해 각각 불리언 질의를 생성하여 검색하는 N-BEST average 방법을 제안하였다. 정보검색 실험용 데이타 모음인 KTSET2.0으로 실험한 결과 제안된 방법은 수동으로 추출한 불리언 질의보다 8% 더 우수한 성능을 보였고, 기존의 벡터공간 모델에 기반한 자연어질의 시스템에 비해 23% 성능향상을 보였다. Abstract There have been a considerable evidence that trained users can achieve a good search effectiveness through a boolean query because a structural boolean query containing operators such as AND, OR, and NOT can make a more accurate representation of user's information need. However, it is not easy for ordinary users to construct a boolean query using appropriate boolean operators. In this paper, we propose a boolean query formulation method that automatically transforms a user's natural language query into a extended boolean query for both effectiveness and user convenience. First, a user's natural language query is syntactically analyzed using KCCG(Korean Combinatory Categorial Grammar) parser and resulting syntactic trees are structurally simplified using a tree-simplifying mechanism in order to catch the logical relationships between keywords. Next, in a simplified tree, plausible noun phrases are identified and added into the same tree as new additional keywords. Finally, a simplified syntactic tree is automatically converted into a boolean query using some mapping rules and linguistic heuristics. We also propose an N-BEST average method that uses top N syntactic trees to compensate for bad effects of single incorrect top syntactic tree. In experiments using KTSET2.0, we showed that a proposed method outperformed a traditional vector space model by 23%, and surprisingly manually constructed boolean queries by 8%.

User Preference based Intelligent Program Guide (사용자 선호도 기반 지능형 프로그램 가이드)

  • 류지웅;김문철;남제호;강경옥;김진웅
    • Journal of Broadcast Engineering
    • /
    • v.7 no.2
    • /
    • pp.153-167
    • /
    • 2002
  • With the advent of digital broadcasting, a large number of program channels become available at the user terminals such as set-top-box or PC. Channel navigation and searching become more difficult at TV terminal sides using a conventional device such as a TV remote controller. The MPEG-7 MDS (Multimedia Description Scheme) and TV Anytime set up a standard about how to describe user preferences for genre, channel, actor/actress, keyword, etc. of the TV programs, and how to describe usage history for user's program consumption behaviors and preferences. But they do not describe how to use them. In this paper, we describe an IPG (Intelligent Program Guider) system that provides TV program and channel information based on user preferences and suggest easy access to TV program that user wants. The IPG monitors user's behaviors of interacting to programs and automatically updates the user's preference changes according1y. The IPG utilizes user preferences description scheme specified in both MPEG-7 MDS and TV Anytime metadata specifications.

Intelligent Web Crawler for Supporting Big Data Analysis Services (빅데이터 분석 서비스 지원을 위한 지능형 웹 크롤러)

  • Seo, Dongmin;Jung, Hanmin
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.12
    • /
    • pp.575-584
    • /
    • 2013
  • Data types used for big-data analysis are very widely, such as news, blog, SNS, papers, patents, sensed data, and etc. Particularly, the utilization of web documents offering reliable data in real time is increasing gradually. And web crawlers that collect web documents automatically have grown in importance because big-data is being used in many different fields and web data are growing exponentially every year. However, existing web crawlers can't collect whole web documents in a web site because existing web crawlers collect web documents with only URLs included in web documents collected in some web sites. Also, existing web crawlers can collect web documents collected by other web crawlers already because information about web documents collected in each web crawler isn't efficiently managed between web crawlers. Therefore, this paper proposed a distributed web crawler. To resolve the problems of existing web crawler, the proposed web crawler collects web documents by RSS of each web site and Google search API. And the web crawler provides fast crawling performance by a client-server model based on RMI and NIO that minimize network traffic. Furthermore, the web crawler extracts core content from a web document by a keyword similarity comparison on tags included in a web documents. Finally, to verify the superiority of our web crawler, we compare our web crawler with existing web crawlers in various experiments.

Automatic Electronic Medical Record Generation System using Speech Recognition and Natural Language Processing Deep Learning (음성인식과 자연어 처리 딥러닝을 통한 전자의무기록자동 생성 시스템)

  • Hyeon-kon Son;Gi-hwan Ryu
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.731-736
    • /
    • 2023
  • Recently, the medical field has been applying mandatory Electronic Medical Records (EMRs) and Electronic Health Records (EHRs) systems that computerize and manage medical records, and distributing them throughout the entire medical industry to utilize patients' past medical records for additional medical procedures. However, the conversations between medical professionals and patients that occur during general medical consultations and counseling sessions are not separately recorded or stored, so additional important patient information cannot be efficiently utilized. Therefore, we propose an electronic medical record system that uses speech recognition and natural language processing deep learning to store conversations between medical professionals and patients in text form, automatically extracts and summarizes important medical consultation information, and generates electronic medical records. The system acquires text information through the recognition process of medical professionals and patients' medical consultation content. The acquired text is then divided into multiple sentences, and the importance of multiple keywords included in the generated sentences is calculated. Based on the calculated importance, the system ranks multiple sentences and summarizes them to create the final electronic medical record data. The proposed system's performance is verified to be excellent through quantitative analysis.

A Study on Organizing the Web Using Facet Analysis (패싯 분석을 이용한 웹 자원의 조직)

  • Yoo, Yeong-Jun
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.15 no.1
    • /
    • pp.23-41
    • /
    • 2004
  • In indexing and organizing Web resources, there have been two basic methods: automatic indexing by extracting key words and library classification schemes or subject directories of search engines. But, both methods have failed to satisfy the user's information needs, due to the lack of standard criteria and the irrationality of its structural system. In this paper I have examined the limits of library classification scheme's structures and the problems related to the nature of Web resources such as specificity and exhaustivity. I have also attempted to explain the logicality of Web resources organization by facet analysis and its strengths and limitations. In so doing, I have proposed three specific methods in using facet analysis: firstly, indexing system by facet analysis; secondly, the alternative transformation of the enumerative classification scheme into facet classification scheme; and finally, the facet model of subject directory of domestic search engine. After examining the three methods, my study concludes that a controlled vocabulary by facet analysis can be employed as a useful method in organizing Web resources.

  • PDF

Automatic Determination of Usenet News Groups from User Profile (사용자 프로파일에 기초한 유즈넷 뉴스그룹 자동 결정 방법)

  • Kim, Jong-Wan;Cho, Kyu-Cheol;Kim, Hee-Jae;Kim, Byeong-Man
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.142-149
    • /
    • 2004
  • It is important to retrieve exact information coinciding with user's need from lots of Usenet news and filter desired information quickly. Differently from email system, we must previously register our interesting news group if we want to get the news information. However, it is not easy for a novice to decide which news group is relevant to his or her interests. In this work, we present a service classifying user preferred news groups among various news groups by the use of Kohonen network. We first extract candidate terms from example documents and then choose a number of representative keywords to be used in Kohonen network from them through fuzzy inference. From the observation of training patterns, we could find the sparsity problem that lots of keywords in training patterns are empty. Thus, a new method to train neural network through reduction of unnecessary dimensions by the statistical coefficient of determination is proposed in this paper. Experimental results show that the proposed method is superior to the method using every dimension in terms of cluster overlap defined by using within cluster distance and between cluster distance.

Medicine Ontology Building based on Semantic Relation and Its Application (의미관계 정보를 이용한 약품 온톨로지의 구축과 활용)

  • Lim Soo-Yeon;Park Seong-Bae;Lee Sang-Jo
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.428-437
    • /
    • 2005
  • An ontology consists of a set and definition of concepts that represents the characteristics of a given domain and relationship between the elements. To reduce time-consuming and cost in building ontology, this paper proposes a semiautomatic method to build a domain ontology using the results of text analysis. To do this, we Propose a terminology processing method and use the extracted concepts and semantic relations between them to build ontology. An experiment domain is selected by the pharmacy field and the built ontology is applied to document retrieval. In order to represent usefulness for retrieving a document using the hierarchical relations in ontology, we compared a typical keyword based retrieval method with an ontology based retrieval method, which uses related information in an ontology for a related feedback. As a result, the latter shows the improvement of precision and recall by $4.97\%$ and $0.78\%$ respectively.