• Title/Summary/Keyword: 주제 기반

Search Result 1,747, Processing Time 0.028 seconds

A Wikipedia-based Query Expansion Method for In-depth Blog Distillation (주제를 깊이 있게 다루는 블로그 피드 검색을 위한 위키피디아 기반 질의 확장 방법)

  • Song, Woo-Sang;Lee, Ye-Ha;Lee, Jong-Hyeok;Yang, Gi-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.11
    • /
    • pp.1121-1125
    • /
    • 2010
  • This paper proposes a Wikipedia-based feedback method for in-depth blog distillation whose goal is to find blogs that represent in-depth thoughts or analysis on a given query. The proposed method uses Wikipedia articles which are relevant to the query. TREC Blogs08 collection which is a large-scale blog corpus and English Wikipedia dump were used for experiments, The proposed method significantly increased the retrieval performance including MAP over the conventional post based feedback method.

Link-Based Clustering in Blogosphere (블로그 공간에서의 링크 기반 클러스터링 방안)

  • Song, Suk-Soon;Yoon, Seok-Ho;Kim, Sang-Wook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.372-374
    • /
    • 2009
  • 본 논문에서는 블로그 공간에 존재하는 블로거와 포스트들을 링크 기반 클러스터링을 통해 클러스터링하고자 한다. 먼저 기존 링크 기반 클러스터링 방안 중에서 블로거와 포스트들을 클러스터링하는데 가장 적합한 LinkClus를 선택한다. LinkClus를 블로그 공간에 적용하기 위해서 블로거와 포스트를 각각 하나의 타입으로, 블로거와 포스트 사이의 액션을 링크로 사상한다. 정확한 클러스터링을 위하여 클러스터의 대상을 여러 주제에 관심을 가지는 블로거 대신 하나의 주제만을 나타내는 폴더로 한다. 또한 노이즈의 발생 가능성을 높이는 링크가 아주 적은 블로거와 포스트를 클러스터링 과정에서 제외 시킨다. 실험을 통하여 제안하는 방안을 이용한 클러스터링 결과가 내용적으로도 유사한지 검증한다.

Keyword Based Conversation Generation using Large Language Model (Large Language Model을 활용한 키워드 기반 대화 생성)

  • Juhwan Lee;Tak-Sung Heo;Jisu Kim;Minsu Jeong;Kyounguk Lee;Kyungsun Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.19-24
    • /
    • 2023
  • 자연어 처리 분야에서 데이터의 중요성이 더욱 강조되고 있으며, 특히 리소스가 부족한 도메인에서 데이터 부족 문제를 극복하는 방법으로 데이터 증강이 큰 주목을 받고 있다. 이 연구는 대규모 언어 모델(Large Language Model, LLM)을 활용한 키워드 기반 데이터 증강 방법을 제안하고자 한다. 구체적으로 한국어에 특화된 LLM을 활용하여 주어진 키워드를 기반으로 특정 주제에 관한 대화 내용을 생성하고, 이를 통해 대화 주제를 분류하는 분류 모델의 성능 향상을 입증했다. 이 연구 결과는 LLM을 활용한 데이터 증강의 유의미성을 입증하며, 리소스가 부족한 상황에서도 이를 활용할 수 있는 방법을 제시한다.

  • PDF

A Term Weight Mensuration based on Popularity for Search Query Expansion (검색 질의 확장을 위한 인기도 기반 단어 가중치 측정)

  • Lee, Jung-Hun;Cheon, Suh-Hyun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.8
    • /
    • pp.620-628
    • /
    • 2010
  • With the use of the Internet pervasive in everyday life, people are now able to retrieve a lot of information through the web. However, exponential growth in the quantity of information on the web has brought limits to online search engines in their search performance by showing piles and piles of unwanted information. With so much unwanted information, web users nowadays need more time and efforts than in the past to search for needed information. This paper suggests a method of using query expansion in order to quickly bring wanted information to web users. Popularity based Term Weight Mensuration better performance than the TF-IDF and Simple Popularity Term Weight Mensuration to experiments without changes of search subject. When a subject changed during search, Popularity based Term Weight Mensuration's performance change is smaller than others.

Keyword Extraction from News Corpus using Modified TF-IDF (TF-IDF의 변형을 이용한 전자뉴스에서의 키워드 추출 기법)

  • Lee, Sung-Jick;Kim, Han-Joon
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.59-73
    • /
    • 2009
  • Keyword extraction is an important and essential technique for text mining applications such as information retrieval, text categorization, summarization and topic detection. A set of keywords extracted from a large-scale electronic document data are used for significant features for text mining algorithms and they contribute to improve the performance of document browsing, topic detection, and automated text classification. This paper presents a keyword extraction technique that can be used to detect topics for each news domain from a large document collection of internet news portal sites. Basically, we have used six variants of traditional TF-IDF weighting model. On top of the TF-IDF model, we propose a word filtering technique called 'cross-domain comparison filtering'. To prove effectiveness of our method, we have analyzed usefulness of keywords extracted from Korean news articles and have presented changes of the keywords over time of each news domain.

  • PDF

An Analysis of the Architectural and Urban Space through Practical Issues of Landscape Urbanism (랜드스케이프 어바니즘의 실천적 주제에 의한 건축도시 공간 분석)

  • Kim, Min-Kyung
    • Journal of Korean Association for Spatial Structures
    • /
    • v.9 no.3
    • /
    • pp.83-92
    • /
    • 2009
  • This study tries to feel out the possibility of experimental areas of architecture which is becoming urbanization gradually through practical issues of landscape urbanism. According to this study, it is affirmed that the practical issues of landscape urbanism such as process in time, Potential territoriality of surface, practical methodology and socio-cultural imagination already have rome into the architectural area over urbanism. This is presented as characteristics of detail issues through relational density on the matrix of landscape urbanism, traces of the general landscape concepts and architectural landscapeconcepts. As a result, these practical issues and detail architectural strategies constitute the important spatial characteristics of contemporary architecture.

  • PDF

A Research on Utilization of KDC Based on Literary Warrant (문헌적 근거에 기반한 한국십진분류법(KDC) 활용현황에 대한 연구)

  • Kim, Sungwon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.55 no.2
    • /
    • pp.25-50
    • /
    • 2021
  • General-purpose classification scheme encompasses all subject areas, While the whole classification scheme is constructed by library studies experts, structure and preparation of each specific subject area's classification should be referenced to that specific subject. In order for the whole system to be practical and useful classification scheme, not just a simple collection of each subject area's scheme, it is necessary to set the rule for properly distributing the amount of classification items, and the collections assigned to these items. The rule to set the distribution of items based on the amount of document collections is called 'literary warrant'. This study examines actual status of assignment of each classification items to information resources, as a result of application of Korean Decimal Classification, and then suggests a way to improve these practices.

A Study on the Theme Selection and Prototype Production for the LX Information Map Service (LX의 정보지도 서비스를 위한 주제선정 및 시범제작)

  • Jeong, Dong-Hoon;Bae, Sang-Keun;Lee, Seong-Gyu
    • Journal of Cadastre & Land InformatiX
    • /
    • v.45 no.1
    • /
    • pp.123-135
    • /
    • 2015
  • In order to satisfy the high expectations of consumers for a variety of consumer's desired subject area, information could be provided in the form of a map according to the analysis information. With the name change in 2015, LX would intend to play a role in building the information infrastructure that can be supported government policy as an intermediary between the government and private sector. Therefore, in this study, we would like to propose a plan that provide personalized information to the consumer. Through compositing a variety of time-series data(inner or outer of LX) based on public information, and analyzing spatially and temporally the rapidly changing land status. For these purpose, prior research and domestic or abroad thematic map service about thematic map making were reviewed. And the reason why the LX makes information map was presented. Also, themes of 3 field were selected, and depending on the data processing or analysis level and theme were subdivided, and then production and expression method were proposed.

An Analysis of News Report Characteristics on Archives & Records Management for the Press in Korea: Based on 1999~2018 News Big Data (뉴스 빅데이터를 이용한 우리나라 언론의 기록관리 분야 보도 특성 분석: 1999~2018 뉴스를 중심으로)

  • Han, Seunghee
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.3
    • /
    • pp.41-75
    • /
    • 2018
  • The purpose of this study is to analyze the characteristics of Korean media on the topic of archives & records management based on time-series analysis. In this study, from January, 1999 to June, 2018, 4,680 news articles on archives & records management topics were extracted from BigKinds. In order to examine the characteristics of the media coverage on the archives & records management topic, this study was analyzed to the difference of the press coverage by period, subject, and type of the media. In addition, this study was conducted word-frequency based content analysis and semantic network analysis to investigate the content characteristics of media on the subject. Based on these results, this study was analyzed to the differences of media coverage by period, subject, and type of media. As a result, the news in the field of records management showed that there was a difference in the amount of news coverage and news contents by period, subject, and type of media. The amount of news coverage began to increase after the Presidential Records Management Act was enacted in 2007, and the largest amount of news was reported in 2013. Daily newspapers and financial newspapers reported the largest amount of news. As a result of analyzing news reports, during the first 10 years after 1999, news topics were formed around the issues arising from the application and diffusion process of the concept of archives & records management. However, since the enactment of the Presidential Records Management Act, archives & records management has become a major factor in political and social issues, and a large amount of political and social news has been reported.