• Title/Summary/Keyword: 토픽 추출

Search Result 209, Processing Time 0.024 seconds

Collaborative Filtering Using Topic Models for Rating Based Recommender Systems (평점 기반 추천시스템을 위한 토픽 모델 협업필터링)

  • Kim, Kwang-Seob;Jung, Ho-Gyeong;Lee, Hyun-Jong;Lee, Hyung-Joon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.381-383
    • /
    • 2012
  • 협업필터링은 지금까지 많은 추천시스템 연구에서 비교대상이 되거나 더 좋은 추천시스템 방법론을 개발하기 위해서 응용되고 있다. 일반적으로 협업필터링 기법은 명시적으로 관찰된 사용자들의 행동을 기반하는 방법이다. 본 연구에서는 LDA(Latent Dirichlet Allocation)을 이용해 사용자와 추천 대상이 되는 아이템의 숨겨진 특성을 추출하고, 이를 협업필터링기법에 응용했다. 영화 추천시스템 구축을 위한 실험에서, 사용자의 선호도는 다양한 영화 장르를 선호하는 비율로 나타난다는 가정(사용자기반)과 영화 또한 장르의 비율로 표현이 된다는 가정(아이템기반)을 했다. 이러한 가정을 토대로 사용자 사이와 영화 사이 간의 유사도를 정의하고, 협업필터링에 적용했을 때, 전통적인 협업필터링 기법보다 뛰어난 결과를 얻을 수 있었다.

A Study on MARC Based Topic Map (Topic Map 기반의 MARC 적용 방안 연구)

  • Jang, Hwa-Su;Ko, Il-Ju
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2008.06a
    • /
    • pp.309-315
    • /
    • 2008
  • 문헌정보처리 표준화도구인 MARC는 포멧의 문제점과 다양한 웹자원 메타데이터 정보조직의 문제점으로 인하여 웹 기반의 XML표준 포멧의 도입을 시도하였고, MARCXML로 변환되어 시스템간 상호운용되고 있으나, MARCXML은 서지정보의 의미특성이나 메타데이터의 표현을 고려하지 않고 단순히 MARC 레코드의 표현을 XML 구조로 변환한 것일 뿐이다. 시맨틱의 핵심기술로 부각되고 있는 Topic Map은 XML기반의 표준기술언어인 ISO의 XTM을 이용해 정보와 지식의 분산 관리를 지원하는 기술이다. 학술정보자원에 대한 DB 구축 시 Topic Map언어인 XTM을 이용한다면 이미 개발된 여러 메타데이터 등을 한곳으로 통합하면서도 신축성과 확장성을 제공하는 것이 용이하게 된다. 하지만, 기존 시스템에서 새로운 Topic Map을 구축하는 것은 많은 비용과 시간이 소요되는 등 어려운 일이다. 본 연구에서는 기 구축된 학술DB로부터 Topic Map에서 재활용할 수 있는 요소들을 추출하기 위한 정보 소스로서 데이터베이스 스키마와 MARC에서 언급하는 메타데이터를 이용하는 것은, XML의 특징인 시스템간 상호운용성을 확보함과 동시에 기초 학문자료의 복잡한 관계의 개념구조, 자료유형 및 자료간의 의미적 상관관계 등을 표현에 있어 효율적인 개발방법임을 제안한다.

  • PDF

Twitter Data Analysis System using LDA model (LDA 모델을 이용한 트위터 데이터 분석 시스템)

  • Lee, Il Seob;Jang, Jeong Hyeon;Yoo, Kwan-Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.389-390
    • /
    • 2017
  • 현재 많은 사용자들이 모바일 기기를 통해 소셜 네트워크 서비스(이하 SNS)를 이용하고 있으며, SNS를 통해 수많은 데이터가 생성되고 있다. SNS상의 정보는 다양하고 신속하게 다루어지기 때문에 시대의 주요 사건을 잘 표현한다. 본 논문은 2015년 1월부터 2017년 8월까지의 약 191만개의 트위터 데이터를 수집한 후, LDA 모델링을 통해 주요 키워드를 추출하고 시대별 주요 토픽과 단어를 파악할 수 있는 시스템을 제안한다.

A Similarity-based Dialogue Modeling with Case Frame and Word Embedding (격틀과 워드 임베딩을 활용한 유사도 기반 대화 모델링)

  • Lee, Hokyung;Bae, Kyoungman;Ko, Youngjoong
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.220-225
    • /
    • 2016
  • 본 논문에서는 격틀과 워드 임베딩을 활용한 유사도 기반 대화 모델링을 제안한다. 기존의 유사도 기반 대화 모델링 방법은 형태소, 형태소 표지, 개체명, 토픽 자질, 핵심단어 등을 대화 말뭉치에서 추출하여 BOW(Bag Of Words) 자질로 사용하였기 때문에 입력된 사용자 발화에 포함된 단어들의 주어, 목적어와 같은 문장성분들의 위치적 역할을 반영할 수 가 없다. 또한, 의미적으로 유사하지만 다른 형태소를 가지는 문장 성분들의 경우 유사도 계산에 반영되지 않는 형태소 불일치 문제가 존재한다. 이러한 문제점을 해결하기 위해서, 위치적 정보를 반영하기 위한 문장성분 기반의 격틀과 형태소 불일치 문제를 해결하기 위한 워드임베딩을 활용하여 개선된 유사도 기반 대화 모델링을 제안한다. 개선된 유사도 기반 대화 모델링은 MRR 성능 약 92%의 성능을 나타낸다.

  • PDF

Analysis of Research Trends in SIAM Journal on Applied Mathematics Using Topic Modeling (토픽모델링을 활용한 SIAM Journal on Applied Mathematics의 연구 동향 분석)

  • Kim, Sung-Yeun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.7
    • /
    • pp.607-615
    • /
    • 2020
  • The purpose of this study was to analyze the research status and trends related to the industrial mathematics based on text mining techniques with a sample of 4910 papers collected in the SIAM Journal on Applied Mathematics from 1970 to 2019. The R program was used to collect titles, abstracts, and key words from the papers and to analyze topic modeling techniques based on LDA algorithm. As a result of the coherence score on the collected papers, 20 topics were determined optimally using the Gibbs sampling methods. The main results were as follows. First, studies on industrial mathematics were conducted in a variety of mathematics fields, including computational mathematics, geometry, mathematical modeling, topology, discrete mathematics, probability and statistics, with a focus on analysis and algebra. Second, 5 hot topics (mathematical biology, nonlinear partial differential equation, discrete mathematics, statistics, topology) and 1 cold topic (probability theory) were found based on time series regression analysis. Third, among the fields that were not reflected in the 2015 revised mathematics curriculum, numeral system, matrix, vector in space, and complex numbers were extracted as the contents to be covered in the high school mathematical curriculum. Finally, this study suggested strategies to activate industrial mathematics in Korea, described the study limitations, and proposed directions for future research.

A Comparative Analysis Study of IFLA School Library Guidelines Using Semantic Network Analysis (언어 네트워크 분석을 통한 IFLA의 학교도서관 가이드라인 비교·분석에 관한 연구)

  • Lee, Byeong-Kee
    • Journal of Korean Library and Information Science Society
    • /
    • v.51 no.2
    • /
    • pp.1-21
    • /
    • 2020
  • The purpose of this study is to explore semantic characteristics of IFLA school library guidelines through network analysis. There are two versions, 2002 edition and 2015 revision of the guidelines. This study analyzed the 2002 edition and 2015 revision of the IFLA school library guidelines view point of semantic network, and compared characteristics of two versions. The keywords were to extracted from two texts, semantic network were composed based on co-occurrence relations with keywords. The centrality(degree centrality, closeness centrality, betweenness centrality) was analyzed from the network. In addition, this study conducted topic modeling analysis using LDA function of NetMiner4.0. The result of this study is following these. First, When comparing the centrality, the 'Program, Teaching, Reading, Inquiry, Literacy, Media' keyword was higher in the 2015 revision than in the 2002 edition. Second, 'Inquiry' in degree centrality and 'Achievement' in closeness centrality which were not included in the 2002 edition top-ranked keyword list, have new appeared in 2015 revision. third, As a result of the analysis of topic modeling, compared to the 2002 version, the importance of topics on programs and services, teaching and learning activities of librarian teacher, and media and information literacy is increasing in the 2015 revision.

A Study on the Analysis of Related Information through the Establishment of the National Core Technology Network: Focused on Display Technology (국가핵심기술 관계망 구축을 통한 연관정보 분석연구: 디스플레이 기술을 중심으로)

  • Pak, Se Hee;Yoon, Won Seok;Chang, Hang Bae
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.2
    • /
    • pp.123-141
    • /
    • 2021
  • As the dependence of technology on the economic structure increases, the importance of National Core Technology is increasing. However, due to the nature of the technology itself, it is difficult to determine the scope of the technology to be protected because the scope of the relation is abstract and information disclosure is limited due to the nature of the National Core Technology. To solve this problem, we propose the most appropriate literature type and method of analysis to distinguish important technologies related to National Core Technology. We conducted a pilot test to apply TF-IDF, and LDA topic modeling, two techniques of text mining analysis for big data analysis, to four types of literature (news, papers, reports, patents) collected with National Core Technology keywords in the field of Display industry. As a result, applying LDA theme modeling to patent data are highly relevant to National Core Technology. Important technologies related to the front and rear industries of displays, including OLEDs and microLEDs, were identified, and the results were visualized as networks to clarify the scope of important technologies associated with National Core Technology. Throughout this study, we have clarified the ambiguity of the scope of association of technologies and overcome the limited information disclosure characteristics of national core technologies.

Exploring Issues Related to the Metaverse from the Educational Perspective Using Text Mining Techniques - Focusing on News Big Data (텍스트마이닝 기법을 활용한 교육관점에서의 메타버스 관련 이슈 탐색 - 뉴스 빅데이터를 중심으로)

  • Park, Ju-Yeon;Jeong, Do-Heon
    • Journal of Industrial Convergence
    • /
    • v.20 no.6
    • /
    • pp.27-35
    • /
    • 2022
  • The purpose of this study is to analyze the metaverse-related issues in the news big data from an educational perspective, explore their characteristics, and provide implications for the educational applicability of the metaverse and future education. To this end, 41,366 cases of metaverse-related data searched on portal sites were collected, and weight values of all extracted keywords were calculated and ranked using TF-IDF, a representative term weight model, and then word cloud visualization analysis was performed. In addition, major topics were analyzed using topic modeling(LDA), a sophisticated probability-based text mining technique. As a result of the study, topics such as platform industry, future talent, and extension in technology were derived as core issues of the metaverse from an educational perspective. In addition, as a result of performing secondary data analysis under three key themes of technology, job, and education, it was found that metaverse has issues related to education platform innovation, future job innovation, and future competency innovation in future education. This study is meaningful in that it analyzes a vast amount of news big data in stages to draw issues from an education perspective and provide implications for future education.

A Study on the Trends in the Studies on Marine Spatial Planning: Focusing on Topic Modeling (해양공간계획 연구동향 분석 연구: 토픽 모델링을 중심으로)

  • Hwang, Kyu Won;Jang, Ah Reum;Lee, Moon Suk
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.27 no.7
    • /
    • pp.954-966
    • /
    • 2021
  • With regards to the marine spatial plannings of the world, the spaces are being managed through the integration of various uses and the establishment of systems and laws in the perspective of the utilization of spaces. In the perspective of policy establishment, the policy readiness level is applied to analyze the trends in the studies on South Korea's marine spatial plans. The scope of the study included analyzing marine spatial plan as a keyword in articles published over the period from 2010 to 2020. The methods of analysis included the analyses of the frequency of word appearance, word clouds, and appearance intensity, which were used to identify key issues. Five keywords that were related to the topics were identified, and were again used to identify the key themes. The core themes were changing in all phases, such as the principles development phase, institutionalization phase, policy verification phase. For future benefit, this requires more research in South Korean public organizations and universities.

Analysis of Changes in Restaurant Attributes According to the Spread of Infectious Diseases: Application of Text Mining Techniques (감염병 확산에 따른 레스토랑 선택속성 변화 분석: 텍스트마이닝 기법 적용)

  • Joonil Yoo;Eunji Lee;Chulmo Koo
    • Information Systems Review
    • /
    • v.25 no.4
    • /
    • pp.89-112
    • /
    • 2023
  • In March 2020, as it was declared a COVID-19 pandemic, various quarantine measures were taken. Accordingly, many changes have occurred in the tourism and hospitality industries. In particular, quarantine guidelines, such as the introduction of non-face-to-face services and social distancing, were implemented in the restaurant industry. For decades, research on restaurant attributes has emphasized the importance of three attributes: atmosphere, service quality, and food quality. Nevertheless, to the best of our knowledge, research on restaurant attributes considering the COVID-19 situation is insufficient. To respond to this call, this study attempted an exploratory approach to classify new restaurant attributes based on understanding environmental changes. This study considered 31,115 online reviews registered in Naverplace as an analysis unit, with 475 general restaurants located in Euljiro, Seoul. Further, we attempted to classify restaurant attributes by clustering words within online reviews through TF-IDF and LDA topic modeling techniques. As a result of the analysis, the factors of "prevention of infectious diseases" were derived as new attributes of restaurants in the context of COVID-19 situations, along with the atmosphere, service quality, and food quality. This study is of academic significance by expanding the literature of existing restaurant attributes in that it categorized the three attributes presented by existing restaurant attributes and further presented new attributes. Moreover, the analysis results have led to the formulation of practical recommendations, considering both the operational aspects of restaurants and policy implications.