• 제목/요약/키워드: Term frequency-inverse document frequency

검색결과 91건 처리시간 0.029초

유사과제파악을 위한 검색 알고리즘의 개발에 관한 연구 (A Study on the Development of Search Algorithm for Identifying the Similar and Redundant Research)

  • 박동진;최기석;이명선;이상태
    • 한국콘텐츠학회논문지
    • /
    • 제9권11호
    • /
    • pp.54-62
    • /
    • 2009
  • 국가적으로 그리고 각 연구기관에서는 투자의 효율성을 기하기 위하여 연구사업 선정과정에서 데이터베이스로부터 중복과제 혹은 유사과제를 검색하는 과정을 거친다. 최근 부얼리언 기반의 키워드 매칭 검색알고리즘의 발전 및 이를 채택한 검색엔진의 개발로 인하여 검색의 정확도가 많이 향상되었지만, 사용자가 입력하는 제한된 수의 키워드들에 의한 검색은 유사과제 파악과 우선순위의 결정에 어려움이 있다. 본 연구에서는 제안된 과제의 문서를 분석하여 다수의 색인어들을 추출하고, 이들에게 가중치를 부여한 후, 기존의 문서들과 비교하여 유사과제를 찾아내는 문서단위의 검색 알고리즘을 제안한다. 구체적으로 벡터공간검색(Vector-Space Retrieval)모델의 한 종류인 TFIDF(Term Frequency Inverse document Frequency)를 기본 구조로 채택한다. 또한 개발되는 알고리즘에는 연구과제 제안문서의 구조에 적합한 속성별 가중치(feature weighting)를 반영하고 검색속도의 향상을 위하여 K-최근접 문서(KNN: K-Nearest Neighbors) 기법도 반영한 알고리즘을 제시한다. 실험을 위하여 실제 연구제안 문서와 구조가 동일한 기존의 보고서를 사용하였는데, KISTI에서 운영하는 과학기술정보포털서비스인 NDSL에서 이미 분류해 놓은 4분야의 1,000 개 연구 보고서 문서를 발췌하여 실험을 하였다.

Incorporating Time Constraints into a Recommender System for Museum Visitors

  • Kovavisaruch, La-or;Sanpechuda, Taweesak;Chinda, Krisada;Wongsatho, Thitipong;Wisadsud, Sodsai;Chaiwongyen, Anuwat
    • Journal of information and communication convergence engineering
    • /
    • 제18권2호
    • /
    • pp.123-131
    • /
    • 2020
  • After observing that most tourists plan to complete their visits to multiple cultural heritage sites within one day, we surmised that for many museum visitors, the foremost thought is with regard to the amount of time is to be spent at each location and how they can maximize their enjoyment at a site while still balancing their travel itinerary? Recommendation systems in e-commerce are built on knowledge about the users' previous purchasing history; recommendation systems for museums, on the other hand, do not have an equivalent data source available. Recent solutions have incorporated advanced technologies such as algorithms that rely on social filtering, which builds recommendations from the nearest identified similar user. Our paper proposes a different approach, and involves providing dynamic recommendations that deploy social filtering as well as content-based filtering using term frequency-inverse document frequency. The main challenge is to overcome a cold start, whereby no information is available on new users entering the system, and thus there is no strong background information for generating the recommendation. In these cases, our solution deploys statistical methods to create a recommendation, which can then be used to gather data for future iterations. We are currently running a pilot test at Chao Samphraya national museum and have received positive feedback to date on the implementation.

SNS대상의 지능형 자연어 수집, 처리 시스템 구현을 통한 한국형 감성사전 구축에 관한 연구 (Research on Designing Korean Emotional Dictionary using Intelligent Natural Language Crawling System in SNS)

  • 이종화
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제29권3호
    • /
    • pp.237-251
    • /
    • 2020
  • Purpose The research was studied the hierarchical Hangul emotion index by organizing all the emotions which SNS users are thinking. As a preliminary study by the researcher, the English-based Plutchick (1980)'s emotional standard was reinterpreted in Korean, and a hashtag with implicit meaning on SNS was studied. To build a multidimensional emotion dictionary and classify three-dimensional emotions, an emotion seed was selected for the composition of seven emotion sets, and an emotion word dictionary was constructed by collecting SNS hashtags derived from each emotion seed. We also want to explore the priority of each Hangul emotion index. Design/methodology/approach In the process of transforming the matrix through the vector process of words constituting the sentence, weights were extracted using TF-IDF (Term Frequency Inverse Document Frequency), and the dimension reduction technique of the matrix in the emotion set was NMF (Nonnegative Matrix Factorization) algorithm. The emotional dimension was solved by using the characteristic value of the emotional word. The cosine distance algorithm was used to measure the distance between vectors by measuring the similarity of emotion words in the emotion set. Findings Customer needs analysis is a force to read changes in emotions, and Korean emotion word research is the customer's needs. In addition, the ranking of the emotion words within the emotion set will be a special criterion for reading the depth of the emotion. The sentiment index study of this research believes that by providing companies with effective information for emotional marketing, new business opportunities will be expanded and valued. In addition, if the emotion dictionary is eventually connected to the emotional DNA of the product, it will be possible to define the "emotional DNA", which is a set of emotions that the product should have.

팔요맥을 중심으로 살펴본 『동의보감』 27맥 속성 연구 (Properties of the Twenty-seven Pulses in DongUiBoGam Based on the Eight Important Pulses)

  • 이태형;정원모;고병호;박히준;김남일;채윤병
    • Korean Journal of Acupuncture
    • /
    • 제32권4호
    • /
    • pp.151-159
    • /
    • 2015
  • Objectives : Pulse diagnosis is considered particularly important among several methods of diagnosis in DongUiBoGam. In spite of its importance, numerous and various pulse descriptions made it difficult to learn and practice pulse diagnosis. In this article, we tried to analyze properties of the twenty-seven pulses from pulse diagnosis cases from DongUiBoGam to enable the practical understanding of pulse diagnosis. Methods : We constituted the four axis according to the eight important pulses. And we analyzed properties of the twenty-seven pulses through the relationship between the four pairs of important pulses and the twenty-seven pulses. To quantify the relevances of important pulses to the twenty-seven pulses, we used the term frequency-inverse document frequency(TF-IDF) method. Results : We could elicit properties of the twenty-seven pulses according to the four axis. Also, we reexamined the categorization of the seven exterior pulses / the eight interior pulses and the similar pulses from DongUiBoGam with the analysis results. Conclusions : We could understand properties of the twenty-seven pulses more specifically with the eight important pulses. And we also could see the relationship among the twenty-seven pulses on each axis. However, the limitation arising from insufficient number of pulse diagnosis cases in this research requires further research with more sources such as other traditional medical records or clinical records in the present time.

텍스트마이닝을 이용한 동의보감의 질병인식방식과 내경편 침구법 경혈 특성 분석 (A Structural Analysis of Acupuncture & Moxibustion Points in the NaeGyeong Chapter of DongUiBoGam Using Text Mining)

  • 이태형;정원모;이인선;이혜정;김남일;채윤병
    • Korean Journal of Acupuncture
    • /
    • 제30권4호
    • /
    • pp.230-242
    • /
    • 2013
  • Objectives : DongUiBoGam is a representative medical literature in Korea. This research intends to structurally grasp how DongUiBoGam understands the human body and review the methods of acupuncture and moxibustion in the NaeGyeong chapter of it using text mining. Methods : The structure of DongUiBoGam was analyzed with specific parts of the book that described contents, major premises of understanding the human body, and processes of treatment. We analyzed characteristics of each acupoints in a relationship with causes of diseases & symptoms in the NaeGyeong chapter using a Term Frequency - Inverse Document Frequency(TFIDF). Results : Three different categories of pattern identification(PI) were formed after structural analysis of DongUiBoGam. Every causes of diseases & symptoms were transformed according to the three categories of PI. After analyzing the relationship between acupoints and causes of diseases & symptoms, 114 acupoints were visualized with TFIDF values of three PI categories. Conclusions : The selection of acupoints in NaeGyeong chapter of DongUiBoGam were linked to causes of diseases & symptoms based on the three PI categories. Through visualization of bipartite relationships between acupoints and causes of diseases & symptoms, we could easily understand characteristics of each acupoint.

A Feasibility Study on Adopting Individual Information Cognitive Processing as Criteria of Categorization on Apple iTunes Store

  • Zhang, Chao;Wan, Lili
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제27권2호
    • /
    • pp.1-28
    • /
    • 2018
  • Purpose More than 7.6 million mobile apps could be approved on both Apple iTunes Store and Google Play. For managing those existed Apps, Apple Inc. established twenty-four primary categories, as well as Google Play had thirty-three primary categories. However, all of their categorizations have appeared more and more problems in managing and classifying numerous apps, such as app miscategorized, cross-attribution problems, lack of categorization keywords index, etc. The purpose of this study focused on introducing individual information cognitive processing as the classification criteria to update the current categorization on Apple iTunes Store. Meanwhile, we tried to observe the effectiveness of the new criteria from a classification process on Apple iTunes Store. Design/Methodology/Approach A research approach with four research stages were performed and a series of mixed methods was developed to identify the feasibility of adopting individual information cognitive processing as categorization criteria. By using machine-learning techniques with Term Frequency-Inverse Document Frequency and Singular Value Decomposition, keyword lists were extracted. By using the prior research results related to car app's categorization, we developed individual information cognitive processing. Further keywords extracting process from the extracted keyword lists was performed. Findings By TF-IDF and SVD, keyword lists from more than five thousand apps were extracted. Furthermore, we developed individual information cognitive processing that included a categorization teaching process and learning process. Three top three keywords for each category were extracted. By comparing the extracted results with prior studies, the inter-rater reliability for two different methods shows significant reliable, which proved the individual information cognitive processing to be reliable as criteria of categorization on Apple iTunes Store. The updating suggestions for Apple iTunes Store were discussed in this paper and the results of this paper may be useful for app store hosts to improve the current categorizations on app stores as well as increasing the efficiency of app discovering and locating process for both app developers and users.

감성분석을 활용한 사물인터넷(IoT) 서비스 리뷰 분석: 사업자 특성에 따른 차이를 중심으로 (An Analysis of IoT Service using Sentiment Analysis on Online Reviews: Focusing on the Characteristics of Service Providers)

  • 류민호;조호수
    • 한국산업정보학회논문지
    • /
    • 제25권5호
    • /
    • pp.91-102
    • /
    • 2020
  • 사물인터넷(Internet of Things: IoT)은 다양한 사업자들이 같은 시장을 두고 경쟁하는 분야로, 서비스를 제공하는 사업자들의 주 사업영역 및 특성에 따라 제품의 기능과 성능 등의 차이가 존재한다. 본 논문은 감성분석을 통해 사업자의 특성에 따라 해당 사업자가 제공하는 서비스에 대한 만족도가 달라지는지를 평가한다. 이를 위해, 구글 플레이스토어에 등록된 국내외 사물인터넷 관련 서비스 중 스마트홈, AI스피커, 스마트카 3가지 영역의 총 41개의 애플리케이션에 대한 34,310건의 리뷰를 수집하고 단어 중요도 분석과 감성분석을 실시하였다. 리뷰분석은 키워드 중요도별, 서비스별, 사업자의 기존 사업영역별, 국내외 사업자별 등 다양한 차원에서 수행되었다. 분석결과, 이용자들의 사물인터넷 서비스에 대한 전반적인 평가는 낮은 것으로 파악되었고, 그나마 스마트홈이 다른 서비스 대비 상대적으로 높은 평가를 받았다. 제조업 기반의 사업자와 해외 사업자는 다른 특성의 사업자 대비 만족도가 높았다. 본 연구의 결과를 통해 개별 기업 차원에서 특성에 따른 서비스의 개선점을 찾고, 제품경쟁력을 제고하기 위한 시사점을 제공한다.

TLS 마이닝을 이용한 '정보시스템연구' 동향 분석 (Analysis on the Trend of The Journal of Information Systems Using TLS Mining)

  • 윤지혜;오창규;이종화
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제31권1호
    • /
    • pp.289-304
    • /
    • 2022
  • Purpose The development of the network and mobile industries has induced companies to invest in information systems, leading a new industrial revolution. The Journal of Information Systems, which developed the information system field into a theoretical and practical study in the 1990s, retains a 30-year history of information systems. This study aims to identify academic values and research trends of JIS by analyzing the trends. Design/methodology/approach This study aims to analyze the trend of JIS by compounding various methods, named as TLS mining analysis. TLS mining analysis consists of a series of analysis including Term Frequency-Inverse Document Frequency (TF-IDF) weight model, Latent Dirichlet Allocation (LDA) topic modeling, and a text mining with Semantic Network Analysis. Firstly, keywords are extracted from the research data using the TF-IDF weight model, and after that, topic modeling is performed using the Latent Dirichlet Allocation (LDA) algorithm to identify issue keywords. Findings The current study used the summery service of the published research paper provided by Korea Citation Index to analyze JIS. 714 papers that were published from 2002 to 2012 were divided into two periods: 2002-2011 and 2012-2021. In the first period (2002-2011), the research trend in the information system field had focused on E-business strategies as most of the companies adopted online business models. In the second period (2012-2021), data-based information technology and new industrial revolution technologies such as artificial intelligence, SNS, and mobile had been the main research issues in the information system field. In addition, keywords for improving the JIS citation index were presented.

텍스트마이닝을 활용한 패브릭 관련 DIY 의류 상품 현황 연구 (A study on the current status of DIY clothing products related to fabric using text mining)

  • 이은혜;이하은;최정욱
    • 한국의상디자인학회지
    • /
    • 제25권2호
    • /
    • pp.111-122
    • /
    • 2023
  • This study aims to collect Big Data related to DIY clothing, analyze the results on a year-by-year basis, understand consumers' perceptions, the status, and reality of DIY clothing. The reference period for the evaluation of DIY clothing trends was set from 2012 to 2022. The data in this study was collected and analyzed using Textom, a Big Data solution program certified as a Good Software by the Telecommunications Technology Association (TTA). For the analysis of fabric-related DIY products, the keyword was set to "DIY clothing", and for data cleansing following collection, the "Espresso K" module was employed. Also, via data collection on a year-by-year basis, a total of 11 lists were generated and the collected data was analyzed by period. The following are the findings of this study's data collection on DIY clothing. The total number of keywords collected over a period of ten years on search engines "Naver" and "Google" between January 1, 2012 and December 31, 2022 was 16,315, and data trends by period indicate a continuous upward trend. In addition, a keyword analysis was conducted to analyze TF-IDF (Term Frequency-Inverse Document Frequency), a statistical measure that reflects the importance of a word within data, and the relationship with N-gram, an analysis of the correlation concerning the relationship between words. Using these results, it was possible to evaluate the popularity and growing tendency of DIY clothing products in conjunction with the evolving social environment, as well as the desire to explore DIY trends among consumers. Therefore, this study is valuable in that it provides preliminary data for DIY clothing research by analyzing the status and reality of DIY products, and furthermore, contributes to the development and production of DIY clothing.

한국어 문서 요약 기법을 활용한 휘발유 재고량에 대한 미디어 분석 (Media-based Analysis of Gasoline Inventory with Korean Text Summarization)

  • 윤성연;박민서
    • 문화기술의 융합
    • /
    • 제9권5호
    • /
    • pp.509-515
    • /
    • 2023
  • 국가 차원의 지속적인 대체 에너지 개발에도 석유 제품의 사용량은 지속적으로 증가하고 있다. 특히, 대표적인 석유 제품인 휘발유는 국제유가의 변동에 그 가격이 크게 변동한다. 주유소에서는 휘발유의 가격 변화에 대응하기 위해 휘발유 재고량을 조절한다. 따라서, 휘발유 재고량의 주요 변화 요인을 분석하여 전반적인 휘발유 소비 행태를 분석할 필요가 있다. 본 연구에서는 주유소의 휘발유 재고량 변화에 영향을 미치는 요인을 파악하기 위해 뉴스 기사를 활용한다. 첫째, 웹 크롤링을 통해 자동으로 휘발유와 관련한 기사를 수집한다. 둘째, 수집한 뉴스 기사를 KoBART(Korean Bidirectional and Auto-Regressive Transformers) 텍스트 요약 모델을 활용하여 요약한다. 셋째, 추출한 요약문을 전처리하고, N-Gram 언어 모델과 TF-IDF(Term Frequency Inverse Document Frequency)를 통해 단어 및 구 단위의 주요 요인을 도출한다. 본 연구를 통해 휘발유 소비 형태의 파악 및 예측이 가능하다.