• 제목/요약/키워드: Term Frequency-Inverse document frequency

검색결과 95건 처리시간 0.021초

아토바스타틴의 새로운 약물 적응증 탐색을 위한 비정형 데이터 분석 (Analysis of Unstructured Data on Detecting of New Drug Indication of Atorvastatin)

  • 정휘수;강길원;최웅;박종혁;신광수;서영성
    • Journal of health informatics and statistics
    • /
    • 제43권4호
    • /
    • pp.329-335
    • /
    • 2018
  • Objectives: In recent years, there has been an increased need for a way to extract desired information from multiple medical literatures at once. This study was conducted to confirm the usefulness of unstructured data analysis using previously published medical literatures to search for new indications. Methods: The new indications were searched through text mining, network analysis, and topic modeling analysis using 5,057 articles of atorvastatin, a treatment for hyperlipidemia, from 1990 to 2017. Results: The extracted keywords was 273. In the frequency of text mining and network analysis, the existing indications of atorvastatin were extracted in top level. The novel indications by Term Frequency-Inverse Document Frequency (TF-IDF) were atrial fibrillation, heart failure, breast cancer, rheumatoid arthritis, combined hyperlipidemia, arrhythmias, multiple sclerosis, non-alcoholic fatty liver disease, contrast-induced acute kidney injury and prostate cancer. Conclusions: Unstructured data analysis for discovering new indications from massive medical literature is expected to be used in drug repositioning industries.

단어-역문서 빈도 벡터화를 통한 한국 걸그룹의 음반 메타 정보 군집화 (Clustering Meta Information of K-Pop Girl Groups Using Term Frequency-inverse Document Frequency Vectorization)

  • 현준서;조재혁
    • Journal of Platform Technology
    • /
    • 제11권3호
    • /
    • pp.12-23
    • /
    • 2023
  • 2020 년대 K-Pop 시장은 보이그룹보다 걸그룹이, 3 세대보다 4 세대가 전반에서 주목받았다. 해당 논문은 걸그룹의 세대가 바뀌기 시작했는지 알아보고자 가사 군집화에 대한 방법과 결과를 제시한다. 2013 년부터 2022 년까지 발표된 47 개 그룹의 1469 곡에 대한 메타정보를 수집하여 가사 정보와 가사 외 메타정보로 분류하여 각각 수치화했다. 가사 정보는 선행연구를 기반으로 단어역문서 빈도 벡터화를 적용한 뒤 상위 벡터 값만 선정하는 전처리를 하였다. 가사 외 메타정보는 가사 정보만 사용했을 때의 편향성을 줄이고 더 좋은 군집화 결과를 보여주기 위해 One-Hot Encoding 으로 전처리하여 적용했다. 전처리된 데이터에 대한 군집화 성능은 Spherical K-Means 의 Silhouette Coefficient, Calinski-Harabasz Score 가 Hierarchical Clustering 에 비해 각각 129%, 45% 더 높았다. 본 연구는 한국 대중가요 발전사와 걸그룹 가사 분석 및 군집화 연구에 기여할 수 있을 것으로 기대된다.

  • PDF

토픽 모델링을 활용한 광범위 선천성 대사이상 신생아 선별검사 관련 온라인 육아 커뮤니티 게시 글 분석: 계량적 내용분석 연구 (Analysis of online parenting community posts on expanded newborn screening for metabolic disorders using topic modeling: a quantitative content analysis)

  • 이명선;정현숙;김진선
    • 여성건강간호학회지
    • /
    • 제29권1호
    • /
    • pp.20-31
    • /
    • 2023
  • Purpose: As more newborns have received expanded newborn screening (NBS) for metabolic disorders, the overall number of false-positive results has increased. The purpose of this study was to explore the psychological impacts experienced by mothers related to the NBS process. Methods: An online parenting community in Korea was selected, and questions regarding NBS were collected using web crawling for the period from October 2018 to August 2021. In total, 634 posts were analyzed. The collected unstructured text data were preprocessed, and keyword analysis, topic modeling, and visualization were performed. Results: Of 1,057 words extracted from posts, the top keyword based on 'term frequency-inverse document frequency' values was "hypothyroidism," followed by "discharge," "close examination," "thyroid-stimulating hormone levels," and "jaundice." The top keyword based on the simple frequency of appearance was "XXX hospital," followed by "close examination," "discharge," "breastfeeding," "hypothyroidism," and "professor." As a result of LDA topic modeling, posts related to inborn errors of metabolism (IEMs) were classified into four main themes: "confirmatory tests of IEMs," "mother and newborn with thyroid function problems," "retests of IEMs," and "feeding related to IEMs." Mothers experienced substantial frustration, stress, and anxiety when they received positive NBS results. Conclusion: The online parenting community played an important role in acquiring and sharing information, as well as psychological support related to NBS in newborn mothers. Nurses can use this study's findings to develop timely and evidence-based information for parents whose children receive positive NBS results to reduce the negative psychological impact.

유사과제파악을 위한 검색 알고리즘의 개발에 관한 연구 (A Study on the Development of Search Algorithm for Identifying the Similar and Redundant Research)

  • 박동진;최기석;이명선;이상태
    • 한국콘텐츠학회논문지
    • /
    • 제9권11호
    • /
    • pp.54-62
    • /
    • 2009
  • 국가적으로 그리고 각 연구기관에서는 투자의 효율성을 기하기 위하여 연구사업 선정과정에서 데이터베이스로부터 중복과제 혹은 유사과제를 검색하는 과정을 거친다. 최근 부얼리언 기반의 키워드 매칭 검색알고리즘의 발전 및 이를 채택한 검색엔진의 개발로 인하여 검색의 정확도가 많이 향상되었지만, 사용자가 입력하는 제한된 수의 키워드들에 의한 검색은 유사과제 파악과 우선순위의 결정에 어려움이 있다. 본 연구에서는 제안된 과제의 문서를 분석하여 다수의 색인어들을 추출하고, 이들에게 가중치를 부여한 후, 기존의 문서들과 비교하여 유사과제를 찾아내는 문서단위의 검색 알고리즘을 제안한다. 구체적으로 벡터공간검색(Vector-Space Retrieval)모델의 한 종류인 TFIDF(Term Frequency Inverse document Frequency)를 기본 구조로 채택한다. 또한 개발되는 알고리즘에는 연구과제 제안문서의 구조에 적합한 속성별 가중치(feature weighting)를 반영하고 검색속도의 향상을 위하여 K-최근접 문서(KNN: K-Nearest Neighbors) 기법도 반영한 알고리즘을 제시한다. 실험을 위하여 실제 연구제안 문서와 구조가 동일한 기존의 보고서를 사용하였는데, KISTI에서 운영하는 과학기술정보포털서비스인 NDSL에서 이미 분류해 놓은 4분야의 1,000 개 연구 보고서 문서를 발췌하여 실험을 하였다.

Incorporating Time Constraints into a Recommender System for Museum Visitors

  • Kovavisaruch, La-or;Sanpechuda, Taweesak;Chinda, Krisada;Wongsatho, Thitipong;Wisadsud, Sodsai;Chaiwongyen, Anuwat
    • Journal of information and communication convergence engineering
    • /
    • 제18권2호
    • /
    • pp.123-131
    • /
    • 2020
  • After observing that most tourists plan to complete their visits to multiple cultural heritage sites within one day, we surmised that for many museum visitors, the foremost thought is with regard to the amount of time is to be spent at each location and how they can maximize their enjoyment at a site while still balancing their travel itinerary? Recommendation systems in e-commerce are built on knowledge about the users' previous purchasing history; recommendation systems for museums, on the other hand, do not have an equivalent data source available. Recent solutions have incorporated advanced technologies such as algorithms that rely on social filtering, which builds recommendations from the nearest identified similar user. Our paper proposes a different approach, and involves providing dynamic recommendations that deploy social filtering as well as content-based filtering using term frequency-inverse document frequency. The main challenge is to overcome a cold start, whereby no information is available on new users entering the system, and thus there is no strong background information for generating the recommendation. In these cases, our solution deploys statistical methods to create a recommendation, which can then be used to gather data for future iterations. We are currently running a pilot test at Chao Samphraya national museum and have received positive feedback to date on the implementation.

SNS대상의 지능형 자연어 수집, 처리 시스템 구현을 통한 한국형 감성사전 구축에 관한 연구 (Research on Designing Korean Emotional Dictionary using Intelligent Natural Language Crawling System in SNS)

  • 이종화
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제29권3호
    • /
    • pp.237-251
    • /
    • 2020
  • Purpose The research was studied the hierarchical Hangul emotion index by organizing all the emotions which SNS users are thinking. As a preliminary study by the researcher, the English-based Plutchick (1980)'s emotional standard was reinterpreted in Korean, and a hashtag with implicit meaning on SNS was studied. To build a multidimensional emotion dictionary and classify three-dimensional emotions, an emotion seed was selected for the composition of seven emotion sets, and an emotion word dictionary was constructed by collecting SNS hashtags derived from each emotion seed. We also want to explore the priority of each Hangul emotion index. Design/methodology/approach In the process of transforming the matrix through the vector process of words constituting the sentence, weights were extracted using TF-IDF (Term Frequency Inverse Document Frequency), and the dimension reduction technique of the matrix in the emotion set was NMF (Nonnegative Matrix Factorization) algorithm. The emotional dimension was solved by using the characteristic value of the emotional word. The cosine distance algorithm was used to measure the distance between vectors by measuring the similarity of emotion words in the emotion set. Findings Customer needs analysis is a force to read changes in emotions, and Korean emotion word research is the customer's needs. In addition, the ranking of the emotion words within the emotion set will be a special criterion for reading the depth of the emotion. The sentiment index study of this research believes that by providing companies with effective information for emotional marketing, new business opportunities will be expanded and valued. In addition, if the emotion dictionary is eventually connected to the emotional DNA of the product, it will be possible to define the "emotional DNA", which is a set of emotions that the product should have.

팔요맥을 중심으로 살펴본 『동의보감』 27맥 속성 연구 (Properties of the Twenty-seven Pulses in DongUiBoGam Based on the Eight Important Pulses)

  • 이태형;정원모;고병호;박히준;김남일;채윤병
    • Korean Journal of Acupuncture
    • /
    • 제32권4호
    • /
    • pp.151-159
    • /
    • 2015
  • Objectives : Pulse diagnosis is considered particularly important among several methods of diagnosis in DongUiBoGam. In spite of its importance, numerous and various pulse descriptions made it difficult to learn and practice pulse diagnosis. In this article, we tried to analyze properties of the twenty-seven pulses from pulse diagnosis cases from DongUiBoGam to enable the practical understanding of pulse diagnosis. Methods : We constituted the four axis according to the eight important pulses. And we analyzed properties of the twenty-seven pulses through the relationship between the four pairs of important pulses and the twenty-seven pulses. To quantify the relevances of important pulses to the twenty-seven pulses, we used the term frequency-inverse document frequency(TF-IDF) method. Results : We could elicit properties of the twenty-seven pulses according to the four axis. Also, we reexamined the categorization of the seven exterior pulses / the eight interior pulses and the similar pulses from DongUiBoGam with the analysis results. Conclusions : We could understand properties of the twenty-seven pulses more specifically with the eight important pulses. And we also could see the relationship among the twenty-seven pulses on each axis. However, the limitation arising from insufficient number of pulse diagnosis cases in this research requires further research with more sources such as other traditional medical records or clinical records in the present time.

텍스트마이닝을 이용한 동의보감의 질병인식방식과 내경편 침구법 경혈 특성 분석 (A Structural Analysis of Acupuncture & Moxibustion Points in the NaeGyeong Chapter of DongUiBoGam Using Text Mining)

  • 이태형;정원모;이인선;이혜정;김남일;채윤병
    • Korean Journal of Acupuncture
    • /
    • 제30권4호
    • /
    • pp.230-242
    • /
    • 2013
  • Objectives : DongUiBoGam is a representative medical literature in Korea. This research intends to structurally grasp how DongUiBoGam understands the human body and review the methods of acupuncture and moxibustion in the NaeGyeong chapter of it using text mining. Methods : The structure of DongUiBoGam was analyzed with specific parts of the book that described contents, major premises of understanding the human body, and processes of treatment. We analyzed characteristics of each acupoints in a relationship with causes of diseases & symptoms in the NaeGyeong chapter using a Term Frequency - Inverse Document Frequency(TFIDF). Results : Three different categories of pattern identification(PI) were formed after structural analysis of DongUiBoGam. Every causes of diseases & symptoms were transformed according to the three categories of PI. After analyzing the relationship between acupoints and causes of diseases & symptoms, 114 acupoints were visualized with TFIDF values of three PI categories. Conclusions : The selection of acupoints in NaeGyeong chapter of DongUiBoGam were linked to causes of diseases & symptoms based on the three PI categories. Through visualization of bipartite relationships between acupoints and causes of diseases & symptoms, we could easily understand characteristics of each acupoint.

A Feasibility Study on Adopting Individual Information Cognitive Processing as Criteria of Categorization on Apple iTunes Store

  • Zhang, Chao;Wan, Lili
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제27권2호
    • /
    • pp.1-28
    • /
    • 2018
  • Purpose More than 7.6 million mobile apps could be approved on both Apple iTunes Store and Google Play. For managing those existed Apps, Apple Inc. established twenty-four primary categories, as well as Google Play had thirty-three primary categories. However, all of their categorizations have appeared more and more problems in managing and classifying numerous apps, such as app miscategorized, cross-attribution problems, lack of categorization keywords index, etc. The purpose of this study focused on introducing individual information cognitive processing as the classification criteria to update the current categorization on Apple iTunes Store. Meanwhile, we tried to observe the effectiveness of the new criteria from a classification process on Apple iTunes Store. Design/Methodology/Approach A research approach with four research stages were performed and a series of mixed methods was developed to identify the feasibility of adopting individual information cognitive processing as categorization criteria. By using machine-learning techniques with Term Frequency-Inverse Document Frequency and Singular Value Decomposition, keyword lists were extracted. By using the prior research results related to car app's categorization, we developed individual information cognitive processing. Further keywords extracting process from the extracted keyword lists was performed. Findings By TF-IDF and SVD, keyword lists from more than five thousand apps were extracted. Furthermore, we developed individual information cognitive processing that included a categorization teaching process and learning process. Three top three keywords for each category were extracted. By comparing the extracted results with prior studies, the inter-rater reliability for two different methods shows significant reliable, which proved the individual information cognitive processing to be reliable as criteria of categorization on Apple iTunes Store. The updating suggestions for Apple iTunes Store were discussed in this paper and the results of this paper may be useful for app store hosts to improve the current categorizations on app stores as well as increasing the efficiency of app discovering and locating process for both app developers and users.

감성분석을 활용한 사물인터넷(IoT) 서비스 리뷰 분석: 사업자 특성에 따른 차이를 중심으로 (An Analysis of IoT Service using Sentiment Analysis on Online Reviews: Focusing on the Characteristics of Service Providers)

  • 류민호;조호수
    • 한국산업정보학회논문지
    • /
    • 제25권5호
    • /
    • pp.91-102
    • /
    • 2020
  • 사물인터넷(Internet of Things: IoT)은 다양한 사업자들이 같은 시장을 두고 경쟁하는 분야로, 서비스를 제공하는 사업자들의 주 사업영역 및 특성에 따라 제품의 기능과 성능 등의 차이가 존재한다. 본 논문은 감성분석을 통해 사업자의 특성에 따라 해당 사업자가 제공하는 서비스에 대한 만족도가 달라지는지를 평가한다. 이를 위해, 구글 플레이스토어에 등록된 국내외 사물인터넷 관련 서비스 중 스마트홈, AI스피커, 스마트카 3가지 영역의 총 41개의 애플리케이션에 대한 34,310건의 리뷰를 수집하고 단어 중요도 분석과 감성분석을 실시하였다. 리뷰분석은 키워드 중요도별, 서비스별, 사업자의 기존 사업영역별, 국내외 사업자별 등 다양한 차원에서 수행되었다. 분석결과, 이용자들의 사물인터넷 서비스에 대한 전반적인 평가는 낮은 것으로 파악되었고, 그나마 스마트홈이 다른 서비스 대비 상대적으로 높은 평가를 받았다. 제조업 기반의 사업자와 해외 사업자는 다른 특성의 사업자 대비 만족도가 높았다. 본 연구의 결과를 통해 개별 기업 차원에서 특성에 따른 서비스의 개선점을 찾고, 제품경쟁력을 제고하기 위한 시사점을 제공한다.