• Title/Summary/Keyword: Term frequency-inverse document frequency

Search Result 92, Processing Time 0.024 seconds

A Study on the General Public's Perceptions of Dental Fear Using Unstructured Big Data

  • Han-A Cho;Bo-Young Park
    • Journal of dental hygiene science
    • /
    • v.23 no.4
    • /
    • pp.255-263
    • /
    • 2023
  • Background: This study used text mining techniques to determine public perceptions of dental fear, extracted keywords related to dental fear, identified the connection between the keywords, and categorized and visualized perceptions related to dental fear. Methods: Keywords in texts posted on Internet portal sites (NAVER and Google) between 1 January, 2000, and 31 December, 2022, were collected. The four stages of analysis were used to explore the keywords: frequency analysis, term frequency-inverse document frequency (TF-IDF), centrality analysis and co-occurrence analysis, and convergent correlations. Results: In the top ten keywords based on frequency analysis, the most frequently used keyword was 'treatment,' followed by 'fear,' 'dental implant,' 'conscious sedation,' 'pain,' 'dental fear,' 'comfort,' 'taking medication,' 'experience,' and 'tooth.' In the TF-IDF analysis, the top three keywords were dental implant, conscious sedation, and dental fear. The co-occurrence analysis was used to explore keywords that appear together and showed that 'fear and treatment' and 'treatment and pain' appeared the most frequently. Conclusion: Texts collected via unstructured big data were analyzed to identify general perceptions related to dental fear, and this study is valuable as a source data for understanding public perceptions of dental fear by grouping associated keywords. The results of this study will be helpful to understand dental fear and used as factors affecting oral health in the future.

Classifying Sub-Categories of Apartment Defect Repair Tasks: A Machine Learning Approach (아파트 하자 보수 시설공사 세부공종 머신러닝 분류 시스템에 관한 연구)

  • Kim, Eunhye;Ji, HongGeun;Kim, Jina;Park, Eunil;Ohm, Jay Y.
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.9
    • /
    • pp.359-366
    • /
    • 2021
  • A number of construction companies in Korea invest considerable human and financial resources to construct a system for managing apartment defect data and for categorizing repair tasks. Thus, this study proposes machine learning models to automatically classify defect complaint text-data into one of the sub categories of 'finishing work' (i.e., one of the defect repair tasks). In the proposed models, we employed two word representation methods (Bag-of-words, Term Frequency-Inverse Document Frequency (TF-IDF)) and two machine learning classifiers (Support Vector Machine, Random Forest). In particular, we conducted both binary- and multi- classification tasks to classify 9 sub categories of finishing work: home appliance installation work, paperwork, painting work, plastering work, interior masonry work, plaster finishing work, indoor furniture installation work, kitchen facility installation work, and tiling work. The machine learning classifiers using the TF-IDF representation method and Random Forest classification achieved more than 90% accuracy, precision, recall, and F1 score. We shed light on the possibility of constructing automated defect classification systems based on the proposed machine learning models.

A Study on the Current Situation and Trend Analysis of The Elderly Healthcare Applications Using Big Data Analysis (텍스트마이닝을 활용한 노인 헬스케어 앱 사용 추이 및 동향 분석)

  • Byun, Hyun;Jeon, Sang-Wan;YI, Eun-Surk
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.5
    • /
    • pp.313-325
    • /
    • 2022
  • The purpose of this study is to examine the changes in the elderly healthcare app market through text mining analysis and to present basic data for activating elderly healthcare apps. Data collection was conducted on Naver, Daum, blog web, and cafe. As for the research method, text mining, TF-IDF(Term frequency-inverse document frequency), emotional analysis, and semantic network analysis were conducted using Textom and Ucinet6, which are big data analysis programs. As a result of this study, a total of six categories were finally derived: resolving the healthcare app information gap, convergence healthcare technology, diffusion media, elderly healthcare app industry, social background, and content. In conclusion, in order for elderly healthcare apps to be accepted and utilized by the elderly, they must have a good diffusion infrastructure, and the effectiveness of healthcare apps must be maximized through the active introduction of convergence technology and content development that can be easily used by the elderly.

Comparative analysis of informationattributes inchemical accident response systems through Unstructured Data: Spotlighting on the OECD Guidelines for Chemical Accident Prevention, Preparedness, and Response (비정형 데이터를 이용한 화학물질 사고 대응 체계 정보속성 비교 분석 : 화학사고 예방, 대비 및 대응을 위한 OECD 지침서를 중심으로)

  • YongJin Kim;Chunghyun Do
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.91-110
    • /
    • 2023
  • The importance of manuals is emphasized because chemical accidents require swift response and recovery, and often result in environmental pollution and casualties. In this regard, the OECD revised OECD Guidelines for the Prevention, Preparedness, and Response to Chemical Accidents (referred to as the OECD Guidelines), in June 2023. Moreover, while existing research primarily raises awareness about chemical accidents, highlighting the need for a system-wide response including laws, regulations, and manuals, it was difficult to find comparative research on the attributes of manuals. So, this paper aims to compare and analyze the second and third editions of the OECD Guidelines, in order to uncover the information attributes and implications of the revised version. Specifically, TF-IDF (Term Frequency-Inverse Document Frequency) was applied to understand which keywords have become more important, and Word2Vec was applied to identify keywords that were used similarly and those that were differentiated. Lastly, a 2×2 matrix was proposed, identifying the topics within each quadrant to provide a deeper comparison of the information attributes of the OECD Guidelines. This study offers a framework to help researchers understand information attributes. From a practical perspective, it appears valuable for the revision of standard manuals by domestic government agencies and corporations related to chemistry.

LSTM Android Malicious Behavior Analysis Based on Feature Weighting

  • Yang, Qing;Wang, Xiaoliang;Zheng, Jing;Ge, Wenqi;Bai, Ming;Jiang, Frank
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.6
    • /
    • pp.2188-2203
    • /
    • 2021
  • With the rapid development of mobile Internet, smart phones have been widely popularized, among which Android platform dominates. Due to it is open source, malware on the Android platform is rampant. In order to improve the efficiency of malware detection, this paper proposes deep learning Android malicious detection system based on behavior features. First of all, the detection system adopts the static analysis method to extract different types of behavior features from Android applications, and extract sensitive behavior features through Term frequency-inverse Document Frequency algorithm for each extracted behavior feature to construct detection features through unified abstract expression. Secondly, Long Short-Term Memory neural network model is established to select and learn from the extracted attributes and the learned attributes are used to detect Android malicious applications, Analysis and further optimization of the application behavior parameters, so as to build a deep learning Android malicious detection method based on feature analysis. We use different types of features to evaluate our method and compare it with various machine learning-based methods. Study shows that it outperforms most existing machine learning based approaches and detects 95.31% of the malware.

Searching Patents Effectively in terms of Keyword Distributions (키워드 분포를 고려한 효과적 특허검색기법)

  • Lee, Wookey;Song, Justin Jongsu;Kang, Michael Mingu
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.3
    • /
    • pp.323-331
    • /
    • 2012
  • With the advancement of the area of knowledge and information, Intellectual Property, especially, patents have captured attention more and more emergent. The increasing need for efficient way of patent information search has been essential, but the prevailing patent search engines have included too many noises for the results due to the Boolean models. This has occasioned too much time for the professional experts to investigate the results manually. In this paper, we reveal the differences between the conventional document search and patent search and analyze the limitations of existing patent search. Furthermore, we propose a specialized in patent search, so that the relationship between the keywords within each document and their significance within each patent document search keyword can be identified. Which in turn, the keywords and the relationships have been appointed a ranking for this patent in the upper ranks and the noise in the data sub-ranked. Therefore this approach is proposed to significantly reduce noise ratio of the data from the search results. Finally, in, we demonstrate the superiority of the proposed methodology by comparing the Kipris dataset.

An Analysis of Indications of Meridians in DongUiBoGam Using Data Mining (데이터마이닝을 이용한 동의보감에서 경락의 주치특성 분석)

  • Chae, Younbyoung;Ryu, Yeonhee;Jung, Won-Mo
    • Korean Journal of Acupuncture
    • /
    • v.36 no.4
    • /
    • pp.292-299
    • /
    • 2019
  • Objectives : DongUiBoGam is one of the representative medical literatures in Korea. We used text mining methods and analyzed the characteristics of the indications of each meridian in the second chapter of DongUiBoGam, WaeHyeong, which addresses external body elements. We also visualized the relationships between the meridians and the disease sites. Methods : Using the term frequency-inverse document frequency (TF-IDF) method, we quantified values regarding the indications of each meridian according to the frequency of the occurrences of 14 meridians and 14 disease sites. The spatial patterns of the indications of each meridian were visualized on a human body template according to the TF-IDF values. Using hierarchical clustering methods, twelve meridians were clustered into four groups based on the TF-IDF distributions of each meridian. Results : TF-IDF values of each meridian showed different constellation patterns at different disease sites. The spatial patterns of the indications of each meridian were similar to the route of the corresponding meridian. Conclusions : The present study identified spatial patterns between meridians and disease sites. These findings suggest that the constellations of the indications of meridians are primarily associated with the lines of the meridian system. We strongly believe that these findings will further the current understanding of indications of acupoints and meridians.

Analysis of Media Articles on COVID-19 and Nurses Using Text Mining and Topic Modeling (텍스트 마이닝과 토픽모델링 분석을 활용한 코로나19와 간호사에 대한 언론기사 분석)

  • An, Jiyeon;Yi, Yunjeong;Lee, Bokim
    • Research in Community and Public Health Nursing
    • /
    • v.32 no.4
    • /
    • pp.467-476
    • /
    • 2021
  • Purpose: The purpose of this study is to understand the social perceptions of nurses in the context of the COVID-19 outbreak through analysis of media articles. Methods: Among the media articles reported from January 1st to September 30th, 2020, those containing the keywords '[corona or Wuhan pneumonia or covid] and [nurse or nursing]' are extracted. After the selection process, the text mining and topic modeling are performed on 454 media articles using textom version 4.5. Results: Frequency Top 30 keywords include 'Nurse', 'Corona', 'Isolation', 'Support', 'Shortage', 'Protective Clothing', and so on. Keywords that ranked high in Term Frequency-Inverse Document Frequency (TF-IDF) values are 'Daegu', 'President', 'Gwangju', 'manpower', and so on. As a result of the topic analysis, 10 topics are derived, such as 'Local infection', 'Dispatch of personnel', 'Message for thanks', and 'Delivery of one's heart'. Conclusion: Nurses are both the contributors and victims of COVID-19 prevention. The government and the nurses' community should make efforts to improve poor working conditions and manpower shortages.

A Study on the Perception of Metaverse Fashion Using Big Data Analysis

  • Hosun Lim
    • Fashion & Textile Research Journal
    • /
    • v.25 no.1
    • /
    • pp.72-81
    • /
    • 2023
  • As changes in social and economic paradigms are accelerating, and non-contact has become the new normal due to the COVID-19 pandemic, metaverse services that build societies in online activities and virtual reality are spreading rapidly. This study analyzes the perception and trend of metaverse fashion using big data. TEXTOM was used to extract metaverse and fashion-related words from Naver and Google and analyze their frequency and importance. Additionally, structural equivalence analysis based on the derived main words was conducted to identify the perception and trend of metaverse fashion. The following results were obtained: First, term frequency(TF) analysis revealed the most frequently appearing words were "metaverse," "fashion," "virtual," "brand," "platform," "digital," "world," "Zepeto," "company," and "game." After analyzing TF-inverse document frequency(TF-IDF), "virtual" was the most important, followed by "brand," "platform," "Zepeto," "digital," "world," "industry," "game," "fashion show," and "industry." "Metaverse" and "fashion" were found to have a high TF but low TF-IDF. Further, words such as "virtual," "brand," "platform," "Zepeto," and "digital" had a higher TF-IDF ranking than TF, indicating that they had high importance in the text. Second, convergence of iterated correlations analysis using UNICET revealed four clusters, classified as "virtual world," "metaverse distribution platform," "fashion contents technology investment," and "metaverse fashion week." Fashion brands are hosting virtual fashion shows and stores on metaverse platforms where the virtual and real worlds coexist, and investment in developing metaverse-related technologies is under way.

Impact of Diverse Document-evaluation Measure-based Searching Methods in Big Data Search Accuracy (빅데이터 검색 정확도에 미치는 다양한 측정 방법 기반 검색 기법의 효과)

  • Kim, Ji young;Han, DaHyeon;Kim, Jongkwon
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.553-558
    • /
    • 2017
  • With the rapid growth of Big Data, research on extracting meaningful information is being pursued by both academia and industry. Especially, data characteristics derived from analysis, and researcher intention are key factors for search algorithms to obtain accurate output. Therefore, reflecting both data characteristics and researcher intention properly is the final goal of data analysis research. The data analyzed properly can help users to increase loyalty to the service provided by company, and to utilize information more effectively and efficiently. In this paper, we explore various methods of document-evaluation, so that we can improve the accuracy of searching article one of the most frequently searches used in real life. We also analyze the experiment result, and suggest the proper manners to use various methods.