• Title/Summary/Keyword: Text-Network Analysis

Search Result 651, Processing Time 0.027 seconds

Practical Text Mining for Trend Analysis: Ontology to visualization in Aerospace Technology

  • Kim, Yoosin;Ju, Yeonjin;Hong, SeongGwan;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.8
    • /
    • pp.4133-4145
    • /
    • 2017
  • Advances in science and technology are driving us to the better life but also forcing us to make more investment at the same time. Therefore, the government has provided the investment to carry on the promising futuristic technology successfully. Indeed, a lot of resources from the government have supported into the science and technology R&D projects for several decades. However, the performance of the public investments remains unclear in many ways, so thus it is required that planning and evaluation about the new investment should be on data driven decision with fact based evidence. In this regard, the government wanted to know the trend and issue of the science and technology with evidences, and has accumulated an amount of database about the science and technology such as research papers, patents, project reports, and R&D information. Nowadays, the database is supporting to various activities such as planning policy, budget allocation, and investment evaluation for the science and technology but the information quality is not reached to the expectation because of limitations of text mining to drill out the information from the unstructured data like the reports and papers. To solve the problem, this study proposes a practical text mining methodology for the science and technology trend analysis, in case of aerospace technology, and conduct text mining methods such as ontology development, topic analysis, network analysis and their visualization.

A Study on Analysis of the Trend of Blockchain by Key Words Network Analysis (키워드 네트워크 분석 방법을 활용한 블록체인 트렌드 분석에 관한 연구)

  • Cho, Seong-Hwan
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.5
    • /
    • pp.550-555
    • /
    • 2018
  • This study aims to identify and compare contents and keywords used in articles related to blockchain applications to various industries. The text mining and Semantic Network Analysis, as methods of keyword network analysis, were used to analyze articles including terms of 'finance' 'energy' and 'logistics', which media and government frequently mentioned as areas that can apply blockchain technologies. For this study, data were collected from 43,093 articles from January, 2017 through July, 2018. Data crawling was carried out by using Python BeautifulSoup and data cleaning was performed in order to eliminate mutual redundancies of the three terms. After that, text mining and semantic network analysis were performed using Textom and UCInet for network analysis between keywords. The results showed that all the three terms were similar in terms of 'technology', but there were differences in the contents of 'government policy' or 'industry' issues. In addition, there were differences in frequencies and centralities of these terms.

A Comparative Study between Ubiquitous City Comprehensive Plan and Ubiquitous City Plan - Focusing on U-Service Plan (유비쿼터스도시종합계획과 유비쿼터스도시계획 비교 연구 -U-서비스 계획을 중심으로-)

  • Yoo, Ji Song;Jeong, Da Woon;Yi, Mi Sook;Min, Kyung Ju
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.83-93
    • /
    • 2015
  • U-Services, which are offered from local governments based on their Ubiquitous City Plans, are only focused on facility and urban management services. Also Citizen oriented U-service is only planned. This study's purpose is to propose the implication for provide of the Citizen oriented U-service comparing with U-Service plan of 'Ubiquitous City Comprehensive Plan' and 'Ubiquitous City Plan' through a network text analysis and word frequency analysis. It was calculated a important keyword that was extracted the service plan contents of the 'Ubiquitous City Comprehensive Plan' and 'Ubiquitous City Plan' of the four local governments. The network text analysis and keyword frequency analysis was performed through derived keyword. Based on the analysis results, awareness of the citizens can be expected to increase about U-City by activating a excavation of Citizen oriented U-service in a variety of sector through additional services and policy of financial support in the next Ubiquitous City Comprehensive Plan.

Analysis of Unstructured Data on Detecting of New Drug Indication of Atorvastatin (아토바스타틴의 새로운 약물 적응증 탐색을 위한 비정형 데이터 분석)

  • Jeong, Hwee-Soo;Kang, Gil-Won;Choi, Woong;Park, Jong-Hyock;Shin, Kwang-Soo;Suh, Young-Sung
    • Journal of health informatics and statistics
    • /
    • v.43 no.4
    • /
    • pp.329-335
    • /
    • 2018
  • Objectives: In recent years, there has been an increased need for a way to extract desired information from multiple medical literatures at once. This study was conducted to confirm the usefulness of unstructured data analysis using previously published medical literatures to search for new indications. Methods: The new indications were searched through text mining, network analysis, and topic modeling analysis using 5,057 articles of atorvastatin, a treatment for hyperlipidemia, from 1990 to 2017. Results: The extracted keywords was 273. In the frequency of text mining and network analysis, the existing indications of atorvastatin were extracted in top level. The novel indications by Term Frequency-Inverse Document Frequency (TF-IDF) were atrial fibrillation, heart failure, breast cancer, rheumatoid arthritis, combined hyperlipidemia, arrhythmias, multiple sclerosis, non-alcoholic fatty liver disease, contrast-induced acute kidney injury and prostate cancer. Conclusions: Unstructured data analysis for discovering new indications from massive medical literature is expected to be used in drug repositioning industries.

Enhancing the Performance of Blog Retrieval by User Tagging and Social Network Analysis (사용자 태그와 중심성 지수를 이용한 블로그 검색 성능 향상에 관한 연구)

  • Kim, Eun-Hee;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.1
    • /
    • pp.61-77
    • /
    • 2010
  • Blogs are now one of the major information resources on the web. The purpose of this study is to enhance the performance of blog retrieval by means of user assigned tags and trackback information. To this end, retrieval experiments were performed with a dataset of 4,908 blog pages together with their associated trackback URLs. In the experiments, text terms, user tags, and network centrality values based on trackbacks were variously combined as retrieval features. The experimental results showed that employing user tags and network centrality values as retrieval features in addition to text words could improve the performance of blog retrieval.

A Multi-Class Classifier of Modified Convolution Neural Network by Dynamic Hyperplane of Support Vector Machine

  • Nur Suhailayani Suhaimi;Zalinda Othman;Mohd Ridzwan Yaakub
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.11
    • /
    • pp.21-31
    • /
    • 2023
  • In this paper, we focused on the problem of evaluating multi-class classification accuracy and simulation of multiple classifier performance metrics. Multi-class classifiers for sentiment analysis involved many challenges, whereas previous research narrowed to the binary classification model since it provides higher accuracy when dealing with text data. Thus, we take inspiration from the non-linear Support Vector Machine to modify the algorithm by embedding dynamic hyperplanes representing multiple class labels. Then we analyzed the performance of multi-class classifiers using macro-accuracy, micro-accuracy and several other metrics to justify the significance of our algorithm enhancement. Furthermore, we hybridized Enhanced Convolution Neural Network (ECNN) with Dynamic Support Vector Machine (DSVM) to demonstrate the effectiveness and efficiency of the classifier towards multi-class text data. We performed experiments on three hybrid classifiers, which are ECNN with Binary SVM (ECNN-BSVM), and ECNN with linear Multi-Class SVM (ECNN-MCSVM) and our proposed algorithm (ECNNDSVM). Comparative experiments of hybrid algorithms yielded 85.12 % for single metric accuracy; 86.95 % for multiple metrics on average. As for our modified algorithm of the ECNN-DSVM classifier, we reached 98.29 % micro-accuracy results with an f-score value of 98 % at most. For the future direction of this research, we are aiming for hyperplane optimization analysis.

A study on detective story authors' style differentiation and style structure based on Text Mining (텍스트 마이닝 기법을 활용한 고전 추리 소설 작가 간 문체적 차이와 문체 구조에 대한 연구)

  • Moon, Seok Hyung;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.89-115
    • /
    • 2019
  • This study was conducted to present the stylistic differences between Arthur Conan Doyle and Agatha Christie, famous as writers of classical mystery novels, through data analysis, and further to present the analytical methodology of the study of style based on text mining. The reason why we chose mystery novels for our research is because the unique devices that exist in classical mystery novels have strong stylistic characteristics, and furthermore, by choosing Arthur Conan Doyle and Agatha Christie, who are also famous to the general reader, as subjects of analysis, so that people who are unfamiliar with the research can be familiar with them. The primary objective of this study is to identify how the differences exist within the text and to interpret the effects of these differences on the reader. Accordingly, in addition to events and characters, which are key elements of mystery novels, the writer's grammatical style of writing was defined in style and attempted to analyze it. Two series and four books were selected by each writer, and the text was divided into sentences to secure data. After measuring and granting the emotional score according to each sentence, the emotions of the page progress were visualized as a graph, and the trend of the event progress in the novel was identified under eight themes by applying Topic modeling according to the page. By organizing co-occurrence matrices and performing network analysis, we were able to visually see changes in relationships between people as events progressed. In addition, the entire sentence was divided into a grammatical system based on a total of six types of writing style to identify differences between writers and between works. This enabled us to identify not only the general grammatical writing style of the author, but also the inherent stylistic characteristics in their unconsciousness, and to interpret the effects of these characteristics on the reader. This series of research processes can help to understand the context of the entire text based on a defined understanding of the style, and furthermore, by integrating previously individually conducted stylistic studies. This prior understanding can also contribute to discovering and clarifying the existence of text in unstructured data, including online text. This could help enable more accurate recognition of emotions and delivery of commands on an interactive artificial intelligence platform that currently converts voice into natural language. In the face of increasing attempts to analyze online texts, including New Media, in many ways and discover social phenomena and managerial values, it is expected to contribute to more meaningful online text analysis and semantic interpretation through the links to these studies. However, the fact that the analysis data used in this study are two or four books by author can be considered as a limitation in that the data analysis was not attempted in sufficient quantities. The application of the writing characteristics applied to the Korean text even though it was an English text also could be limitation. The more diverse stylistic characteristics were limited to six, and the less likely interpretation was also considered as a limitation. In addition, it is also regrettable that the research was conducted by analyzing classical mystery novels rather than text that is commonly used today, and that various classical mystery novel writers were not compared. Subsequent research will attempt to increase the diversity of interpretations by taking into account a wider variety of grammatical systems and stylistic structures and will also be applied to the current frequently used online text analysis to assess the potential for interpretation. It is expected that this will enable the interpretation and definition of the specific structure of the style and that various usability can be considered.

A Study of Comparison between Cruise Tours in China and U.S.A through Big Data Analytics

  • Shuting, Tao;Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.23 no.6
    • /
    • pp.1-11
    • /
    • 2017
  • The purpose of this study was to compare the cruise tours between China and U.S.A. through the semantic network analysis of big data by collecting online data with SCTM (Smart crawling & Text mining), a data collecting and processing program. The data analysis period was from January $1^{st}$, 2015 to August $15^{th}$, 2017, meanwhile, "cruise tour, china", "cruise tour, usa" were conducted to be as keywords to collet related data and packaged Netdraw along with UCINET 6.0 were utilized for data analysis. Currently, Chinese cruisers concern on the cruising destinations while American cruisers pay more attention on the onboard experience and cruising expenditure. After performing CONCOR (convergence of iterated correlation) analysis, for Chinese cruise tour, there were three clusters created with domestic destinations, international destinations and hospitality tourism. As for American cruise tour, four groups have been segmented with cruise expenditure, onboard experience, cruise brand and destinations. Since the cruise tourism of America was greatly developed, this study also was supposed to provide significant and social network-oriented suggestions for Chinese cruise tourism.

Research trends over 10 years (2010-2021) in infant and toddler rearing behavior by family caregivers in South Korea: text network and topic modeling

  • In-Hye Song;Kyung-Ah Kang
    • Child Health Nursing Research
    • /
    • v.29 no.3
    • /
    • pp.182-194
    • /
    • 2023
  • Purpose: This study analyzed research trends in infant and toddler rearing behavior among family caregivers over a 10-year period (2010-2021). Methods: Text network analysis and topic modeling were employed on data collected from relevant papers, following the extraction and refinement of semantic morphemes. A semantic-centered network was constructed by extracting words from 2,613 English-language abstracts. Data analysis was performed using NetMiner 4.5.0. Results: Frequency analysis, degree centrality, and eigenvector centrality all revealed the terms ''scale," ''program," and ''education" among the top 10 keywords associated with infant and toddler rearing behaviors among family caregivers. The keywords extracted from the analysis were divided into two clusters through cohesion analysis. Additionally, they were classified into two topic groups using topic modeling: "program and evaluation" (64.37%) and "caregivers' role and competency in child development" (35.63%). Conclusion: The roles and competencies of family caregivers are essential for the development of infants and toddlers. Intervention programs and evaluations are necessary to improve rearing behaviors. Future research should determine the role of nurses in supporting family caregivers. Additionally, it should facilitate the development of nursing strategies and intervention programs to promote positive rearing practices.

Network, Centrality, and Topic Analysis on Korea's Trade and Economy with Latin America and the Caribbean Area (한국의 중남미 지역연구 네트워크와 중심성 및 무역과 경제에 대한 토픽 변동분석)

  • Chae-Deug Yi
    • Korea Trade Review
    • /
    • v.47 no.6
    • /
    • pp.189-209
    • /
    • 2022
  • This study aims to analyze Latin America and the Caribbean papers published in Korea during the past 2000-2020 years. Through this study, it is possible to understand the main subject and direction of research in Korea's Latin America and the Caribbean area. As the research mythologies, this study uses the text mining and Social Network Analysis such as frequency analysis, several centrality analyses, and topic analysis. After analyzing the empirical results, there has been a tendency to change the key words and centrality coefficients between 2000-2010 and 2011-2020 years. During 2011-2020 years, the most frequent keywords were changed from Neoliberalism and culture to policy education, and economy related words. The degree and closeness centrality analyses appeared the higher frequency key words. However, the eigenvector centrality appeared very different from the order of frequency key words. The topic analysis shows that the culture, language, and Neoliberalism were the most important keywords during 2000-2010 years but economy, labor trade, industry, development became the most important keywords during 2011-2020 years in topics.