• 제목/요약/키워드: Network Mining

Search Result 1,036, Processing Time 0.028 seconds

Technology Planning through Technology Roadmap: Application of Patent Citation Network (기술로드맵을 통한 기술기획: 특허인용네트워크의 활용)

  • Jeong, Yu-Jin;Yoon, Byung-Un
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.11
    • /
    • pp.5227-5237
    • /
    • 2011
  • Technology roadmap is a powerful tool that considers relationships of technology, product and market and referred as a supporting technology strategy and planning. There are numerous studies that have attempted to develop technology roadmap and case studies on specific technology areas. However, a number of studies have been dependant on brainstorming and discussion of expert group, delphi technique as qualitative analysis rather than systemic and quantitative analysis. To overcome the limitation, patent analysis considered as quite quantitative analysis is employed in this paper. Therefore, this paper proposes new technology roadmapping based on patent citation network considering technology life cycle and suggests planning for undeveloped technology but considered as promising. At first, patent data and citation information are collected and patent citation network is developed on the basis of collected patent information. Secondly, we investigate a stage of technology in the life cycle by considering patent application year and the technology life cycle, and duration of technology development is estimated. In addition, subsequent technologies are grouped as nodes of a super-level technology to show the evolution of the technology for the period. Finally, a technology roadmap is drawn by linking these technology nodes in a technology layer and estimating the duration of development time. Based on technology roadmap, technology planning is conducted to identify undeveloped technology through text mining and this paper suggests characteristics of technology that needs to be developed in the future. In order to illustrate the process of the proposed approach, technology for hydrogen storage is selected in this paper.

A Study on the Changes in Perspectives on Unwed Mothers in S.Korea and the Direction of Government Polices: 1995~2020 Social Media Big Data Analysis (한국미혼모에 대한 관점 변화와 정부정책의 방향: 1995년~2020년 소셜미디어 빅데이터 분석)

  • Seo, Donghee;Jun, Boksun
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.12
    • /
    • pp.305-313
    • /
    • 2021
  • This study collected and analyzed big data from 1995 to 2020, focusing on the keywords "unwed mother", "single mother," and "single mom" to present appropriate government support policy directions according to changes in perspectives on unwed mothers. Big data collection platform Textom was used to collect data from portal search sites Naver and Daum and refine data. The final refined data were word frequency analysis, TF-IDF analysis, an N-gram analysis provided by Textom. In addition, Network analysis and CONCOR analysis were conducted through the UCINET6 program. As a result of the study, similar words appeared in word frequency analysis and TF-IDF analysis, but they differed by year. In the N-gram analysis, there were similarities in word appearance, but there were many differences in frequency and form of words appearing in series. As a result of CONCOR analysis, it was found that different clusters were formed by year. This study confirms the change in the perspective of unwed mothers through big data analysis, suggests the need for unwed mothers policies for various options for independent women, and policies that embrace pregnancy, childbirth, and parenting without discrimination within the new family form.

Exploring Potential Application Industry for Fintech Technology by Expanding its Terminology: Network Analysis and Topic Modelling Approach (용어 확장을 통한 핀테크 기술 적용가능 산업의 탐색 :네트워크 분석 및 토픽 모델링 접근)

  • Park, Mingyu;Jeon, Byeongmin;Kim, Jongwoo;Geum, Youngjung
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.1
    • /
    • pp.1-28
    • /
    • 2021
  • FinTech has been discussed as an important business area towards technology-driven financial innovation. The term fintech is a combination of finance and technology, which means ICT technology currently associated with all finance areas. The popularity of the fintech industry has significantly increased over time, with full investment and support for numerous startups. Therefore, both academia and practice tried to analyze the trend of the fintech area. Despite the fact, however, previous research has limitations in terms of collecting relevant databases for fintech and identifying proper application areas. In response, this study proposed a new method for analyzing the trend of Fintech fields by expanding Fintech's terminology and using network analysis and topic modeling. A new Fintech terminology list was created and a total of 18,341 patents were collected from USPTO for 10 years. The co-classification analysis and network analysis was conducted to identify the technological trends of patent classification. In addition, topic modeling was conducted to identify the trends of fintech in order to analyze the contents of fintech. This study is expected to help both managers and investors who want to be involved in technology-driven financial services seize new FinTech technology opportunities.

Digital Transformation: Using D.N.A.(Data, Network, AI) Keywords Generalized DMR Analysis (디지털 전환: D.N.A.(Data, Network, AI) 키워드를 활용한 토픽 모델링)

  • An, Sehwan;Ko, Kangwook;Kim, Youngmin
    • Knowledge Management Research
    • /
    • v.23 no.3
    • /
    • pp.129-152
    • /
    • 2022
  • As a key infrastructure for digital transformation, the spread of data, network, artificial intelligence (D.N.A.) fields and the emergence of promising industries are laying the groundwork for active digital innovation throughout the economy. In this study, by applying the text mining methodology, major topics were derived by using the abstract, publication year, and research field of the study corresponding to the SCIE, SSCI, and A&HCI indexes of the WoS database as input variables. First, main keywords were identified through TF and TF-IDF analysis based on word appearance frequency, and then topic modeling was performed using g-DMR. With the advantage of the topic model that can utilize various types of variables as meta information, it was possible to properly explore the meaning beyond simply deriving a topic. According to the analysis results, topics such as business intelligence, manufacturing production systems, service value creation, telemedicine, and digital education were identified as major research topics in digital transformation. To summarize the results of topic modeling, 1) research on business intelligence has been actively conducted in all areas after COVID-19, and 2) issues such as intelligent manufacturing solutions and metaverses have emerged in the manufacturing field. It has been confirmed that the topic of production systems is receiving attention once again. Finally, 3) Although the topic itself can be viewed separately in terms of technology and service, it was found that it is undesirable to interpret it separately because a number of studies comprehensively deal with various services applied by combining the relevant technologies.

6G Technology Competitiveness and Network Analysis: Focusing on GaN Integrated Circuit Patent Data (6G의 기술경쟁력 및 네트워크 분석: GaN 집적회로 특허 데이터 중심)

  • Woo-Seok Choi;Jin-Yong Kim;Jung-Hwan Lee;Sang-Hyun Choi
    • Journal of Industrial Convergence
    • /
    • v.21 no.3
    • /
    • pp.1-15
    • /
    • 2023
  • Expectations for wireless communication technology are rising as a base technology that promotes innovation in various industries in line with the paradigm of digital transformation in the 21st century beyond the stage of being used only for communication service itself. In this study, in order to compare 6G technological competitiveness between Korea and leading countries, technological competitiveness was confirmed through PFS, CPP, and network analysis based on GaN Integrated Circuit patent data. Korea's 6G technological competitiveness was 0.62 in PFS and 3.93 in CPP, which were 32.8% and 19.9%, respectively, compared to leading countries. In addition, as a result of network analysis, the collaboration rate in the 6G field was 7.2%, and the collaboration ecosystem was very insufficient in most countries. In contrast, it was confirmed that Korea, unlike leading countries, has established a small-scale collaboration ecosystem linked by industry and academia. Thus, it is necessary to establish a strategy for 6G communication technology at the national level so that communication technology can be advanced based on a relatively well-established collaborative ecosystem.

The Prediction of the Helpfulness of Online Review Based on Review Content Using an Explainable Graph Neural Network (설명가능한 그래프 신경망을 활용한 리뷰 콘텐츠 기반의 유용성 예측모형)

  • Eunmi Kim;Yao Ziyan;Taeho Hong
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.309-323
    • /
    • 2023
  • As the role of online reviews has become increasingly crucial, numerous studies have been conducted to utilize helpful reviews. Helpful reviews, perceived by customers, have been verified in various research studies to be influenced by factors such as ratings, review length, review content, and so on. The determination of a review's helpfulness is generally based on the number of 'helpful' votes from consumers, with more 'helpful' votes considered to have a more significant impact on consumers' purchasing decisions. However, recently written reviews that have not been exposed to many customers may have relatively few 'helpful' votes and may lack 'helpful' votes altogether due to a lack of participation. Therefore, rather than relying on the number of 'helpful' votes to assess the helpfulness of reviews, we aim to classify them based on review content. In addition, the text of the review emerges as the most influential factor in review helpfulness. This study employs text mining techniques, including topic modeling and sentiment analysis, to analyze the diverse impacts of content and emotions embedded in the review text. In this study, we propose a review helpfulness prediction model based on review content, utilizing movie reviews from IMDb, a global movie information site. We construct a review helpfulness prediction model by using an explainable Graph Neural Network (GNN), while addressing the interpretability limitations of the machine learning model. The explainable graph neural network is expected to provide more reliable information about helpful or non-helpful reviews as it can identify connections between reviews.

Extension Method of Association Rules Using Social Network Analysis (사회연결망 분석을 활용한 연관규칙 확장기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.111-126
    • /
    • 2017
  • Recommender systems based on association rule mining significantly contribute to seller's sales by reducing consumers' time to search for products that they want. Recommendations based on the frequency of transactions such as orders can effectively screen out the products that are statistically marketable among multiple products. A product with a high possibility of sales, however, can be omitted from the recommendation if it records insufficient number of transactions at the beginning of the sale. Products missing from the associated recommendations may lose the chance of exposure to consumers, which leads to a decline in the number of transactions. In turn, diminished transactions may create a vicious circle of lost opportunity to be recommended. Thus, initial sales are likely to remain stagnant for a certain period of time. Products that are susceptible to fashion or seasonality, such as clothing, may be greatly affected. This study was aimed at expanding association rules to include into the list of recommendations those products whose initial trading frequency of transactions is low despite the possibility of high sales. The particular purpose is to predict the strength of the direct connection of two unconnected items through the properties of the paths located between them. An association between two items revealed in transactions can be interpreted as the interaction between them, which can be expressed as a link in a social network whose nodes are items. The first step calculates the centralities of the nodes in the middle of the paths that indirectly connect the two nodes without direct connection. The next step identifies the number of the paths and the shortest among them. These extracts are used as independent variables in the regression analysis to predict future connection strength between the nodes. The strength of the connection between the two nodes of the model, which is defined by the number of nodes between the two nodes, is measured after a certain period of time. The regression analysis results confirm that the number of paths between the two products, the distance of the shortest path, and the number of neighboring items connected to the products are significantly related to their potential strength. This study used actual order transaction data collected for three months from February to April in 2016 from an online commerce company. To reduce the complexity of analytics as the scale of the network grows, the analysis was performed only on miscellaneous goods. Two consecutively purchased items were chosen from each customer's transactions to obtain a pair of antecedent and consequent, which secures a link needed for constituting a social network. The direction of the link was determined in the order in which the goods were purchased. Except for the last ten days of the data collection period, the social network of associated items was built for the extraction of independent variables. The model predicts the number of links to be connected in the next ten days from the explanatory variables. Of the 5,711 previously unconnected links, 611 were newly connected for the last ten days. Through experiments, the proposed model demonstrated excellent predictions. Of the 571 links that the proposed model predicts, 269 were confirmed to have been connected. This is 4.4 times more than the average of 61, which can be found without any prediction model. This study is expected to be useful regarding industries whose new products launch quickly with short life cycles, since their exposure time is critical. Also, it can be used to detect diseases that are rarely found in the early stages of medical treatment because of the low incidence of outbreaks. Since the complexity of the social networking analysis is sensitive to the number of nodes and links that make up the network, this study was conducted in a particular category of miscellaneous goods. Future research should consider that this condition may limit the opportunity to detect unexpected associations between products belonging to different categories of classification.

A study on detective story authors' style differentiation and style structure based on Text Mining (텍스트 마이닝 기법을 활용한 고전 추리 소설 작가 간 문체적 차이와 문체 구조에 대한 연구)

  • Moon, Seok Hyung;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.89-115
    • /
    • 2019
  • This study was conducted to present the stylistic differences between Arthur Conan Doyle and Agatha Christie, famous as writers of classical mystery novels, through data analysis, and further to present the analytical methodology of the study of style based on text mining. The reason why we chose mystery novels for our research is because the unique devices that exist in classical mystery novels have strong stylistic characteristics, and furthermore, by choosing Arthur Conan Doyle and Agatha Christie, who are also famous to the general reader, as subjects of analysis, so that people who are unfamiliar with the research can be familiar with them. The primary objective of this study is to identify how the differences exist within the text and to interpret the effects of these differences on the reader. Accordingly, in addition to events and characters, which are key elements of mystery novels, the writer's grammatical style of writing was defined in style and attempted to analyze it. Two series and four books were selected by each writer, and the text was divided into sentences to secure data. After measuring and granting the emotional score according to each sentence, the emotions of the page progress were visualized as a graph, and the trend of the event progress in the novel was identified under eight themes by applying Topic modeling according to the page. By organizing co-occurrence matrices and performing network analysis, we were able to visually see changes in relationships between people as events progressed. In addition, the entire sentence was divided into a grammatical system based on a total of six types of writing style to identify differences between writers and between works. This enabled us to identify not only the general grammatical writing style of the author, but also the inherent stylistic characteristics in their unconsciousness, and to interpret the effects of these characteristics on the reader. This series of research processes can help to understand the context of the entire text based on a defined understanding of the style, and furthermore, by integrating previously individually conducted stylistic studies. This prior understanding can also contribute to discovering and clarifying the existence of text in unstructured data, including online text. This could help enable more accurate recognition of emotions and delivery of commands on an interactive artificial intelligence platform that currently converts voice into natural language. In the face of increasing attempts to analyze online texts, including New Media, in many ways and discover social phenomena and managerial values, it is expected to contribute to more meaningful online text analysis and semantic interpretation through the links to these studies. However, the fact that the analysis data used in this study are two or four books by author can be considered as a limitation in that the data analysis was not attempted in sufficient quantities. The application of the writing characteristics applied to the Korean text even though it was an English text also could be limitation. The more diverse stylistic characteristics were limited to six, and the less likely interpretation was also considered as a limitation. In addition, it is also regrettable that the research was conducted by analyzing classical mystery novels rather than text that is commonly used today, and that various classical mystery novel writers were not compared. Subsequent research will attempt to increase the diversity of interpretations by taking into account a wider variety of grammatical systems and stylistic structures and will also be applied to the current frequently used online text analysis to assess the potential for interpretation. It is expected that this will enable the interpretation and definition of the specific structure of the style and that various usability can be considered.

Analyzing Different Contexts for Energy Terms through Text Mining of Online Science News Articles (온라인 과학 기사 텍스트 마이닝을 통해 분석한 에너지 용어 사용의 맥락)

  • Oh, Chi Yeong;Kang, Nam-Hwa
    • Journal of Science Education
    • /
    • v.45 no.3
    • /
    • pp.292-303
    • /
    • 2021
  • This study identifies the terms frequently used together with energy in online science news articles and topics of the news reports to find out how the term energy is used in everyday life and to draw implications for science curriculum and instruction about energy. A total of 2,171 online news articles in science category published by 11 major newspaper companies in Korea for one year from March 1, 2018 were selected by using energy as a search term. As a result of natural language processing, a total of 51,224 sentences consisting of 507,901 words were compiled for analysis. Using the R program, term frequency analysis, semantic network analysis, and structural topic modeling were performed. The results show that the terms with exceptionally high frequencies were technology, research, and development, which reflected the characteristics of news articles that report new findings. On the other hand, terms used more than once per two articles were industry-related terms (industry, product, system, production, market) and terms that were sufficiently expected as energy-related terms such as 'electricity' and 'environment.' Meanwhile, 'sun', 'heat', 'temperature', and 'power generation', which are frequently used in energy-related science classes, also appeared as terms belonging to the highest frequency. From a network analysis, two clusters were found including terms related to industry and technology and terms related to basic science and research. From the analysis of terms paired with energy, it was also found that terms related to the use of energy such as 'energy efficiency,' 'energy saving,' and 'energy consumption' were the most frequently used. Out of 16 topics found, four contexts of energy were drawn including 'high-tech industry,' 'industry,' 'basic science,' and 'environment and health.' The results suggest that the introduction of the concept of energy degradation as a starting point for energy classes can be effective. It also shows the need to introduce high-tech industries or the context of environment and health into energy learning.

A Study on Deep Learning Methodology for Bigdata Mining from Smart Farm using Heterogeneous Computing (스마트팜 빅데이터 분석을 위한 이기종간 심층학습 기법 연구)

  • Min, Jae-Ki;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.162-162
    • /
    • 2017
  • 구글에서 공개한 Tensorflow를 이용한 여러 학문 분야의 연구가 활발하다. 농업 시설환경을 대상으로 한 빅데이터의 축적이 증가함과 아울러 실효적인 정보 획득을 위한 각종 데이터 분석 및 마이닝 기법에 대한 연구 또한 활발한 상황이다. 한편, 타 분야의 성공적인 심층학습기법 응용사례에 비하여 농업 분야에서의 응용은 초기 성장 단계라 할 수 있다. 이는 농업 현장에서 취득한 정보의 난해성 및 완성도 높은 생육/환경 모델링 정보의 부재로 실효적인 전과정 처리 기술 도출에 소요되는 시간, 비용, 연구 환경이 상대적으로 부족하기 때문일 것이다. 특히, 센서 기반 데이터 취득 기술 증가에 따라 비약적으로 방대해진 수집 데이터를 시간 복잡도가 높은 심층 학습 모델링 연산에 기계적으로 단순 적용할 경우 시간 효율적인 측면에서 성공적인 결과 도출에 애로가 있을 것이다. 매우 높은 시간 복잡도를 해결하기 위하여 제시된 하드웨어 가속 기능의 경우 일부 개발환경에 국한이 되어 있다. 일례로, 구글의 Tensorflow는 오픈소스 기반 병렬 클러스터링 기술인 MPICH를 지원하는 알고리즘을 공개하지 않고 있다. 따라서, 본 연구에서는 심층학습 기법 연구에 있어서, 예상 가능한 다양한 자원을 활용하여 최대한 연산의 결과를 빨리 도출할 수 있는 하드웨어적인 접근 방법을 모색하였다. 호스트에서 수행하는 일방적인 학습 알고리즘과 달리 이기종간 심층 학습이 가능하기 위해선 우선, NFS(Network File System)를 이용하여 데이터 계층이 상호 연결이 되어야 한다. 이를 위해서 고속 네트워크를 기반으로 한 NFS의 이용이 필수적이다. 둘째로 제한된 자원의 한계를 극복하기 위한 메모 공유 라이브러리가 필요하다. 셋째로 이기종간 프로세서에 최적화된 병렬 처리용 컴파일러를 이용해야 한다. 가장 중요한 부분은 이기종간의 처리 능력에 따른 작업을 고르게 분배할 수 있는 작업 스케쥴링이 수행되어야 하며, 이는 처리하고자 하는 데이터의 형태에 따라 매우 가변적이므로 해당 데이터 도메인에 대한 엄밀한 사전 벤치마킹이 수행되어야 한다. 이러한 요구조건을 대부분 충족하는 Open-CL ver1.2(https://www.khronos.org/opencl/)를 이용하였다. 최신의 Open-CL 버전은 2.2이나 본 연구를 위하여 준비한 4가지 이기종 시스템에서 모두 공통적으로 지원하는 버전은 1.2이다. 실험적으로 선정된 4가지 이기종 시스템은 1) Windows 10 Pro, 2) Linux-Ubuntu 16.04.4 LTS-x86_64, 3) MAC OS X 10.11 4) Linux-Ubuntu 16.04.4 LTS-ARM Cortext-A15 이다. 비교 분석을 위하여 NVIDIA 사에서 제공하는 Pascal Titan X 2식을 SLI로 구성한 시스템을 준비하였다. 개별 시스템에서 별도로 컴파일 된 바이너리의 이름을 통일하고, 개별 시스템의 코어수를 동일하게 균등 배분하여 100 Hz의 데이터로 입력이 되는 온도 정보와 조도 정보를 입력으로 하고 이를 습도정보에 Linear Gradient Descent Optimizer를 이용하여 Epoch 10,000회의 학습을 수행하였다. 4종의 이기종에서 총 32개의 코어를 이용한 학습에서 17초 내외로 연산 수행을 마쳤으나, 비교 시스템에서는 11초 내외로 연산을 마치는 결과가 나왔다. 기보유 하드웨어의 적절한 활용이 가능한 심층학습 기법에 대한 연구를 지속할 것이다

  • PDF