• Title/Summary/Keyword: 문헌 클러스터링

Search Result 54, Processing Time 0.019 seconds

Extraction of Author Identification Elements of Overseas Academic Papers on Authority Data System for Science and Technology (과학기술 전거데이터 시스템에서의 해외 학술논문 저자 식별요소 추출)

  • Choi, Hyunmi;Lee, Seokhyoung;Kim, Kwangyoung;Kim, Hwanmin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.05a
    • /
    • pp.711-713
    • /
    • 2013
  • Various human resource information of the world can be found according to spread of social network such as facebook and twitter. There are an amounts of researcher information on the science and technology area but it is difficult to find a suitable researcher for research or business such as research partner, because researcher information is not systematically arranged. To solver this problem, we are constructing authority data system for science and technology based on authority information of overseas academic papers. In this paper, in order to construct the authority data, we extracts author identification elements from millions of overseas academic papers, which are published from 1994 to 2012. There are more than 50 author identification elements such as author name, affiliation, paper title, publisher, year, keywords, co-author, co-author's affiliation in Korean, English, Chinese, and Japanese. We construct the element database by extracting and storing an author identification information based on the elements from overseas academic papers. Future works includes that the authority database for overseas academic papers is constructed by storing an academic activities of researchers after author clustering with these extracted elements. The authority data is used to improve the researcher information utilization and activate community to find a suitable research partner or a business examiner.

  • PDF

Measuring the Goodness of Fit of Link Reduction Algorithms for Mapping Intellectual Structures in Bibliometric Analysis (계량서지적 분석에서 지적구조 매핑을 위한 링크 삭감 알고리즘의 적합도 측정)

  • Lee, Jae Yun
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.2
    • /
    • pp.233-254
    • /
    • 2022
  • Link reduction algorithms such as pathfinder network are the widely used methods to overcome problems with the visualization of weighted networks for knowledge domain analysis. This study proposed NetRSQ, an indicator to measure the goodness of fit of a link reduction algorithm for the network visualization. NetRSQ is developed to calculate the fitness of a network based on the rank correlation between the path length and the degree of association between entities. The validity of NetRSQ was investigated with data from previous research which qualitatively evaluated several network generation algorithms. As the primary test result, the higher degree of NetRSQ appeared in the network with better intellectual structures in the quality evaluation of networks built by various methods. The performance of 4 link reduction algorithms was tested in 40 datasets from various domains and compared with NetRSQ. The test shows that there is no specific link reduction algorithm that performs better over others in all cases. Therefore, the NetRSQ can be a useful tool as a basis of reliability to select the most fitting algorithm for the network visualization of intellectual structures.

Analysis of Global Entrepreneurship Trends Due to COVID-19: Focusing on Crunchbase (Covid-19에 따른 글로벌 창업 트렌드 분석: Crunchbase를 중심으로)

  • Shinho Kim;Youngjung Geum
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.18 no.3
    • /
    • pp.141-156
    • /
    • 2023
  • Due to the unprecedented worldwide pandemic of the new Covid-19 infection, business trends of companies have changed significantly. Therefore, it is strongly required to monitor the rapid changes of innovation trends to design and plan future businesses. Since the pandemic, many studies have attempted to analyze business changes, but they are limited to specific industries and are insufficient in terms of data objectivity. In response, this study aims to analyze business trends after Covid-19 using Crunchbase, a global startup data. The data is collected and preprocessed every two years from 2018 to 2021 to compare the business trends. To capture the major trends, a network analysis is conducted for the industry groups and industry information based on the co-occurrence. To analyze the minor trends, LDA-based topic modelling and word2vec-based clustering is used. As a result, e-commerce, education, delivery, game and entertainment industries are promising based on their technological advances, showing extension and diversification of industry boundaries as well as digitalization and servitization of business contents. This study is expected to help venture capitalists and entrepreneurs to understand the rapid changes under the impact of Covid-19 and to make right decisions for the future.

  • PDF

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.