• Title/Summary/Keyword: 검색 기능

Search Result 1,946, Processing Time 0.024 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

Directions of Implementing Documentation Strategies for Local Regions (지역 기록화를 위한 도큐멘테이션 전략의 적용)

  • Seol, Moon-Won
    • The Korean Journal of Archival Studies
    • /
    • no.26
    • /
    • pp.103-149
    • /
    • 2010
  • Documentation strategy has been experimented in various subject areas and local regions since late 1980's when it was proposed as archival appraisal and selection methods by archival communities in the United States. Though it was criticized to be too ideal, it needs to shed new light on the potentialities of the strategy for documenting local regions in digital environment. The purpose of this study is to analyse the implementation issues of documentation strategy and to suggest the directions for documenting local regions of Korea through the application of the strategy. The documentation strategy which was developed more than twenty years ago in mostly western countries gives us some implications for documenting local regions even in current digital environments. They are as follows; Firstly, documentation strategy can enhance the value of archivists as well as archives in local regions because archivist should be active shaper of history rather than passive receiver of archives according to the strategy. It can also be a solution for overcoming poor conditions of local archives management in Korea. Secondly, the strategy can encourage cooperation between collecting institutions including museums, libraries, archives, cultural centers, history institutions, etc. in each local region. In the networked environment the cooperation can be achieved more effectively than in traditional environment where the heavy workload of cooperative institutions is needed. Thirdly, the strategy can facilitate solidarity of various groups in local region. According to the analysis of the strategy projects, it is essential to collect their knowledge, passion, and enthusiasm of related groups to effectively implement the strategy. It can also provide a methodology for minor groups of society to document their memories. This study suggests the directions of documenting local regions in consideration of current archival infrastructure of Korean as follows; Firstly, very selective and intensive documentation should be pursued rather than comprehensive one for documenting local regions. Though it is a very political problem to decide what subject has priority for documentation, interests of local community members as well as professional groups should be considered in the decision-making process seriously. Secondly, it is effective to plan integrated representation of local history in the distributed custody of local archives. It would be desirable to implement archival gateway for integrated search and representation of local archives regardless of the location of archives. Thirdly, it is necessary to try digital documentation using Web 2.0 technologies. Documentation strategy as the methodology of selecting and acquiring archives can not avoid subjectivity and prejudices of appraiser completely. To mitigate the problems, open documentation system should be prepared for reflecting different interests of different groups. Fourth, it is desirable to apply a conspectus model used in cooperative collection management of libraries to document local regions digitally. Conspectus can show existing documentation strength and future documentation intensity for each participating institution. Using this, documentation level of each subject area can be set up cooperatively and effectively in the local regions.

A Study on the Development Trend of Artificial Intelligence Using Text Mining Technique: Focused on Open Source Software Projects on Github (텍스트 마이닝 기법을 활용한 인공지능 기술개발 동향 분석 연구: 깃허브 상의 오픈 소스 소프트웨어 프로젝트를 대상으로)

  • Chong, JiSeon;Kim, Dongsung;Lee, Hong Joo;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.1-19
    • /
    • 2019
  • Artificial intelligence (AI) is one of the main driving forces leading the Fourth Industrial Revolution. The technologies associated with AI have already shown superior abilities that are equal to or better than people in many fields including image and speech recognition. Particularly, many efforts have been actively given to identify the current technology trends and analyze development directions of it, because AI technologies can be utilized in a wide range of fields including medical, financial, manufacturing, service, and education fields. Major platforms that can develop complex AI algorithms for learning, reasoning, and recognition have been open to the public as open source projects. As a result, technologies and services that utilize them have increased rapidly. It has been confirmed as one of the major reasons for the fast development of AI technologies. Additionally, the spread of the technology is greatly in debt to open source software, developed by major global companies, supporting natural language recognition, speech recognition, and image recognition. Therefore, this study aimed to identify the practical trend of AI technology development by analyzing OSS projects associated with AI, which have been developed by the online collaboration of many parties. This study searched and collected a list of major projects related to AI, which were generated from 2000 to July 2018 on Github. This study confirmed the development trends of major technologies in detail by applying text mining technique targeting topic information, which indicates the characteristics of the collected projects and technical fields. The results of the analysis showed that the number of software development projects by year was less than 100 projects per year until 2013. However, it increased to 229 projects in 2014 and 597 projects in 2015. Particularly, the number of open source projects related to AI increased rapidly in 2016 (2,559 OSS projects). It was confirmed that the number of projects initiated in 2017 was 14,213, which is almost four-folds of the number of total projects generated from 2009 to 2016 (3,555 projects). The number of projects initiated from Jan to Jul 2018 was 8,737. The development trend of AI-related technologies was evaluated by dividing the study period into three phases. The appearance frequency of topics indicate the technology trends of AI-related OSS projects. The results showed that the natural language processing technology has continued to be at the top in all years. It implied that OSS had been developed continuously. Until 2015, Python, C ++, and Java, programming languages, were listed as the top ten frequently appeared topics. However, after 2016, programming languages other than Python disappeared from the top ten topics. Instead of them, platforms supporting the development of AI algorithms, such as TensorFlow and Keras, are showing high appearance frequency. Additionally, reinforcement learning algorithms and convolutional neural networks, which have been used in various fields, were frequently appeared topics. The results of topic network analysis showed that the most important topics of degree centrality were similar to those of appearance frequency. The main difference was that visualization and medical imaging topics were found at the top of the list, although they were not in the top of the list from 2009 to 2012. The results indicated that OSS was developed in the medical field in order to utilize the AI technology. Moreover, although the computer vision was in the top 10 of the appearance frequency list from 2013 to 2015, they were not in the top 10 of the degree centrality. The topics at the top of the degree centrality list were similar to those at the top of the appearance frequency list. It was found that the ranks of the composite neural network and reinforcement learning were changed slightly. The trend of technology development was examined using the appearance frequency of topics and degree centrality. The results showed that machine learning revealed the highest frequency and the highest degree centrality in all years. Moreover, it is noteworthy that, although the deep learning topic showed a low frequency and a low degree centrality between 2009 and 2012, their ranks abruptly increased between 2013 and 2015. It was confirmed that in recent years both technologies had high appearance frequency and degree centrality. TensorFlow first appeared during the phase of 2013-2015, and the appearance frequency and degree centrality of it soared between 2016 and 2018 to be at the top of the lists after deep learning, python. Computer vision and reinforcement learning did not show an abrupt increase or decrease, and they had relatively low appearance frequency and degree centrality compared with the above-mentioned topics. Based on these analysis results, it is possible to identify the fields in which AI technologies are actively developed. The results of this study can be used as a baseline dataset for more empirical analysis on future technology trends that can be converged.

A Study on Intelligent Value Chain Network System based on Firms' Information (기업정보 기반 지능형 밸류체인 네트워크 시스템에 관한 연구)

  • Sung, Tae-Eung;Kim, Kang-Hoe;Moon, Young-Su;Lee, Ho-Shin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.67-88
    • /
    • 2018
  • Until recently, as we recognize the significance of sustainable growth and competitiveness of small-and-medium sized enterprises (SMEs), governmental support for tangible resources such as R&D, manpower, funds, etc. has been mainly provided. However, it is also true that the inefficiency of support systems such as underestimated or redundant support has been raised because there exist conflicting policies in terms of appropriateness, effectiveness and efficiency of business support. From the perspective of the government or a company, we believe that due to limited resources of SMEs technology development and capacity enhancement through collaboration with external sources is the basis for creating competitive advantage for companies, and also emphasize value creation activities for it. This is why value chain network analysis is necessary in order to analyze inter-company deal relationships from a series of value chains and visualize results through establishing knowledge ecosystems at the corporate level. There exist Technology Opportunity Discovery (TOD) system that provides information on relevant products or technology status of companies with patents through retrievals over patent, product, or company name, CRETOP and KISLINE which both allow to view company (financial) information and credit information, but there exists no online system that provides a list of similar (competitive) companies based on the analysis of value chain network or information on potential clients or demanders that can have business deals in future. Therefore, we focus on the "Value Chain Network System (VCNS)", a support partner for planning the corporate business strategy developed and managed by KISTI, and investigate the types of embedded network-based analysis modules, databases (D/Bs) to support them, and how to utilize the system efficiently. Further we explore the function of network visualization in intelligent value chain analysis system which becomes the core information to understand industrial structure ystem and to develop a company's new product development. In order for a company to have the competitive superiority over other companies, it is necessary to identify who are the competitors with patents or products currently being produced, and searching for similar companies or competitors by each type of industry is the key to securing competitiveness in the commercialization of the target company. In addition, transaction information, which becomes business activity between companies, plays an important role in providing information regarding potential customers when both parties enter similar fields together. Identifying a competitor at the enterprise or industry level by using a network map based on such inter-company sales information can be implemented as a core module of value chain analysis. The Value Chain Network System (VCNS) combines the concepts of value chain and industrial structure analysis with corporate information simply collected to date, so that it can grasp not only the market competition situation of individual companies but also the value chain relationship of a specific industry. Especially, it can be useful as an information analysis tool at the corporate level such as identification of industry structure, identification of competitor trends, analysis of competitors, locating suppliers (sellers) and demanders (buyers), industry trends by item, finding promising items, finding new entrants, finding core companies and items by value chain, and recognizing the patents with corresponding companies, etc. In addition, based on the objectivity and reliability of the analysis results from transaction deals information and financial data, it is expected that value chain network system will be utilized for various purposes such as information support for business evaluation, R&D decision support and mid-term or short-term demand forecasting, in particular to more than 15,000 member companies in Korea, employees in R&D service sectors government-funded research institutes and public organizations. In order to strengthen business competitiveness of companies, technology, patent and market information have been provided so far mainly by government agencies and private research-and-development service companies. This service has been presented in frames of patent analysis (mainly for rating, quantitative analysis) or market analysis (for market prediction and demand forecasting based on market reports). However, there was a limitation to solving the lack of information, which is one of the difficulties that firms in Korea often face in the stage of commercialization. In particular, it is much more difficult to obtain information about competitors and potential candidates. In this study, the real-time value chain analysis and visualization service module based on the proposed network map and the data in hands is compared with the expected market share, estimated sales volume, contact information (which implies potential suppliers for raw material / parts, and potential demanders for complete products / modules). In future research, we intend to carry out the in-depth research for further investigating the indices of competitive factors through participation of research subjects and newly developing competitive indices for competitors or substitute items, and to additively promoting with data mining techniques and algorithms for improving the performance of VCNS.

Analysis of Research Trends in Journal of Distribution Science (유통과학연구의 연구 동향 분석 : 창간호부터 제8권 제3호까지를 중심으로)

  • Kim, Young-Min;Kim, Young-Ei;Youn, Myoung-Kil
    • Journal of Distribution Science
    • /
    • v.8 no.4
    • /
    • pp.5-15
    • /
    • 2010
  • This study investigated research trends of JDS that KODISA published and gave implications to elevate quality of scholarly journals. In other words, the study classified scientific system of distribution area to investigate research trends and to compare it with other scholarly journals of distribution and to give implications for higher level of JDS. KODISA published JDS Vol.1 No.1 for the first time in 1999 followed by Vol.8 No.3 in September 2010 to show 109 theses in total. KODISA investigated subjects, research institutions, number of participants, methodology, frequency of theses in both the Korean language and English, frequency of participation of not only the Koreans but also foreigners and use of references, etc. And, the study investigated JDR of KODIA, JKDM(The Journal of Korean Distribution & Management) and JDA that researched distribution, so that it found out development ways. To investigate research trends of JDS that KODISA publishes, main category was made based on the national science and technology standard classification system of MEST (Ministry Of Education, Science And Technology), table of classification of research areas of NRF(National Research Foundation of Korea), research classification system of both KOREADIMA and KLRA(Korea Logistics Research Association) and distribution science and others that KODISA is looking for, and distribution economy area was divided into general distribution, distribution economy, distribution, distribution information and others, and distribution management was divided into distribution management, marketing, MD and purchasing, consumer behavior and others. The findings were as follow: Firstly, main category occupied 47 theses (43.1%) of distribution economy and 62 theses (56.9%) of distribution management among 109 theses in total. Active research area of distribution economy consisted of 14 theses (12.8%) of distribution information and 9 theses (8.3%) of distribution economy to research distribution as well as distribution information positively every year. The distribution management consisted of 25 theses (22.9%) of distribution management and 20 theses (18.3%) of marketing, These days, research on distribution management, marketing, distribution, distribution information and others is increasing. Secondly, researchers published theses as follow: 55 theses (50.5%) by professor by himself or herself, 12 theses (11.0%) of joint research by professors and businesses, Professors/students published 9 theses (8.3%) followed by 5 theses (4.6%) of researchers, 5 theses (4.6%) of businesses, 4 theses (3.7%) of professors, researchers and businesses and 2 theses (1.8%) of students. Professors published theses less, while businesses, research institutions and graduate school students did more continuously. The number of researchers occupied single researcher (43 theses, 39.5%), two researchers (42 theses, 38.5%) and three researchers or more (24 theses, 22.0%). Thirdly, professors published theses the most at most of areas. Researchers of main category of distribution economy consisted of professors (25 theses, 53.2%), professors and businesses (7 theses, 14.9%), professors and businesses (7 theses, 14.9%), professors and researchers (6 theses, 12.8%) and professors and students (3 theses, 6.3%). And, researchers of main category of distribution management consisted of professors (30 theses, 48.4%), professors and businesses (10 theses, 16.1%), and professors and researchers as well as professors and students (6 theses, 9.7%). Researchers of distribution management consisted of professors, professors and businesses, professors and researchers, researchers and businesses, etc to have various types. Professors mainly researched marketing, MD and purchasing, and consumer behavior, etc to demand active participation of businesses and researchers. Fourthly, research methodology was: Literature research occupied 45 theses (41.3%) the most followed by empirical research based on questionnaire survey (44 theses, 40.4%). General distribution, distribution economy, distribution and distribution management, etc mostly adopted literature research, while marketing did empirical research based on questionnaire survey the most. Fifthly, theses in the Korean language occupied 92.7% (101 theses), while those in English did 7.3% (8 theses). No more than one thesis in English was published until 2006, and 7 theses (11.9%) were published after 2007 to increase. The theses in English were published more to be affirmative. Foreigner researcher published one thesis (0.9%) and both Korean researchers and foreigner researchers jointly published two theses (1.8%) to have very much low participation of foreigner researchers. Sixthly, one thesis of JDS had 27.5 references in average that consisted of 11.1 local references and 16.4 foreign references. And, cited times was 0.4 thesis in average to be low. The distribution economy cited 24.2 references in average (9.4 local references and 14.8 foreign references and JDS had 0.6 cited reference. The distribution management had 30.0 references in average (12.1 local references and 17.9 foreign references) and had 0.3 reference of JDS itself. Seventhly, similar type of scholarly journal had theses in the Korean language and English: JDR( Journal of Distribution Research) of KODIA(Korea Distribution Association) published 92 theses in the Korean language (96.8%) and 3 theses in English (3.2%), that is to say, 95 theses in total. JKDM of KOREADIMA published 132 theses in total that consisted of 93 theses in the Korean language (70.5%) and 39 theses in English (29.5%). Since 2008, JKDM has published scholarly journal in English one time every year. JDS published 52 theses in the Korean language (88.1%) and 7 theses in English (11.9%), that is to say, 59 theses in total. Sixthly, similar type of scholarly journals and research methodology were: JDR's research methodology had 65 empirical researches based on questionnaire survey (68.4%), followed by 17 literature researches (17.9%) and 11 quantitative analyses (11.6%). JKDM made use of various kinds of research methodologies to have 60 questionnaire surveys (45.5%), followed by 40 literature researches (30.3%), 21 quantitative analyses (15.9%), 6 system analyses (4.5%) and 5 case studies (3.8%). And, JDS made use of 30 questionnaire surveys (50.8%), followed by 15 literature researches (25.4%), 7 case studies (11.9%) and 6 quantitative analyses (10.2%). Ninthly, similar types of scholarly journals and Korean researchers and foreigner researchers were: JDR published 93 theses (97.8%) by Korean researchers except for 1 thesis by foreigner researcher and 1 thesis by joint research of the Korean researchers and foreigner researchers. And, JKDM had no foreigner research and 13 theses (9.8%) by joint research of the Korean researchers and foreigner researchers to have more foreigner researchers as well as researchers in foreign countries than similar types of scholarly journals had. And, JDS published 56 theses (94.9%) of the Korean researchers, one thesis (1.7%) of foreigner researcher only, and 2 theses (3.4%) of joint research of both the Koreans and foreigners. Tenthly, similar type of scholarly journals and reference had citation: JDR had 42.5 literatures in average that consisted of 10.9 local literatures (25.7%) and 31.6 foreign literatures (74.3%), and cited times accounted for 1.1 thesis to decrease. JKDM cited 10.5 Korean literatures (36.3%) and 18.4 foreign literatures (63.7%), and number of self-cited literature was no more than 1.1. Number of cited times accounted for 2.9 literatures in 2008 and then decreased continuously since then. JDS cited 26,8 references in average that consisted of 10.9 local references (40.7%) and 15.9 foreign references (59.3%), and number of self-cited accounted for 0.2 reference until 2009, and it increased to be 2.1 references in 2010. The author gives implications based on JDS research trends and investigation on similar type of scholarly journals as follow: Firstly, JDS shall actively invite foreign contributors to prepare for SSCI. Secondly, ratio of theses in English shall increase greatly. Thirdly, various kinds of research methodology shall be accepted to elevate quality of scholarly journals. Fourthly, to increase cited times, Google and other web retrievals shall be reinforced to supply scholarly journals to foreign countries more. Local scholarly journals can be worldwide scholarly journal enough to be acknowledged even in foreign countries by improving the implications above.

  • PDF