• Title/Summary/Keyword: Cluster based network

Search Result 846, Processing Time 0.022 seconds

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

An Analysis of Big Video Data with Cloud Computing in Ubiquitous City (클라우드 컴퓨팅을 이용한 유시티 비디오 빅데이터 분석)

  • Lee, Hak Geon;Yun, Chang Ho;Park, Jong Won;Lee, Yong Woo
    • Journal of Internet Computing and Services
    • /
    • v.15 no.3
    • /
    • pp.45-52
    • /
    • 2014
  • The Ubiquitous-City (U-City) is a smart or intelligent city to satisfy human beings' desire to enjoy IT services with any device, anytime, anywhere. It is a future city model based on Internet of everything or things (IoE or IoT). It includes a lot of video cameras which are networked together. The networked video cameras support a lot of U-City services as one of the main input data together with sensors. They generate huge amount of video information, real big data for the U-City all the time. It is usually required that the U-City manipulates the big data in real-time. And it is not easy at all. Also, many times, it is required that the accumulated video data are analyzed to detect an event or find a figure among them. It requires a lot of computational power and usually takes a lot of time. Currently we can find researches which try to reduce the processing time of the big video data. Cloud computing can be a good solution to address this matter. There are many cloud computing methodologies which can be used to address the matter. MapReduce is an interesting and attractive methodology for it. It has many advantages and is getting popularity in many areas. Video cameras evolve day by day so that the resolution improves sharply. It leads to the exponential growth of the produced data by the networked video cameras. We are coping with real big data when we have to deal with video image data which are produced by the good quality video cameras. A video surveillance system was not useful until we find the cloud computing. But it is now being widely spread in U-Cities since we find some useful methodologies. Video data are unstructured data thus it is not easy to find a good research result of analyzing the data with MapReduce. This paper presents an analyzing system for the video surveillance system, which is a cloud-computing based video data management system. It is easy to deploy, flexible and reliable. It consists of the video manager, the video monitors, the storage for the video images, the storage client and streaming IN component. The "video monitor" for the video images consists of "video translater" and "protocol manager". The "storage" contains MapReduce analyzer. All components were designed according to the functional requirement of video surveillance system. The "streaming IN" component receives the video data from the networked video cameras and delivers them to the "storage client". It also manages the bottleneck of the network to smooth the data stream. The "storage client" receives the video data from the "streaming IN" component and stores them to the storage. It also helps other components to access the storage. The "video monitor" component transfers the video data by smoothly streaming and manages the protocol. The "video translator" sub-component enables users to manage the resolution, the codec and the frame rate of the video image. The "protocol" sub-component manages the Real Time Streaming Protocol (RTSP) and Real Time Messaging Protocol (RTMP). We use Hadoop Distributed File System(HDFS) for the storage of cloud computing. Hadoop stores the data in HDFS and provides the platform that can process data with simple MapReduce programming model. We suggest our own methodology to analyze the video images using MapReduce in this paper. That is, the workflow of video analysis is presented and detailed explanation is given in this paper. The performance evaluation was experiment and we found that our proposed system worked well. The performance evaluation results are presented in this paper with analysis. With our cluster system, we used compressed $1920{\times}1080(FHD)$ resolution video data, H.264 codec and HDFS as video storage. We measured the processing time according to the number of frame per mapper. Tracing the optimal splitting size of input data and the processing time according to the number of node, we found the linearity of the system performance.

Data Mining and Construction of Database Concerning Effects of Vitis Genus (산머루 관련 정보수집 및 데이터베이스의 구축)

  • Kim, Min-A;Jo, Yun-Ju;Shin, Jee-Young;Shin, Min-Kyu;Bae, Hyun-Su;Hong, Moo-Chang;Kim, Yang-Seok
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.26 no.4
    • /
    • pp.551-556
    • /
    • 2012
  • The database for the oriental medicine had been existed in documentation in past times and it has been developed to the database type for random accesses in the information society. However, the aspects of the database are not so diversified and the database for the bio herbal material exists in widened type dictionary style. It is a situation that the database which handles the in-depth raw herbal medicines is not sufficient in its quantity and quality. Korean wild grape is a deciduous plant categorized into the Vitaceae and it was found experimentally that it has various medical effects. It is one of the medical materials with higher potentiality of academic study and commercialization recently because it has a bigger possibility to be applied into diverse industrial fields including the medical product for health, food and beauty. We constituted the cooperative system among the Muju cluster business group for Korean mountain wild grapes, Physiology Laboratory in Kyung Hee University Oriental Medicine and Medical Classics Laboratory in Kyung Hee University Oriental Medicine with a view to focusing on such potentiality and a database for Korean wild grapes was made a touchstone for establishing the in-depth database for the single bio medical materials. First of all, the literatures based on the North East Asia in ancient times had been categorized into the classical literature (Korean literature published by government organization, Korean classical literature, Chinese classical literature and classical literature fro Korean and Chinese oriental medicine) and modern literature (Modern literature for oriental medicine, modern literature for domestic and foreign herbal medicine) to cover the eastern and western research records and writings related to Korean wild grapes and the text-mining work has been performed through the cooperation system with the Medical Classics Laboratory in Kyung Hee University Oriental Medicine. First of all, the data for the experiment and theory for Korean wild grape were collected for the Medline database controlled by the Parliament Library of USA to arrange the domestic and foreign theses with topic for Korean wild grapes and the network hyperlink function and down load function were mounted for self-thesis searching function and active view based on the collected data. The thesis searching function provides various auxiliary functions and the searching is available according to the diverse searching/queries such as the name of sub species of Korean wild grape, the logical intersection index for the active ingredients, efficacy and elements. It was constituted for the researchers who design the Korean wild grape study to design of easier experiment. In addition, the data related to the patents for Korean wild grape which were collected from European Patent Office in response to the commercialization possibility and the system available for searching and view was established in the same viewpoint. Perl was used for the query programming and MS-SQL for database establishment and management in the designing of this database. Currently, the data is available for free use and the address is as follows. http://163.180.41.43:8011/index.html

Policy Change and Innovation of Textile Industry in Daegu·Kyungbuk Region (대구·경북지역 섬유산업의 정책변화와 혁신과제)

  • Shin, Jin-Kyo;Kim, Yo-Han
    • Management & Information Systems Review
    • /
    • v.31 no.3
    • /
    • pp.223-248
    • /
    • 2012
  • This study analyses support policy and structural change of textile industry in Daegu Kyungbuk region, and suggests major issues for textile industry's innovation. In Daegu Kyungbuk, it was 1999 that a policy, so called Milano Project, in order to promote a textile industry was devised. In 2004, the Regional Industrial Promotion Plan was devised. The plan was born from a view point of establishing a regional innovation system and of promoting the innovative clusters under a knowledge based economy. After then, the Regional Industry Promotion Project or Regional Strategic Industry Promotion Project became a core of regional textile industrial policy. Research results indicated that the first stage Milano project (1999-2003) showed both positive and negative effects. There were no long-term development plan, clear vision and strategy. But, core industrial infrastructure for differentiated product development, such as New product Development Support Center and Dyeing Design Practical Application Center, was constructed. The second stage Daegu Textile Industry Promotion Plan (2004-2008) displayed a significant technological performance and new product sales with the assistance of Kyungbuk province. Also, textile industry revealed positive fruits such as financial structure, productivity, and profitability as a result of strong restructuring. In industrial structure, there was a important change from clothe textile material to industry textile material. Most of textile companies did not showed high capability in CEO's technology innovation intention, entrepreneurship, R&D and human resource competency in compare with other industry. We suggested that Daegu Kyungbuk has to select and concentrate on the high-tech textile material and living textile for sustainable development and competitiveness. We also proposed a confidence and cooperation based innovation network and company oriented innovation cluster.

  • PDF

Clustering Method based on Genre Interest for Cold-Start Problem in Movie Recommendation (영화 추천 시스템의 초기 사용자 문제를 위한 장르 선호 기반의 클러스터링 기법)

  • You, Tithrottanak;Rosli, Ahmad Nurzid;Ha, Inay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.57-77
    • /
    • 2013
  • Social media has become one of the most popular media in web and mobile application. In 2011, social networks and blogs are still the top destination of online users, according to a study from Nielsen Company. In their studies, nearly 4 in 5active users visit social network and blog. Social Networks and Blogs sites rule Americans' Internet time, accounting to 23 percent of time spent online. Facebook is the main social network that the U.S internet users spend time more than the other social network services such as Yahoo, Google, AOL Media Network, Twitter, Linked In and so on. In recent trend, most of the companies promote their products in the Facebook by creating the "Facebook Page" that refers to specific product. The "Like" option allows user to subscribed and received updates their interested on from the page. The film makers which produce a lot of films around the world also take part to market and promote their films by exploiting the advantages of using the "Facebook Page". In addition, a great number of streaming service providers allows users to subscribe their service to watch and enjoy movies and TV program. They can instantly watch movies and TV program over the internet to PCs, Macs and TVs. Netflix alone as the world's leading subscription service have more than 30 million streaming members in the United States, Latin America, the United Kingdom and the Nordics. As the matter of facts, a million of movies and TV program with different of genres are offered to the subscriber. In contrast, users need spend a lot time to find the right movies which are related to their interest genre. Recent years there are many researchers who have been propose a method to improve prediction the rating or preference that would give the most related items such as books, music or movies to the garget user or the group of users that have the same interest in the particular items. One of the most popular methods to build recommendation system is traditional Collaborative Filtering (CF). The method compute the similarity of the target user and other users, which then are cluster in the same interest on items according which items that users have been rated. The method then predicts other items from the same group of users to recommend to a group of users. Moreover, There are many items that need to study for suggesting to users such as books, music, movies, news, videos and so on. However, in this paper we only focus on movie as item to recommend to users. In addition, there are many challenges for CF task. Firstly, the "sparsity problem"; it occurs when user information preference is not enough. The recommendation accuracies result is lower compared to the neighbor who composed with a large amount of ratings. The second problem is "cold-start problem"; it occurs whenever new users or items are added into the system, which each has norating or a few rating. For instance, no personalized predictions can be made for a new user without any ratings on the record. In this research we propose a clustering method according to the users' genre interest extracted from social network service (SNS) and user's movies rating information system to solve the "cold-start problem." Our proposed method will clusters the target user together with the other users by combining the user genre interest and the rating information. It is important to realize a huge amount of interesting and useful user's information from Facebook Graph, we can extract information from the "Facebook Page" which "Like" by them. Moreover, we use the Internet Movie Database(IMDb) as the main dataset. The IMDbis online databases that consist of a large amount of information related to movies, TV programs and including actors. This dataset not only used to provide movie information in our Movie Rating Systems, but also as resources to provide movie genre information which extracted from the "Facebook Page". Formerly, the user must login with their Facebook account to login to the Movie Rating System, at the same time our system will collect the genre interest from the "Facebook Page". We conduct many experiments with other methods to see how our method performs and we also compare to the other methods. First, we compared our proposed method in the case of the normal recommendation to see how our system improves the recommendation result. Then we experiment method in case of cold-start problem. Our experiment show that our method is outperform than the other methods. In these two cases of our experimentation, we see that our proposed method produces better result in case both cases.

Analysis of Football Fans' Uniform Consumption: Before and After Son Heung-Min's Transfer to Tottenham Hotspur FC (국내 프로축구 팬들의 유니폼 소비 분석: 손흥민의 토트넘 홋스퍼 FC 이적 전후 비교)

  • Choi, Yeong-Hyeon;Lee, Kyu-Hye
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.91-108
    • /
    • 2020
  • Korea's famous soccer players are steadily performing well in international leagues, which led to higher interests of Korean fans in the international leagues. Reflecting the growing social phenomenon of rising interests on international leagues by Korean fans, the study examined the overall consumer perception in the consumption of uniform by domestic soccer fans and compared the changes in perception following the transfers of the players. Among others, the paper examined the consumer perception and purchase factors of soccer fans shown in social media, focusing on periods before and after the recruitment of Heung-Min Son to English Premier League's Tottenham Football Club. To this end, the EPL uniform is the collection keyword the paper utilized and collected consumer postings from domestic website and social media via Python 3.7, and analyzed them using Ucinet 6, NodeXL 1.0.1, and SPSS 25.0 programs. The results of this study can be summarized as follows. First, the uniform of the club that consistently topped the league, has been gaining attention as a popular uniform, and the players' performance, and the players' position have been identified as key factors in the purchase and search of professional football uniforms. In the case of the club, the actual ranking and whether the league won are shown to be important factors in the purchase and search of professional soccer uniforms. The club's emblem and the sponsor logo that will be attached to the uniform are also factors of interest to consumers. In addition, in the decision making process of purchase of a uniform by professional soccer fan, uniform's form, marking, authenticity, and sponsors are found to be more important than price, design, size, and logo. The official online store has emerged as a major purchasing channel, followed by gifts for friends or requests from acquaintances when someone travels to the United Kingdom. Second, a classification of key control categories through the convergence of iteration correlation analysis and Clauset-Newman-Moore clustering algorithm shows differences in the classification of individual groups, but groups that include the EPL's club and player keywords are identified as the key topics in relation to professional football uniforms. Third, between 2002 and 2006, the central theme for professional football uniforms was World Cup and English Premier League, but from 2012 to 2015, the focus has shifted to more interest of domestic and international players in the English Premier League. The subject has changed to the uniform itself from this time on. In this context, the paper can confirm that the major issues regarding the uniforms of professional soccer players have changed since Ji-Sung Park's transfer to Manchester United, and Sung-Yong Ki, Chung-Yong Lee, and Heung-Min Son's good performances in these leagues. The paper also identified that the uniforms of the clubs to which the players have transferred to are of interest. Fourth, both male and female consumers are showing increasing interest in Son's league, the English Premier League, which Tottenham FC belongs to. In particular, the increasing interest in Son has shown a tendency to increase interest in football uniforms for female consumers. This study presents a variety of researches on sports consumption and has value as a consumer study by identifying unique consumption patterns. It is meaningful in that the accuracy of the interpretation has been enhanced by using a cluster analysis via convergence of iteration correlation analysis and Clauset-Newman-Moore clustering algorithm to identify the main topics. Based on the results of this study, the clubs will be able to maximize its profits and maintain good relationships with fans by identifying key drivers of consumer awareness and purchasing for professional soccer fans and establishing an effective marketing strategy.