A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)
-
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.1-21
- /
- 2020
With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.
The purpose of this thesis is to derive a unit hydrograph which may be applied to the ungaged watershed area from the relations between directly measurable unitgraph properties such as peak discharge(qp), time to peak discharge (Tp), and lag time (Lg) and watershed characteristics such as river length(L) from the given station to the upstream limits of the watershed area in km, river length from station to centroid of gravity of the watershed area in km (Lca), and main stream slope in meter per km (S). Other procedure based on routing a time-area diagram through catchment storage named Instantaneous Unit Hydrograph(IUH). Dimensionless unitgraph also analysed in brief. The basic data (1969 to 1973) used in these studies are 9 recording level gages and rating curves, 41 rain gages and pluviographs, and 40 observed unitgraphs through the 9 sub watersheds in Nak Oong River basin. The results summarized in these studies are as follows; 1. Time in hour from start of rise to peak rate (Tp) generally occured at the position of 0.3Tb (time base of hydrograph) with some indication of higher values for larger watershed. The base flow is comparelatively higher than the other small watershed area. 2. Te losses from rainfall were divided into initial loss and continuing loss. Initial loss may be defined as that portion of storm rainfall which is intercepted by vegetation, held in deppression storage or infiltrated at a high rate early in the storm and continuing loss is defined as the loss which continues at a constant rate throughout the duration of the storm after the initial loss has been satisfied. Tis continuing loss approximates the nearly constant rate of infiltration (
This paper describes a collaborative project between academia and industry which focused on improving the marketing and product development strategies for two private label apparel brands of a large regional department store chain in the southeastern United States. The goal of the project was to revitalize product lines of the two brands by incorporating student ideas for new solutions, thereby giving the students practical experience with a real-life industry situation. There were a number of key players involved in the project. A privately-owned department store chain based in the southeastern United States which was seeking an academic partner had recognized a need to update two existing private label brands. They targeted middle-aged consumers looking for casual, moderately priced merchandise. The company was seeking to change direction with both packaging and presentation, and possibly product design. The branding and product development divisions of the company contacted professors in an academic department of a large southeastern state university. Two of the professors agreed that the task would be a good fit for their classes - one was a junior-level Intermediate Brand Management class; the other was a senior-level Fashion Product Development class. The professors felt that by working collaboratively on the project, students would be exposed to a real world scenario, within the security of an academic learning environment. Collaboration within an interdisciplinary team has the advantage of providing experiences and resources beyond the capabilities of a single student and adds "brainpower" to problem-solving processes (Lowman 2000). This goal of improving the capabilities of students directed the instructors in each class to form interdisciplinary teams between the Branding and Product Development classes. In addition, many universities are employing industry partnerships in research and teaching, where collaboration within temporal (semester) and physical (classroom/lab) constraints help to increase students' knowledge and experience of a real-world situation. At the University of Tennessee, the Center of Industrial Services and UT-Knoxville's College of Engineering worked with a company to develop design improvements in its U.S. operations. In this study, Because should be lower case b with a private label retail brand, Wickett, Gaskill and Damhorst's (1999) revised Retail Apparel Product Development Model was used by the product development and brand management teams. This framework was chosen because it addresses apparel product development from the concept to the retail stage. Two classes were involved in this project: a junior level Brand Management class and a senior level Fashion Product Development class. Seven teams were formed which included four students from Brand Management and two students from Product Development. The classes were taught the same semester, but not at the same time. At the beginning of the semester, each class was introduced to the industry partner and given the problem. Half the teams were assigned to the men's brand and half to the women's brand. The teams were responsible for devising approaches to the problem, formulating a timeline for their work, staying in touch with industry representatives and making sure that each member of the team contributed in a positive way. The objective for the teams was to plan, develop, and present a product line using merchandising processes (following the Wickett, Gaskill and Damhorst model) and develop new branding strategies for the proposed lines. The teams performed trend, color, fabrication and target market research; developed sketches for a line; edited the sketches and presented their line plans; wrote specifications; fitted prototypes on fit models, and developed final production samples for presentation to industry. The branding students developed a SWOT analysis, a Brand Measurement report, a mind-map for the brands and a fully integrated Marketing Report which was presented alongside the ideas for the new lines. In future if the opportunity arises to work in this collaborative way with an existing company who wishes to look both at branding and product development strategies, classes will be scheduled at the same time so that students have more time to meet and discuss timelines and assigned tasks. As it was, student groups had to meet outside of each class time and this proved to be a challenging though not uncommon part of teamwork (Pfaff and Huddleston, 2003). Although the logistics of this exercise were time-consuming to set up and administer, professors felt that the benefits to students were multiple. The most important benefit, according to student feedback from both classes, was the opportunity to work with industry professionals, follow their process, and see the results of their work evaluated by the people who made the decisions at the company level. Faculty members were grateful to have a "real-world" case to work with in the classroom to provide focus. Creative ideas and strategies were traded as plans were made, extending and strengthening the departmental links be tween the branding and product development areas. By working not only with students coming from a different knowledge base, but also having to keep in contact with the industry partner and follow the framework and timeline of industry practice, student teams were challenged to produce excellent and innovative work under new circumstances. Working on the product development and branding for "real-life" brands that are struggling gave students an opportunity to see how closely their coursework ties in with the real-world and how creativity, collaboration and flexibility are necessary components of both the design and business aspects of company operations. Industry personnel were impressed by (a) the level and depth of knowledge and execution in the student projects, and (b) the creativity of new ideas for the brands.
Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.
Thecodiplosis japonesis is sweeping the Pinus densiflora forests from south-west to north-east direction, destroying almost all the aged large trees as well as even the young ones. The front line of infestation is moving slowly but ceaselessly norhwards as a long bottle front. Estimation is that more than 40 percent of the area of P. densiflora forest has been damaged already, however some individuals could escapes from the damage and contribute to restore the site to the previous vegetation composition. When the stands were attacked by this insect, the drastic openings of the upper story of tree canopy formed by exclusively P. densiflora are usually resulted and some environmental factors such as light, temperature, litter accumulation, soil moisture and offers were naturally modified. With these changes after insect invasion, as the time passes, phytosociologic changes of the vegetation are gradually proceeding. If we select the forest according to four categories concerning the history of the insect outbreak, namely, non-attacked (healthy forest), recently damaged (the outbreak occured about 1-2 years ago), severely damaged (occured 5-6 years ago), damage prolonged (occured 10 years ago) and restored (occured about 20 years ago), any directional changes of vegetation composition could be traced these in line with four progressive stages. To elucidate these changes, three survey districts; (1) "Gongju" where the damage was severe and it was outbroken in 1977, (2) "Buyeo" where damage prolonged and (3) "Gochang" as restored, were set, (See Tab. 1). All these were located in the south temperate forest zone which was delimited mainly due to the temporature factor and generally accepted without any opposition at present. In view of temperature, the amount and distribution of precipitation and various soil factor, the overall homogeneity of environmental conditions between survey districts might be accepted. However this did not mean that small changes of edaphic and topographic conditions and microclimates can induce any alteration of vegetation patterns. Again four survey plots were set in each district and inter plot distance was 3 to 4 km. And again four subplots were set within a survey plot. The size of a subplot was