• Title/Summary/Keyword: Form-giving

Search Result 275, Processing Time 0.022 seconds

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Derivation of the Synthetic Unit Hydrograph Based on the Watershed Characteristics (유역특성에 의한 합성단위도의 유도에 관한 연구)

  • 서승덕
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.17 no.1
    • /
    • pp.3642-3654
    • /
    • 1975
  • The purpose of this thesis is to derive a unit hydrograph which may be applied to the ungaged watershed area from the relations between directly measurable unitgraph properties such as peak discharge(qp), time to peak discharge (Tp), and lag time (Lg) and watershed characteristics such as river length(L) from the given station to the upstream limits of the watershed area in km, river length from station to centroid of gravity of the watershed area in km (Lca), and main stream slope in meter per km (S). Other procedure based on routing a time-area diagram through catchment storage named Instantaneous Unit Hydrograph(IUH). Dimensionless unitgraph also analysed in brief. The basic data (1969 to 1973) used in these studies are 9 recording level gages and rating curves, 41 rain gages and pluviographs, and 40 observed unitgraphs through the 9 sub watersheds in Nak Oong River basin. The results summarized in these studies are as follows; 1. Time in hour from start of rise to peak rate (Tp) generally occured at the position of 0.3Tb (time base of hydrograph) with some indication of higher values for larger watershed. The base flow is comparelatively higher than the other small watershed area. 2. Te losses from rainfall were divided into initial loss and continuing loss. Initial loss may be defined as that portion of storm rainfall which is intercepted by vegetation, held in deppression storage or infiltrated at a high rate early in the storm and continuing loss is defined as the loss which continues at a constant rate throughout the duration of the storm after the initial loss has been satisfied. Tis continuing loss approximates the nearly constant rate of infiltration (${\Phi}$-index method). The loss rate from this analysis was estimated 50 Per cent to the rainfall excess approximately during the surface runoff occured. 3. Stream slope seems approximate, as is usual, to consider the mainstreamonly, not giving any specific consideration to tributary. It is desirable to develop a single measure of slope that is representative of the who1e stream. The mean slope of channel increment in 1 meter per 200 meters and 1 meter per 1400 meters were defined at Gazang and Jindong respectively. It is considered that the slopes are low slightly in the light of other river studies. Flood concentration rate might slightly be low in the Nak Dong river basin. 4. It found that the watershed lag (Lg, hrs) could be expressed by Lg=0.253 (L.Lca)0.4171 The product L.Lca is a measure of the size and shape of the watershed. For the logarithms, the correlation coefficient for Lg was 0.97 which defined that Lg is closely related with the watershed characteristics, L and Lca. 5. Expression for basin might be expected to take form containing theslope as {{{{ { L}_{g }=0.545 {( { L. { L}_{ca } } over { SQRT {s} } ) }^{0.346 } }}}} For the logarithms, the correlation coefficient for Lg was 0.97 which defined that Lg is closely related with the basin characteristics too. It should be needed to take care of analysis which relating to the mean slopes 6. Peak discharge per unit area of unitgraph for standard duration tr, ㎥/sec/$\textrm{km}^2$, was given by qp=10-0.52-0.0184Lg with a indication of lower values for watershed contrary to the higher lag time. For the logarithms, the correlation coefficient qp was 0.998 which defined high sign ificance. The peak discharge of the unitgraph for an area could therefore be expected to take the from Qp=qp. A(㎥/sec). 7. Using the unitgraph parameter Lg, the base length of the unitgraph, in days, was adopted as {{{{ {T}_{b } =0.73+2.073( { { L}_{g } } over {24 } )}}}} with high significant correlation coefficient, 0.92. The constant of the above equation are fixed by the procedure used to separate base flow from direct runoff. 8. The width W75 of the unitgraph at discharge equal to 75 per cent of the peak discharge, in hours and the width W50 at discharge equal to 50 Per cent of the peak discharge in hours, can be estimated from {{{{ { W}_{75 }= { 1.61} over { { q}_{b } ^{1.05 } } }}}} and {{{{ { W}_{50 }= { 2.5} over { { q}_{b } ^{1.05 } } }}}} respectively. This provides supplementary guide for sketching the unitgraph. 9. Above equations define the three factors necessary to construct the unitgraph for duration tr. For the duration tR, the lag is LgR=Lg+0.2(tR-tr) and this modified lag, LgRis used in qp and Tb It the tr happens to be equal to or close to tR, further assume qpR=qp. 10. Triangular hydrograph is a dimensionless unitgraph prepared from the 40 unitgraphs. The equation is shown as {{{{ { q}_{p } = { K.A.Q} over { { T}_{p } } }}}} or {{{{ { q}_{p } = { 0.21A.Q} over { { T}_{p } } }}}} The constant 0.21 is defined to Nak Dong River basin. 11. The base length of the time-area diagram for the IUH routing is {{{{C=0.9 {( { L. { L}_{ca } } over { SQRT { s} } ) }^{1/3 } }}}}. Correlation coefficient for C was 0.983 which defined a high significance. The base length of the T-AD was set to equal the time from the midpoint of rain fall excess to the point of contraflexure. The constant K, derived in this studies is K=8.32+0.0213 {{{{ { L} over { SQRT { s} } }}}} with correlation coefficient, 0.964. 12. In the light of the results analysed in these studies, average errors in the peak discharge of the Synthetic unitgraph, Triangular unitgraph, and IUH were estimated as 2.2, 7.7 and 6.4 per cent respectively to the peak of observed average unitgraph. Each ordinate of the Synthetic unitgraph was approached closely to the observed one.

  • PDF

Retail Product Development and Brand Management Collaboration between Industry and University Student Teams (산업여대학학생단대지간적령수산품개발화품패관리협작(产业与大学学生团队之间的零售产品开发和品牌管理协作))

  • Carroll, Katherine Emma
    • Journal of Global Scholars of Marketing Science
    • /
    • v.20 no.3
    • /
    • pp.239-248
    • /
    • 2010
  • This paper describes a collaborative project between academia and industry which focused on improving the marketing and product development strategies for two private label apparel brands of a large regional department store chain in the southeastern United States. The goal of the project was to revitalize product lines of the two brands by incorporating student ideas for new solutions, thereby giving the students practical experience with a real-life industry situation. There were a number of key players involved in the project. A privately-owned department store chain based in the southeastern United States which was seeking an academic partner had recognized a need to update two existing private label brands. They targeted middle-aged consumers looking for casual, moderately priced merchandise. The company was seeking to change direction with both packaging and presentation, and possibly product design. The branding and product development divisions of the company contacted professors in an academic department of a large southeastern state university. Two of the professors agreed that the task would be a good fit for their classes - one was a junior-level Intermediate Brand Management class; the other was a senior-level Fashion Product Development class. The professors felt that by working collaboratively on the project, students would be exposed to a real world scenario, within the security of an academic learning environment. Collaboration within an interdisciplinary team has the advantage of providing experiences and resources beyond the capabilities of a single student and adds "brainpower" to problem-solving processes (Lowman 2000). This goal of improving the capabilities of students directed the instructors in each class to form interdisciplinary teams between the Branding and Product Development classes. In addition, many universities are employing industry partnerships in research and teaching, where collaboration within temporal (semester) and physical (classroom/lab) constraints help to increase students' knowledge and experience of a real-world situation. At the University of Tennessee, the Center of Industrial Services and UT-Knoxville's College of Engineering worked with a company to develop design improvements in its U.S. operations. In this study, Because should be lower case b with a private label retail brand, Wickett, Gaskill and Damhorst's (1999) revised Retail Apparel Product Development Model was used by the product development and brand management teams. This framework was chosen because it addresses apparel product development from the concept to the retail stage. Two classes were involved in this project: a junior level Brand Management class and a senior level Fashion Product Development class. Seven teams were formed which included four students from Brand Management and two students from Product Development. The classes were taught the same semester, but not at the same time. At the beginning of the semester, each class was introduced to the industry partner and given the problem. Half the teams were assigned to the men's brand and half to the women's brand. The teams were responsible for devising approaches to the problem, formulating a timeline for their work, staying in touch with industry representatives and making sure that each member of the team contributed in a positive way. The objective for the teams was to plan, develop, and present a product line using merchandising processes (following the Wickett, Gaskill and Damhorst model) and develop new branding strategies for the proposed lines. The teams performed trend, color, fabrication and target market research; developed sketches for a line; edited the sketches and presented their line plans; wrote specifications; fitted prototypes on fit models, and developed final production samples for presentation to industry. The branding students developed a SWOT analysis, a Brand Measurement report, a mind-map for the brands and a fully integrated Marketing Report which was presented alongside the ideas for the new lines. In future if the opportunity arises to work in this collaborative way with an existing company who wishes to look both at branding and product development strategies, classes will be scheduled at the same time so that students have more time to meet and discuss timelines and assigned tasks. As it was, student groups had to meet outside of each class time and this proved to be a challenging though not uncommon part of teamwork (Pfaff and Huddleston, 2003). Although the logistics of this exercise were time-consuming to set up and administer, professors felt that the benefits to students were multiple. The most important benefit, according to student feedback from both classes, was the opportunity to work with industry professionals, follow their process, and see the results of their work evaluated by the people who made the decisions at the company level. Faculty members were grateful to have a "real-world" case to work with in the classroom to provide focus. Creative ideas and strategies were traded as plans were made, extending and strengthening the departmental links be tween the branding and product development areas. By working not only with students coming from a different knowledge base, but also having to keep in contact with the industry partner and follow the framework and timeline of industry practice, student teams were challenged to produce excellent and innovative work under new circumstances. Working on the product development and branding for "real-life" brands that are struggling gave students an opportunity to see how closely their coursework ties in with the real-world and how creativity, collaboration and flexibility are necessary components of both the design and business aspects of company operations. Industry personnel were impressed by (a) the level and depth of knowledge and execution in the student projects, and (b) the creativity of new ideas for the brands.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

Ecological Changes of Insect-damaged Pinus densiflora Stands in the Southern Temperate Forest Zone of Korea (I) (솔잎혹파리 피해적송림(被害赤松林)의 생태학적(生態学的) 연구(研究) (I))

  • Yim, Kyong Bin;Lee, Kyong Jae;Kim, Yong Shik
    • Journal of Korean Society of Forest Science
    • /
    • v.52 no.1
    • /
    • pp.58-71
    • /
    • 1981
  • Thecodiplosis japonesis is sweeping the Pinus densiflora forests from south-west to north-east direction, destroying almost all the aged large trees as well as even the young ones. The front line of infestation is moving slowly but ceaselessly norhwards as a long bottle front. Estimation is that more than 40 percent of the area of P. densiflora forest has been damaged already, however some individuals could escapes from the damage and contribute to restore the site to the previous vegetation composition. When the stands were attacked by this insect, the drastic openings of the upper story of tree canopy formed by exclusively P. densiflora are usually resulted and some environmental factors such as light, temperature, litter accumulation, soil moisture and offers were naturally modified. With these changes after insect invasion, as the time passes, phytosociologic changes of the vegetation are gradually proceeding. If we select the forest according to four categories concerning the history of the insect outbreak, namely, non-attacked (healthy forest), recently damaged (the outbreak occured about 1-2 years ago), severely damaged (occured 5-6 years ago), damage prolonged (occured 10 years ago) and restored (occured about 20 years ago), any directional changes of vegetation composition could be traced these in line with four progressive stages. To elucidate these changes, three survey districts; (1) "Gongju" where the damage was severe and it was outbroken in 1977, (2) "Buyeo" where damage prolonged and (3) "Gochang" as restored, were set, (See Tab. 1). All these were located in the south temperate forest zone which was delimited mainly due to the temporature factor and generally accepted without any opposition at present. In view of temperature, the amount and distribution of precipitation and various soil factor, the overall homogeneity of environmental conditions between survey districts might be accepted. However this did not mean that small changes of edaphic and topographic conditions and microclimates can induce any alteration of vegetation patterns. Again four survey plots were set in each district and inter plot distance was 3 to 4 km. And again four subplots were set within a survey plot. The size of a subplot was $10m{\times}10m$ for woody vegetation and $5m{\times}5m$ for ground cover vegetation which was less than 2 m high. The nested quadrat method was adopted. In sampling survey plots, the followings were taken into account: (1) Natural growth having more than 80 percent of crown density of upper canopy and more than 5 hectares of area. (2) Was not affected by both natural and artificial disturbances such as fire and thinning operation for the past three decades. (3) Lower than 500 m of altitude (4) Less than 20 degrees of slope, and (5) Northerly sited aspect. An intensive vegetation survey was undertaken during the summer of 1980. The vegetation was devided into 3 categories for sampling; the upper layer (dominated mainly by the pine trees), the middle layer composed by oak species and other broad-leaved trees as well as the pine, and the ground layer or the lower layer (shrubby form of woody plants). In this study our survey was concentrated on woody species only. For the vegetation analysis, calculated were values of intensity, frequency, covers, relative importance, species diversity, dominance and similarity and dissimilasity index when importance values were calculated, different relative weights as score were arbitrarily given to each layer, i.e., 3 points for the upper layer, 2 for the middle layer and 1 for the ground layer. Then the formula becomes as follows; $$R.I.V.=\frac{3(IV\;upper\;L.)+2(IV.\;middle\;L.)+1(IV.\;ground\;L.)}{6}$$ The values of Similarity Index were calculated on the basis of the Relative Importance Value of trees (sum of relative density, frequency and cover). The formula used is; $$S.I.=\frac{2C}{S_1+S_2}{\times}100=\frac{2C}{100+100}{\times}100=C(%)$$ Where: C = The sum of the lower of the two quantitative values for species shared by the two communities. $S_1$ = The sum of all values for the first community. $S_2$ = The sum of all values for the second community. In Tab. 3, the species composition of each plot by layer and by district is presented. Without exception, the species formed the upper layer of stands was Pinus densiflora. As seen from the table, the relative cover (%), density (number of tree per $500m^2$), the range of height and diameter at brest height and cone bearing tendency were given. For the middle layer, Quercus spp. (Q. aliena, serrata, mongolica, accutissina and variabilis) and Pinus densiflora were dominating ones. Genus Rhodedendron and Lespedeza were abundant in ground vegetation, but some oaks were involved also. (1) Gongju district The total of woody species appeared in this district was 26 and relative importance value of Pinus densiflora for the upper layer was 79.1%, but in the middle layer, the R.I.V. for Quercus acctissima, Pinus densiflora, and Quercus aliena, were 22.8%, 18.7% and 10.0%, respectively, and in ground vegetation Q. mongolica 17.0%, Q. serrata 16.8% Corylus heterophylla 11.8%, and Q. dentata 11.3% in order. (2) Buyeo district. The number of species enumerated in this district was 36 and the R.I.V. of Pinus densiflora for the uppper layer was 100%. In the middle layer, the R.I.V. of Q. variabilis and Q. serrata were 8.6% and 8.5% respectively. In the ground vegetative 24 species were counted which had no more than 5% of R.I.V. The mean R.I.V. of P.densiflora ( totaling three layers ) and averaging four plots was 57.7% in contrast to 46.9% for Gongju district. (3) Gochang-district The total number of woody species was 23 and the mean R.I.V. of Pinus densiflora was 66.0% showing greater value than those for two former districts. The next high value was 6.5% for Q. serrata. As the time passes since insect outbreak, the mean R.I.V. of P. densiflora increased as the following order, 46.9%, 57.7% and 66%. This implies that P. densiflora was getting back to its original dominat state again. The pooled importance of Genus Quercus was decreasing with the increase of that for Pinus densiflora. This trend was contradict to the facts which were surveyed at Kyonggi-do area (the central temperate forest zone) reported previously (Yim et al, 1980). Among Genus Quercus, Quercus acutissina, warm-loving species, was more abundant in the southern temperature zone to which the present research is concerned than the central temperate zone. But vice-versa was true with Q. mongolica, a cold-loving one. The species which are not common between the present survey and the previous report are Corpinus cordata, Beltala davurica, Wisturia floribunda, Weigela subsessilis, Gleditsia japonica var. koraiensis, Acer pseudosieboldianum, Euonymus japonica var. macrophylla, Ribes mandshuricum, Pyrus calleryana var. faruiei, Tilia amurensis and Pyrus pyrifolia. In Figure 4 and Table 5, Maximum species diversity (maximum H'), Species diversity (H') and Eveness (J') were presented. The Similarity indices between districts were shown in Tab. 5. Seeing Fig. 6, showing two-dimensional ordination of polts on the basis of X and Y coordinates, Ai plots aggregate at the left site, Bi plots at lower site, and Ci plots at upper-right site. The increasing and decreasing patterns as to Relative Density and Relative Importance Value by genus or species were given in Fig. 7. Some of the patterns presented here are not consistent with the previously reported ones (Yim, et al, 1980). The present authors would like to attribute this fact that two distinct types of the insect attack, one is the short war type occuring in the south temperate forest zone, which means that insect attack went for a few years only, the other one is a long-drawn was type observed at the temperate forest zone in which the insect damage went on continuously for several years. These different behaviours of infestation might have resulted the different ways of vegetational change. Analysing the similarity indices between districts, the very convincing results come out that the value of dissimilarity index between A and B was 30%, 27% between B and C and 35% between A and C (Table 6). The range of similarity index was obtained from the calculation of every possible combinations of plots between two districts. Longer time isolation between communities has brought the higher value of dissimilarity index. The main components of ground vegetation, 10 to 20 years after insect outbreak, become to be consisted of mainly Genus Lespedeza and Rhododendron. Genus Quercus which relate to the top dorminant state for a while after insect attack was giving its place to Pinus densiflora. It was implied that, provided that the soil fertility, soil moisture and soil depth were good enough, Genus Quercuss had never been so easily taken ever by the resistant speeies like Pinus densiflora which forms the edaphic climax at vast areas of forest land. Usually they refer Quercus to the representative component of the undisturbed natural forest in the central part of this country.

  • PDF