• Title/Summary/Keyword: Number of Users

Search Result 3,054, Processing Time 0.032 seconds

Word-of-Mouth Effect for Online Sales of K-Beauty Products: Centered on China SINA Weibo and Meipai (K-Beauty 구전효과가 온라인 매출액에 미치는 영향: 중국 SINA Weibo와 Meipai 중심으로)

  • Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.197-218
    • /
    • 2019
  • In addition to economic growth and national income increase, China is also experiencing rapid growth in consumption of cosmetics. About 67% of the total trade volume of Chinese cosmetics is made by e-commerce and especially K-Beauty products, which are Korean cosmetics are very popular. According to previous studies, 80% of consumer goods such as cosmetics are affected by the word of mouth information, searching the product information before purchase. Mostly, consumers acquire information related to cosmetics through comments made by other consumers on SNS such as SINA Weibo and Wechat, and recently they also use information about beauty related video channels. Most of the previous online word-of-mouth researches were mainly focused on media itself such as Facebook, Twitter, and blogs. However, the informational characteristics and the expression forms are also diverse. Typical types are text, picture, and video. This study focused on these types. We analyze the unstructured data of SINA Weibo, the SNS representative platform of China, and Meipai, the video platform, and analyze the impact of K-Beauty brand sales by dividing online word-of-mouth information with quantity and direction information. We analyzed about 330,000 data from Meipai, and 110,000 data from SINA Weibo and analyzed the basic properties of cosmetics. As a result of analysis, the amount of online word-of-mouth information has a positive effect on the sales of cosmetics irrespective of the type of media. However, the online videos showed higher impacts than the pictures and texts. Therefore, it is more effective for companies to carry out advertising and promotional activities in parallel with the existing SNS as well as video related information. It is understood that it is important to generate the frequency of exposure irrespective of media type. The positiveness of the video media was significant but the positiveness of the picture and text media was not significant. Due to the nature of information types, the amount of information in video media is more than that in text-oriented media, and video-related channels are emerging all over the world. In particular, China has made a number of video platforms in recent years and has enjoyed popularity among teenagers and thirties. As a result, existing SNS users are being dispersed to video media. We also analyzed the effect of online type of information on the online cosmetics sales by dividing the product type of cosmetics into basic cosmetics and color cosmetics. As a result, basic cosmetics had a positive effect on the sales according to the number of online videos and it was affected by the negative information of the videos. In the case of basic cosmetics, effects or characteristics do not appear immediately like color cosmetics, so information such as changes after use is often transmitted over a period of time. Therefore, it is important for companies to move more quickly to issues generated from video media. Color cosmetics are largely influenced by negative oral statements and sensitive to picture and text-oriented media. Information such as picture and text has the advantage and disadvantage that the process of making it can be made easier than video. Therefore, complaints and opinions are generally expressed in SNS quickly and immediately. Finally, we analyzed how product diversity affects sales according to online word of mouth information type. As a result of the analysis, it can be confirmed that when a variety of products are introduced in a video channel, they have a positive effect on online cosmetics sales. The significance of this study in the theoretical aspect is that, as in the previous studies, online sales have basically proved that K-Beauty cosmetics are also influenced by word-of-mouth. However this study focused on media types and both media have a positive impact on sales, as in previous studies, but it has been proven that video is more informative and influencing than text, depending on media abundance. In addition, according to the existing research on information direction, it is said that the negative influence has more influence, but in the basic study, the correlation is not significant, but the effect of negation in the case of color cosmetics is large. In the case of temporal fashion products such as color cosmetics, fast oral effect is influenced. In practical terms, it is expected that it will be helpful to use advertising strategies on the sales and advertising strategy of K-Beauty cosmetics in China by distinguishing basic and color cosmetics. In addition, it can be said that it recognized the importance of a video advertising strategy such as YouTube and one-person media. The results of this study can be used as basic data for analyzing the big data in understanding the Chinese cosmetics market and establishing appropriate strategies and marketing utilization of related companies.

Estimation of SCS Runoff Curve Number and Hydrograph by Using Highly Detailed Soil Map(1:5,000) in a Small Watershed, Sosu-myeon, Goesan-gun (SCS-CN 산정을 위한 수치세부정밀토양도 활용과 괴산군 소수면 소유역의 물 유출량 평가)

  • Hong, Suk-Young;Jung, Kang-Ho;Choi, Chol-Uong;Jang, Min-Won;Kim, Yi-Hyun;Sonn, Yeon-Kyu;Ha, Sang-Keun
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.43 no.3
    • /
    • pp.363-373
    • /
    • 2010
  • "Curve number" (CN) indicates the runoff potential of an area. The US Soil Conservation Service (SCS)'s CN method is a simple, widely used, and efficient method for estimating the runoff from a rainfall event in a particular area, especially in ungauged basins. The use of soil maps requested from end-users was dominant up to about 80% of total use for estimating CN based rainfall-runoff. This study introduce the use of soil maps with respect to hydrologic and watershed management focused on hydrologic soil group and a case study resulted in assessing effective rainfall and runoff hydrograph based on SCS-CN method in a small watershed. The ratio of distribution areas for hydrologic soil group based on detailed soil map (1:25,000) of Korea were 42.2% (A), 29.4% (B), 18.5% (C), and 9.9% (D) for HSG 1995, and 35.1% (A), 15.7% (B), 5.5% (C), and 43.7% (D) for HSG 2006, respectively. The ratio of D group in HSG 2006 accounted for 43.7% of the total and 34.1% reclassified from A, B, and C groups of HSG 1995. Similarity between HSG 1995 and 2006 was about 55%. Our study area was located in Sosu-myeon, Goesan-gun including an approx. 44 $km^2$-catchment, Chungchungbuk-do. We used a digital elevation model (DEM) to delineate the catchments. The soils were classified into 4 hydrologic soil groups on the basis of measured infiltration rate and a model of the representative soils of the study area reported by Jung et al. 2006. Digital soil maps (1:5,000) were used for classifying hydrologic soil groups on the basis of soil series unit. Using high resolution satellite images, we delineated the boundary of each field or other parcel on computer screen, then surveyed the land use and cover in each. We calculated CN for each and used those data and a land use and cover map and a hydrologic soil map to estimate runoff. CN values, which are ranged from 0 (no runoff) to 100 (all precipitation runs off), of the catchment were 73 by HSG 1995 and 79 by HSG 2006, respectively. Each runoff response, peak runoff and time-to-peak, was examined using the SCS triangular synthetic unit hydrograph, and the results of HSG 2006 showed better agreement with the field observed data than those with use of HSG 1995.

Major Class Recommendation System based on Deep learning using Network Analysis (네트워크 분석을 활용한 딥러닝 기반 전공과목 추천 시스템)

  • Lee, Jae Kyu;Park, Heesung;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.95-112
    • /
    • 2021
  • In university education, the choice of major class plays an important role in students' careers. However, in line with the changes in the industry, the fields of major subjects by department are diversifying and increasing in number in university education. As a result, students have difficulty to choose and take classes according to their career paths. In general, students choose classes based on experiences such as choices of peers or advice from seniors. This has the advantage of being able to take into account the general situation, but it does not reflect individual tendencies and considerations of existing courses, and has a problem that leads to information inequality that is shared only among specific students. In addition, as non-face-to-face classes have recently been conducted and exchanges between students have decreased, even experience-based decisions have not been made as well. Therefore, this study proposes a recommendation system model that can recommend college major classes suitable for individual characteristics based on data rather than experience. The recommendation system recommends information and content (music, movies, books, images, etc.) that a specific user may be interested in. It is already widely used in services where it is important to consider individual tendencies such as YouTube and Facebook, and you can experience it familiarly in providing personalized services in content services such as over-the-top media services (OTT). Classes are also a kind of content consumption in terms of selecting classes suitable for individuals from a set content list. However, unlike other content consumption, it is characterized by a large influence of selection results. For example, in the case of music and movies, it is usually consumed once and the time required to consume content is short. Therefore, the importance of each item is relatively low, and there is no deep concern in selecting. Major classes usually have a long consumption time because they have to be taken for one semester, and each item has a high importance and requires greater caution in choice because it affects many things such as career and graduation requirements depending on the composition of the selected classes. Depending on the unique characteristics of these major classes, the recommendation system in the education field supports decision-making that reflects individual characteristics that are meaningful and cannot be reflected in experience-based decision-making, even though it has a relatively small number of item ranges. This study aims to realize personalized education and enhance students' educational satisfaction by presenting a recommendation model for university major class. In the model study, class history data of undergraduate students at University from 2015 to 2017 were used, and students and their major names were used as metadata. The class history data is implicit feedback data that only indicates whether content is consumed, not reflecting preferences for classes. Therefore, when we derive embedding vectors that characterize students and classes, their expressive power is low. With these issues in mind, this study proposes a Net-NeuMF model that generates vectors of students, classes through network analysis and utilizes them as input values of the model. The model was based on the structure of NeuMF using one-hot vectors, a representative model using data with implicit feedback. The input vectors of the model are generated to represent the characteristic of students and classes through network analysis. To generate a vector representing a student, each student is set to a node and the edge is designed to connect with a weight if the two students take the same class. Similarly, to generate a vector representing the class, each class was set as a node, and the edge connected if any students had taken the classes in common. Thus, we utilize Node2Vec, a representation learning methodology that quantifies the characteristics of each node. For the evaluation of the model, we used four indicators that are mainly utilized by recommendation systems, and experiments were conducted on three different dimensions to analyze the impact of embedding dimensions on the model. The results show better performance on evaluation metrics regardless of dimension than when using one-hot vectors in existing NeuMF structures. Thus, this work contributes to a network of students (users) and classes (items) to increase expressiveness over existing one-hot embeddings, to match the characteristics of each structure that constitutes the model, and to show better performance on various kinds of evaluation metrics compared to existing methodologies.

A Study on the Regional Characteristics of Broadband Internet Termination by Coupling Type using Spatial Information based Clustering (공간정보기반 클러스터링을 이용한 초고속인터넷 결합유형별 해지의 지역별 특성연구)

  • Park, Janghyuk;Park, Sangun;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.45-67
    • /
    • 2017
  • According to the Internet Usage Research performed in 2016, the number of internet users and the internet usage have been increasing. Smartphone, compared to the computer, is taking a more dominant role as an internet access device. As the number of smart devices have been increasing, some views that the demand on high-speed internet will decrease; however, Despite the increase in smart devices, the high-speed Internet market is expected to slightly increase for a while due to the speedup of Giga Internet and the growth of the IoT market. As the broadband Internet market saturates, telecom operators are over-competing to win new customers, but if they know the cause of customer exit, it is expected to reduce marketing costs by more effective marketing. In this study, we analyzed the relationship between the cancellation rates of telecommunication products and the factors affecting them by combining the data of 3 cities, Anyang, Gunpo, and Uiwang owned by a telecommunication company with the regional data from KOSIS(Korean Statistical Information Service). Especially, we focused on the assumption that the neighboring areas affect the distribution of the cancellation rates by coupling type, so we conducted spatial cluster analysis on the 3 types of cancellation rates of each region using the spatial analysis tool, SatScan, and analyzed the various relationships between the cancellation rates and the regional data. In the analysis phase, we first summarized the characteristics of the clusters derived by combining spatial information and the cancellation data. Next, based on the results of the cluster analysis, Variance analysis, Correlation analysis, and regression analysis were used to analyze the relationship between the cancellation rates data and regional data. Based on the results of analysis, we proposed appropriate marketing methods according to the region. Unlike previous studies on regional characteristics analysis, In this study has academic differentiation in that it performs clustering based on spatial information so that the regions with similar cancellation types on adjacent regions. In addition, there have been few studies considering the regional characteristics in the previous study on the determinants of subscription to high-speed Internet services, In this study, we tried to analyze the relationship between the clusters and the regional characteristics data, assuming that there are different factors depending on the region. In this study, we tried to get more efficient marketing method considering the characteristics of each region in the new subscription and customer management in high-speed internet. As a result of analysis of variance, it was confirmed that there were significant differences in regional characteristics among the clusters, Correlation analysis shows that there is a stronger correlation the clusters than all region. and Regression analysis was used to analyze the relationship between the cancellation rate and the regional characteristics. As a result, we found that there is a difference in the cancellation rate depending on the regional characteristics, and it is possible to target differentiated marketing each region. As the biggest limitation of this study and it was difficult to obtain enough data to carry out the analyze. In particular, it is difficult to find the variables that represent the regional characteristics in the Dong unit. In other words, most of the data was disclosed to the city rather than the Dong unit, so it was limited to analyze it in detail. The data such as income, card usage information and telecommunications company policies or characteristics that could affect its cause are not available at that time. The most urgent part for a more sophisticated analysis is to obtain the Dong unit data for the regional characteristics. Direction of the next studies be target marketing based on the results. It is also meaningful to analyze the effect of marketing by comparing and analyzing the difference of results before and after target marketing. It is also effective to use clusters based on new subscription data as well as cancellation data.

Analyzing the Issue Life Cycle by Mapping Inter-Period Issues (기간별 이슈 매핑을 통한 이슈 생명주기 분석 방법론)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.25-41
    • /
    • 2014
  • Recently, the number of social media users has increased rapidly because of the prevalence of smart devices. As a result, the amount of real-time data has been increasing exponentially, which, in turn, is generating more interest in using such data to create added value. For instance, several attempts are being made to analyze the relevant search keywords that are frequently used on new portal sites and the words that are regularly mentioned on various social media in order to identify social issues. The technique of "topic analysis" is employed in order to identify topics and themes from a large amount of text documents. As one of the most prevalent applications of topic analysis, the technique of issue tracking investigates changes in the social issues that are identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has two limitations. First, when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. This creates practical limitations in the form of significant time and cost burdens. Therefore, this traditional approach is difficult to apply in most applications that need to perform an analysis on the additional period. Second, the issue is not only generated and terminated constantly, but also one issue can sometimes be distributed into several issues or multiple issues can be integrated into one single issue. In other words, each issue is characterized by a life cycle that consists of the stages of creation, transition (merging and segmentation), and termination. The existing issue tracking methods do not address the connection and effect relationship between these issues. The purpose of this study is to overcome the two limitations of the existing issue tracking method, one being the limitation regarding the analysis method and the other being the limitation involving the lack of consideration of the changeability of the issues. Let us assume that we perform multiple topic analysis for each multiple period. Then it is essential to map issues of different periods in order to trace trend of issues. However, it is not easy to discover connection between issues of different periods because the issues derived for each period mutually contain heterogeneity. In this study, to overcome these limitations without having to analyze the entire period's documents simultaneously, the analysis can be performed independently for each period. In addition, we performed issue mapping to link the identified issues of each period. An integrated approach on each details period was presented, and the issue flow of the entire integrated period was depicted in this study. Thus, as the entire process of the issue life cycle, including the stages of creation, transition (merging and segmentation), and extinction, is identified and examined systematically, the changeability of the issues was analyzed in this study. The proposed methodology is highly efficient in terms of time and cost, as it sufficiently considered the changeability of the issues. Further, the results of this study can be used to adapt the methodology to a practical situation. By applying the proposed methodology to actual Internet news, the potential practical applications of the proposed methodology are analyzed. Consequently, the proposed methodology was able to extend the period of the analysis and it could follow the course of progress of each issue's life cycle. Further, this methodology can facilitate a clearer understanding of complex social phenomena using topic analysis.

Analysis of Telephone Counseling Service on Child Health (전화 아기건강상담을 통해 나타난 우리나라 어머니들의 육아문제 분석)

  • Song Ji-Ho;Han Kyung-Ja;Oh Ka-Sil;Cho Kyoul-Ja;Lee Ja-Hyung;Park Eun-Sook;Cho Kap-Chul;Tak Young-Nan;Ahn Young-Mee
    • Child Health Nursing Research
    • /
    • v.7 no.2
    • /
    • pp.245-257
    • /
    • 2001
  • This study analyzed the services as operated by the Child Health Telephone Service Center. The Center is a toll free service operated as part of the community services of the Korean Academic Society of Child Health Nursing. The aim of the study was to describe the concerns of child caregivers regarding child health care as discussed during telephone counseling. Specific objectives were as follows: 1. To analyze the activities of the Center. 2. To describe the characteristics of caregivers who made phone calls for counseling services and also the characteristics of their children. 3. To analyze the content of the counseling sessions. 4. To analyze counseling content according to the characteristics of the caregivers and their children. Data used for the study were obtained from the counseling records for the period from Sept. to Dec. 1999, as kept by the three counselors at the Center. The total number of calls was 8,261 and that consisted of 15,150 questions. The total questions were merged into 13,236 by eliminating those questions which overlapped or were of similar content. The final 13,236 questions were used for the final analyses. Almost of the callers (98.4%) were mothers. Among them 89.6% were between 25 and 35 years of age. Geographical distribution of the callers covered the whole nation. The largest numbers who made the calls were from the Seoul metropolitan area (36%), followed by 28% from Kyung Gi Province, and 20% were from the Kyung Sang area. Among 8,261 callers, 72.8% were first users. Sex of the babies and children in question for counseling was about even for males and females and ages ranged from one month to six years. The largest group (62.5%) was the less than six month age group. The finalized 13,236 questions/problems were categorized into 11 problem areas. They were in order of frequency, physical problems, feedings and nutrient concerns, information on child rearing, growth and development, guidance on utilization of child care facilities, elimination problems, sleeping concerns, immunization related concerns, behavior problems, injury and accidents, and safety measures. The most frequent problems for counseling were physical signs and symptoms (27.3%), followed by feeding and nutrients, information on child rearing, and growth and development. Of physical problems, abnormal gastrointestinal signs and symptoms were the most frequent concern and skin problems were next at 25% and 23.3% respectively. Loose bowels, vomiting and constipation were the most frequent gastrointestinal problems. Atopic dermatitis had the highest frequency at 53.3% with diaper rash being the second highest among the skin problems. About 80% of the growth and developmental category were physical development concerns related to physiological, body growth, and motor and sensory development. This study constitutes the activity report for the first year of the Center. The findings correspond with literature reports on child health problems and parents educational needs. One recommendation from this study is that since the services of the Center are carried out only by telephone, the psychology of the counselees and the counselor relationship must be considered for better services.

  • PDF

A Real-Time Stock Market Prediction Using Knowledge Accumulation (지식 누적을 이용한 실시간 주식시장 예측)

  • Kim, Jin-Hwa;Hong, Kwang-Hun;Min, Jin-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.109-130
    • /
    • 2011
  • One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.

The Present State and Solutions for Archival Arrangement and Description of National Archives & Records Service of Korea (국가기록원의 기록물 정리기술의 현황과 개선방안)

  • Yoon, Ju-Bom
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.4 no.2
    • /
    • pp.118-162
    • /
    • 2004
  • Archival description in archives has an important role in document control and reference service. Archives has made an effort to do archival description. But we have some differences and problems about a theory and practical processes comparing with advanced countries. The serious difference in a theory is that a function classification, maintenance of an original order, arrangement of multi-level description are not reflected in practical process. they are arranged in shelves after they are arranged by registration order in a unit of a volume like an arrangement of book. In addition, there are problems in history of agency change or control of index. So these can cause inconvenience for users. For improving, in this study we introduced the meaning and importance of arrangement of description, the situation and problem of arrangement of description in The National Archives, and a description guideline in other foreign countries. The next is an example for ISAD(G). This paper has chapter 8, the chapter 1 is introduction, the chapter 2 is the meaning and importance of arrangement of description, excluding the chapter 8 is conclusion we can say like this from the chapter 3 to the chapter 7. In the chapter 3, we explain GOVT we are using now and description element category in situation and problem of arrangement of description in Archives. In the chapter 4, this is about guideline from Archives in U.S.A, England and Australia. 1. Lifecycle Date Requirement Guide from NARA is introduced and of the description field, the way of the description about just one title element is introduced. 2. This is about the guideline of the description from Public Record Office. That name is National Archives Cataloguing Guidelines Introduction. We are saying "PROCAT" from this guideline and the seven procedure of description. 3. This is about Commomon Record Series from National Archives of Australia. we studied Registration & description procedures for CRS system. In the chapter 5, This is about the example which applied ISAD to. Archives introduce description of documents produced from Appeals Commission in the Ministry of Government Administration. In the chapter 6, 7. These are about the problems we pointed after using ISAD, naming for the document at procedure section in every institution, the lack of description fields category, the sort or classification of the kind or form, the reference or identified number, the absence description rule about the details, function classification, multi-level description, input format, arrangement of book shelf, authority control. The plan for improving are that problems. The best way for arrangement and description in Archives is to examine the standard, guideline, manual from archives in the advanced countries. So we suggested we need many research and study about this in the academic field.

A Methodology for Extracting Shopping-Related Keywords by Analyzing Internet Navigation Patterns (인터넷 검색기록 분석을 통한 쇼핑의도 포함 키워드 자동 추출 기법)

  • Kim, Mingyu;Kim, Namgyu;Jung, Inhwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.123-136
    • /
    • 2014
  • Recently, online shopping has further developed as the use of the Internet and a variety of smart mobile devices becomes more prevalent. The increase in the scale of such shopping has led to the creation of many Internet shopping malls. Consequently, there is a tendency for increasingly fierce competition among online retailers, and as a result, many Internet shopping malls are making significant attempts to attract online users to their sites. One such attempt is keyword marketing, whereby a retail site pays a fee to expose its link to potential customers when they insert a specific keyword on an Internet portal site. The price related to each keyword is generally estimated by the keyword's frequency of appearance. However, it is widely accepted that the price of keywords cannot be based solely on their frequency because many keywords may appear frequently but have little relationship to shopping. This implies that it is unreasonable for an online shopping mall to spend a great deal on some keywords simply because people frequently use them. Therefore, from the perspective of shopping malls, a specialized process is required to extract meaningful keywords. Further, the demand for automating this extraction process is increasing because of the drive to improve online sales performance. In this study, we propose a methodology that can automatically extract only shopping-related keywords from the entire set of search keywords used on portal sites. We define a shopping-related keyword as a keyword that is used directly before shopping behaviors. In other words, only search keywords that direct the search results page to shopping-related pages are extracted from among the entire set of search keywords. A comparison is then made between the extracted keywords' rankings and the rankings of the entire set of search keywords. Two types of data are used in our study's experiment: web browsing history from July 1, 2012 to June 30, 2013, and site information. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The original sample dataset contains 150 million transaction logs. First, portal sites are selected, and search keywords in those sites are extracted. Search keywords can be easily extracted by simple parsing. The extracted keywords are ranked according to their frequency. The experiment uses approximately 3.9 million search results from Korea's largest search portal site. As a result, a total of 344,822 search keywords were extracted. Next, by using web browsing history and site information, the shopping-related keywords were taken from the entire set of search keywords. As a result, we obtained 4,709 shopping-related keywords. For performance evaluation, we compared the hit ratios of all the search keywords with the shopping-related keywords. To achieve this, we extracted 80,298 search keywords from several Internet shopping malls and then chose the top 1,000 keywords as a set of true shopping keywords. We measured precision, recall, and F-scores of the entire amount of keywords and the shopping-related keywords. The F-Score was formulated by calculating the harmonic mean of precision and recall. The precision, recall, and F-score of shopping-related keywords derived by the proposed methodology were revealed to be higher than those of the entire number of keywords. This study proposes a scheme that is able to obtain shopping-related keywords in a relatively simple manner. We could easily extract shopping-related keywords simply by examining transactions whose next visit is a shopping mall. The resultant shopping-related keyword set is expected to be a useful asset for many shopping malls that participate in keyword marketing. Moreover, the proposed methodology can be easily applied to the construction of special area-related keywords as well as shopping-related ones.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.