• Title/Summary/Keyword: Web-System

Search Result 7,837, Processing Time 0.034 seconds

Prediction of Key Variables Affecting NBA Playoffs Advancement: Focusing on 3 Points and Turnover Features (미국 프로농구(NBA)의 플레이오프 진출에 영향을 미치는 주요 변수 예측: 3점과 턴오버 속성을 중심으로)

  • An, Sehwan;Kim, Youngmin
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.263-286
    • /
    • 2022
  • This study acquires NBA statistical information for a total of 32 years from 1990 to 2022 using web crawling, observes variables of interest through exploratory data analysis, and generates related derived variables. Unused variables were removed through a purification process on the input data, and correlation analysis, t-test, and ANOVA were performed on the remaining variables. For the variable of interest, the difference in the mean between the groups that advanced to the playoffs and did not advance to the playoffs was tested, and then to compensate for this, the average difference between the three groups (higher/middle/lower) based on ranking was reconfirmed. Of the input data, only this year's season data was used as a test set, and 5-fold cross-validation was performed by dividing the training set and the validation set for model training. The overfitting problem was solved by comparing the cross-validation result and the final analysis result using the test set to confirm that there was no difference in the performance matrix. Because the quality level of the raw data is high and the statistical assumptions are satisfied, most of the models showed good results despite the small data set. This study not only predicts NBA game results or classifies whether or not to advance to the playoffs using machine learning, but also examines whether the variables of interest are included in the major variables with high importance by understanding the importance of input attribute. Through the visualization of SHAP value, it was possible to overcome the limitation that could not be interpreted only with the result of feature importance, and to compensate for the lack of consistency in the importance calculation in the process of entering/removing variables. It was found that a number of variables related to three points and errors classified as subjects of interest in this study were included in the major variables affecting advancing to the playoffs in the NBA. Although this study is similar in that it includes topics such as match results, playoffs, and championship predictions, which have been dealt with in the existing sports data analysis field, and comparatively analyzed several machine learning models for analysis, there is a difference in that the interest features are set in advance and statistically verified, so that it is compared with the machine learning analysis result. Also, it was differentiated from existing studies by presenting explanatory visualization results using SHAP, one of the XAI models.

End-use Analysis of Household Water by Metering (가정용수의 용도별 사용 원단위 분석)

  • Kim, Hwa Soo;Lee, Doo Jin;Kim, Ju Whan;Jung, Kwan Soo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.28 no.5B
    • /
    • pp.595-601
    • /
    • 2008
  • The purpose of this study is to investigate the trends and patterns of various kind of water uses in a household by metering in Korea. Water use components are classified by toilet, washbowl, bathing, laundry, kitchen, miscellaneous. Flow meters are installed in 140 household selected by sampling in all around Korea. The data are gathered by web-based data collection system from the year 2002 to 2006, considering pre-investigated data such as occupation, revenue, family members, housing types, age, floor area, water saving devices, education, miscellaneous. Reliable data are selected by upper fence method for each observed water use component and statistical characteristics are estimated for each residential type to determine liter per capita per day. Estimated domestic per capita day show an indoor water use with the range from 150 lpcd to 169 lpcd for each housing type as the order of high rise apartment, multi-house, and single house. As the order of consuming amount among water use components, it is investigated that toilet (38.5 lpcd) is the first, and the second is laundry water (30.8 lpcd), the third is kitchen (28.4 lpcd), the fourth is bathtub (24.7 lpcd), the next is washbowl (15.4 lpcd). The results are compared with water uses in U.K. and U.S. As life style has been changed into western style, pattern of water use in Korea is tend to be similar with the U.S. water use pattern. Compared with the surveying results by Bradley, on 1985. Thirty liter of total use increased with the advancement of economic level, and a little change of water use pattern can be found. Especially, toilet water take almost half part of total water use and laundry water shows lowest as 11% in surveying at the year of 1985. But, this study shows that 39 liter, 28% of toilet water, has been decreased by the spread of saving devices and campaign. It is supposed that the spread large sized laundry machine make by-hand laundry has been decreased and water use increased. Unit water amount of each end-use in household can be applied to design factor for water and wastewater facilities, and it play a role as information in establishing water demand forecasting and conservation policy.

Analysis of Tourism Popularity Using T-map Search andSome Trend Data: Focusing on Chuncheon-city, Gangwon-province (T맵 검색지와 썸트랜드 데이터를 이용한 관광인기도분석: 강원도 춘천을 중심으로)

  • TaeWoo Kim;JaeHee Cho
    • Journal of Service Research and Studies
    • /
    • v.12 no.1
    • /
    • pp.25-35
    • /
    • 2022
  • Covid-19, of which the first patient in Korea occurred in January 2020, has affected various fields. Of these, the tourism sector might havebeen hit the hardest. In particular, since tourism-based industrial structure forms the basis of the region, Gangwon-province, and the tourism industry is the main source of income for small businesses and small enterprises, the damage is great. To check the situation and extent of such damage, targeting the Chuncheon region, where public access is the most convenient among the Gangwon regions, one-day tours are possible using public transportation from Seoul and the metropolitan area, with a general image that low expense tourism is recognized as possible, this study conducted empirical analysis through data analysis. For this, the general status of the region was checked based on the visitor data of Chuncheon city provided by the tourist information system, and to check the levels ofinterest in 2019, before Covid-19, and in 2020, after Covid-19, by comparing keywords collected from the web service sometrend of Vibe Company Inc., a company specializing in keyword collection, with SK Telecom's T-map search site data, which in parallel provides in-vehicle navigation service and communication service, this study analyzed the general regional image of Chuncheon-city. In addition, by comparing data from two years by developing a tourism popularity index applying keywords and T-map search site data, this study examined how much the Covid-19 situation affected the level of interest of visitors to the Chuncheon area leading to actual visits using a data analysis approach. According to the results of big data analysis applying the tourism popularity index after designing the data mart, this study confirmed that the effect of the Covid-19 situation on tourism popularity in Chuncheon-city, Gangwon-provincewas not significant, and confirmed the image of tourist destinations based on the regional characteristics of the region. It is hoped that the results of this research and analysis can be used as useful reference data for tourism economic policy making.

Semi-Quantitative Scoring of Late Gadolinium Enhancement of the Left Ventricle in Patients with Ischemic Cardiomyopathy: Improving Interobserver Reliability and Agreement Using Consensus Guidance from the Asian Society of Cardiovascular Imaging-Practical Tutorial (ASCI-PT) 2020

  • Cherry Kim;Chul Hwan Park;Do Yeon Kim;Jaehyung Cha;Bae Young Lee;Chan Ho Park;Eun-Ju Kang;Hyun Jung Koo;Kakuya Kitagawa;Min Jae Cha;Rungroj Krittayaphong;Sang Il Choi;Sanjaya Viswamitra;Sung Min Ko;Sung Mok Kim;Sung Ho Hwang;Nguyen Ngoc Trang;Whal Lee;Young Jin Kim;Jongmin Lee;Dong Hyun Yang
    • Korean Journal of Radiology
    • /
    • v.23 no.3
    • /
    • pp.298-307
    • /
    • 2022
  • Objective: This study aimed to evaluate the effect of implementing the consensus statement from the Asian Society of Cardiovascular Imaging-Practical Tutorial 2020 (ASCI-PT 2020) on the reliability of cardiac MR with late gadolinium enhancement (CMR-LGE) myocardial viability scoring between observers in the context of ischemic cardiomyopathy. Materials and Methods: A total of 17 cardiovascular imaging experts from five different countries evaluated CMR obtained in 26 patients (male:female, 23:3; median age [interquartile range], 55.5 years [50-61.8]) with ischemic cardiomyopathy. For LGE scoring, based on the 17 segments, the extent of LGE in each segment was graded using a five-point scoring system ranging from 0 to 4 before and after exposure according to the consensus statement. All scoring was performed via web-based review. Scores for slices, vascular territories, and total scores were obtained as the sum of the relevant segmental scores. Interobserver reliability for segment scores was assessed using Fleiss' kappa, while the intraclass correlation coefficient (ICC) was used for slice score, vascular territory score, and total score. Inter-observer agreement was assessed using the limits of agreement from the mean (LoA). Results: Interobserver reliability (Fleiss' kappa) in each segment ranged 0.242-0.662 before the consensus and increased to 0.301-0.774 after the consensus. The interobserver reliability (ICC) for each slice, each vascular territory, and total score increased after the consensus (slice, 0.728-0.805 and 0.849-0.884; vascular territory, 0.756-0.902 and 0.852-0.941; total score, 0.847 and 0.913, before and after implementing the consensus statement, respectively. Interobserver agreement in scoring also improved with the implementation of the consensus for all slices, vascular territories, and total score. The LoA for the total score narrowed from ± 10.36 points to ± 7.12 points. Conclusion: The interobserver reliability and agreement for CMR-LGE scoring for ischemic cardiomyopathy improved when following guidance from the ASCI-PT 2020 consensus statement.

The 1998, 1999 Patterns of Care Study for Breast Irradiation After Breast-Conserving Surgery in Korea (1998, 1999년도 우리나라에서 시행된 유방보존수술 후 방사선치료 현황 조사)

  • Suh Chang-Ok;Shin Hyun Soo;Cho Jae Ho;Park Won;Ahn Seung Do;Shin Kyung Hwan;Chung Eun Ji;Keum Ki Chang;Ha Sung Whan;Ahn Sung Ja;Kim Woo Cheol;Lee Myung Za;Ahn Ki Jung
    • Radiation Oncology Journal
    • /
    • v.22 no.3
    • /
    • pp.192-199
    • /
    • 2004
  • Purpose: To determine the patterns on evaluation and treatment in the patient with early breast cancer treated with conservative surgery and radiotherapy and to improve the radiotherapy techiniques, nationwide survey was peformed. Materials and Methods: A web-based database system for korean Patterns of Care Study (PCS) for 6 common cancers was developed. Two hundreds sixty-one randomly selected records of eligible patients treated between 1998$\~$1999 from 15 hospitals were reviewed. Results: The patients ages ranged from 24 to 85 years(median 45 years). Infiltrating ductal carcinoma was most common histologic type (88.9$\%$) followed by medullary carcinoma (4.2$\%$) and infiltrating lobular carcinoma (1.5$\%$). Pathologic T stage by AJCC was T1 in 59.7$\%$ of the casses, T2 in 29.5$\%$ of the cases, Tis in 8.8$\%$ of the cases. Axillary lymph node dissection was peformed I\in 91.2$\%$ of the cases and 69.7$\%$ were node negative. AJCC stage was 0 in 8.8$\%$ of the cases, stage I in 44.9$\%$ of the cases, stage IIa in 33.3$\%$ of the cases, and stage IIb in 8.4$\%$ of the cases. Estrogen and progesteron receptors were evaluated in 71.6$\%$, and 70.9$\%$ of the patients, respectively. Surgical methods of breast-conserving surgery was excision/lumpectomy in 37.2$\%$, wide excision in 11.5$\%$, quadrantectomy in 23$\%$ and partial mastectomy in 27.5$\%$ of the cases. A pathologically confirmed negative margin was obtained in 90.8$\%$ of the cases. Pathological margin was involved with tumor in 10 patients and margin was close (less than 2 mm) in 10 patients. All the patients except one recieved more than 90$\%$ of the planned radiotherapy dose. Radiotherapy volume was breast only In 88$\%$ of the cases, breast+supraclavicular fossa (SCL) in 5$\%$ of the cases, and breast+ SCL+ posterior axillary boost in 4.2%$\%$of the cases. Only one patient received isolated internal mammary lymph node irradiation. Used radiation beam was Co-60 in 8 cases, 4 MV X-ray in 115 cases, 6 MV X-ray in 125 cases, and 10 MV X-ray in 11 cases. The radiation dose to the whole breast was 45$\~$59.4 Gy (median 50.4) and boost dose was 8$\~$20 Gy (median 10 Gy). The total radiation dose delivered was 50.4$\~$70.4 Gy (median 60.4 Gy). Conclusion: There was no major deviation from current standard in the patterns of evaluation and treatment for the patients with early breast cancer treated with breast conservation method. Some varieties were identified in boost irradiation dose. Separate analysis for the datails of radiotherapy planning will be followed and the outcome of treatment is needed to evaluate the process.

Smart Store in Smart City: The Development of Smart Trade Area Analysis System Based on Consumer Sentiments (Smart Store in Smart City: 소비자 감성기반 상권분석 시스템 개발)

  • Yoo, In-Jin;Seo, Bong-Goon;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.25-52
    • /
    • 2018
  • This study performs social network analysis based on consumer sentiment related to a location in Seoul using data reflecting consumers' web search activities and emotional evaluations associated with commerce. The study focuses on large commercial districts in Seoul. In addition, to consider their various aspects, social network indexes were combined with the trading area's public data to verify factors affecting the area's sales. According to R square's change, We can see that the model has a little high R square value even though it includes only the district's public data represented by static data. However, the present study confirmed that the R square of the model combined with the network index derived from the social network analysis was even improved much more. A regression analysis of the trading area's public data showed that the five factors of 'number of market district,' 'residential area per person,' 'satisfaction of residential environment,' 'rate of change of trade,' and 'survival rate over 3 years' among twenty two variables. The study confirmed a significant influence on the sales of the trading area. According to the results, 'residential area per person' has the highest standardized beta value. Therefore, 'residential area per person' has the strongest influence on commercial sales. In addition, 'residential area per person,' 'number of market district,' and 'survival rate over 3 years' were found to have positive effects on the sales of all trading area. Thus, as the number of market districts in the trading area increases, residential area per person increases, and as the survival rate over 3 years of each store in the trading area increases, sales increase. On the other hand, 'satisfaction of residential environment' and 'rate of change of trade' were found to have a negative effect on sales. In the case of 'satisfaction of residential environment,' sales increase when the satisfaction level is low. Therefore, as consumer dissatisfaction with the residential environment increases, sales increase. The 'rate of change of trade' shows that sales increase with the decreasing acceleration of transaction frequency. According to the social network analysis, of the 25 regional trading areas in Seoul, Yangcheon-gu has the highest degree of connection. In other words, it has common sentiments with many other trading areas. On the other hand, Nowon-gu and Jungrang-gu have the lowest degree of connection. In other words, they have relatively distinct sentiments from other trading areas. The social network indexes used in the combination model are 'density of ego network,' 'degree centrality,' 'closeness centrality,' 'betweenness centrality,' and 'eigenvector centrality.' The combined model analysis confirmed that the degree centrality and eigenvector centrality of the social network index have a significant influence on sales and the highest influence in the model. 'Degree centrality' has a negative effect on the sales of the districts. This implies that sales decrease when holding various sentiments of other trading area, which conflicts with general social myths. However, this result can be interpreted to mean that if a trading area has low 'degree centrality,' it delivers unique and special sentiments to consumers. The findings of this study can also be interpreted to mean that sales can be increased if the trading area increases consumer recognition by forming a unique sentiment and city atmosphere that distinguish it from other trading areas. On the other hand, 'eigenvector centrality' has the greatest effect on sales in the combined model. In addition, the results confirmed a positive effect on sales. This finding shows that sales increase when a trading area is connected to others with stronger centrality than when it has common sentiments with others. This study can be used as an empirical basis for establishing and implementing a city and trading area strategy plan considering consumers' desired sentiments. In addition, we expect to provide entrepreneurs and potential entrepreneurs entering the trading area with sentiments possessed by those in the trading area and directions into the trading area considering the district-sentiment structure.

Empirical Analysis on Bitcoin Price Change by Consumer, Industry and Macro-Economy Variables (비트코인 가격 변화에 관한 실증분석: 소비자, 산업, 그리고 거시변수를 중심으로)

  • Lee, Junsik;Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.195-220
    • /
    • 2018
  • In this study, we conducted an empirical analysis of the factors that affect the change of Bitcoin Closing Price. Previous studies have focused on the security of the block chain system, the economic ripple effects caused by the cryptocurrency, legal implications and the acceptance to consumer about cryptocurrency. In various area, cryptocurrency was studied and many researcher and people including government, regardless of country, try to utilize cryptocurrency and applicate to its technology. Despite of rapid and dramatic change of cryptocurrencies' price and growth of its effects, empirical study of the factors affecting the price change of cryptocurrency was lack. There were only a few limited studies, business reports and short working paper. Therefore, it is necessary to determine what factors effect on the change of closing Bitcoin price. For analysis, hypotheses were constructed from three dimensions of consumer, industry, and macroeconomics for analysis, and time series data were collected for variables of each dimension. Consumer variables consist of search traffic of Bitcoin, search traffic of bitcoin ban, search traffic of ransomware and search traffic of war. Industry variables were composed GPU vendors' stock price and memory vendors' stock price. Macro-economy variables were contemplated such as U.S. dollar index futures, FOMC policy interest rates, WTI crude oil price. Using above variables, we did times series regression analysis to find relationship between those variables and change of Bitcoin Closing Price. Before the regression analysis to confirm the relationship between change of Bitcoin Closing Price and the other variables, we performed the Unit-root test to verifying the stationary of time series data to avoid spurious regression. Then, using a stationary data, we did the regression analysis. As a result of the analysis, we found that the change of Bitcoin Closing Price has negative effects with search traffic of 'Bitcoin Ban' and US dollar index futures, while change of GPU vendors' stock price and change of WTI crude oil price showed positive effects. In case of 'Bitcoin Ban', it is directly determining the maintenance or abolition of Bitcoin trade, that's why consumer reacted sensitively and effected on change of Bitcoin Closing Price. GPU is raw material of Bitcoin mining. Generally, increasing of companies' stock price means the growth of the sales of those companies' products and services. GPU's demands increases are indirectly reflected to the GPU vendors' stock price. Making an interpretation, a rise in prices of GPU has put a crimp on the mining of Bitcoin. Consequently, GPU vendors' stock price effects on change of Bitcoin Closing Price. And we confirmed U.S. dollar index futures moved in the opposite direction with change of Bitcoin Closing Price. It moved like Gold. Gold was considered as a safe asset to consumers and it means consumer think that Bitcoin is a safe asset. On the other hand, WTI oil price went Bitcoin Closing Price's way. It implies that Bitcoin are regarded to investment asset like raw materials market's product. The variables that were not significant in the analysis were search traffic of bitcoin, search traffic of ransomware, search traffic of war, memory vendor's stock price, FOMC policy interest rates. In search traffic of bitcoin, we judged that interest in Bitcoin did not lead to purchase of Bitcoin. It means search traffic of Bitcoin didn't reflect all of Bitcoin's demand. So, it implies there are some factors that regulate and mediate the Bitcoin purchase. In search traffic of ransomware, it is hard to say concern of ransomware determined the whole Bitcoin demand. Because only a few people damaged by ransomware and the percentage of hackers requiring Bitcoins was low. Also, its information security problem is events not continuous issues. Search traffic of war was not significant. Like stock market, generally it has negative in relation to war, but exceptional case like Gulf war, it moves stakeholders' profits and environment. We think that this is the same case. In memory vendor stock price, this is because memory vendors' flagship products were not VRAM which is essential for Bitcoin supply. In FOMC policy interest rates, when the interest rate is low, the surplus capital is invested in securities such as stocks. But Bitcoin' price fluctuation was large so it is not recognized as an attractive commodity to the consumers. In addition, unlike the stock market, Bitcoin doesn't have any safety policy such as Circuit breakers and Sidecar. Through this study, we verified what factors effect on change of Bitcoin Closing Price, and interpreted why such change happened. In addition, establishing the characteristics of Bitcoin as a safe asset and investment asset, we provide a guide how consumer, financial institution and government organization approach to the cryptocurrency. Moreover, corroborating the factors affecting change of Bitcoin Closing Price, researcher will get some clue and qualification which factors have to be considered in hereafter cryptocurrency study.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Analysis of Research Trends in Journal of Distribution Science (유통과학연구의 연구 동향 분석 : 창간호부터 제8권 제3호까지를 중심으로)

  • Kim, Young-Min;Kim, Young-Ei;Youn, Myoung-Kil
    • Journal of Distribution Science
    • /
    • v.8 no.4
    • /
    • pp.5-15
    • /
    • 2010
  • This study investigated research trends of JDS that KODISA published and gave implications to elevate quality of scholarly journals. In other words, the study classified scientific system of distribution area to investigate research trends and to compare it with other scholarly journals of distribution and to give implications for higher level of JDS. KODISA published JDS Vol.1 No.1 for the first time in 1999 followed by Vol.8 No.3 in September 2010 to show 109 theses in total. KODISA investigated subjects, research institutions, number of participants, methodology, frequency of theses in both the Korean language and English, frequency of participation of not only the Koreans but also foreigners and use of references, etc. And, the study investigated JDR of KODIA, JKDM(The Journal of Korean Distribution & Management) and JDA that researched distribution, so that it found out development ways. To investigate research trends of JDS that KODISA publishes, main category was made based on the national science and technology standard classification system of MEST (Ministry Of Education, Science And Technology), table of classification of research areas of NRF(National Research Foundation of Korea), research classification system of both KOREADIMA and KLRA(Korea Logistics Research Association) and distribution science and others that KODISA is looking for, and distribution economy area was divided into general distribution, distribution economy, distribution, distribution information and others, and distribution management was divided into distribution management, marketing, MD and purchasing, consumer behavior and others. The findings were as follow: Firstly, main category occupied 47 theses (43.1%) of distribution economy and 62 theses (56.9%) of distribution management among 109 theses in total. Active research area of distribution economy consisted of 14 theses (12.8%) of distribution information and 9 theses (8.3%) of distribution economy to research distribution as well as distribution information positively every year. The distribution management consisted of 25 theses (22.9%) of distribution management and 20 theses (18.3%) of marketing, These days, research on distribution management, marketing, distribution, distribution information and others is increasing. Secondly, researchers published theses as follow: 55 theses (50.5%) by professor by himself or herself, 12 theses (11.0%) of joint research by professors and businesses, Professors/students published 9 theses (8.3%) followed by 5 theses (4.6%) of researchers, 5 theses (4.6%) of businesses, 4 theses (3.7%) of professors, researchers and businesses and 2 theses (1.8%) of students. Professors published theses less, while businesses, research institutions and graduate school students did more continuously. The number of researchers occupied single researcher (43 theses, 39.5%), two researchers (42 theses, 38.5%) and three researchers or more (24 theses, 22.0%). Thirdly, professors published theses the most at most of areas. Researchers of main category of distribution economy consisted of professors (25 theses, 53.2%), professors and businesses (7 theses, 14.9%), professors and businesses (7 theses, 14.9%), professors and researchers (6 theses, 12.8%) and professors and students (3 theses, 6.3%). And, researchers of main category of distribution management consisted of professors (30 theses, 48.4%), professors and businesses (10 theses, 16.1%), and professors and researchers as well as professors and students (6 theses, 9.7%). Researchers of distribution management consisted of professors, professors and businesses, professors and researchers, researchers and businesses, etc to have various types. Professors mainly researched marketing, MD and purchasing, and consumer behavior, etc to demand active participation of businesses and researchers. Fourthly, research methodology was: Literature research occupied 45 theses (41.3%) the most followed by empirical research based on questionnaire survey (44 theses, 40.4%). General distribution, distribution economy, distribution and distribution management, etc mostly adopted literature research, while marketing did empirical research based on questionnaire survey the most. Fifthly, theses in the Korean language occupied 92.7% (101 theses), while those in English did 7.3% (8 theses). No more than one thesis in English was published until 2006, and 7 theses (11.9%) were published after 2007 to increase. The theses in English were published more to be affirmative. Foreigner researcher published one thesis (0.9%) and both Korean researchers and foreigner researchers jointly published two theses (1.8%) to have very much low participation of foreigner researchers. Sixthly, one thesis of JDS had 27.5 references in average that consisted of 11.1 local references and 16.4 foreign references. And, cited times was 0.4 thesis in average to be low. The distribution economy cited 24.2 references in average (9.4 local references and 14.8 foreign references and JDS had 0.6 cited reference. The distribution management had 30.0 references in average (12.1 local references and 17.9 foreign references) and had 0.3 reference of JDS itself. Seventhly, similar type of scholarly journal had theses in the Korean language and English: JDR( Journal of Distribution Research) of KODIA(Korea Distribution Association) published 92 theses in the Korean language (96.8%) and 3 theses in English (3.2%), that is to say, 95 theses in total. JKDM of KOREADIMA published 132 theses in total that consisted of 93 theses in the Korean language (70.5%) and 39 theses in English (29.5%). Since 2008, JKDM has published scholarly journal in English one time every year. JDS published 52 theses in the Korean language (88.1%) and 7 theses in English (11.9%), that is to say, 59 theses in total. Sixthly, similar type of scholarly journals and research methodology were: JDR's research methodology had 65 empirical researches based on questionnaire survey (68.4%), followed by 17 literature researches (17.9%) and 11 quantitative analyses (11.6%). JKDM made use of various kinds of research methodologies to have 60 questionnaire surveys (45.5%), followed by 40 literature researches (30.3%), 21 quantitative analyses (15.9%), 6 system analyses (4.5%) and 5 case studies (3.8%). And, JDS made use of 30 questionnaire surveys (50.8%), followed by 15 literature researches (25.4%), 7 case studies (11.9%) and 6 quantitative analyses (10.2%). Ninthly, similar types of scholarly journals and Korean researchers and foreigner researchers were: JDR published 93 theses (97.8%) by Korean researchers except for 1 thesis by foreigner researcher and 1 thesis by joint research of the Korean researchers and foreigner researchers. And, JKDM had no foreigner research and 13 theses (9.8%) by joint research of the Korean researchers and foreigner researchers to have more foreigner researchers as well as researchers in foreign countries than similar types of scholarly journals had. And, JDS published 56 theses (94.9%) of the Korean researchers, one thesis (1.7%) of foreigner researcher only, and 2 theses (3.4%) of joint research of both the Koreans and foreigners. Tenthly, similar type of scholarly journals and reference had citation: JDR had 42.5 literatures in average that consisted of 10.9 local literatures (25.7%) and 31.6 foreign literatures (74.3%), and cited times accounted for 1.1 thesis to decrease. JKDM cited 10.5 Korean literatures (36.3%) and 18.4 foreign literatures (63.7%), and number of self-cited literature was no more than 1.1. Number of cited times accounted for 2.9 literatures in 2008 and then decreased continuously since then. JDS cited 26,8 references in average that consisted of 10.9 local references (40.7%) and 15.9 foreign references (59.3%), and number of self-cited accounted for 0.2 reference until 2009, and it increased to be 2.1 references in 2010. The author gives implications based on JDS research trends and investigation on similar type of scholarly journals as follow: Firstly, JDS shall actively invite foreign contributors to prepare for SSCI. Secondly, ratio of theses in English shall increase greatly. Thirdly, various kinds of research methodology shall be accepted to elevate quality of scholarly journals. Fourthly, to increase cited times, Google and other web retrievals shall be reinforced to supply scholarly journals to foreign countries more. Local scholarly journals can be worldwide scholarly journal enough to be acknowledged even in foreign countries by improving the implications above.

  • PDF

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.