• Title/Summary/Keyword: Sales data prediction

Search Result 108, Processing Time 0.031 seconds

Prediction of Good Seller in Overseas sales of Domestic Books Using Big Data (빅데이터를 활용한 국내 도서의 해외 판매시 굿셀러 예측)

  • Kim, Nayeon;Kim, Doyoung;Kim, Miryeo;Jung, Jiyeong;Kim, Hyon Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.401-404
    • /
    • 2022
  • 한국 문학이 세계로 뻗어나감에 따라 해외 시장에서 자리를 잡는 것이 중요해진 시점이다. 본 연구에서는 2016 년도부터 2020 년도까지 최근 5 년간 해외 출간된 도서들 중에서 굿셀러로 분류되는 누적 5 천부 이상 판매 여부를 예측하고자 했다. 굿셀러로 분류되는 도서는 전체 번역 도서 중 적은 비율을 차지하여 데이터 불균형이 발생하였으며, 본 연구에서는 SMOTE 기법과 앙상블 알고리즘을 적용하여 데이터 불균형 문제를 해결하였다. 그 결과, 데이터 클래스 비율이 1:1 에 가까울수록 성능 개선 효과가 나타났으며 LightGBM 모델이 99.83%의 AUC 값을 얻어 다른 앙상블 알고리즘에 비해 가장 좋은 예측 성능을 보임을 검증하였다. 또한 누적 5 천부 이상 판매 여부 예측에 있어 큰 영향을 미치는 변수로는 작가가 가장 중요한 요인으로 나타났으며 출간 국가, 그리고 평점 평균, 평점 참여자 수 같은 온라인 요인도 판매 예측에 유의미한 변수로 나타난 것을 확인할 수 있었다.

Association Analysis of Product Sales using Sequential Layer Filtering (순차적 레이어 필터링을 이용한 상품 판매 연관도 분석)

  • Sun-Ho Bang;Kang-Hyun Lee;Ji-Young Jang;Tsatsral Telmentugs;Kwnag-Sup Shin
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.213-224
    • /
    • 2022
  • In logistics and distribution, Market Basket Analysis (MBA) is used as an important means to analyze the correlation between major sales products and to increase internal operational efficiency. In particular, the results of market basket analysis are used as important reference data for decision-making processes such as product purchase prediction, product recommendation, and product display structure in stores. With the recent development of e-commerce, the number of items handled by a single distribution and logistics company has rapidly increased, And the existing analytical methods such as Apriori and FP-Growth have slowed down due to the exponential increase in the amount of calculation and applied to actual business. There is a limit to examining important association rules to overcome this limitation, In this study, at the Main-Category level, which is the highest classification system of products, the utility item set mining technique that can consider the sales volume of products together was used to first select a group of products mainly sold together. Then, at the sub-category level, the types of products sold together were identified using FP-Growth. By using this sequential layer filtering technique, it may be possible to reduce the unnecessary calculations and to find practically usable rules for enhancing the effectiveness and profitability.

Development of a Resort's Cross-selling Prediction Model and Its Interpretation using SHAP (리조트 교차판매 예측모형 개발 및 SHAP을 이용한 해석)

  • Boram Kang;Hyunchul Ahn
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.195-204
    • /
    • 2022
  • The tourism industry is facing a crisis due to the recent COVID-19 pandemic, and it is vital to improving profitability to overcome it. In situations such as COVID-19, it would be more efficient to sell additional products other than guest rooms to customers who have visited to increase the unit price rather than adopting an aggressive sales strategy to increase room occupancy to increase profits. Previous tourism studies have used machine learning techniques for demand forecasting, but there have been few studies on cross-selling forecasting. Also, in a broader sense, a resort is the same accommodation industry as a hotel. However, there is no study specialized in the resort industry, which is operated based on a membership system and has facilities suitable for lodging and cooking. Therefore, in this study, we propose a cross-selling prediction model using various machine learning techniques with an actual resort company's accommodation data. In addition, by applying the explainable artificial intelligence XAI(eXplainable AI) technique, we intend to interpret what factors affect cross-selling and confirm how they affect cross-selling through empirical analysis.

A deep learning analysis of the KOSPI's directions (딥러닝분석과 기술적 분석 지표를 이용한 한국 코스피주가지수 방향성 예측)

  • Lee, Woosik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.287-295
    • /
    • 2017
  • Since Google's AlphaGo defeated a world champion of Go players in 2016, there have been many interests in the deep learning. In the financial sector, a Robo-Advisor using deep learning gains a significant attention, which builds and manages portfolios of financial instruments for investors.In this paper, we have proposed the a deep learning algorithm geared toward identification and forecast of the KOSPI index direction,and we also have compared the accuracy of the prediction.In an application of forecasting the financial market index direction, we have shown that the Robo-Advisor using deep learning has a significant effect on finance industry. The Robo-Advisor collects a massive data such as earnings statements, news reports and regulatory filings, analyzes those and recommends investors how to view market trends and identify the best time to purchase financial assets. On the other hand, the Robo-Advisor allows businesses to learn more about their customers, develop better marketing strategies, increase sales and decrease costs.

An Empirical Study on the Effects of e-Mail Marketing : A focus on e-Mail Campaign for Credit Card Consumers (이메일 마케팅 성과에 관한 연구: 신용카드 고객을 대상으로 한 캠페인을 중심으로)

  • Shin, Sung-Hoon;Chung, Soo-Yeon;Park, Cheol
    • Information Systems Review
    • /
    • v.11 no.1
    • /
    • pp.49-67
    • /
    • 2009
  • E-mail marketing is the cheapest channel in target marketing. The channel works amazingly well for marketers who know how to use it. The e-mail marketers are able to integrate transactional and behavioral data to improve the targeting content of e-mail marketing campaigns. The cost in e-mail marketing is low and e-mail marketing makes no pollution. But, the e-mail response rate is lower than all the other channels. So, it is very hard for companies to increase their sales volumes, though the companies are ready to execute e-mail marketing campaigns on the side of computer systems. Marketers can send messages easily to target customers compared to other channels. But, the possibility to be read by the customers is low. Normal e-mails are continually devalued by spam mails. This study shows the influence of e-mail marketing to increase sales used by credit cards, on the basis of the real data promoted by A bank, in the Republic of Korea. The analysis on the traits of the respondent can help marketers to target customers. If additional studies on the response prediction model on the basis of traits of potential respondents are done, the targeting method to increase the effectiveness of e-mail marketing will be better structured and organized.

Spatial Hedonic Modeling using Geographically Weighted LASSO Model (GWL을 적용한 공간 헤도닉 모델링)

  • Jin, Chanwoo;Lee, Gunhak
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.6
    • /
    • pp.917-934
    • /
    • 2014
  • Geographically weighted regression(GWR) model has been widely used to estimate spatially heterogeneous real estate prices. The GWR model, however, has some limitations of the selection of different price determinants over space and the restricted number of observations for local estimation. Alternatively, the geographically weighted LASSO(GWL) model has been recently introduced and received a growing interest. In this paper, we attempt to explore various local price determinants for the real estate by utilizing the GWL and its applicability to forecasting the real estate price. To do this, we developed the three hedonic models of OLS, GWR, and GWL focusing on the sales price of apartments in Seoul and compared those models in terms of model fit, prediction, and multicollinearity. As a result, local models appeared to be better than the global OLS on the whole, and in particular, the GWL appeared to be more explanatory and predictable than other models. Moreover, the GWL enabled to provide spatially different sets of price determinants which no multicollinearity exists. The GWL helps select the significant sets of independent variables from a high dimensional dataset, and hence will be a useful technique for large and complex spatial big data.

  • PDF

The Effect of Related Party Transactions on Crash Risk (특수관계자 거래가 주가급락에 미치는 영향)

  • Ryu, Hae-Young
    • The Journal of Industrial Distribution & Business
    • /
    • v.9 no.6
    • /
    • pp.49-55
    • /
    • 2018
  • Purpose - This paper examines the effect of related party transactions on crash firm-specific stock price crash risk. Ownership of a typical Korean conglomerate is concentrated in a single family. In those entities, management and board positions are often filled by family members. Therefore, a dominant shareholder can benefit from related party transactions. In Korea, firms have to report related party transactions in financial statement footnotes. However, those are not disclosed in detail. The more related party transactions are the greater information risk. Thus, companies with related party transactions are likely to experience stock price crashes. Research design, data, and methodology - 2,598 firm-year observations are used for the main analysis. Those samples are from TS2000 database from 2009 to 2013, and the database covers KOSPI-listed firms in Korea. The proxy for related party transactions (RTP) is calculated by dividing total transactions to the related-party by total sales. A dummy variable is used as a dependent variable (CRASH) in the regression model. Logistic regression is used to explain the relationship between related party transactions and crash risk. Then, the sample was separated into two groups; tunneling firms and propping firms. The relation between related party transactions and crash risk variances with features of the transaction were investigated. Results - Using a sample of KOSPI-listed firms in TS2000 database for the period of 2009-2013, I find that stock price crash risk increases as the trade volume of related-party transactions increases. Specifically, I find that the coefficient of RPT is significantly positive, supporting the prediction. In addition, this relationship is strong and robust in tunneling firms. Conclusions - The results report that firms with related party transactions are more likely to experience stock price crashes. The results mean that related party transactions increase the possibility of future stock price crashes by enlarging information asymmetry between controlling shareholders and minority shareholders. In case of tunneling, it could be seen that related party transactions are positively associated with stock crash risk. The result implies that the characteristic of the transaction influences crash risk. This study is related to a literature that investigates the effect of related party transactions on the stock market.

The Effect of Managerial Overconfidence on Crash Risk (경영자과신이 주가급락위험에 미치는 영향)

  • Ryu, Haeyoung
    • The Journal of Industrial Distribution & Business
    • /
    • v.8 no.5
    • /
    • pp.87-93
    • /
    • 2017
  • Purpose - This paper investigates whether managerial overconfidence is associated with firm-specific crash risk. Overconfidence leads managers to overestimate the returns of their investment projects, and misperceive negative net present value projects as value creating. They even use voluntary disclosures to convey their optimistic beliefs about the firms' long-term prospects to the stock market. Thus, the overconfidence bias can lead to managerial bad news hoarding behavior. When bad news accumulates and crosses some tipping point, it will come out all at once, resulting in a stock price crash. Research design, data and methodology - 7,385 firm-years used for the main analysis are from the KIS Value database between 2006 and 2013. This database covers KOSPI-listed and KOSDAQ-listed firms in Korea. The proxy for overconfidence is based on excess investment in assets. A residual from the regression of total asset growth on sales growth run by industry-year is used as an independent variable. If a firm has at least one crash week during a year, it is referred to as a high crash risk firm. The dependant variable is a dummy variable that equals 1 if a firm is a high crash risk firm, and zero otherwise. After explaining the relationship between managerial overconfidence and crash risk, the total sample was divided into two sub-samples; chaebol firms and non-chaebol firms. The relation between how I overconfidence and crash risk varies with business group affiliation was investigated. Results - The results showed that managerial overconfidence is positively related to crash risk. Specifically, the coefficient of OVERC is significantly positive, supporting the prediction. The results are strong and robust in non-chaebol firms. Conclusions - The results show that firms with overconfident managers are likely to experience stock price crashes. This study is related to past literature that examines the impact of managerial overconfidence on the stock market. This study contributes to the literature by examining whether overconfidence can explain a firm's future crashes.

Application of machine learning models for estimating house price (단독주택가격 추정을 위한 기계학습 모형의 응용)

  • Lee, Chang Ro;Park, Key Ho
    • Journal of the Korean Geographical Society
    • /
    • v.51 no.2
    • /
    • pp.219-233
    • /
    • 2016
  • In social science fields, statistical models are used almost exclusively for causal explanation, and explanatory modeling has been a mainstream until now. In contrast, predictive modeling has been rare in the fields. Hence, we focus on constructing the predictive non-parametric model, instead of the explanatory model. Gangnam-gu, Seoul was chosen as a study area and we collected single-family house sales data sold between 2011 and 2014. We applied non-parametric models proposed in machine learning area including generalized additive model(GAM), random forest, multivariate adaptive regression splines(MARS) and support vector machines(SVM). Models developed recently such as MARS and SVM were found to be superior in predictive power for house price estimation. Finally, spatial autocorrelation was accounted for in the non-parametric models additionally, and the result showed that their predictive power was enhanced further. We hope that this study will prompt methodology for property price estimation to be extended from traditional parametric models into non-parametric ones.

  • PDF

A Study on Interactions of Competitive Promotions Between the New and Used Cars (신차와 중고차간 프로모션의 상호작용에 대한 연구)

  • Chang, Kwangpil
    • Asia Marketing Journal
    • /
    • v.14 no.1
    • /
    • pp.83-98
    • /
    • 2012
  • In a market where new and used cars are competing with each other, we would run the risk of obtaining biased estimates of cross elasticity between them if we focus on only new cars or on only used cars. Unfortunately, most of previous studies on the automobile industry have focused on only new car models without taking into account the effect of used cars' pricing policy on new cars' market shares and vice versa, resulting in inadequate prediction of reactive pricing in response to competitors' rebate or price discount. However, there are some exceptions. Purohit (1992) and Sullivan (1990) looked into both new and used car markets at the same time to examine the effect of new car model launching on the used car prices. But their studies have some limitations in that they employed the average used car prices reported in NADA Used Car Guide instead of actual transaction prices. Some of the conflicting results may be due to this problem in the data. Park (1998) recognized this problem and used the actual prices in his study. His work is notable in that he investigated the qualitative effect of new car model launching on the pricing policy of the used car in terms of reinforcement of brand equity. The current work also used the actual price like Park (1998) but the quantitative aspect of competitive price promotion between new and used cars of the same model was explored. In this study, I develop a model that assumes that the cross elasticity between new and used cars of the same model is higher than those amongst new cars and used cars of the different model. Specifically, I apply the nested logit model that assumes the car model choice at the first stage and the choice between new and used cars at the second stage. This proposed model is compared to the IIA (Independence of Irrelevant Alternatives) model that assumes that there is no decision hierarchy but that new and used cars of the different model are all substitutable at the first stage. The data for this study are drawn from Power Information Network (PIN), an affiliate of J.D. Power and Associates. PIN collects sales transaction data from a sample of dealerships in the major metropolitan areas in the U.S. These are retail transactions, i.e., sales or leases to final consumers, excluding fleet sales and including both new car and used car sales. Each observation in the PIN database contains the transaction date, the manufacturer, model year, make, model, trim and other car information, the transaction price, consumer rebates, the interest rate, term, amount financed (when the vehicle is financed or leased), etc. I used data for the compact cars sold during the period January 2009- June 2009. The new and used cars of the top nine selling models are included in the study: Mazda 3, Honda Civic, Chevrolet Cobalt, Toyota Corolla, Hyundai Elantra, Ford Focus, Volkswagen Jetta, Nissan Sentra, and Kia Spectra. These models in the study accounted for 87% of category unit sales. Empirical application of the nested logit model showed that the proposed model outperformed the IIA (Independence of Irrelevant Alternatives) model in both calibration and holdout samples. The other comparison model that assumes choice between new and used cars at the first stage and car model choice at the second stage turned out to be mis-specfied since the dissimilarity parameter (i.e., inclusive or categroy value parameter) was estimated to be greater than 1. Post hoc analysis based on estimated parameters was conducted employing the modified Lanczo's iterative method. This method is intuitively appealing. For example, suppose a new car offers a certain amount of rebate and gains market share at first. In response to this rebate, a used car of the same model keeps decreasing price until it regains the lost market share to maintain the status quo. The new car settle down to a lowered market share due to the used car's reaction. The method enables us to find the amount of price discount to main the status quo and equilibrium market shares of the new and used cars. In the first simulation, I used Jetta as a focal brand to see how its new and used cars set prices, rebates or APR interactively assuming that reactive cars respond to price promotion to maintain the status quo. The simulation results showed that the IIA model underestimates cross elasticities, resulting in suggesting less aggressive used car price discount in response to new cars' rebate than the proposed nested logit model. In the second simulation, I used Elantra to reconfirm the result for Jetta and came to the same conclusion. In the third simulation, I had Corolla offer $1,000 rebate to see what could be the best response for Elantra's new and used cars. Interestingly, Elantra's used car could maintain the status quo by offering lower price discount ($160) than the new car ($205). In the future research, we might want to explore the plausibility of the alternative nested logit model. For example, the NUB model that assumes choice between new and used cars at the first stage and brand choice at the second stage could be a possibility even though it was rejected in the current study because of mis-specification (A dissimilarity parameter turned out to be higher than 1). The NUB model may have been rejected due to true mis-specification or data structure transmitted from a typical car dealership. In a typical car dealership, both new and used cars of the same model are displayed. Because of this fact, the BNU model that assumes brand choice at the first stage and choice between new and used cars at the second stage may have been favored in the current study since customers first choose a dealership (brand) then choose between new and used cars given this market environment. However, suppose there are dealerships that carry both new and used cars of various models, then the NUB model might fit the data as well as the BNU model. Which model is a better description of the data is an empirical question. In addition, it would be interesting to test a probabilistic mixture model of the BNU and NUB on a new data set.

  • PDF