• Title/Summary/Keyword: K-best algorithm

Search Result 1,029, Processing Time 0.034 seconds

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

  • Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.177-192
    • /
    • 2016
  • Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.

A Multimodal Profile Ensemble Approach to Development of Recommender Systems Using Big Data (빅데이터 기반 추천시스템 구현을 위한 다중 프로파일 앙상블 기법)

  • Kim, Minjeong;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.93-110
    • /
    • 2015
  • The recommender system is a system which recommends products to the customers who are likely to be interested in. Based on automated information filtering technology, various recommender systems have been developed. Collaborative filtering (CF), one of the most successful recommendation algorithms, has been applied in a number of different domains such as recommending Web pages, books, movies, music and products. But, it has been known that CF has a critical shortcoming. CF finds neighbors whose preferences are like those of the target customer and recommends products those customers have most liked. Thus, CF works properly only when there's a sufficient number of ratings on common product from customers. When there's a shortage of customer ratings, CF makes the formation of a neighborhood inaccurate, thereby resulting in poor recommendations. To improve the performance of CF based recommender systems, most of the related studies have been focused on the development of novel algorithms under the assumption of using a single profile, which is created from user's rating information for items, purchase transactions, or Web access logs. With the advent of big data, companies got to collect more data and to use a variety of information with big size. So, many companies recognize it very importantly to utilize big data because it makes companies to improve their competitiveness and to create new value. In particular, on the rise is the issue of utilizing personal big data in the recommender system. It is why personal big data facilitate more accurate identification of the preferences or behaviors of users. The proposed recommendation methodology is as follows: First, multimodal user profiles are created from personal big data in order to grasp the preferences and behavior of users from various viewpoints. We derive five user profiles based on the personal information such as rating, site preference, demographic, Internet usage, and topic in text. Next, the similarity between users is calculated based on the profiles and then neighbors of users are found from the results. One of three ensemble approaches is applied to calculate the similarity. Each ensemble approach uses the similarity of combined profile, the average similarity of each profile, and the weighted average similarity of each profile, respectively. Finally, the products that people among the neighborhood prefer most to are recommended to the target users. For the experiments, we used the demographic data and a very large volume of Web log transaction for 5,000 panel users of a company that is specialized to analyzing ranks of Web sites. R and SAS E-miner was used to implement the proposed recommender system and to conduct the topic analysis using the keyword search, respectively. To evaluate the recommendation performance, we used 60% of data for training and 40% of data for test. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. A widely used combination metric called F1 metric that gives equal weight to both recall and precision was employed for our evaluation. As the results of evaluation, the proposed methodology achieved the significant improvement over the single profile based CF algorithm. In particular, the ensemble approach using weighted average similarity shows the highest performance. That is, the rate of improvement in F1 is 16.9 percent for the ensemble approach using weighted average similarity and 8.1 percent for the ensemble approach using average similarity of each profile. From these results, we conclude that the multimodal profile ensemble approach is a viable solution to the problems encountered when there's a shortage of customer ratings. This study has significance in suggesting what kind of information could we use to create profile in the environment of big data and how could we combine and utilize them effectively. However, our methodology should be further studied to consider for its real-world application. We need to compare the differences in recommendation accuracy by applying the proposed method to different recommendation algorithms and then to identify which combination of them would show the best performance.

A Study on the Design of Case-based Reasoning Office Knowledge Recommender System for Office Professionals (사례기반추론을 이용한 사무지식 추천시스템)

  • Kim, Myong-Ok;Na, Jung-Ah
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.131-146
    • /
    • 2011
  • It is becoming more essential than ever for office professionals to become competent in information collection/gathering and problem solving in today's global business society. In particular, office professionals do not only assist simple chores but are also forced to make decisions as quickly and efficiently as possible in problematic situations that can end in either profit or loss to their company. Since office professionals rely heavily on their tacit knowledge to solve problems that arise in everyday business situations, it is truly helpful and efficient to refer to similar business cases from the past and share or reuse such previous business knowledge for better performance results. Case-based reasoning(CBR) is a problem-solving method which utilizes previous similar cases to solve problems. Through CBR, the closest case to the current business situation can be searched and retrieved from the case or knowledge base and can be referred to for a new solution. This reduces the time and resources needed and increase success probability. The main purpose of this study is to design a system called COKRS(Case-based reasoning Office Knowledge Recommender System) and develop a prototype for it. COKRS manages cases and their meta data, accepts key words from the user and searches the casebase for the most similar past case to the input keyword, and communicates with users to collect information about the quality of the case provided and continuously apply the information to update values on the similarity table. Core concepts like system architecture, definition of a case, meta database, similarity table have been introduced, and also an algorithm to retrieve all similar cases from past work history has also been proposed. In this research, a case is best defined as a work experience in office administration. However, defining a case in office administration was not an easy task in reality. We surveyed 10 office professionals in order to get an idea of how to define a case in office administration and found out that in most cases any type of office work is to be recorded digitally and/or non-digitally. Therefore, we have defined a record or document case as for COKRS. Similarity table was composed of items of the result of job analysis for office professionals conducted in a previous research. Values between items of the similarity table were initially set to those from researchers' experiences and literature review. The results of this study could also be utilized in other areas of business for knowledge sharing wherever it is necessary and beneficial to share and learn from past experiences. We expect this research to be a reference for researchers and developers who are in this area or interested in office knowledge recommendation system based on CBR. Focus group interview(FGI) was conducted with ten administrative assistants carefully selected from various areas of business. They were given a chance to try out COKRS in an actual work setting and make some suggestions for future improvement. FGI has identified the user-interface for saving and searching cases for keywords as the most positive aspect of COKRS, and has identified the most urgently needed improvement as transforming tacit knowledge and knowhow into recorded documents more efficiently. Also, the focus group has mentioned that it is essential to secure enough support, encouragement, and reward from the company and promote positive attitude and atmosphere for knowledge sharing for everybody's benefit in the company.

Treatment Strategies for Depression during Pregnancy and Lactation (임신과 수유기 우울증의 치료 전략)

  • Lee, Soyoung Irene;Jung, Han-Yong
    • Korean Journal of Biological Psychiatry
    • /
    • v.14 no.2
    • /
    • pp.91-98
    • /
    • 2007
  • Objectives : Considering the impact of depressive illness on physical and mental health of both mother and fetus, specification of a treatment algorithm for depressive disorder during pregnancy is legitimated. This article provides a systemic review of treatments for depressive disorder during pregnancy and lactation. Methods : According to the search strategy of the Clinical Research Center for Depression of Korean Health 21 R & D Project, PubMed and EMBASE were searched using terms with regard to the treatment of depressive disorders during pregnancy and lactation. Reference lists of related reviews and studies were searched. In addition, relevant practice guidelines were searched using the PubMed. All identified clinical literatures were reviewed and summarized in a narrative manner. Results : Pharmacotherapy during pregnancy and lactation requires a comprehensive assessment of the risks and benefits of treatment for both mother and fetus or neonate. Recently, there is growing evidence that the use of tricyclic and selective serotonin reuptake inhibitors during pregnancy and lactation does not result in increased risks of teratogenicity. Treatment strategies are described according to the point of time of pregnancy or lactation. FDA categories for antidepressants during pregnancy and lactation are described. In addition, issues regarding to the electroconvulsive therapy and psychosocial treatment are discussed. Conclusion : The treatment option for depressive disorders during pregnancy and lactation depends on the severity of depressive illnesses of the individual patient. For mild to moderate depression, the non-pharmacological treatment should be considered first. For moderate to severe depression, pharmacotherapy should be administered in addition to the psychosocial treatment. ECT is recommended for depressive disorder of severe intensity. As the research knowledge is limited, the recommendations should based on the best judgement of psychiatrists.

  • PDF

The Availability of the step optimization in Monaco Planning system (모나코 치료계획 시스템에서 단계적 최적화 조건 실현의 유용성)

  • Kim, Dae Sup
    • The Journal of Korean Society for Radiation Therapy
    • /
    • v.26 no.2
    • /
    • pp.207-216
    • /
    • 2014
  • Purpose : We present a method to reduce this gap and complete the treatment plan, to be made by the re-optimization is performed in the same conditions as the initial treatment plan different from Monaco treatment planning system. Materials and Methods : The optimization is carried in two steps when performing the inverse calculation for volumetric modulated radiation therapy or intensity modulated radiation therapy in Monaco treatment planning system. This study was the first plan with a complete optimization in two steps by performing all of the treatment plan, without changing the optimized condition from Step 1 to Step 2, a typical sequential optimization performed. At this time, the experiment was carried out with a pencil beam and Monte Carlo algorithm is applied In step 2. We compared initial plan and re-optimized plan with the same optimized conditions. And then evaluated the planning dose by measurement. When performing a re-optimization for the initial treatment plan, the second plan applied the step optimization. Results : When the common optimization again carried out in the same conditions in the initial treatment plan was completed, the result is not the same. From a comparison of the treatment planning system, similar to the dose-volume the histogram showed a similar trend, but exhibit different values that do not satisfy the conditions best optimized dose, dose homogeneity and dose limits. Also showed more than 20% different in comparison dosimetry. If different dose algorithms, this measure is not the same out. Conclusion : The process of performing a number of trial and error, and you get to the ultimate goal of treatment planning optimization process. If carried out to optimize the completion of the initial trust only the treatment plan, we could be made of another treatment plan. The similar treatment plan could not satisfy to optimization results. When you perform re-optimization process, you will need to apply the step optimized conditions, making sure the dose distribution through the optimization process.

Determinants of Consumer Preference by type of Accommodation: Two Step Cluster Analysis (이단계 군집분석에 의한 농촌관광 편의시설 유형별 소비자 선호 결정요인)

  • Park, Duk-Byeong;Yoon, Yoo-Shik;Lee, Min-Soo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.17 no.3
    • /
    • pp.1-19
    • /
    • 2007
  • 1. Purpose Rural tourism is made by individuals with different characteristics, needs and wants. It is important to have information on the characteristics and preferences of the consumers of the different types of existing rural accommodation. The stud aims to identify the determinants of consumer preference by type of accommodations. 2. Methodology 2.1 Sample Data were collected from 1000 people by telephone survey with three-stage stratified random sampling in seven metropolitan areas in Korea. Respondents were chosen by sampling internal on telephone book published in 2006. We surveyed from four to ten-thirty 0'clock afternoon so as to systematic sampling considering respondents' life cycle. 2.2 Two-step cluster Analysis Our study is accomplished through the use of a two-step cluster method to classify the accommodation in a reduced number of groups, so that each group constitutes a type. This method had been suggested as appropriate in clustering large data sets with mixed attributes. The method is based on a distance measure that enables data with both continuous and categorical attributes to be clustered. This is derived from a probabilistic model in which the distance between two clusters in equivalent to the decrease in log-likelihood function as a result of merging. 2.3 Multinomial Logit Analysis The estimation of a Multionmial Logit model determines the characteristics of tourist who is most likely to opt for each type of accommodation. The Multinomial Logit model constitutes an appropriate framework to explore and explain choice process where the choice set consists of more than two alternatives. Due to its ease and quick estimation of parameters, the Multinomial Logit model has been used for many empirical studies of choice in tourism. 3. Findings The auto-clustering algorithm indicated that a five-cluster solution was the best model, because it minimized the BIC value and the change in them between adjacent numbers of clusters. The accommodation establishments can be classified into five types: Traditional House, Typical Farmhouse, Farmstay house for group Tour, Log Cabin for Family, and Log Cabin for Individuals. Group 1 (Traditional House) includes mainly the large accommodation establishments, i.e. those with ondoll style room providing meals and one shower room on family tourist, of original construction style house. Group 2 (Typical Farmhouse) encompasses accommodation establishments of Ondoll rooms and each bathroom providing meals. It includes, in other words, the tourist accommodations Known as "rural houses." Group 3 (Farmstay House for Group) has accommodation establishments of Ondoll rooms not providing meals and self cooking facilities, large room size over five persons. Group 4 (Log Cabin for Family) includes mainly the popular accommodation establishments, i.e. those with Ondoll style room with on shower room on family tourist, of western styled log house. While the accommodations in this group are not defined as regards type of construction, the group does include all the original Korean style construction, Finally, group 5 (Log Cabin for Individuals)includes those accommodations that are bedroom western styled wooden house with each bathroom. First Multinomial Logit model is estimated including all the explicative variables considered and taking accommodation group 2 as base alternative. The results show that the variables and the estimated values of the parameters for the model giving the probability of each of the five different types of accommodation available in rural tourism village in Korea, according to the socio-economic and trip related characteristics of the individuals. An initial observation of the analysis reveals that none of variables income, the number of journey, distance, and residential style of house is explicative in the choice of rural accommodation. The age and accompany variables are significant for accommodation establishment of group 1. The education and rural residential experience variables are significant for accommodation establishment of groups 4 and 5. The expenditure and marital status variables are significant for accommodation establishment of group 4. The gender and occupation variable are significant for accommodation establishment of group 3. The loyalty variable is significant for accommodation establishment of groups 3 and 4. The study indicates that significant differences exist among the individuals who choose each type of accommodation at a destination. From this investigation is evident that several profiles of tourists can be attracted by a rural destination according to the types of existing accommodations at this destination. Besides, the tourist profiles may be used as the basis for investment policy and promotion for each type of accommodation, making use in each case of the variables that indicate a greater likelihood of influencing the tourist choice of accommodation.

  • PDF

A CF-based Health Functional Recommender System using Extended User Similarity Measure (확장된 사용자 유사도를 이용한 CF-기반 건강기능식품 추천 시스템)

  • Sein Hong;Euiju Jeong;Jaekyeong Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.1-17
    • /
    • 2023
  • With the recent rapid development of ICT(Information and Communication Technology) and the popularization of digital devices, the size of the online market continues to grow. As a result, we live in a flood of information. Thus, customers are facing information overload problems that require a lot of time and money to select products. Therefore, a personalized recommender system has become an essential methodology to address such issues. Collaborative Filtering(CF) is the most widely used recommender system. Traditional recommender systems mainly utilize quantitative data such as rating values, resulting in poor recommendation accuracy. Quantitative data cannot fully reflect the user's preference. To solve such a problem, studies that reflect qualitative data, such as review contents, are being actively conducted these days. To quantify user review contents, text mining was used in this study. The general CF consists of the following three steps: user-item matrix generation, Top-N neighborhood group search, and Top-K recommendation list generation. In this study, we propose a recommendation algorithm that applies an extended similarity measure, which utilize quantified review contents in addition to user rating values. After calculating review similarity by applying TF-IDF, Word2Vec, and Doc2Vec techniques to review content, extended similarity is created by combining user rating similarity and quantified review contents. To verify this, we used user ratings and review data from the e-commerce site Amazon's "Health and Personal Care". The proposed recommendation model using extended similarity measure showed superior performance to the traditional recommendation model using only user rating value-based similarity measure. In addition, among the various text mining techniques, the similarity obtained using the TF-IDF technique showed the best performance when used in the neighbor group search and recommendation list generation step.

Relationships on Magnitude and Frequency of Freshwater Discharge and Rainfall in the Altered Yeongsan Estuary (영산강 하구의 방류와 강우의 규모 및 빈도 상관성 분석)

  • Rhew, Ho-Sang;Lee, Guan-Hong
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.16 no.4
    • /
    • pp.223-237
    • /
    • 2011
  • The intermittent freshwater discharge has an critical influence upon the biophysical environments and the ecosystems of the Yeongsan Estuary where the estuary dam altered the continuous mixing of saltwater and freshwater. Though freshwater discharge is controlled by human, the extreme events are mainly driven by the heavy rainfall in the river basin, and provide various impacts, depending on its magnitude and frequency. This research aims to evaluate the magnitude and frequency of extreme freshwater discharges, and to establish the magnitude-frequency relationships between basin-wide rainfall and freshwater inflow. Daily discharge and daily basin-averaged rainfall from Jan 1, 1997 to Aug 31, 2010 were used to determine the relations between discharge and rainfall. Consecutive daily discharges were grouped into independent events using well-defined event-separation algorithm. Partial duration series were extracted to obtain the proper probability distribution function for extreme discharges and corresponding rainfall events. Extreme discharge events over the threshold 133,656,000 $m^3$ count up to 46 for 13.7y years, following the Weibull distribution with k=1.4. The 3-day accumulated rain-falls which occurred one day before peak discharges (1day-before-3day -sum rainfall), are determined as a control variable for discharge, because their magnitude is best correlated with that of the extreme discharge events. The minimum value of the corresponding 1day-before-3day-sum rainfall, 50.98mm is initially set to a threshold for the selection of discharge-inducing rainfall cases. The number of 1day-before-3day-sum rainfall groups after selection, however, exceeds that of the extreme discharge events. The canonical discriminant analysis indicates that water level over target level (-1.35 m EL.) can be useful to divide the 1day-before-3day-sum rainfall groups into discharge-induced and non-discharge ones. It also shows that the newly-set threshold, 104mm, can just separate these two cases without errors. The magnitude-frequency relationships between rainfall and discharge are established with the newly-selected lday-before-3day-sum rainfalls: $D=1.111{\times}10^8+1.677{\times}10^6{\overline{r_{3day}}$, (${\overline{r_{3day}}{\geqq}104$, $R^2=0.459$), $T_d=1.326T^{0.683}_{r3}$, $T_d=0.117{\exp}[0.0155{\overline{r_{3day}}]$, where D is the quantity of discharge, ${\overline{r_{3day}}$ the 1day-before-3day-sum rainfall, $T_{r3}$ and $T_d$, are respectively return periods of 1day-before-3day-sum rainfall and freshwater discharge. These relations provide the framework to evaluate the effect of freshwater discharge on estuarine flow structure, water quality, responses of ecosystems from the perspective of magnitude and frequency.

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)

  • Yun, Unil;Pyun, Gwangbum
    • Journal of Internet Computing and Services
    • /
    • v.16 no.1
    • /
    • pp.67-74
    • /
    • 2015
  • In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.