• 제목/요약/키워드: important value

Search Result 8,274, Processing Time 0.052 seconds

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)

  • Yun, Unil;Pyun, Gwangbum
    • Journal of Internet Computing and Services
    • /
    • v.16 no.1
    • /
    • pp.67-74
    • /
    • 2015
  • In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.

Prospective Study on Preoperative Evaluation for the Prediction of Mortality and Morbidity after Lung Cancer Resection (폐암절제술후 발생하는 사망 및 합병증의 예측인자 평가에 관한 전향적 연구)

  • Park, Jeong-Woong;Suh, Gee-Young;Kim, Ho-Cheol;Cheon, Eun-Mee;Chung, Man-Pyo;Kim, Ho-Joong;Kwon, O-Jung;Kim, Kwan-Min;Kim, Jin-Kook;Shim, Young-Mok;Rhee, Chong-H.;Han, Yong-Chol
    • Tuberculosis and Respiratory Diseases
    • /
    • v.45 no.1
    • /
    • pp.57-67
    • /
    • 1998
  • Purpose : This study was undertaken to determine the preoperative predictors of mortality and morbidity after lung cancer resection. Method: During the period from October 1, 1995 to August 31, 1996, a prospective study was conducted in 92 lung resection candidates diagnosed as lung cancer. For preoperative predictors of nonpulmonary factors, we considered age, sex, weight loss, hematocrit, serum albumin, EKG and concomitant illness, and for those of pulmonary factors, smoking history, presence of pneumonia, dyspnea scale(1 to 4), arterial blood gas analysis with room air breathing, routine pulmonary function test. And predicted postoperative(ppo) pulmonary factors such as PPO-$FEV_1$, ppo-diffusing capacity(DLco), predicted postoperative product(PPP) of ppo-$FEV_1%{\times}ppo$-DLco% and ppo-maximal $O_2$ uptake($VO_2$max) were also considered. Results: There were 78 men and 14 women with a median age of 62 years(range 42 to 82) and a mean $FEV_1$ of $2.37\pm0.06L$. Twenty nine patients had a decreased $FEV_1$ less than 2.0L. Pneumonectomy was performed in 26 patients, bilobectomy in 12, lobectomy in 54. Pulmonary complications developed in 10 patients, cardiac complications in 9, other complications(empyema, air leak, bleeding) in 11, and 16 patients were managed in intensive care unit for more than 48hours. Three patients died within 30 days after operation. The ppo-$VO_2$max was less than 10ml/kg/min in these three patients, but its statistical significance could not be determined due to small number of patients. In multivariate analysis, the predictor related to postoperative death was weight loss(p<0.05), and as for pulmonary complications, weight loss, dyspnea scale, ppo-DLco and extent of resection(p<0.05). Conclusions: Based on this study, preoperative nonpulmonary factors such as weight loss and dyspnea scale are more important than the pulmonary factors in the prediction of postoperative mortality and/or morbodity in lung resection candidates, but exercise pulmonary fuction test may be useful Our study suggests that ppo-$VO_2$max value less than 10ml/kg/min is associated with death after lung cancer resection but further studies are needed to validate this result.

  • PDF

Requirement and Perception of Parents on the Subject of Home Economics in Middle School (중학교 가정교과에 대한 학부모의 인식 및 요구도)

  • Shin Hyo-Shick;Park Mi-Soog
    • Journal of Korean Home Economics Education Association
    • /
    • v.18 no.3 s.41
    • /
    • pp.1-22
    • /
    • 2006
  • The purpose of this study is that I should look for a desirous directions about home economics by studying the requirements and perception of the high school parents who have finished the course of home economics. It was about 600 parents whom I have searched Seoul-Pusan, Ganwon. Ghynggi province, Choongcheong-Gyungsang province, Cheonla and Jeju province of 600, I chose only 560 as apparently suitable research. The questions include 61 requirements about home economics and one which we never fail to keep among the contents, whenever possible and one about the perception of home economics aims 11 about the perception of home economics courses and management. The collections were analyzed frequency, percent, mean. standard deviation t-test by using SAS program. The followings is the summary result of studying of it. 1. All the boys and girls learning together about the Idea of healthy lives and desirous human formulation and knowledge together are higher. 2. Among the teaching purposes of home economics, the item of the scientific principle and knowledge for improvements of home life shows 15.7% below average value. 3. The recognition degree about the quality of home economics is highly related with the real life, and about the system. we recognize lacking in periods and contents of home economics field and about guiding content, accomplishment and application qualities are higher regardless of sex. 4. The important term which we should emphasize in the subject of home economics is family part. 5. Among the needs of home economic requirement in freshman, in the middle unit, their growth and development are higher than anything else, representing 4.11, and by contrast the basic principle and actuality is 3.70, which is lowest among them. 6. In the case of second grade requirement of home economics content for parents in the middle unit young man and consuming life is 4.09 highest. 7. In the case of 3rd grade requirement of economics contents in the middle unit the choice of coming direction and job ethics is highest 4.16, and preparing meals and evaluation is lowest 3.50.

  • PDF

Application of Support Vector Regression for Improving the Performance of the Emotion Prediction Model (감정예측모형의 성과개선을 위한 Support Vector Regression 응용)

  • Kim, Seongjin;Ryoo, Eunchung;Jung, Min Kyu;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.185-202
    • /
    • 2012
  • .Since the value of information has been realized in the information society, the usage and collection of information has become important. A facial expression that contains thousands of information as an artistic painting can be described in thousands of words. Followed by the idea, there has recently been a number of attempts to provide customers and companies with an intelligent service, which enables the perception of human emotions through one's facial expressions. For example, MIT Media Lab, the leading organization in this research area, has developed the human emotion prediction model, and has applied their studies to the commercial business. In the academic area, a number of the conventional methods such as Multiple Regression Analysis (MRA) or Artificial Neural Networks (ANN) have been applied to predict human emotion in prior studies. However, MRA is generally criticized because of its low prediction accuracy. This is inevitable since MRA can only explain the linear relationship between the dependent variables and the independent variable. To mitigate the limitations of MRA, some studies like Jung and Kim (2012) have used ANN as the alternative, and they reported that ANN generated more accurate prediction than the statistical methods like MRA. However, it has also been criticized due to over fitting and the difficulty of the network design (e.g. setting the number of the layers and the number of the nodes in the hidden layers). Under this background, we propose a novel model using Support Vector Regression (SVR) in order to increase the prediction accuracy. SVR is an extensive version of Support Vector Machine (SVM) designated to solve the regression problems. The model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that is close (within a threshold ${\varepsilon}$) to the model prediction. Using SVR, we tried to build a model that can measure the level of arousal and valence from the facial features. To validate the usefulness of the proposed model, we collected the data of facial reactions when providing appropriate visual stimulating contents, and extracted the features from the data. Next, the steps of the preprocessing were taken to choose statistically significant variables. In total, 297 cases were used for the experiment. As the comparative models, we also applied MRA and ANN to the same data set. For SVR, we adopted '${\varepsilon}$-insensitive loss function', and 'grid search' technique to find the optimal values of the parameters like C, d, ${\sigma}^2$, and ${\varepsilon}$. In the case of ANN, we adopted a standard three-layer backpropagation network, which has a single hidden layer. The learning rate and momentum rate of ANN were set to 10%, and we used sigmoid function as the transfer function of hidden and output nodes. We performed the experiments repeatedly by varying the number of nodes in the hidden layer to n/2, n, 3n/2, and 2n, where n is the number of the input variables. The stopping condition for ANN was set to 50,000 learning events. And, we used MAE (Mean Absolute Error) as the measure for performance comparison. From the experiment, we found that SVR achieved the highest prediction accuracy for the hold-out data set compared to MRA and ANN. Regardless of the target variables (the level of arousal, or the level of positive / negative valence), SVR showed the best performance for the hold-out data set. ANN also outperformed MRA, however, it showed the considerably lower prediction accuracy than SVR for both target variables. The findings of our research are expected to be useful to the researchers or practitioners who are willing to build the models for recognizing human emotions.

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • The Effects of Environmental Dynamism on Supply Chain Commitment in the High-tech Industry: The Roles of Flexibility and Dependence (첨단산업의 환경동태성이 공급체인의 결속에 미치는 영향: 유연성과 의존성의 역할)

    • Kim, Sang-Deok;Ji, Seong-Goo
      • Journal of Global Scholars of Marketing Science
      • /
      • v.17 no.2
      • /
      • pp.31-54
      • /
      • 2007
    • The exchange between buyers and sellers in the industrial market is changing from short-term to long-term relationships. Long-term relationships are governed mainly by formal contracts or informal agreements, but many scholars are now asserting that controlling relationship by using formal contracts under environmental dynamism is inappropriate. In this case, partners will depend on each other's flexibility or interdependence. The former, flexibility, provides a general frame of reference, order, and standards against which to guide and assess appropriate behavior in dynamic and ambiguous situations, thus motivating the value-oriented performance goals shared between partners. It is based on social sacrifices, which can potentially minimize any opportunistic behaviors. The later, interdependence, means that each firm possesses a high level of dependence in an dynamic channel relationship. When interdependence is high in magnitude and symmetric, each firm enjoys a high level of power and the bonds between the firms should be reasonably strong. Strong shared power is likely to promote commitment because of the common interests, attention, and support found in such channel relationships. This study deals with environmental dynamism in high-tech industry. Firms in the high-tech industry regard it as a key success factor to successfully cope with environmental changes. However, due to the lack of studies dealing with environmental dynamism and supply chain commitment in the high-tech industry, it is very difficult to find effective strategies to cope with them. This paper presents the results of an empirical study on the relationship between environmental dynamism and supply chain commitment in the high-tech industry. We examined the effects of consumer, competitor, and technological dynamism on supply chain commitment. Additionally, we examined the moderating effects of flexibility and dependence of supply chains. This study was confined to the type of high-tech industry which has the characteristics of rapid technology change and short product lifecycle. Flexibility among the firms of this industry, having the characteristic of hard and fast growth, is more important here than among any other industry. Thus, a variety of environmental dynamism can affect a supply chain relationship. The industries targeted industries were electronic parts, metal product, computer, electric machine, automobile, and medical precision manufacturing industries. Data was collected as follows. During the survey, the researchers managed to obtain the list of parts suppliers of 2 companies, N and L, with an international competitiveness in the mobile phone manufacturing industry; and of the suppliers in a business relationship with S company, a semiconductor manufacturing company. They were asked to respond to the survey via telephone and e-mail. During the two month period of February-April 2006, we were able to collect data from 44 companies. The respondents were restricted to direct dealing authorities and subcontractor company (the supplier) staff with at least three months of dealing experience with a manufacture (an industrial material buyer). The measurement validation procedures included scale reliability; discriminant and convergent validity were used to validate measures. Also, the reliability measurements traditionally employed, such as the Cronbach's alpha, were used. All the reliabilities were greater than.70. A series of exploratory factor analyses was conducted. We conducted confirmatory factor analyses to assess the validity of our measurements. A series of chi-square difference tests were conducted so that the discriminant validity could be ensured. For each pair, we estimated two models-an unconstrained model and a constrained model-and compared the two model fits. All these tests supported discriminant validity. Also, all items loaded significantly on their respective constructs, providing support for convergent validity. We then examined composite reliability and average variance extracted (AVE). The composite reliability of each construct was greater than.70. The AVE of each construct was greater than.50. According to the multiple regression analysis, customer dynamism had a negative effect and competitor dynamism had a positive effect on a supplier's commitment. In addition, flexibility and dependence had significant moderating effects on customer and competitor dynamism. On the other hand, all hypotheses about technological dynamism had no significant effects on commitment. In other words, technological dynamism had no direct effect on supplier's commitment and was not moderated by the flexibility and dependence of the supply chain. This study makes its contribution in the point of view that this is a rare study on environmental dynamism and supply chain commitment in the field of high-tech industry. Especially, this study verified the effects of three sectors of environmental dynamism on supplier's commitment. Also, it empirically tested how the effects were moderated by flexibility and dependence. The results showed that flexibility and interdependence had a role to strengthen supplier's commitment under environmental dynamism in high-tech industry. Thus relationship managers in high-tech industry should make supply chain relationship flexible and interdependent. The limitations of the study are as follows; First, about the research setting, the study was conducted with high-tech industry, in which the direction of the change in the power balance of supply chain dyads is usually determined by manufacturers. So we have a difficulty with generalization. We need to control the power structure between partners in a future study. Secondly, about flexibility, we treated it throughout the paper as positive, but it can also be negative, i.e. violating an agreement or moving, but in the wrong direction, etc. Therefore we need to investigate the multi-dimensionality of flexibility in future research.

    • PDF

    Prognostic Value of TNM Staging in Small Cell Lung Cancer (소세포폐암의 TNM 병기에 따른 예후)

    • Park, Jae-Yong;Kim, Kwan-Young;Chae, Sang-Cheol;Kim, Jeong-Seok;Kim, Kwon-Yeop;Park, Ki-Su;Cha, Seung-Ik;Kim, Chang-Ho;Kam, Sin;Jung, Tae-Hoon
      • Tuberculosis and Respiratory Diseases
      • /
      • v.45 no.2
      • /
      • pp.322-332
      • /
      • 1998
    • Background: Accurate staging is important to determine treatment modalities and to predict prognosis for the patients with lung cancer. The simple two-stage system of the Veteran's Administration Lung Cancer study Group has been used for staging of small cell lung cancer(SCLC) because treatment usually consists of chemotherapy with or without radiotherapy. However, this system does not accurately reflect segregation of patients into homogenous prognostic groups. Therefore, a variety of new staging system have been proposed as more intensive treatments including either intensive radiotherapy or surgery enter clinical trials. We evaluate the prognostic importance of TNM staging, which has the advantage of providing a uniform detailed classification of tumor spread, in patients with SCLC. Methods: The medical records of 166 patients diagnosed with SCLC between January 1989 and December 1996 were reviewed retrospectively. The influence of TNM stage on survival was analyzed in 147 patients, among 166 patients, who had complete TNM staging data. Results: Three patients were classified in stage I / II, 15 in stage III a, 78 in stage IIIb and 48 in stage IV. Survival rate at 1 and 2 years for these patients were as follows: stage I / II, 75% and 37.5% ; stage IIIa, 46.7% and 25.0% ; stage III b, 34.3% and 11.3% ; and stage IV, 2.6% and 0%. The 2-year survival rates for 84 patients who received chemotherapy(more than 2 cycles) with or without radiotherapy were as follows: stage I / II, 37.5% ; stage rna, 31.3% ; stage IIIb 13.5% ; and stage IV 0%. Overall outcome according to TNM staging was significantly different whether or not received treatment. However, there was no significant difference between stage IIIa and stage IIIb though median survival and 2-year survival rate were higher in stage IIIa than stage IIIb. Conclusion: These results suggest that the TNM staging system may be helpful for predicting the prognosis of patients with SCLC.

    • PDF

    Prognostic Value of the Expression of p53 and bcl-2 in Non-Small Cell Lung Cancer (비소세포폐암에서 p53과 bcl-2의 발현이 예후에 미치는 영향)

    • Yang, Seok-Chul;Yoon, Ho-Joo;Shin, Dong-Ho;Park, Sung-Soo;Lee, Jung-Hee;Keum, Joo-Seob;Kong, Gu;Lee, Jung-Dal
      • Tuberculosis and Respiratory Diseases
      • /
      • v.45 no.5
      • /
      • pp.962-974
      • /
      • 1998
    • Background: Alteration of p53 tumor suppressor genes is most frequently identified in human neoplasms, including lung carcinoma. It is well known that bcl-2 oncoprotein protects cells from apoptosis. Recent studies have demonstrated that bcl-2 expression is associated with favorable prognosis for patients with non-small cell lung carcinoma. However, the precise biologic role of bcl-2 in the development of these tumors is still obscure. p53 and bcl-2 have important regulatory influence in the apoptotic pathway and thus their relationship is of interest in tumorigenesis, especially lung cancer. Purpose: The author investigated to know the prognostic significance of the expression of p53 and bcl-2 in radically resected non-small cell lung cancer. Method: 84 cases of formalin-fixed paraffin-embedded blocks from resected primary non-small cell lung cancer from 1980 to 1994 at Hanyang University Hospital were available for both clinical follow-up and immunohistochemical staining using monoclonal antibodies for p53 and bcl-2. Results : The histologic classification of the tumor was based on WHO criteria., and the specimens included 45 squamous cell carcinomas(53.6%), 28 adeonocarcinomas(33.3%) and 11 large cell carcinomas(13.1 %). p53 immunoreactivity was noted in 47 cases of 84 cases(56.0%). bcl-2 immunoreactivity was noted in 15 cases of 84 cases(17.9%). The mean survival duration was $64.23{\pm}10.73$ months in bcl-2 positive group and $35.28{\pm}4$. 39 months in bcl-2 negative group. The bcl-2 expression was significantly correlated with survival in radically resected non-small cell lung cancer patients(p=0.03). The mean survival duration was $34.71{\pm}6.12$ months in p53 positive group and $45.35{\pm}6.30$ months in p53 negative group(p=0.21). The p53 expression was not predictive for survival. There was no correlation between combination of the different status of p53 and bcl-2 expression in our study. Conclusions : The interaction and the regulation of new biologic markers, such as those involved in the apoptotic pathway, are complex. bcl-2 overexpression is a good prognostic factor in non-small cell lung cancer and p53 expression is not significantly associated with the prognostic factor in non-small cell lung cancer.

    • PDF

    Content-based Recommendation Based on Social Network for Personalized News Services (개인화된 뉴스 서비스를 위한 소셜 네트워크 기반의 콘텐츠 추천기법)

    • Hong, Myung-Duk;Oh, Kyeong-Jin;Ga, Myung-Hyun;Jo, Geun-Sik
      • Journal of Intelligence and Information Systems
      • /
      • v.19 no.3
      • /
      • pp.57-71
      • /
      • 2013
    • Over a billion people in the world generate new news minute by minute. People forecasts some news but most news are from unexpected events such as natural disasters, accidents, crimes. People spend much time to watch a huge amount of news delivered from many media because they want to understand what is happening now, to predict what might happen in the near future, and to share and discuss on the news. People make better daily decisions through watching and obtaining useful information from news they saw. However, it is difficult that people choose news suitable to them and obtain useful information from the news because there are so many news media such as portal sites, broadcasters, and most news articles consist of gossipy news and breaking news. User interest changes over time and many people have no interest in outdated news. From this fact, applying users' recent interest to personalized news service is also required in news service. It means that personalized news service should dynamically manage user profiles. In this paper, a content-based news recommendation system is proposed to provide the personalized news service. For a personalized service, user's personal information is requisitely required. Social network service is used to extract user information for personalization service. The proposed system constructs dynamic user profile based on recent user information of Facebook, which is one of social network services. User information contains personal information, recent articles, and Facebook Page information. Facebook Pages are used for businesses, organizations and brands to share their contents and connect with people. Facebook users can add Facebook Page to specify their interest in the Page. The proposed system uses this Page information to create user profile, and to match user preferences to news topics. However, some Pages are not directly matched to news topic because Page deals with individual objects and do not provide topic information suitable to news. Freebase, which is a large collaborative database of well-known people, places, things, is used to match Page to news topic by using hierarchy information of its objects. By using recent Page information and articles of Facebook users, the proposed systems can own dynamic user profile. The generated user profile is used to measure user preferences on news. To generate news profile, news category predefined by news media is used and keywords of news articles are extracted after analysis of news contents including title, category, and scripts. TF-IDF technique, which reflects how important a word is to a document in a corpus, is used to identify keywords of each news article. For user profile and news profile, same format is used to efficiently measure similarity between user preferences and news. The proposed system calculates all similarity values between user profiles and news profiles. Existing methods of similarity calculation in vector space model do not cover synonym, hypernym and hyponym because they only handle given words in vector space model. The proposed system applies WordNet to similarity calculation to overcome the limitation. Top-N news articles, which have high similarity value for a target user, are recommended to the user. To evaluate the proposed news recommendation system, user profiles are generated using Facebook account with participants consent, and we implement a Web crawler to extract news information from PBS, which is non-profit public broadcasting television network in the United States, and construct news profiles. We compare the performance of the proposed method with that of benchmark algorithms. One is a traditional method based on TF-IDF. Another is 6Sub-Vectors method that divides the points to get keywords into six parts. Experimental results demonstrate that the proposed system provide useful news to users by applying user's social network information and WordNet functions, in terms of prediction error of recommended news.

    Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

    • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
      • Journal of Intelligence and Information Systems
      • /
      • v.21 no.3
      • /
      • pp.1-17
      • /
      • 2015
    • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.