• Title/Summary/Keyword: Learning Pattern

Search Result 1,292, Processing Time 0.03 seconds

Query-based Answer Extraction using Korean Dependency Parsing (의존 구문 분석을 이용한 질의 기반 정답 추출)

  • Lee, Dokyoung;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.161-177
    • /
    • 2019
  • In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.

Measuring the Public Service Quality Using Process Mining: Focusing on N City's Building Licensing Complaint Service (프로세스 마이닝을 이용한 공공서비스의 품질 측정: N시의 건축 인허가 민원 서비스를 중심으로)

  • Lee, Jung Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.35-52
    • /
    • 2019
  • As public services are provided in various forms, including e-government, the level of public demand for public service quality is increasing. Although continuous measurement and improvement of the quality of public services is needed to improve the quality of public services, traditional surveys are costly and time-consuming and have limitations. Therefore, there is a need for an analytical technique that can measure the quality of public services quickly and accurately at any time based on the data generated from public services. In this study, we analyzed the quality of public services based on data using process mining techniques for civil licensing services in N city. It is because the N city's building license complaint service can secure data necessary for analysis and can be spread to other institutions through public service quality management. This study conducted process mining on a total of 3678 building license complaint services in N city for two years from January 2014, and identified process maps and departments with high frequency and long processing time. According to the analysis results, there was a case where a department was crowded or relatively few at a certain point in time. In addition, there was a reasonable doubt that the increase in the number of complaints would increase the time required to complete the complaints. According to the analysis results, the time required to complete the complaint was varied from the same day to a year and 146 days. The cumulative frequency of the top four departments of the Sewage Treatment Division, the Waterworks Division, the Urban Design Division, and the Green Growth Division exceeded 50% and the cumulative frequency of the top nine departments exceeded 70%. Higher departments were limited and there was a great deal of unbalanced load among departments. Most complaint services have a variety of different patterns of processes. Research shows that the number of 'complementary' decisions has the greatest impact on the length of a complaint. This is interpreted as a lengthy period until the completion of the entire complaint is required because the 'complement' decision requires a physical period in which the complainant supplements and submits the documents again. In order to solve these problems, it is possible to drastically reduce the overall processing time of the complaints by preparing thoroughly before the filing of the complaints or in the preparation of the complaints, or the 'complementary' decision of other complaints. By clarifying and disclosing the cause and solution of one of the important data in the system, it helps the complainant to prepare in advance and convinces that the documents prepared by the public information will be passed. The transparency of complaints can be sufficiently predictable. Documents prepared by pre-disclosed information are likely to be processed without problems, which not only shortens the processing period but also improves work efficiency by eliminating the need for renegotiation or multiple tasks from the point of view of the processor. The results of this study can be used to find departments with high burdens of civil complaints at certain points of time and to flexibly manage the workforce allocation between departments. In addition, as a result of analyzing the pattern of the departments participating in the consultation by the characteristics of the complaints, it is possible to use it for automation or recommendation when requesting the consultation department. In addition, by using various data generated during the complaint process and using machine learning techniques, the pattern of the complaint process can be found. It can be used for automation / intelligence of civil complaint processing by making this algorithm and applying it to the system. This study is expected to be used to suggest future public service quality improvement through process mining analysis on civil service.

A Study on People Counting in Public Metro Service using Hybrid CNN-LSTM Algorithm (Hybrid CNN-LSTM 알고리즘을 활용한 도시철도 내 피플 카운팅 연구)

  • Choi, Ji-Hye;Kim, Min-Seung;Lee, Chan-Ho;Choi, Jung-Hwan;Lee, Jeong-Hee;Sung, Tae-Eung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.131-145
    • /
    • 2020
  • In line with the trend of industrial innovation, IoT technology utilized in a variety of fields is emerging as a key element in creation of new business models and the provision of user-friendly services through the combination of big data. The accumulated data from devices with the Internet-of-Things (IoT) is being used in many ways to build a convenience-based smart system as it can provide customized intelligent systems through user environment and pattern analysis. Recently, it has been applied to innovation in the public domain and has been using it for smart city and smart transportation, such as solving traffic and crime problems using CCTV. In particular, it is necessary to comprehensively consider the easiness of securing real-time service data and the stability of security when planning underground services or establishing movement amount control information system to enhance citizens' or commuters' convenience in circumstances with the congestion of public transportation such as subways, urban railways, etc. However, previous studies that utilize image data have limitations in reducing the performance of object detection under private issue and abnormal conditions. The IoT device-based sensor data used in this study is free from private issue because it does not require identification for individuals, and can be effectively utilized to build intelligent public services for unspecified people. Especially, sensor data stored by the IoT device need not be identified to an individual, and can be effectively utilized for constructing intelligent public services for many and unspecified people as data free form private issue. We utilize the IoT-based infrared sensor devices for an intelligent pedestrian tracking system in metro service which many people use on a daily basis and temperature data measured by sensors are therein transmitted in real time. The experimental environment for collecting data detected in real time from sensors was established for the equally-spaced midpoints of 4×4 upper parts in the ceiling of subway entrances where the actual movement amount of passengers is high, and it measured the temperature change for objects entering and leaving the detection spots. The measured data have gone through a preprocessing in which the reference values for 16 different areas are set and the difference values between the temperatures in 16 distinct areas and their reference values per unit of time are calculated. This corresponds to the methodology that maximizes movement within the detection area. In addition, the size of the data was increased by 10 times in order to more sensitively reflect the difference in temperature by area. For example, if the temperature data collected from the sensor at a given time were 28.5℃, the data analysis was conducted by changing the value to 285. As above, the data collected from sensors have the characteristics of time series data and image data with 4×4 resolution. Reflecting the characteristics of the measured, preprocessed data, we finally propose a hybrid algorithm that combines CNN in superior performance for image classification and LSTM, especially suitable for analyzing time series data, as referred to CNN-LSTM (Convolutional Neural Network-Long Short Term Memory). In the study, the CNN-LSTM algorithm is used to predict the number of passing persons in one of 4×4 detection areas. We verified the validation of the proposed model by taking performance comparison with other artificial intelligence algorithms such as Multi-Layer Perceptron (MLP), Long Short Term Memory (LSTM) and RNN-LSTM (Recurrent Neural Network-Long Short Term Memory). As a result of the experiment, proposed CNN-LSTM hybrid model compared to MLP, LSTM and RNN-LSTM has the best predictive performance. By utilizing the proposed devices and models, it is expected various metro services will be provided with no illegal issue about the personal information such as real-time monitoring of public transport facilities and emergency situation response services on the basis of congestion. However, the data have been collected by selecting one side of the entrances as the subject of analysis, and the data collected for a short period of time have been applied to the prediction. There exists the limitation that the verification of application in other environments needs to be carried out. In the future, it is expected that more reliability will be provided for the proposed model if experimental data is sufficiently collected in various environments or if learning data is further configured by measuring data in other sensors.

A Methodology of Customer Churn Prediction based on Two-Dimensional Loyalty Segmentation (이차원 고객충성도 세그먼트 기반의 고객이탈예측 방법론)

  • Kim, Hyung Su;Hong, Seung Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.111-126
    • /
    • 2020
  • Most industries have recently become aware of the importance of customer lifetime value as they are exposed to a competitive environment. As a result, preventing customers from churn is becoming a more important business issue than securing new customers. This is because maintaining churn customers is far more economical than securing new customers, and in fact, the acquisition cost of new customers is known to be five to six times higher than the maintenance cost of churn customers. Also, Companies that effectively prevent customer churn and improve customer retention rates are known to have a positive effect on not only increasing the company's profitability but also improving its brand image by improving customer satisfaction. Predicting customer churn, which had been conducted as a sub-research area for CRM, has recently become more important as a big data-based performance marketing theme due to the development of business machine learning technology. Until now, research on customer churn prediction has been carried out actively in such sectors as the mobile telecommunication industry, the financial industry, the distribution industry, and the game industry, which are highly competitive and urgent to manage churn. In addition, These churn prediction studies were focused on improving the performance of the churn prediction model itself, such as simply comparing the performance of various models, exploring features that are effective in forecasting departures, or developing new ensemble techniques, and were limited in terms of practical utilization because most studies considered the entire customer group as a group and developed a predictive model. As such, the main purpose of the existing related research was to improve the performance of the predictive model itself, and there was a relatively lack of research to improve the overall customer churn prediction process. In fact, customers in the business have different behavior characteristics due to heterogeneous transaction patterns, and the resulting churn rate is different, so it is unreasonable to assume the entire customer as a single customer group. Therefore, it is desirable to segment customers according to customer classification criteria, such as loyalty, and to operate an appropriate churn prediction model individually, in order to carry out effective customer churn predictions in heterogeneous industries. Of course, in some studies, there are studies in which customers are subdivided using clustering techniques and applied a churn prediction model for individual customer groups. Although this process of predicting churn can produce better predictions than a single predict model for the entire customer population, there is still room for improvement in that clustering is a mechanical, exploratory grouping technique that calculates distances based on inputs and does not reflect the strategic intent of an entity such as loyalties. This study proposes a segment-based customer departure prediction process (CCP/2DL: Customer Churn Prediction based on Two-Dimensional Loyalty segmentation) based on two-dimensional customer loyalty, assuming that successful customer churn management can be better done through improvements in the overall process than through the performance of the model itself. CCP/2DL is a series of churn prediction processes that segment two-way, quantitative and qualitative loyalty-based customer, conduct secondary grouping of customer segments according to churn patterns, and then independently apply heterogeneous churn prediction models for each churn pattern group. Performance comparisons were performed with the most commonly applied the General churn prediction process and the Clustering-based churn prediction process to assess the relative excellence of the proposed churn prediction process. The General churn prediction process used in this study refers to the process of predicting a single group of customers simply intended to be predicted as a machine learning model, using the most commonly used churn predicting method. And the Clustering-based churn prediction process is a method of first using clustering techniques to segment customers and implement a churn prediction model for each individual group. In cooperation with a global NGO, the proposed CCP/2DL performance showed better performance than other methodologies for predicting churn. This churn prediction process is not only effective in predicting churn, but can also be a strategic basis for obtaining a variety of customer observations and carrying out other related performance marketing activities.

FAMILY DYNAMICS OF INCEST PERCEIVED BY ADOLESECENTS (청소년이 지각한 근친상간의 가족역동)

  • Kim, Hun-Soo;Shin, Hwa-Sik
    • Journal of the Korean Academy of Child and Adolescent Psychiatry
    • /
    • v.6 no.1
    • /
    • pp.56-64
    • /
    • 1995
  • Family is a primary unit of the major socialization processing for children. Parents among the family members are one of the most important figures from whom the child and adolescent acquire a wide variety of behavior patterns, attitudes, values and norms. An organization of family members product family structural functioning. Abnormal family structure is one of the most important reference models in the learning of antisocial patterns of behavior. Therefore incest and child sexual abuse including spouse abuse, elderly abuse, and neglect occurs in the abnormal family structural setting. In particular, incest, a specific form of sexual abuse, was once thought to be a phenomenon of great rarity, but our clinical experiences, especially over the past decade, have made us aware that incest and child sexual abuse is not rare case and on the increasing trend. Therefore, the aim of this study was to determine the family problem and dynamics of incest family, and character pattern of post-incest adolescent victim in Korea. A total of 1,838 adolescents from middle and high school(1,237) and juvenile correctional institute(601) were studied, sampled from Korean student population and adolescent delinquent population confined in juvenile correctional institutes, using proportional stratified random sampling method. The subjects' ages ranged from 12 to 21 years. Data were collected through questionnaire survey. Data analysis was done by IBM PC of Behavior Science Center at the Korea university, using SAS program. Statistical methods employed were Chi-square, principal component analysis and t-test etc. The results of this study were as follows ; 1) Of 1,071 subjects, 40(3.7%) reported incest experiences(sibling incest : 1.6% ; another type of incest : 2.1%) in their family setting. 2) The character pattern of post-incest adolescent victim was more socially maladjusted, immature, impulsive, rigid, anxious and dependent than non-incest adolescent. Also they showed some problem in academic performance and their assertiveness. 3) The other family members of incest family revealed more psychological and behavioral problem such as depression, alcoholism, psychotic disorder and criminal act than the non-incest family, even though there is no evidence of the context between them. 4) The family dynamics of incest family tended to be dysfunctional trend, as compared with non-incest family. It showed that the psychological instability of family member, parental rejection toward their children, coldness and indifference among family member and marital discordance between the parents had significant correlation with incest.

  • PDF

Strategy for Store Management Using SOM Based on RFM (RFM 기반 SOM을 이용한 매장관리 전략 도출)

  • Jeong, Yoon Jeong;Choi, Il Young;Kim, Jae Kyeong;Choi, Ju Choel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.93-112
    • /
    • 2015
  • Depending on the change in consumer's consumption pattern, existing retail shop has evolved in hypermarket or convenience store offering grocery and daily products mostly. Therefore, it is important to maintain the inventory levels and proper product configuration for effectively utilize the limited space in the retail store and increasing sales. Accordingly, this study proposed proper product configuration and inventory level strategy based on RFM(Recency, Frequency, Monetary) model and SOM(self-organizing map) for manage the retail shop effectively. RFM model is analytic model to analyze customer behaviors based on the past customer's buying activities. And it can differentiates important customers from large data by three variables. R represents recency, which refers to the last purchase of commodities. The latest consuming customer has bigger R. F represents frequency, which refers to the number of transactions in a particular period and M represents monetary, which refers to consumption money amount in a particular period. Thus, RFM method has been known to be a very effective model for customer segmentation. In this study, using a normalized value of the RFM variables, SOM cluster analysis was performed. SOM is regarded as one of the most distinguished artificial neural network models in the unsupervised learning tool space. It is a popular tool for clustering and visualization of high dimensional data in such a way that similar items are grouped spatially close to one another. In particular, it has been successfully applied in various technical fields for finding patterns. In our research, the procedure tries to find sales patterns by analyzing product sales records with Recency, Frequency and Monetary values. And to suggest a business strategy, we conduct the decision tree based on SOM results. To validate the proposed procedure in this study, we adopted the M-mart data collected between 2014.01.01~2014.12.31. Each product get the value of R, F, M, and they are clustered by 9 using SOM. And we also performed three tests using the weekday data, weekend data, whole data in order to analyze the sales pattern change. In order to propose the strategy of each cluster, we examine the criteria of product clustering. The clusters through the SOM can be explained by the characteristics of these clusters of decision trees. As a result, we can suggest the inventory management strategy of each 9 clusters through the suggested procedures of the study. The highest of all three value(R, F, M) cluster's products need to have high level of the inventory as well as to be disposed in a place where it can be increasing customer's path. In contrast, the lowest of all three value(R, F, M) cluster's products need to have low level of inventory as well as to be disposed in a place where visibility is low. The highest R value cluster's products is usually new releases products, and need to be placed on the front of the store. And, manager should decrease inventory levels gradually in the highest F value cluster's products purchased in the past. Because, we assume that cluster has lower R value and the M value than the average value of good. And it can be deduced that product are sold poorly in recent days and total sales also will be lower than the frequency. The procedure presented in this study is expected to contribute to raising the profitability of the retail store. The paper is organized as follows. The second chapter briefly reviews the literature related to this study. The third chapter suggests procedures for research proposals, and the fourth chapter applied suggested procedure using the actual product sales data. Finally, the fifth chapter described the conclusion of the study and further research.

Research Framework for International Franchising (국제프랜차이징 연구요소 및 연구방향)

  • Kim, Ju-Young;Lim, Young-Kyun;Shim, Jae-Duck
    • Journal of Global Scholars of Marketing Science
    • /
    • v.18 no.4
    • /
    • pp.61-118
    • /
    • 2008
  • The purpose of this research is to construct research framework for international franchising based on existing literature and to identify research components in the framework. Franchise can be defined as management styles that allow franchisee use various management assets of franchisor in order to make or sell product or service. It can be divided into product distribution franchise that is designed to sell products and business format franchise that is designed for running it as business whatever its form is. International franchising can be defined as a way of internationalization of franchisor to foreign country by providing its business format or package to franchisee of host country. International franchising is growing fast for last four decades but academic research on this is quite limited. Especially in Korea, research about international franchising is carried out on by case study format with single case or empirical study format with survey based on domestic franchise theory. Therefore, this paper tries to review existing literature on international franchising research, providing research framework, and then stimulating new research on this field. International franchising research components include motives and environmental factors for decision of expanding to international franchising, entrance modes and development plan for international franchising, contracts and management strategy of international franchising, and various performance measures from different perspectives. First, motives of international franchising are fee collection from franchisee. Also it provides easier way to expanding to foreign country. The other motives including increase total sales volume, occupying better strategic position, getting quality resources, and improving efficiency. Environmental factors that facilitating international franchising encompasses economic condition, trend, and legal or political factors in host and/or home countries. In addition, control power and risk management capability of franchisor plays critical role in successful franchising contract. Final decision to enter foreign country via franchising is determined by numerous factors like history, size, growth, competitiveness, management system, bonding capability, industry characteristics of franchisor. After deciding to enter into foreign country, franchisor needs to set entrance modes of international franchising. Within contractual mode, there are master franchising and area developing franchising, licensing, direct franchising, and joint venture. Theories about entrance mode selection contain concepts of efficiency, knowledge-based approach, competence-based approach, agent theory, and governance cost. The next step after entrance decision is operation strategy. Operation strategy starts with selecting a target city and a target country for franchising. In order to finding, screening targets, franchisor needs to collect information about candidates. Critical information includes brand patent, commercial laws, regulations, market conditions, country risk, and industry analysis. After selecting a target city in target country, franchisor needs to select franchisee, in other word, partner. The first important criteria for selecting partners are financial credibility and capability, possession of real estate. And cultural similarity and knowledge about franchisor and/or home country are also recognized as critical criteria. The most important element in operating strategy is legal document between franchisor and franchisee with home and host countries. Terms and conditions in legal documents give objective information about characteristics of franchising agreement for academic research. Legal documents have definitions of terminology, territory and exclusivity, agreement of term, initial fee, continuing fees, clearing currency, and rights about sub-franchising. Also, legal documents could have terms about softer elements like training program and operation manual. And harder elements like law competent court and terms of expiration. Next element in operating strategy is about product and service. Especially for business format franchising, product/service deliverable, benefit communicators, system identifiers (architectural features), and format facilitators are listed for product/service strategic elements. Another important decision on product/service is standardization vs. customization. The rationale behind standardization is cost reduction, efficiency, consistency, image congruence, brand awareness, and competitiveness on price. Also standardization enables large scale R&D and innovative change in management style. Another element in operating strategy is control management. The simple way to control franchise contract is relying on legal terms, contractual control system. There are other control systems, administrative control system and ethical control system. Contractual control system is a coercive source of power, but franchisor usually doesn't want to use legal power since it doesn't help to build up positive relationship. Instead, self-regulation is widely used. Administrative control system uses control mechanism from ordinary work relationship. Its main component is supporting activities to franchisee and communication method. For example, franchisor provides advertising, training, manual, and delivery, then franchisee follows franchisor's direction. Another component is building franchisor's brand power. The last research element is performance factor of international franchising. Performance elements can be divided into franchisor's performance and franchisee's performance. The conceptual performance measures of franchisor are simple but not easy to obtain objectively. They are profit, sale, cost, experience, and brand power. The performance measures of franchisee are mostly about benefits of host country. They contain small business development, promotion of employment, introduction of new business model, and level up technology status. There are indirect benefits, like increase of tax, refinement of corporate citizenship, regional economic clustering, and improvement of international balance. In addition to those, host country gets socio-cultural change other than economic effects. It includes demographic change, social trend, customer value change, social communication, and social globalization. Sometimes it is called as westernization or McDonaldization of society. In addition, the paper reviews on theories that have been frequently applied to international franchising research, such as agent theory, resource-based view, transaction cost theory, organizational learning theory, and international expansion theories. Resource based theory is used in strategic decision based on resources, like decision about entrance and cooperation depending on resources of franchisee and franchisor. Transaction cost theory can be applied in determination of mutual trust or satisfaction of franchising players. Agent theory tries to explain strategic decision for reducing problem caused by utilizing agent, for example research on control system in franchising agreements. Organizational Learning theory is relatively new in franchising research. It assumes organization tries to maximize performance and learning of organization. In addition, Internalization theory advocates strategic decision of direct investment for removing inefficiency of market transaction and is applied in research on terms of contract. And oligopolistic competition theory is used to explain various entry modes for international expansion. Competency theory support strategic decision of utilizing key competitive advantage. Furthermore, research methodologies including qualitative and quantitative methodologies are suggested for more rigorous international franchising research. Quantitative research needs more real data other than survey data which is usually respondent's judgment. In order to verify theory more rigorously, research based on real data is essential. However, real quantitative data is quite hard to get. The qualitative research other than single case study is also highly recommended. Since international franchising has limited number of applications, scientific research based on grounded theory and ethnography study can be used. Scientific case study is differentiated with single case study on its data collection method and analysis method. The key concept is triangulation in measurement, logical coding and comparison. Finally, it provides overall research direction for international franchising after summarizing research trend in Korea. International franchising research in Korea has two different types, one is for studying Korean franchisor going overseas and the other is for Korean franchisee of foreign franchisor. Among research on Korean franchisor, two common patterns are observed. First of all, they usually deal with success story of one franchisor. The other common pattern is that they focus on same industry and country. Therefore, international franchise research needs to extend their focus to broader subjects with scientific research methodology as well as development of new theory.

  • PDF

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

An Analysis of the Change of Secondary Earth Science Teachers' Knowledge about the East Sea's Currents through Drawing Schematic Current Maps (해류도 그리기를 통한 중등학교 지구과학 교사들의 동해 해류에 대한 지식의 변화 분석)

  • Park, Kyung-Ae;Park, Ji-Eun;Lee, Ki-Young;Choi, Byoung-Ju;Lee, Sang-Ho;Kim, Young-Taeg;Lee, Eun-Il
    • Journal of the Korean earth science society
    • /
    • v.36 no.3
    • /
    • pp.258-279
    • /
    • 2015
  • The purpose of this study was to analyze the change of secondary earth science teachers' knowledge about the currents of the East Sea through drawing of a schematic map of oceanic currents. For this purpose, thirty two earth science teachers participated in the six-hour long training of learning and practice related to ocean current schematic map. The teacher participants performed drawing of the ocean current schematic map of the East Sea in three different phases, i.e.; pre-, post-, and delayed-post phase. In addition, all the maps conducted by participants were converted to digitalized image data. Detailed analysis were performed to investigate participating teachers' knowledge about the currents of the East Sea. Findings are as follows: First, the teacher participants have background knowledge about the ocean current map, but it reveals an incorrect knowledge about some concepts. Second, after teacher training, teachers' knowledge increased about the East Sea's currents, while a decrease was found in the differences between individual teachers' knowledge. This pattern was more evident in the delayed-post phase of drawing than in the post-phase occurred immediately after training. Third, the teacher participants were strongly aware of the need to improve the ocean current schematic map of the East Sea in science textbook in terms of scientific knowledge. In addition, they showed a high level of satisfaction about teacher training because they perceived that it was meaningful in various aspects; recognizing the importance of content knowledge and conjunction with instructional strategies, the needs of secondary science curriculum, and recognition of the nature of scientific knowledge. The results imply that teachers' subject matter knowledge plays a significant role to make science teaching effective.

The Analysis on the Relationship between Firms' Exposures to SNS and Stock Prices in Korea (기업의 SNS 노출과 주식 수익률간의 관계 분석)

  • Kim, Taehwan;Jung, Woo-Jin;Lee, Sang-Yong Tom
    • Asia pacific journal of information systems
    • /
    • v.24 no.2
    • /
    • pp.233-253
    • /
    • 2014
  • Can the stock market really be predicted? Stock market prediction has attracted much attention from many fields including business, economics, statistics, and mathematics. Early research on stock market prediction was based on random walk theory (RWT) and the efficient market hypothesis (EMH). According to the EMH, stock market are largely driven by new information rather than present and past prices. Since it is unpredictable, stock market will follow a random walk. Even though these theories, Schumaker [2010] asserted that people keep trying to predict the stock market by using artificial intelligence, statistical estimates, and mathematical models. Mathematical approaches include Percolation Methods, Log-Periodic Oscillations and Wavelet Transforms to model future prices. Examples of artificial intelligence approaches that deals with optimization and machine learning are Genetic Algorithms, Support Vector Machines (SVM) and Neural Networks. Statistical approaches typically predicts the future by using past stock market data. Recently, financial engineers have started to predict the stock prices movement pattern by using the SNS data. SNS is the place where peoples opinions and ideas are freely flow and affect others' beliefs on certain things. Through word-of-mouth in SNS, people share product usage experiences, subjective feelings, and commonly accompanying sentiment or mood with others. An increasing number of empirical analyses of sentiment and mood are based on textual collections of public user generated data on the web. The Opinion mining is one domain of the data mining fields extracting public opinions exposed in SNS by utilizing data mining. There have been many studies on the issues of opinion mining from Web sources such as product reviews, forum posts and blogs. In relation to this literatures, we are trying to understand the effects of SNS exposures of firms on stock prices in Korea. Similarly to Bollen et al. [2011], we empirically analyze the impact of SNS exposures on stock return rates. We use Social Metrics by Daum Soft, an SNS big data analysis company in Korea. Social Metrics provides trends and public opinions in Twitter and blogs by using natural language process and analysis tools. It collects the sentences circulated in the Twitter in real time, and breaks down these sentences into the word units and then extracts keywords. In this study, we classify firms' exposures in SNS into two groups: positive and negative. To test the correlation and causation relationship between SNS exposures and stock price returns, we first collect 252 firms' stock prices and KRX100 index in the Korea Stock Exchange (KRX) from May 25, 2012 to September 1, 2012. We also gather the public attitudes (positive, negative) about these firms from Social Metrics over the same period of time. We conduct regression analysis between stock prices and the number of SNS exposures. Having checked the correlation between the two variables, we perform Granger causality test to see the causation direction between the two variables. The research result is that the number of total SNS exposures is positively related with stock market returns. The number of positive mentions of has also positive relationship with stock market returns. Contrarily, the number of negative mentions has negative relationship with stock market returns, but this relationship is statistically not significant. This means that the impact of positive mentions is statistically bigger than the impact of negative mentions. We also investigate whether the impacts are moderated by industry type and firm's size. We find that the SNS exposures impacts are bigger for IT firms than for non-IT firms, and bigger for small sized firms than for large sized firms. The results of Granger causality test shows change of stock price return is caused by SNS exposures, while the causation of the other way round is not significant. Therefore the correlation relationship between SNS exposures and stock prices has uni-direction causality. The more a firm is exposed in SNS, the more is the stock price likely to increase, while stock price changes may not cause more SNS mentions.