• Title/Summary/Keyword: extraction

Search Result 17,050, Processing Time 0.053 seconds

EVALUATION OF SERUM LEVELS OF SYSTEMIC STATUS IN ORAL AND MAXILLOFACIAL SURGERY PATIENTS (구강악안면 수술을 받은 환자들에서의 전신영양평가)

  • Kim, Uk-Kyu;Kim, Yong-Deok;Byun, June-Ho;Shin, Sang-Hun;Chung, In-Kyo
    • Journal of the Korean Association of Oral and Maxillofacial Surgeons
    • /
    • v.29 no.5
    • /
    • pp.301-314
    • /
    • 2003
  • The purposes of this retrospective study were to assess the change of serum parameters in oral and maxillofacial surgery patients after operation and to determine what laboratory parameters on treatment periods were associated with the recovery of systemic condition. For purposes of assessing systemic nutritional status, several serum parameters were chosen. The sample patients were randomsubjects extracted from three category patient groups- oral cancer, odontogenic abscess, facial bone fracture based on treated patients at department of oral and maxillofacial surgery in Pusan National University Hospital from September 1, 1998, to September 1, 2002. Each groups were consisted with 10 patients. Each patient chart was examined and blood sample parameters were reviewed with clinical signs, symptoms and vital sign at preoperative day, postoperative 1 day, postoperative 1 week. Several parameters were analyzed statistically for extraction of mean values and differences between the periods groups. The findings of serum parameters of cancer, abscess and fracture groups were as follows: 1. In cancer patients, Hb, MCV, albumin, cholesterol, LDH, AST, ALT, neutrophil, platelet, leukocyte, Na, K, Cl, BUN, creatinine were analyzed. Values of Hb, albumin, AST, neutrophil, leukocyte, Cl showed significantly differences according to periods. 2. In abscess patients, CRP, ESR, leukocyte, body temperature, neutrophil were analyzed. Values of CRP, leukocyte, body temperature, neutrophil showed significanlty differences according to periods. 3. In fracture patients, same parameters with cancer patient's were chosen. Values of platelet, Cl only showed significantly differences according to periods. 4. In cancer patients, data regarding correlation was analyzed statistically as Pearson's value. A positive correlation was found between Hb and albumin, K, Na(P<0.05). A positive correlation was also found between neutrophil and leukocyte(P<0.05). Positive correlations were found between cholesterol and ALT, LDH and platelet, creatinine both, Platelet and BUN, Na and K(P<0.01). 5. In abscess patients, Peason's correlation values were analyzed on parameters. A positive correlation was found only between CRP and neutrophil(P<0.05). 6. In fracture patients, The correlations of parameters also were statistically analyzed. Positive correlations were found between MCV and K, albumin and LDH, AST and three parameters of creatinine, Na, Cl, K and neutrophil, neutrophil and three parameters of leukocyte, BUN, K(P<0.05). Positive correlations were found between LDH and AST, ALT and AST, creatinine both(P<0.01). This retrospective clinical study showed the CRP levels only on abscess patients may be useful in determination of clinical infected status, but the levels of other parameters on cancer, fracture patients did not showed significant values as diagnostic aids for clinical status.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

Extension Method of Association Rules Using Social Network Analysis (사회연결망 분석을 활용한 연관규칙 확장기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.111-126
    • /
    • 2017
  • Recommender systems based on association rule mining significantly contribute to seller's sales by reducing consumers' time to search for products that they want. Recommendations based on the frequency of transactions such as orders can effectively screen out the products that are statistically marketable among multiple products. A product with a high possibility of sales, however, can be omitted from the recommendation if it records insufficient number of transactions at the beginning of the sale. Products missing from the associated recommendations may lose the chance of exposure to consumers, which leads to a decline in the number of transactions. In turn, diminished transactions may create a vicious circle of lost opportunity to be recommended. Thus, initial sales are likely to remain stagnant for a certain period of time. Products that are susceptible to fashion or seasonality, such as clothing, may be greatly affected. This study was aimed at expanding association rules to include into the list of recommendations those products whose initial trading frequency of transactions is low despite the possibility of high sales. The particular purpose is to predict the strength of the direct connection of two unconnected items through the properties of the paths located between them. An association between two items revealed in transactions can be interpreted as the interaction between them, which can be expressed as a link in a social network whose nodes are items. The first step calculates the centralities of the nodes in the middle of the paths that indirectly connect the two nodes without direct connection. The next step identifies the number of the paths and the shortest among them. These extracts are used as independent variables in the regression analysis to predict future connection strength between the nodes. The strength of the connection between the two nodes of the model, which is defined by the number of nodes between the two nodes, is measured after a certain period of time. The regression analysis results confirm that the number of paths between the two products, the distance of the shortest path, and the number of neighboring items connected to the products are significantly related to their potential strength. This study used actual order transaction data collected for three months from February to April in 2016 from an online commerce company. To reduce the complexity of analytics as the scale of the network grows, the analysis was performed only on miscellaneous goods. Two consecutively purchased items were chosen from each customer's transactions to obtain a pair of antecedent and consequent, which secures a link needed for constituting a social network. The direction of the link was determined in the order in which the goods were purchased. Except for the last ten days of the data collection period, the social network of associated items was built for the extraction of independent variables. The model predicts the number of links to be connected in the next ten days from the explanatory variables. Of the 5,711 previously unconnected links, 611 were newly connected for the last ten days. Through experiments, the proposed model demonstrated excellent predictions. Of the 571 links that the proposed model predicts, 269 were confirmed to have been connected. This is 4.4 times more than the average of 61, which can be found without any prediction model. This study is expected to be useful regarding industries whose new products launch quickly with short life cycles, since their exposure time is critical. Also, it can be used to detect diseases that are rarely found in the early stages of medical treatment because of the low incidence of outbreaks. Since the complexity of the social networking analysis is sensitive to the number of nodes and links that make up the network, this study was conducted in a particular category of miscellaneous goods. Future research should consider that this condition may limit the opportunity to detect unexpected associations between products belonging to different categories of classification.

Suggestion of Learning Objectives in Social Dental Hygiene: Oral Health Administration Area (사회치위생학의 학습목표 제안: 구강보건행정 영역)

  • Park, Su-Kyung;Lee, Ga-Yeong;Jang, Young-Eun;Yoo, Sang-Hee;Kim, Yeun-Ju;Lee, Sue-Hyang;Kim, Han-Nah;Jo, Hye-Won;Kim, Myoung-Hee;Kim, Hee-Kyoung;Ryu, Da-Young;Kim, Min-Ji;Shin, Sun-Jung;Kim, Nam-Hee;Yoon, Mi-Sook
    • Journal of dental hygiene science
    • /
    • v.18 no.2
    • /
    • pp.85-96
    • /
    • 2018
  • The purpose of this study is to propose learning objectives in social dental hygiene by analyzing and reviewing learning objectives in oral health administration area of the existing public oral health. This study is a cross-sectional study. The subjects of the study selected with convenience extraction were 15 members of the social dental hygiene subcommittee of the Korean Society of Dental Hygiene Science. Data collection was conducted by self-filling questionnaire. The research tool is from 48 items of A division in the book of learning objectives in the dental hygienist national examination, and this study classified each of them into 'dental hygiene job relevance', 'dental hygiene competency relevance', 'timeliness', and 'value discrimination of educational goal setting' to comprise 192 items. Also, to collect expert opinions, this study conducted Delphi survey on 7 academic experts. Statistical analysis was performed using the IBM SPSS Statistics ver. 23.0 program (IBM Co., Armonk, NY, USA). Recoding was performed according to the degree of relevance of each learning objective and frequency analysis was performed. This study removed 18 items from the whole learning objectives in the dental hygienist national examination in the oral health administration area of public oral health. Fifteen revisions were made and 15 existing learning objectives were maintained. Forty-five learning objectives were proposed as new social dental hygiene learning objectives. The topics of learning objectives are divided into social security and medical assistance, oral health care system, oral health administration, and oral health policy. As a result of this study, it was necessary to construct the learning objectives of social dental hygiene in response to changing situation at the time. The contents of education should be revised in order of revision of learning objectives, development of competency, development of learning materials, and national examination.

The Jurisdictional Precedent Analysis of Medical Dispute in Dental Field (치과임상영역에서 발생된 의료분쟁의 판례분석)

  • Kwon, Byung-Ki;Ahn, Hyoung-Joon;Kang, Jin-Kyu;Kim, Chong-Youl;Choi, Jong-Hoon
    • Journal of Oral Medicine and Pain
    • /
    • v.31 no.4
    • /
    • pp.283-296
    • /
    • 2006
  • Along with the development of scientific technologies, health care has been growing remarkably, and as the social life quality improves with increasing interest in health, the demand for medical service is rapidly increasing. However, medical accident and medical dispute also are rapidly increasing due to various factors such as, increasing sense of people's right, lack of understanding in the nature of medical practice, over expectation on medical technique, commercialize medical supply system, moral degeneracy and unawareness of medical jurisprudence by doctors, widespread trend of mutual distrust, and lack of systematized device for solution of medical dispute. This study analysed 30 cases of civil suit in the year between 1994 to 2004, which were selected among the medical dispute cases in dental field with the judgement collected from organizations related to dentistry and department of oral medicine, Yonsei university dental hospital. The following results were drawn from the analyses: 1. The distribution of year showed rapid increase of medical dispute after the year 2000. 2. In the types of medical dispute, suit associated with tooth extraction took 36.7% of all. 3. As for the cause of medical dispute, uncomfortable feeling and dissatisfaction with the treatment showed 36.7%, death and permanent damage showed 16.7% each. 4. Winning the suit, compulsory mediation and recommendation for settlement took 60.0% of judgement result for the plaintiff. 5. For the type of medical organization in relation to medical dispute, 60.0% was found to be the private dental clinics, and 30.0% was university dental hospitals. 6. For the level of trial, dispute that progressed above 2 or 3 trials was of 30.0%. 7. For the amount of claim for damage, the claim amounting between 50 million to 100 million won was of 36.7%, and that of more than 100 million won was 13.3%, and in case of the judgement amount, the amount ranging from 10 million to 30 million won was of 40.0%, and that of more than 100 million won was of 6.7%. 8. For the number of dentist involved in the suit, 26.7% was of 2 or more dentists. 9. For the amount of time spent until the judgement, 46.7% took 11 to 20 months, and 36.7% took 21 to 30 months. 10. For medical malpractice, 46.7% was judged to be guilty, and 70% of the cases had undergone medical judgement or verification of the case by specialists during the process of the suit. 11. In the lost cases of doctors(18 cases), 72.2% was due to violence of carefulness in practice and 16.7% was due to missing of explanation to patient. Medical disputes occurring in the field of dentistry are usually of relatively less risky cases. Hence, the importance of explanation to patient is emphasized, and since the levels of patient satisfaction are subjective, improvement of the relationship between the patient and the dentist and recovery of autonomy within the group dentist are essential in addition to the reduction of technical malpractice. Moreover, management measure against the medical dispute should be set up through complement of the current doctors and hospitals medical malpractice insurance which is being conducted irrationally, and establishment of system in which education as well as consultation for medical disputes lead by the group of dental clinicians and academic scholars are accessible.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Sesquiterpenoids Bioconversion Analysis by Wood Rot Fungi

  • Lee, Su-Yeon;Ryu, Sun-Hwa;Choi, In-Gyu;Kim, Myungkil
    • 한국균학회소식:학술대회논문집
    • /
    • 2016.05a
    • /
    • pp.19-20
    • /
    • 2016
  • Sesquiterpenoids are defined as $C_{15}$ compounds derived from farnesyl pyrophosphate (FPP), and their complex structures are found in the tissue of many diverse plants (Degenhardt et al. 2009). FPP's long chain length and additional double bond enables its conversion to a huge range of mono-, di-, and tri-cyclic structures. A number of cyclic sesquiterpenes with alcohol, aldehyde, and ketone derivatives have key biological and medicinal properties (Fraga 1999). Fungi, such as the wood-rotting Polyporus brumalis, are excellent sources of pharmaceutically interesting natural products such as sesquiterpenoids. In this study, we investigated the biosynthesis of P. brumalis sesquiterpenoids on modified medium. Fungal suspensions of 11 white rot species were inoculated in modified medium containing $C_6H_{12}O_6$, $C_4H_{12}N_2O_6$, $KH_2PO_4$, $MgSO_4$, and $CaCl_2$ for 20 days. Cultivation was stopped by solvent extraction via separation of the mycelium. The metabolites were identified as follows: propionic acid (1), mevalonic acid lactone (2), ${\beta}$-eudesmane (3), and ${\beta}$-eudesmol (4), respectively (Figure 1). The main peaks of ${\beta}$-eudesmane and ${\beta}$-eudesmol, which were indicative of sesquiterpene structures, were consistently detected for 5, 7, 12, and 15 days These results demonstrated the existence of terpene metabolism in the mycelium of P. brumalis. Polyporus spp. are known to generate flavor components such as methyl 2,4-dihydroxy-3,6-dimethyl benzoate; 2-hydroxy-4-methoxy-6-methyl benzoic acid; 3-hydroxy-5-methyl phenol; and 3-methoxy-2,5-dimethyl phenol in submerged cultures (Hoffmann and Esser 1978). Drimanes of sesquiterpenes were reported as metabolites from P. arcularius and shown to exhibit antimicrobial activity against Gram-positive bacteria such as Staphylococcus aureus (Fleck et al. 1996). The main metabolites of P. brumalis, ${\beta}$-Eudesmol and ${\beta}$-eudesmane, were categorized as eudesmane-type sesquiterpene structures. The eudesmane skeleton could be biosynthesized from FPP-derived IPP, and approximately 1,000 structures have been identified in plants as essential oils. The biosynthesis of eudesmol from P. brumalis may thus be an important tool for the production of useful natural compounds as presumed from its identified potent bioactivity in plants. Essential oils comprising eudesmane-type sesquiterpenoids have been previously and extensively researched (Wu et al. 2006). ${\beta}$-Eudesmol is a well-known and important eudesmane alcohol with an anticholinergic effect in the vascular endothelium (Tsuneki et al. 2005). Additionally, recent studies demonstrated that ${\beta}$-eudesmol acts as a channel blocker for nicotinic acetylcholine receptors at the neuromuscular junction, and it can inhibit angiogenesis in vitro and in vivo by blocking the mitogen-activated protein kinase (MAPK) signaling pathway (Seo et al. 2011). Variation of nutrients was conducted to determine an optimum condition for the biosynthesis of sesquiterpenes by P. brumalis. Genes encoding terpene synthases, which are crucial to the terpene synthesis pathway, generally respond to environmental factors such as pH, temperature, and available nutrients (Hoffmeister and Keller 2007, Yu and Keller 2005). Calvo et al. described the effect of major nutrients, carbon and nitrogen, on the synthesis of secondary metabolites (Calvo et al. 2002). P. brumalis did not prefer to synthesize sesquiterpenes under all growth conditions. Results of differences in metabolites observed in P. brumalis grown in PDB and modified medium highlighted the potential effect inorganic sources such as $C_4H_{12}N_2O_6$, $KH_2PO_4$, $MgSO_4$, and $CaCl_2$ on sesquiterpene synthesis. ${\beta}$-eudesmol was apparent during cultivation except for when P. brumalis was grown on $MgSO_4$-free medium. These results demonstrated that $MgSO_4$ can specifically control the biosynthesis of ${\beta}$-eudesmol. Magnesium has been reported as a cofactor that binds to sesquiterpene synthase (Agger et al. 2008). Specifically, the $Mg^{2+}$ ions bind to two conserved metal-binding motifs. These metal ions complex to the substrate pyrophosphate, thereby promoting the ionization of the leaving groups of FPP and resulting in the generation of a highly reactive allylic cation. Effect of magnesium source on the sesquiterpene biosynthesis was also identified via analysis of the concentration of total carbohydrates. Our current study offered further insight that fungal sesquiterpene biosynthesis can be controlled by nutrients. To profile the metabolites of P. brumalis, the cultures were extracted based on the growth curve. Despite metabolites produced during mycelia growth, there was difficulty in detecting significant changes in metabolite production, especially those at low concentrations. These compounds may be of interest in understanding their synthetic mechanisms in P. brumalis. The synthesis of terpene compounds began during the growth phase at day 9. Sesquiterpene synthesis occurred after growth was complete. At day 9, drimenol, farnesol, and mevalonic lactone (or mevalonic acid lactone) were identified. Mevalonic acid lactone is the precursor of the mevalonic pathway, and particularly, it is a precursor for a number of biologically important lipids, including cholesterol hormones (Buckley et al. 2002). Farnesol is the precursor of sesquiterpenoids. Drimenol compounds, bi-cyclic-sesquiterpene alcohols, can be synthesized from trans-trans farnesol via cyclization and rearrangement (Polovinka et al. 1994). They have also been identified in the basidiomycota Lentinus lepideus as secondary metabolites. After 12 days in the growth phase, ${\beta}$-elemene caryophyllene, ${\delta}$-cadiene, and eudesmane were detected with ${\beta}$-eudesmol. The data showed the synthesis of sesquiterpene hydrocarbons with bi-cyclic structures. These compounds can be synthesized from FPP by cyclization. Cyclic terpenoids are synthesized through the formation of a carbon skeleton from linear precursors by terpene cyclase, which is followed by chemical modification by oxidation, reduction, methylation, etc. Sesquiterpene cyclase is a key branch-point enzyme that catalyzes the complex intermolecular cyclization of the linear prenyl diphosphate into cyclic hydrocarbons (Toyomasu et al. 2007). After 20 days in stationary phase, the oxygenated structures eudesmol, elemol, and caryophyllene oxide were detected. Thus, after growth, sesquiterpenes were identified. Per these results, we showed that terpene metabolism in wood-rotting fungi occurs in the stationary phase. We also showed that such metabolism can be controlled by magnesium supplementation in the growth medium. In conclusion, we identified P. brumalis as a wood-rotting fungus that can produce sesquiterpenes. To mechanistically understand eudesmane-type sesquiterpene biosynthesis in P. brumalis, further research into the genes regulating the dynamics of such biosynthesis is warranted.

  • PDF

Construction of Consumer Confidence index based on Sentiment analysis using News articles (뉴스기사를 이용한 소비자의 경기심리지수 생성)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.1-27
    • /
    • 2017
  • It is known that the economic sentiment index and macroeconomic indicators are closely related because economic agent's judgment and forecast of the business conditions affect economic fluctuations. For this reason, consumer sentiment or confidence provides steady fodder for business and is treated as an important piece of economic information. In Korea, private consumption accounts and consumer sentiment index highly relevant for both, which is a very important economic indicator for evaluating and forecasting the domestic economic situation. However, despite offering relevant insights into private consumption and GDP, the traditional approach to measuring the consumer confidence based on the survey has several limits. One possible weakness is that it takes considerable time to research, collect, and aggregate the data. If certain urgent issues arise, timely information will not be announced until the end of each month. In addition, the survey only contains information derived from questionnaire items, which means it can be difficult to catch up to the direct effects of newly arising issues. The survey also faces potential declines in response rates and erroneous responses. Therefore, it is necessary to find a way to complement it. For this purpose, we construct and assess an index designed to measure consumer economic sentiment index using sentiment analysis. Unlike the survey-based measures, our index relies on textual analysis to extract sentiment from economic and financial news articles. In particular, text data such as news articles and SNS are timely and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. There exist two main approaches to the automatic extraction of sentiment from a text, we apply the lexicon-based approach, using sentiment lexicon dictionaries of words annotated with the semantic orientations. In creating the sentiment lexicon dictionaries, we enter the semantic orientation of individual words manually, though we do not attempt a full linguistic analysis (one that involves analysis of word senses or argument structure); this is the limitation of our research and further work in that direction remains possible. In this study, we generate a time series index of economic sentiment in the news. The construction of the index consists of three broad steps: (1) Collecting a large corpus of economic news articles on the web, (2) Applying lexicon-based methods for sentiment analysis of each article to score the article in terms of sentiment orientation (positive, negative and neutral), and (3) Constructing an economic sentiment index of consumers by aggregating monthly time series for each sentiment word. In line with existing scholarly assessments of the relationship between the consumer confidence index and macroeconomic indicators, any new index should be assessed for its usefulness. We examine the new index's usefulness by comparing other economic indicators to the CSI. To check the usefulness of the newly index based on sentiment analysis, trend and cross - correlation analysis are carried out to analyze the relations and lagged structure. Finally, we analyze the forecasting power using the one step ahead of out of sample prediction. As a result, the news sentiment index correlates strongly with related contemporaneous key indicators in almost all experiments. We also find that news sentiment shocks predict future economic activity in most cases. In almost all experiments, the news sentiment index strongly correlates with related contemporaneous key indicators. Furthermore, in most cases, news sentiment shocks predict future economic activity; in head-to-head comparisons, the news sentiment measures outperform survey-based sentiment index as CSI. Policy makers want to understand consumer or public opinions about existing or proposed policies. Such opinions enable relevant government decision-makers to respond quickly to monitor various web media, SNS, or news articles. Textual data, such as news articles and social networks (Twitter, Facebook and blogs) are generated at high-speeds and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. Although research using unstructured data in economic analysis is in its early stages, but the utilization of data is expected to greatly increase once its usefulness is confirmed.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.