• Title/Summary/Keyword: 데이터 활용

Search Result 18,183, Processing Time 0.045 seconds

A Study on analysis of contrasts and variation in SUV with the passage of uptake time in 18F-FDOPA Brain PET/CT (18F-FDOPA Brain PET/CT 검사의 영상 대조도 분석 및 섭취 시간에 따른 SUV변화 고찰)

  • Seo, Kang rok;Lee, Jeong eun;Ko, Hyun soo;Ryu, Jae kwang;Nam, Ki pyo
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.23 no.1
    • /
    • pp.69-74
    • /
    • 2019
  • Purpose $^{18}F$-FDOPA using amino acid is particularly attractive for imaging of brain tumors because of the high uptake in tumor tissue and the low uptake in normal brain tissue. But, on the other hand, $^{18}F$-FDG is highly uptake in both tumor tissue and normal brain tissue. The purpose of study is to evaluate comparison of contrasts in $^{18}F$-FDOPA Brain PET/CT and $^{18}F$-FDG Brain PET/CT and to find out optimal scan time by analysis of variation in SUV with the passage of uptake time. Materials and Methods A region of interest of approximately $350mm^2$ at the center of the tumor and cerebellum in 12 patients ($51.4{\pm}12.8yrs$) who $^{18}F$-FDG Brain PET/CT and $^{18}F$-FDOPA Brain PET/CT were examined more than once each. The $SUV_{max}$ was measured, and the $SUV_{max}$ ratio (T/C ratio) of the tumor cerebellum was calculated. In the analysis of SUV, T/C ratio was calculated for each frame after dividing into 15 frames of 2 minutes each using List mode data in 25 patients ($49.{\pm}10.3yrs$). SPSS 21 was used to compare T/C ratio of $^{18}F$-FDOPA and T/C ratio of $^{18}F$-FDG. Results The T/C ratio of $^{18}F$-FDOPA Brain PET/CT was higher than the T/C ratio of $^{18}F$-FDG Brain, and show a significant difference according to a paired t-test(t=-5.214, p=0.000). As a result of analyzing changes in $SUV_{max}$ and T/C ratio, the peak point of $SUV_{max}$ was $5.6{\pm}2.9$ and appeared in the fourth frame (6 to 8 minutes), and the peak of T/C ratio also appeared in the fourth frame (6 to 8 minutes). Taking this into consideration and comparing the existing 10 to 30 minutes image and 6 to 26 minutes image, the $SUV_{max}$ and T/C ratio increased by 0.2 and 0.1 each, compared to the 10 to 30 minutes image for 6 to 26 minutes image. Conclusion From this study, $^{18}F$-FDOPA Brain PET/CT is effective when reading the image, because the T/C ratio of $^{18}F$-FDOPA Brain PET/CT was higher than T/C ratio of $^{18}F$-FDG Brain PET/CT. In addition, in the case of $^{18}F$-FDOPA Brain PET/CT, there was no difference between the existing 10 to 30 minutes image and 6 to 26 minutes image. Through continuous research, we can find possibility of shortening examination time in $^{18}F$-FDOPA Brain PET/CT. Also, we can help physician to accurate reading using additional scan data.

Analysis of Korean Dietary Patterns using Food Intake Data - Focusing on Kimchi and Alcoholic Beverages (식품섭취량을 활용한 우리나라 식이 패턴 분석 - 김치류 및 주류 중심으로)

  • Kim, Soo-Hwaun;Choi, Jang-Duck;Kim, Sheen-Hee;Lee, Joon-Goo;Kwon, Yu-Jihn;Shin, Choonshik;Shin, Min-Su;Chun, So-Young;Kang, Gil-Jin
    • Journal of Food Hygiene and Safety
    • /
    • v.34 no.3
    • /
    • pp.251-262
    • /
    • 2019
  • In this study, we analyzed Korean dietary habits with food intake data from the Korea National Health and Nutrition Examination Survey (KNHANES) and the Korea Centers for Disease Control and Prevention and we proposed a set of management guidelines for future Korean dietary habits. A total of 839 food items (1,419 foods) were analyzed according to the food catagories in "Food Code", which is the representative food classification system in Korea. The average total daily food intake was 1,585.77 g/day, with raw and processed foods accounting for 858.96 g/day and 726.81 g/day, respectively. Cereal grains contributed to the highest proportion of the food intake. Over 90% of subjects consumed cereal grains (99.09%) and root and tuber vegetables (95.80%) among the top 15 consumed food groups. According to the analysis by item, rice, Korean cabbage kimchi, apple, radish, egg, chili pepper, onion, wheat, soybean curds, potato, cucumber and pork were major (at least 1% of the average daily intake, 158.6 g/day) and frequently (eaten by more than 25% of subjects, 5,168 persons) consumed food items, and Korean spices were at the top of this list. In the case of kimchi, the proportion of intake of Korean cabbage kimchi (64.89 g/day) was the highest. In the case of alcoholic beverages, intake was highest by order of beer (63.53 g/day), soju (39.11 g/day) and makgeolli (19.70 g/day), and intake frequency was high in order of soju (11.3%), beer (7.2%), and sake (6.6%). Analysis results by seasonal intake trends showed that cereal grains have steadily decreased and beverages have slightly risen. In the case of alcoholic beverage consumption frequency, some kinds of makgeolli, wine, sake, and black raspberry wine have decreased gradually year by year. The consumption trend for kimchi has been gradually decreasing as well.

Deriving adoption strategies of deep learning open source framework through case studies (딥러닝 오픈소스 프레임워크의 사례연구를 통한 도입 전략 도출)

  • Choi, Eunjoo;Lee, Junyeong;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.27-65
    • /
    • 2020
  • Many companies on information and communication technology make public their own developed AI technology, for example, Google's TensorFlow, Facebook's PyTorch, Microsoft's CNTK. By releasing deep learning open source software to the public, the relationship with the developer community and the artificial intelligence (AI) ecosystem can be strengthened, and users can perform experiment, implementation and improvement of it. Accordingly, the field of machine learning is growing rapidly, and developers are using and reproducing various learning algorithms in each field. Although various analysis of open source software has been made, there is a lack of studies to help develop or use deep learning open source software in the industry. This study thus attempts to derive a strategy for adopting the framework through case studies of a deep learning open source framework. Based on the technology-organization-environment (TOE) framework and literature review related to the adoption of open source software, we employed the case study framework that includes technological factors as perceived relative advantage, perceived compatibility, perceived complexity, and perceived trialability, organizational factors as management support and knowledge & expertise, and environmental factors as availability of technology skills and services, and platform long term viability. We conducted a case study analysis of three companies' adoption cases (two cases of success and one case of failure) and revealed that seven out of eight TOE factors and several factors regarding company, team and resource are significant for the adoption of deep learning open source framework. By organizing the case study analysis results, we provided five important success factors for adopting deep learning framework: the knowledge and expertise of developers in the team, hardware (GPU) environment, data enterprise cooperation system, deep learning framework platform, deep learning framework work tool service. In order for an organization to successfully adopt a deep learning open source framework, at the stage of using the framework, first, the hardware (GPU) environment for AI R&D group must support the knowledge and expertise of the developers in the team. Second, it is necessary to support the use of deep learning frameworks by research developers through collecting and managing data inside and outside the company with a data enterprise cooperation system. Third, deep learning research expertise must be supplemented through cooperation with researchers from academic institutions such as universities and research institutes. Satisfying three procedures in the stage of using the deep learning framework, companies will increase the number of deep learning research developers, the ability to use the deep learning framework, and the support of GPU resource. In the proliferation stage of the deep learning framework, fourth, a company makes the deep learning framework platform that improves the research efficiency and effectiveness of the developers, for example, the optimization of the hardware (GPU) environment automatically. Fifth, the deep learning framework tool service team complements the developers' expertise through sharing the information of the external deep learning open source framework community to the in-house community and activating developer retraining and seminars. To implement the identified five success factors, a step-by-step enterprise procedure for adoption of the deep learning framework was proposed: defining the project problem, confirming whether the deep learning methodology is the right method, confirming whether the deep learning framework is the right tool, using the deep learning framework by the enterprise, spreading the framework of the enterprise. The first three steps (i.e. defining the project problem, confirming whether the deep learning methodology is the right method, and confirming whether the deep learning framework is the right tool) are pre-considerations to adopt a deep learning open source framework. After the three pre-considerations steps are clear, next two steps (i.e. using the deep learning framework by the enterprise and spreading the framework of the enterprise) can be processed. In the fourth step, the knowledge and expertise of developers in the team are important in addition to hardware (GPU) environment and data enterprise cooperation system. In final step, five important factors are realized for a successful adoption of the deep learning open source framework. This study provides strategic implications for companies adopting or using deep learning framework according to the needs of each industry and business.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • The Pattern Analysis of Financial Distress for Non-audited Firms using Data Mining (데이터마이닝 기법을 활용한 비외감기업의 부실화 유형 분석)

    • Lee, Su Hyun;Park, Jung Min;Lee, Hyoung Yong
      • Journal of Intelligence and Information Systems
      • /
      • v.21 no.4
      • /
      • pp.111-131
      • /
      • 2015
    • There are only a handful number of research conducted on pattern analysis of corporate distress as compared with research for bankruptcy prediction. The few that exists mainly focus on audited firms because financial data collection is easier for these firms. But in reality, corporate financial distress is a far more common and critical phenomenon for non-audited firms which are mainly comprised of small and medium sized firms. The purpose of this paper is to classify non-audited firms under distress according to their financial ratio using data mining; Self-Organizing Map (SOM). SOM is a type of artificial neural network that is trained using unsupervised learning to produce a lower dimensional discretized representation of the input space of the training samples, called a map. SOM is different from other artificial neural networks as it applies competitive learning as opposed to error-correction learning such as backpropagation with gradient descent, and in the sense that it uses a neighborhood function to preserve the topological properties of the input space. It is one of the popular and successful clustering algorithm. In this study, we classify types of financial distress firms, specially, non-audited firms. In the empirical test, we collect 10 financial ratios of 100 non-audited firms under distress in 2004 for the previous two years (2002 and 2003). Using these financial ratios and the SOM algorithm, five distinct patterns were distinguished. In pattern 1, financial distress was very serious in almost all financial ratios. 12% of the firms are included in these patterns. In pattern 2, financial distress was weak in almost financial ratios. 14% of the firms are included in pattern 2. In pattern 3, growth ratio was the worst among all patterns. It is speculated that the firms of this pattern may be under distress due to severe competition in their industries. Approximately 30% of the firms fell into this group. In pattern 4, the growth ratio was higher than any other pattern but the cash ratio and profitability ratio were not at the level of the growth ratio. It is concluded that the firms of this pattern were under distress in pursuit of expanding their business. About 25% of the firms were in this pattern. Last, pattern 5 encompassed very solvent firms. Perhaps firms of this pattern were distressed due to a bad short-term strategic decision or due to problems with the enterpriser of the firms. Approximately 18% of the firms were under this pattern. This study has the academic and empirical contribution. In the perspectives of the academic contribution, non-audited companies that tend to be easily bankrupt and have the unstructured or easily manipulated financial data are classified by the data mining technology (Self-Organizing Map) rather than big sized audited firms that have the well prepared and reliable financial data. In the perspectives of the empirical one, even though the financial data of the non-audited firms are conducted to analyze, it is useful for find out the first order symptom of financial distress, which makes us to forecast the prediction of bankruptcy of the firms and to manage the early warning and alert signal. These are the academic and empirical contribution of this study. The limitation of this research is to analyze only 100 corporates due to the difficulty of collecting the financial data of the non-audited firms, which make us to be hard to proceed to the analysis by the category or size difference. Also, non-financial qualitative data is crucial for the analysis of bankruptcy. Thus, the non-financial qualitative factor is taken into account for the next study. This study sheds some light on the non-audited small and medium sized firms' distress prediction in the future.

    An Expert System for the Estimation of the Growth Curve Parameters of New Markets (신규시장 성장모형의 모수 추정을 위한 전문가 시스템)

    • Lee, Dongwon;Jung, Yeojin;Jung, Jaekwon;Park, Dohyung
      • Journal of Intelligence and Information Systems
      • /
      • v.21 no.4
      • /
      • pp.17-35
      • /
      • 2015
    • Demand forecasting is the activity of estimating the quantity of a product or service that consumers will purchase for a certain period of time. Developing precise forecasting models are considered important since corporates can make strategic decisions on new markets based on future demand estimated by the models. Many studies have developed market growth curve models, such as Bass, Logistic, Gompertz models, which estimate future demand when a market is in its early stage. Among the models, Bass model, which explains the demand from two types of adopters, innovators and imitators, has been widely used in forecasting. Such models require sufficient demand observations to ensure qualified results. In the beginning of a new market, however, observations are not sufficient for the models to precisely estimate the market's future demand. For this reason, as an alternative, demands guessed from those of most adjacent markets are often used as references in such cases. Reference markets can be those whose products are developed with the same categorical technologies. A market's demand may be expected to have the similar pattern with that of a reference market in case the adoption pattern of a product in the market is determined mainly by the technology related to the product. However, such processes may not always ensure pleasing results because the similarity between markets depends on intuition and/or experience. There are two major drawbacks that human experts cannot effectively handle in this approach. One is the abundance of candidate reference markets to consider, and the other is the difficulty in calculating the similarity between markets. First, there can be too many markets to consider in selecting reference markets. Mostly, markets in the same category in an industrial hierarchy can be reference markets because they are usually based on the similar technologies. However, markets can be classified into different categories even if they are based on the same generic technologies. Therefore, markets in other categories also need to be considered as potential candidates. Next, even domain experts cannot consistently calculate the similarity between markets with their own qualitative standards. The inconsistency implies missing adjacent reference markets, which may lead to the imprecise estimation of future demand. Even though there are no missing reference markets, the new market's parameters can be hardly estimated from the reference markets without quantitative standards. For this reason, this study proposes a case-based expert system that helps experts overcome the drawbacks in discovering referential markets. First, this study proposes the use of Euclidean distance measure to calculate the similarity between markets. Based on their similarities, markets are grouped into clusters. Then, missing markets with the characteristics of the cluster are searched for. Potential candidate reference markets are extracted and recommended to users. After the iteration of these steps, definite reference markets are determined according to the user's selection among those candidates. Then, finally, the new market's parameters are estimated from the reference markets. For this procedure, two techniques are used in the model. One is clustering data mining technique, and the other content-based filtering of recommender systems. The proposed system implemented with those techniques can determine the most adjacent markets based on whether a user accepts candidate markets. Experiments were conducted to validate the usefulness of the system with five ICT experts involved. In the experiments, the experts were given the list of 16 ICT markets whose parameters to be estimated. For each of the markets, the experts estimated its parameters of growth curve models with intuition at first, and then with the system. The comparison of the experiments results show that the estimated parameters are closer when they use the system in comparison with the results when they guessed them without the system.

    Study on 3D Printer Suitable for Character Merchandise Production Training (캐릭터 상품 제작 교육에 적합한 3D프린터 연구)

    • Kwon, Dong-Hyun
      • Cartoon and Animation Studies
      • /
      • s.41
      • /
      • pp.455-486
      • /
      • 2015
    • The 3D printing technology, which started from the patent registration in 1986, was a technology that did not attract attention other than from some companies, due to the lack of awareness at the time. However, today, as expiring patents are appearing after the passage of 20 years, the price of 3D printers have decreased to the level of allowing purchase by individuals and the technology is attracting attention from industries, in addition to the general public, such as by naturally accepting 3D and to share 3D data, based on the generalization of online information exchange and improvement of computer performance. The production capability of 3D printers, which is based on digital data enabling digital transmission and revision and supplementation or production manufacturing not requiring molding, may provide a groundbreaking change to the process of manufacturing, and may attain the same effect in the character merchandise sector. Using a 3D printer is becoming a necessity in various figure merchandise productions which are in the forefront of the kidult culture that is recently gaining attention, and when predicting the demand by the industrial sites related to such character merchandise and when considering the more inexpensive price due to the expiration of patents and sharing of technology, expanding opportunities and sectors of employment and cultivating manpower that are able to engage in further creative work seems as a must, by introducing education courses cultivating manpower that can utilize 3D printers at the education field. However, there are limits in the information that can be obtained when seeking to introduce 3D printers in school education. Because the press or information media only mentions general information, such as the growth of the industrial size or prosperous future value of 3D printers, the research level of the academic world also remains at the level of organizing contents in an introductory level, such as by analyzing data on industrial size, analyzing the applicable scope in the industry, or introducing the printing technology. Such lack of information gives rise to problems at the education site. There would be no choice but to incur temporal and opportunity expenses, since the technology would only be able to be used after going through trials and errors, by first introducing the technology without examining the actual information, such as through comparing the strengths and weaknesses. In particular, if an expensive equipment introduced does not suit the features of school education, the loss costs would be significant. This research targeted general users without a technology-related basis, instead of specialists. By comparing the strengths and weaknesses and analyzing the problems and matters requiring notice upon use, pursuant to the representative technologies, instead of merely introducing the 3D printer technology as had been done previously, this research sought to explain the types of features that a 3D printer should have, in particular, when required in education relating to the development of figure merchandise as an optional cultural contents at cartoon-related departments, and sought to provide information that can be of practical help when seeking to provide education using 3D printers in the future. In the main body, the technologies were explained by making a classification based on a new perspective, such as the buttress method, types of materials, two-dimensional printing method, and three-dimensional printing method. The reason for selecting such different classification method was to easily allow mutual comparison of the practical problems upon use. In conclusion, the most suitable 3D printer was selected as the printer in the FDM method, which is comparatively cheap and requires low repair and maintenance cost and low materials expenses, although rather insufficient in the quality of outputs, and a recommendation was made, in addition, to select an entity that is supportive in providing technical support.

    Application of Support Vector Regression for Improving the Performance of the Emotion Prediction Model (감정예측모형의 성과개선을 위한 Support Vector Regression 응용)

    • Kim, Seongjin;Ryoo, Eunchung;Jung, Min Kyu;Kim, Jae Kyeong;Ahn, Hyunchul
      • Journal of Intelligence and Information Systems
      • /
      • v.18 no.3
      • /
      • pp.185-202
      • /
      • 2012
    • .Since the value of information has been realized in the information society, the usage and collection of information has become important. A facial expression that contains thousands of information as an artistic painting can be described in thousands of words. Followed by the idea, there has recently been a number of attempts to provide customers and companies with an intelligent service, which enables the perception of human emotions through one's facial expressions. For example, MIT Media Lab, the leading organization in this research area, has developed the human emotion prediction model, and has applied their studies to the commercial business. In the academic area, a number of the conventional methods such as Multiple Regression Analysis (MRA) or Artificial Neural Networks (ANN) have been applied to predict human emotion in prior studies. However, MRA is generally criticized because of its low prediction accuracy. This is inevitable since MRA can only explain the linear relationship between the dependent variables and the independent variable. To mitigate the limitations of MRA, some studies like Jung and Kim (2012) have used ANN as the alternative, and they reported that ANN generated more accurate prediction than the statistical methods like MRA. However, it has also been criticized due to over fitting and the difficulty of the network design (e.g. setting the number of the layers and the number of the nodes in the hidden layers). Under this background, we propose a novel model using Support Vector Regression (SVR) in order to increase the prediction accuracy. SVR is an extensive version of Support Vector Machine (SVM) designated to solve the regression problems. The model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that is close (within a threshold ${\varepsilon}$) to the model prediction. Using SVR, we tried to build a model that can measure the level of arousal and valence from the facial features. To validate the usefulness of the proposed model, we collected the data of facial reactions when providing appropriate visual stimulating contents, and extracted the features from the data. Next, the steps of the preprocessing were taken to choose statistically significant variables. In total, 297 cases were used for the experiment. As the comparative models, we also applied MRA and ANN to the same data set. For SVR, we adopted '${\varepsilon}$-insensitive loss function', and 'grid search' technique to find the optimal values of the parameters like C, d, ${\sigma}^2$, and ${\varepsilon}$. In the case of ANN, we adopted a standard three-layer backpropagation network, which has a single hidden layer. The learning rate and momentum rate of ANN were set to 10%, and we used sigmoid function as the transfer function of hidden and output nodes. We performed the experiments repeatedly by varying the number of nodes in the hidden layer to n/2, n, 3n/2, and 2n, where n is the number of the input variables. The stopping condition for ANN was set to 50,000 learning events. And, we used MAE (Mean Absolute Error) as the measure for performance comparison. From the experiment, we found that SVR achieved the highest prediction accuracy for the hold-out data set compared to MRA and ANN. Regardless of the target variables (the level of arousal, or the level of positive / negative valence), SVR showed the best performance for the hold-out data set. ANN also outperformed MRA, however, it showed the considerably lower prediction accuracy than SVR for both target variables. The findings of our research are expected to be useful to the researchers or practitioners who are willing to build the models for recognizing human emotions.

    Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

    • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
      • Journal of Intelligence and Information Systems
      • /
      • v.19 no.4
      • /
      • pp.123-132
      • /
      • 2013
    • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

    Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

    • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
      • Journal of Intelligence and Information Systems
      • /
      • v.20 no.1
      • /
      • pp.163-176
      • /
      • 2014
    • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.