• Title/Summary/Keyword: standard classification system

Search Result 631, Processing Time 0.029 seconds

A Study on the Traditional Geographic System Recognition and Environmental Value Estimate of Hannamkeumbuk-Keumbuk Mountains for the Establishment of a Management Plan (관리계획 수립을 위한 한남금북.금북정맥의 전통적 지리체계인식과 환경가치 추정 연구)

  • Kang, Kee-Rae;Kim, Dong-Pil
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.40 no.1
    • /
    • pp.23-33
    • /
    • 2012
  • In this study, how much users of Hannamkeumbuk Keumbuk Mountains are aware of Baekdaegan and its attached mountain chains, a traditional geographic system, according to Sangyungpyo and basic data like the degree of awareness and use-behaviors, etc. have been studied. In addition, the environmental value of Hannamkeumbuk Keumbuk Mountains separating the central and the southern part of Korea among attached mountain ranges, secondary mountain chains, which act as an ecosystem buffer in the Baekdudaegan Range, has been estimated at the current amount of currency. In the questions of the perception of the traditional classification standard of mountain chains and Baekdudaegan, more than 70% of respondents answered that they had heard of or known them but 66.8% werenot aware of Hannamkeumbuk Keumbuk Mountains. While the awareness for Baekdudaegan is high, the perception of its attached mountain chains was very poor. DBDC responder system and CVM, which is used widely for the value estimate method of environment goods, were used. As the result, an additional benefit got when a person visits Hannamkeumbuk Keumbuk mountains was estimated as 5,813 won. It could find out that this amount was very low compared with 51,984 won, average visit cost. It judged that the reason was that damage of environmental conditions, the monotony of the trails and progress of indiscriminate environmental destruction, etc. The results of this study will offer a new perspective on public relations activities and resource conservation of Baekdudaegan and its attached mountain chains and estimate perceptions and efficient services for visitors to HannamKeumbuk Keumbuk Mountains. This study will act as data for basic planning and management to increase the mountains' value and to preserve them. Further studies are needed to make a frame of work division and management with various organizations so that the management of Hannamkeumbuk-Keumbuk Mountains may be properly established and their value may been hanced.

Drainage Analysis for the Anyang-cheon Upper-watershed Management Planning (유역관리계획수립(流域管理計劃樹立)에 관(關)한 기초적(基礎的) 연구(硏究))

  • Woo, Bo Myeong
    • Journal of Korean Society of Forest Science
    • /
    • v.42 no.1
    • /
    • pp.39-54
    • /
    • 1979
  • Such stream characteristics as the numbers, lengths, orders of stream channels, and drainage density are the essential elements for the analysis of drainages in planning of watershed management in a drainage basin. The drainage net is the pattern of tributaries and master streams in a drainage basin as declineated on a planimetric map. Stream order is a measure of the position of a stream in the hierarchy of tributaries. Density of the drainage is given by the quotient of the cumulative length of stream and the total drainage area. Drainage density then is simply a length per unit of area. In this study, the Anyang-cheon upper-watershed is selected for the survey and analysis of the stream system and drainage density in view point of the useful collection of data for effective watershed management planning. The Anyang-cheon upper-watershed is consisted of about 12,600 hectars of drainage area including the 13 Sub-stream. Total length of the Stream (as described in the Stream Law) in the survey area is measured as much as 71.2km, and that of the Small-stream as descrived in the Saemaul Stream Survey Book (1972) is calculated as 43,010 meters. Besides of this lengths, measured about 43,410 meters of the Small-stream and about 71,900 meters of the Torrential valley through this study. The range of the drainage density among the 13 Sub-streams having sub-watershed is analysed as from 14.79 to 24.10, and average value of drainage density in the entire watershed is calculated as 18.21 in case of including the length of the Torrential valley and 12.50 in case of excluding the same. It is required that the standard classification system in classifing for the characteristics of identification among the Stream, Sub-stream, Small-stream, Torrent, and Torrential valley must be satisfied through joint study of the authorities concerned.

  • PDF

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

The Characteristic of Laws on the Kind of Urban Green Spaces and the Legal Requirements for the Green Spaces of Urban Habitat in China (중국의 도시녹지 종류와 도시거주구 녹지의 설치 기준에 관한 법제도의 현황과 특성)

  • Shin, Ick-Soon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.41 no.3
    • /
    • pp.1-11
    • /
    • 2013
  • This study investigated Chinese Laws on the kind of urban green spaces and the legal requirements for the green spaces of urban habitat and analyzed the specificities of them intending to provide basic data to suggest bringing in or not the relevant Chinese Laws to Korea. This study can be summarized as follows: First, the concept of Chinese urban green spaces(g.s.) classified by 5 kinds(park g.s., production g.s., protection g.s., attachment g.s., the others g.s.) placed the park and green spaces in the same category unlike the Korean urban green spaces that only distinguishes between park and green spaces. The Chinese Urban Park is classified by 4 kinds(composite park, community park, special park, linear park) at the 'Standard for urban green spaces classification' which is below in rank of the legal system. Second, in case of calculation for green spaces ratio of urban green spaces in China, the green rooftop landscaping area should not be included as a green spaces area except the rooftop of a basement or semi basement building to which residents have easy access. The green spaces requirements and compulsory secure ratio by 3 habitat kinds(habitat, small habit, minimum habitat) of when to act as a residential plan is regulated. Third, the green spaces system is obligated to establish at habitat green spaces plan and is specified to conserve and improve existing trees and green spaces. The green spaces ratio on reconstruction for old habitat is relaxed to be lower than for new habitat and a gradient of green spaces is peculiarly clarified. The details and requirements for establishment and the minimum area intending for each classes of the central green spaces(habitat park, children park, minimum habitat's green spaces) are regulated. Especially at a garden style of minimum habitat's green spaces, intervals between the south and north houses and a compulsory security for green spaces area classifying into two groups(closing type green spaces and open type green spaces) by a middle-rise or high-rise building are clarified. System of calculation for green spaces area is presented at a special regulation. Fourth, a general index(area/person) of public green spaces within habitat to achieve by 3 habitat kinds is determined, in this case, the index on reconstruction for a deterioration zone can be relaxed to be lower to the extent of a specified quantity. A location and scale, minimum width and minimum area per place of public green spaces are regulated. A space plot principle including adjacent to a road, greening area ratio against total area, security of open space and the shadow line boundary of sunshine are also regulated to intend for public green spaces. Fifth, the minimum horizontal distance between the underground cables and the surrounding greening trees are regulated as the considerable items for green spaces when setting up the underground cables. The principle to establish green spaces within public service facilities is regulated according to the kind of service contents. It shall be examined in order to import or not the special regulations that only exist in Chinese Laws but not in Korean Laws. The result of this study will contribute to gain the domestic landscape architect's' sympathy of the research related to Chinese urban green spaces laws requiring immediate attention and will be a good chance to advance into the internationalization of Korean Landscape Architectural Laws.

Study on the Patterns of Helicopter Emergency Medical Services in Ullung Island (울릉도 지역의 헬리콥터를 이용한 응급환자 후송 실태)

  • Kim, Tae-Hun;Lim, Hyun-Sul;Lee, Kwan
    • Journal of agricultural medicine and community health
    • /
    • v.27 no.1
    • /
    • pp.115-123
    • /
    • 2002
  • Objective: The aim of this study was to evaluate the patterns of helicopter emergency medical services (HEMS) in Ullung Island. Methods : The authors reviewed the records from emergency room diaries and the lists of helicopter transfers in the Ullung Public Health Medical Center over the 5-year period from Jan 1, 1997 to Dec 31, 2001. Results : One hundred thirteen cases were transferred by helicopters in 88 flights. According to year, the number of flights was 13(14.8%) and the number of cases was 15(13.3%) in 1997; 17(19.3%) and 21(18.6%) in 1998; 18(20.5%) and 20(17.7%) in 1999; 17(19.3%) and 20(17.7%) in 2000; and 23(26.1%) and 37(32.7%) in 2001. According to the kind of helicopter, the number of flights was 46(52.3%) and the number of cases was 60(53.1%) by Maritime police; and 19(21.6%) and 28(25.1%) by 119 rescue. According to time zone, there were no night flights. According to sex and age, there were 75 male cases(66.4%) and 28 cases(28.3%) of patients aged sixty years and over. The number of flights was 11(12.5%) and the number of cases was 15(13.3%) in November; 10 flights(11.4%) and 14 cases(12.4%) in March; and 7 cases(8.0%) in each of September, October and April. The most common season of helicopter transfer cases was autumn. According to transfer area, there were 48 cases (42.5%) in Pohang city, Gyeonsangbukdo; 35(31.0%) in Gangnung city, Gangwondo; and 17(15.0%) in Daegu metropolitan city. According to condition, there were 27 cases(23.9%) of cerebro-vascular accident, 13(11.5%) of fracture and 11(9.7%) of head injury. According to admission department, there were 42 cases(37.2%) in Neurosurgery, 21(18.6%) in Internal Medicine and 13(11.5%) in Orthopedic Surgery. According tothe Korea Standard Classification of Disease(3-KSCD), circulatory systemic disease(IX) and injury, intoxication and others (XIX) were the two most frequent categories with 34 cases(30.1%) each, followed by digestive system disease (XI) with 23 cases(20.4%). Conclusions : HEMS in Ullung Island leave much to be desired. Helicopters cannot make a night flight and are not equipped with medical facilities. HEMS in islands such as Ullung Island are essential. We hope that night flights, equipment-monitoring systems for emergency patients in the helicopters, and a law related to HEMS in the island will all be established.

  • PDF

Present status and prospect for development of mushrooms in Korea

  • Jang, Kab-Yeul;Oh, Youn-Lee;Oh, Minji;Im, Ji-Hoon;Lee, Seul-Ki;Kong, Won-Sik
    • 한국균학회소식:학술대회논문집
    • /
    • 2018.05a
    • /
    • pp.27-27
    • /
    • 2018
  • The production scale of mushroom cultivation in Korea is approximately 600 billion won, which is 1.6% of the Korean gross agricultural output. Annually, ca. 190,000 tons of mushrooms are harvested in Korea. Although the numbers of mushroom farms and cultivators are constantly decreasing, the total mushroom yields are increasing due to the large-scale cultivation facilities and automation. The recent expansion of the well-being trend causes increase in mushroom consumption in Korea: annual per capita consumption of mushroom was 3.9kg ('13) that is a little higher than European's average. Thus the exports of mushrooms, mainly Flammulina velutipes and Pleurotus ostreatus, have been increased since the middle of 2000s. Recently, however, it is slightly reduced. However, Vietnam, Hong Kong, the United States, the Netherlands and continued to export, and the country has increased recently been exported to Australia, Canada, Southeast Asia and so on. Canned foods of Agaricus bisporus was the first exports of the Korean mushroom industry. This business has reached the peak of the sale in 1977-1978. As Korea initiated trade with China in 1980, the international prices of mushrooms were sharply fall that led to shrink the domestic markets. According to the high demand to develop new items to substitute for A. bisporus, oyster mushroom (Pleurotus ostreatus) was received the attention since it seems to suit the taste of Korean consumers. Although log cultivation technique was developed in the early 1970s for oyster mushroom, this method requires a great deal of labor. Thus we developed shelf cultivation technique which is easier to manage and allows the mass production. In this technique, the growing shelf is manly made from fermented rice straw, that is the unique P. ostreatus medium in the world, was used only in South Korea. After then, the use of cotton wastes as an additional material of medium, the productivity. Currently it is developing a standard cultivation techniques and environmental control system that can stably produce mushrooms throughout the year. The increase of oyster mushroom production may activate the domestic market and contribute to the industrial development. In addition, oyster mushroom production technology has a role in forming the basis of the development of bottle cultivation. Developed mushroom cultivation technology using bottles made possible the mass production. In particular, bottle cultivation method using a liquid spawn can be an opportunity to export the F.velutipes and P.eryngii. In addition, the white varieties of F.velutipes were second developed in the world after Japan. We also developed the new A.bisporus cultivar "Sae-ah" that is easy to grown in Korea. To lead the mushroom industry, we will continue to develop the cultivars with an international competitive power and to improve the cultivation techniques. Mushroom research in Korea nowadays focuses on analysis of mushroom genetics in combination with development of new mushroom varieties, mushroom physiology and cultivation. Further studied are environmental factors for cultivation, disease control, development and utilization of mushroom substrate resources, post-harvest management and improvement of marketable traits. Finally, the RDA manages the collection, classification, identification and preservation of mushroom resources. To keep up with the increasing application of biotechnology in agricultural research the genome project of various mushrooms and the draft of the genetic map has just been completed. A broad range of future studies based on this project is anticipated. The mushroom industry in Korea continually grows and its productivity rapidly increases through the development of new mushrooms cultivars and automated plastic bottle cultivation. Consumption of medicinal mushrooms like Ganoderma lucidum and Phellinus linteus is also increasing strongly. Recently, business of edible and medicinal mushrooms was suffering under over-production and problems in distribution. Fortunately, expansion of the mushroom export helped ease the negative effects for the mushroom industry.

  • PDF

Transfer Learning using Multiple ConvNet Layers Activation Features with Principal Component Analysis for Image Classification (전이학습 기반 다중 컨볼류션 신경망 레이어의 활성화 특징과 주성분 분석을 이용한 이미지 분류 방법)

  • Byambajav, Batkhuu;Alikhanov, Jumabek;Fang, Yang;Ko, Seunghyun;Jo, Geun Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.205-225
    • /
    • 2018
  • Convolutional Neural Network (ConvNet) is one class of the powerful Deep Neural Network that can analyze and learn hierarchies of visual features. Originally, first neural network (Neocognitron) was introduced in the 80s. At that time, the neural network was not broadly used in both industry and academic field by cause of large-scale dataset shortage and low computational power. However, after a few decades later in 2012, Krizhevsky made a breakthrough on ILSVRC-12 visual recognition competition using Convolutional Neural Network. That breakthrough revived people interest in the neural network. The success of Convolutional Neural Network is achieved with two main factors. First of them is the emergence of advanced hardware (GPUs) for sufficient parallel computation. Second is the availability of large-scale datasets such as ImageNet (ILSVRC) dataset for training. Unfortunately, many new domains are bottlenecked by these factors. For most domains, it is difficult and requires lots of effort to gather large-scale dataset to train a ConvNet. Moreover, even if we have a large-scale dataset, training ConvNet from scratch is required expensive resource and time-consuming. These two obstacles can be solved by using transfer learning. Transfer learning is a method for transferring the knowledge from a source domain to new domain. There are two major Transfer learning cases. First one is ConvNet as fixed feature extractor, and the second one is Fine-tune the ConvNet on a new dataset. In the first case, using pre-trained ConvNet (such as on ImageNet) to compute feed-forward activations of the image into the ConvNet and extract activation features from specific layers. In the second case, replacing and retraining the ConvNet classifier on the new dataset, then fine-tune the weights of the pre-trained network with the backpropagation. In this paper, we focus on using multiple ConvNet layers as a fixed feature extractor only. However, applying features with high dimensional complexity that is directly extracted from multiple ConvNet layers is still a challenging problem. We observe that features extracted from multiple ConvNet layers address the different characteristics of the image which means better representation could be obtained by finding the optimal combination of multiple ConvNet layers. Based on that observation, we propose to employ multiple ConvNet layer representations for transfer learning instead of a single ConvNet layer representation. Overall, our primary pipeline has three steps. Firstly, images from target task are given as input to ConvNet, then that image will be feed-forwarded into pre-trained AlexNet, and the activation features from three fully connected convolutional layers are extracted. Secondly, activation features of three ConvNet layers are concatenated to obtain multiple ConvNet layers representation because it will gain more information about an image. When three fully connected layer features concatenated, the occurring image representation would have 9192 (4096+4096+1000) dimension features. However, features extracted from multiple ConvNet layers are redundant and noisy since they are extracted from the same ConvNet. Thus, a third step, we will use Principal Component Analysis (PCA) to select salient features before the training phase. When salient features are obtained, the classifier can classify image more accurately, and the performance of transfer learning can be improved. To evaluate proposed method, experiments are conducted in three standard datasets (Caltech-256, VOC07, and SUN397) to compare multiple ConvNet layer representations against single ConvNet layer representation by using PCA for feature selection and dimension reduction. Our experiments demonstrated the importance of feature selection for multiple ConvNet layer representation. Moreover, our proposed approach achieved 75.6% accuracy compared to 73.9% accuracy achieved by FC7 layer on the Caltech-256 dataset, 73.1% accuracy compared to 69.2% accuracy achieved by FC8 layer on the VOC07 dataset, 52.2% accuracy compared to 48.7% accuracy achieved by FC7 layer on the SUN397 dataset. We also showed that our proposed approach achieved superior performance, 2.8%, 2.1% and 3.1% accuracy improvement on Caltech-256, VOC07, and SUN397 dataset respectively compare to existing work.

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

  • Kim, Kilho;Choi, Sangwoo;Chae, Moon-jung;Park, Heewoong;Lee, Jaehong;Park, Jonghun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.163-177
    • /
    • 2019
  • As smartphones are getting widely used, human activity recognition (HAR) tasks for recognizing personal activities of smartphone users with multimodal data have been actively studied recently. The research area is expanding from the recognition of the simple body movement of an individual user to the recognition of low-level behavior and high-level behavior. However, HAR tasks for recognizing interaction behavior with other people, such as whether the user is accompanying or communicating with someone else, have gotten less attention so far. And previous research for recognizing interaction behavior has usually depended on audio, Bluetooth, and Wi-Fi sensors, which are vulnerable to privacy issues and require much time to collect enough data. Whereas physical sensors including accelerometer, magnetic field and gyroscope sensors are less vulnerable to privacy issues and can collect a large amount of data within a short time. In this paper, a method for detecting accompanying status based on deep learning model by only using multimodal physical sensor data, such as an accelerometer, magnetic field and gyroscope, was proposed. The accompanying status was defined as a redefinition of a part of the user interaction behavior, including whether the user is accompanying with an acquaintance at a close distance and the user is actively communicating with the acquaintance. A framework based on convolutional neural networks (CNN) and long short-term memory (LSTM) recurrent networks for classifying accompanying and conversation was proposed. First, a data preprocessing method which consists of time synchronization of multimodal data from different physical sensors, data normalization and sequence data generation was introduced. We applied the nearest interpolation to synchronize the time of collected data from different sensors. Normalization was performed for each x, y, z axis value of the sensor data, and the sequence data was generated according to the sliding window method. Then, the sequence data became the input for CNN, where feature maps representing local dependencies of the original sequence are extracted. The CNN consisted of 3 convolutional layers and did not have a pooling layer to maintain the temporal information of the sequence data. Next, LSTM recurrent networks received the feature maps, learned long-term dependencies from them and extracted features. The LSTM recurrent networks consisted of two layers, each with 128 cells. Finally, the extracted features were used for classification by softmax classifier. The loss function of the model was cross entropy function and the weights of the model were randomly initialized on a normal distribution with an average of 0 and a standard deviation of 0.1. The model was trained using adaptive moment estimation (ADAM) optimization algorithm and the mini batch size was set to 128. We applied dropout to input values of the LSTM recurrent networks to prevent overfitting. The initial learning rate was set to 0.001, and it decreased exponentially by 0.99 at the end of each epoch training. An Android smartphone application was developed and released to collect data. We collected smartphone data for a total of 18 subjects. Using the data, the model classified accompanying and conversation by 98.74% and 98.83% accuracy each. Both the F1 score and accuracy of the model were higher than the F1 score and accuracy of the majority vote classifier, support vector machine, and deep recurrent neural network. In the future research, we will focus on more rigorous multimodal sensor data synchronization methods that minimize the time stamp differences. In addition, we will further study transfer learning method that enables transfer of trained models tailored to the training data to the evaluation data that follows a different distribution. It is expected that a model capable of exhibiting robust recognition performance against changes in data that is not considered in the model learning stage will be obtained.