• Title/Summary/Keyword: FOREST CLASSIFICATION

Search Result 1,049, Processing Time 0.032 seconds

Critiques of 'The Endangered and Protected Wild Species List in Korea' Proposed by Korea Ministry of Environment and Listing Process - Is This the Best Process for the Current National Management of Endangered Wildlife and Plants in Korea? - (2011년 환경부 멸종위기종 등록절차 및 대상 멸종위기종 식물 목록 재고-과연 현재 국가 멸종위기종 관리가 최선의 방안인가? -)

  • Kim, Hui;Lee, Byong Cheon;Kim, Yong Shik;Chang, Chin-Sung
    • Journal of Korean Society of Forest Science
    • /
    • v.101 no.1
    • /
    • pp.7-19
    • /
    • 2012
  • After having announced legislation for threatened or endangered species on the List of Endangered and Threatened Wildlife and Plants in 2005, the Korea Ministry of Environment proposed (in June 2011) amending the list, thereby delisting or reclassifying endangered species using new quantitative criteria for two levels (I and II), as well as status reviews. The new legislation included 40 species remained in their original endangered status, but 19 species were delisted, 5 species were proposed as candidates for delisting, 29 species were given a new endangered listing, and 3 species were proposed for an endangered listing in Korea. We assessed the threatened status of 98 plants using the IUCN Red List Criteria (version 3.1) at the global level, and compared the Ministry's revised criteria with the IUCN Red List Criteria and ESA criteria used in the USA. Most species proposed by the Ministry do not qualify as threatened and one of the major difficulties found in applying IUCN Red List Criteria at the global scale was a lack of knowledge on the status of species at broader geographic scales and the perceived difficulty this causes. Under the current classification process, many endangered species, such as Abeliophyllum distichum, Leontice microrhyncha, Echinosophora koreensis, Leontopodium coreanum, Iris odaesanensis, and Corylopsis coreana at global level were excluded here. Knowledge gaps and uncertainties mean that the number of taxa at high risk of extinction may be substantially greater than is currently understood. Due to a lack of information on its taxonomic status, currently there is controversy over the Red List status of Physocarpus insularis. Also, Caragana koreana, which was an invalidly published name, should be excluded here. Although the Korea Ministry of Environment insisted this procedure was conducted by applying the modified IUCN threat categories and definitions, this evaluation has been carried out based only on subjective views and misapplication of the IUCN Red List Criteria. The current listings by the Korea Ministry of Environment should be challenged. We suggest that broad species concepts on endemic species are applied and also criteria that adequately address the proper quantitative knowledge should be used. It is suggested that the highest priorities for the Red List should be given to endemic species at least in the Korean peninsula first at global scale.

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

Comparative Study on the Carbon Stock Changes Measurement Methodologies of Perennial Woody Crops-focusing on Overseas Cases (다년생 목본작물의 탄소축적 변화량 산정방법론 비교 연구-해외사례를 중심으로)

  • Hae-In Lee;Yong-Ju Lee;Kyeong-Hak Lee;Chang-Bae Lee
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.4
    • /
    • pp.258-266
    • /
    • 2023
  • This study analyzed methodologies for estimating carbon stocks of perennial woody crops and the research cases in overseas countries. As a result, we found that Australia, Bulgaria, Canada, and Japan are using the stock-difference method, while Austria, Denmark, and Germany are estimating the change in the carbon stock based on the gain-loss method. In some overseas countries, the researches were conducted on estimating the carbon stock change using image data as tier 3 phase beyond the research developing country-specific factors as tier 2 phase. In South Korea, convergence studies as the third stage were conducted in forestry field, but advanced research in the agricultural field is at the beginning stage. Based on these results, we suggest directions for the following four future researches: 1) securing national-specific factors related to emissions and removals in the agricultural field through the development of allometric equation and carbon conversion factors for perennial woody crops to improve the completeness of emission and removals statistics, 2) implementing policy studies on the cultivation area calculation refinement with fruit tree-biomass-based maturity, 3) developing a more advanced estimation technique for perennial woody crops in the agricultural sector using allometric equation and remote sensing techniques based on the agricultural and forestry satellite scheduled to be launched in 2025, and to establish a matrix and monitoring system for perennial woody crop cultivation areas in the agricultural sector, Lastly, 4) estimating soil carbon stocks change, which is currently estimated by treating all agricultural areas as one, by sub-land classification to implement a dynamic carbon cycle model. This study suggests a detailed guideline and advanced methods of carbon stock change calculation for perennial woody crops, which supports 2050 Carbon Neutral Strategy of Ministry of Agriculture, Food, and Rural Affairs and activate related research in agricultural sector.

Analysis on the Relation between the Morphological Physical and Chemical Properties of Forest Soils and the Growth of the Pinus koraiensis Sieb. et Zucc. and Larix leptolepis Gord by Quantification (수량화(數量化)에 의(依)한 우리나라 삼림토양(森林土壤)의 형태학적(形態学的) 및 이화학적(理化学的) 성질(性質)과 잣나무 및 낙엽송(落葉松)의 생장(生長) 상관분석(相關分析))

  • Chung, In Koo
    • Journal of Korean Society of Forest Science
    • /
    • v.53 no.1
    • /
    • pp.1-26
    • /
    • 1981
  • 1. Aiming at supply of basic informations on tree species siting and forest fertilization by understanding of soil properties that are demanded by each tree species through studies of forest soil's morphological, physical and chemical properties in relation to tree growth in our country, the necessary data have been collected in the last 10 years, are quantified according to quantification theory and are analyzed in sccordance with multi-variate analysis. 2. Test species, japanese larch (Larix leptolepis Gord) and the Korean white pine, (pinus koraiensis S et Z.) are plantable in extensive areas from mid to north in the temperate forest zone and are the two most recommended reforestation tree species in Korea. However, their respective site demands are little known and they have been in confusion or considered demanding the same site during reforestation. When the Korean white pine is planted in larch sites, it has shown relatively good growth, but, when Japanese larch is planted in Korean white pine site it can be hardly said that the Japanese Larch growth is good. To understand on such a difference soil factors have been studied so as to see how th soil's morphological, physical and chemical factors affect tree growth helped with the electronic computer. 3. All the stands examined are man-made mature forests. From 294 Japanese larch plots and 259 Korean white pine plots dominant trees are cut as samples and through stem analysis site index is determined. For each site index soil profiles are made in the related forest-land for analysis. Soil samples are taken from each profile horizon and forest-land productivity classification tables are worked out through physical and chemical analyses of the soil samples for each tree species for the study of relationships between physical, chemical and the combined physical/properties of soil and tree growth. 4. In the study of relationships between physical properties of soil and tree growth it is found out that Japanese larch growth is influenced by the following factors in the decreasing order of weight deposit form, soil depth, soil moisture, altitude, relief, soil type, depth a A-horizon, soil consistency, content of organic matter, soil texture, bed rock, gravel content, aspect and slope. For the Korean white pine the influencing factors' order is soil type, soil consistency, bed rock, aspect, depth of A-horizon, soil moisture, altitude, relief, deposit form, soil depth, soil texture, gravel content and slope. 5. In the study of relationships between chemical properties of soil and tree growth it is found out that Japanese larch growth is influenced by the following factors in the order of base saturation, organic matter, CaO, C/N ratio, effective $P_2O_5$, PH, exchangeable, $K_2O$, T-N, MgO, CEC, Total Base and Na. For the Korean white pine the influencing factors' order is effective $P_2O_5$, Total Base, T-N, Na, C/N ratio, PH, CaO, base saturation, organic matter, exchangeable $K_2O$, CEC and MgO. 6. In the study of relationships between the combined physical and chemical properties of soil and tree growth it is found out that Japanese larch growth is influenced by the following factors in the order of soil depth, deposit form, soil moisture, PH, relief, soil type altitude, T-N, soil consistency, effective $P_2O_5$, soil texture, depth of A-horizon, Total Base, exchangeable $K_2O$ and base saturation. For the Korean white pine the influencing factors' order is soil type, soil consistency, aspect, effective $P_2O_5$, depth of A-horizon, exchangeable $K_2O$, soil moisture, Total Base, altitude, soil depth, base saturation, relief, T-N, C/N ratio and deposit form. 7. In the multiple correlation of forest soil's physical properties larch's correlation coefficient for Japanese Larch is 0.9272 and for Korean white pine, 0.8996. With chemical properties larch has 0.7474 and Korean white pine has 0.7365. So, the soil's physical properties are found out more closely related with tree growth than chemical properties. However, this seems due to inadequate expression of soil's chemical factors and it is proved that the chemical properities are not less important than the physical properties. In the multiple correlation of the combined physical and chemical properties consisting of important morphological and physical factors as well as chemical factors of forest soils larch's multiple correlation coefficient is found out to be 0.9434 and for Korean white pine it is 0.9103 leading to the highest correlation. 8. As shown in the partial correlation coefficients Japanese larch needs deeper soil depth than Korean white pine and in the deposit form of colluvial and creeping soils are demanded by the larch. Moderately moist to not moist should be soil moisture and PH should be from 5.5 to 6.1 for the larch. Demands of T-N, soil texture and soil nutrients are higher for the larch than the Korean white pine. Thus, soil depth, deposit form, relief, soil moisture, PH, N, altitude and soil texture are good indicators for species sitings with larch and the Korean white pine while soil type and soil consistency are indicative only limitedly of species sitings due to their wide variations as plantation environments. For the larch siting soil depth, deposit form, relief, soil moisture, pH, soil type, N and soil texture are indicators of good growth and for the Korean white pine they are soil type, soil consistency, effective $P_2O_5$ and exchangeable $K_2O$. In soil nutrients larch has been found out demanding more than the Korean white pine except $K_2O$, which is demanded more by the Korean white pine than Japanese larch generally. 9. Physical properties of soil has been known as affecting tree growth to the greatest extent so far. However, as a result of this study it is proved through computer analysis that chemical properties of soil are not less important factors for tree growth than chemical properties and site demands for the Japanese larch and the Korean white pine that have been uncertain so far could be clarified.

  • PDF

Evaluation of Grade-Classification of Wood Waste in Korea by Characteristic Analysis (국내 폐목재 특성분석을 통한 등급화 평가)

  • Kim, Joung-Dae;Park, Joon-Seok;Do, In-Hwan;Hong, Soo-Youl;Oh, Gil-Jong;Chung, David;Yoon, Jung-In;Phae, Chae-Gun
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.30 no.11
    • /
    • pp.1102-1110
    • /
    • 2008
  • This research was performed to analyze the characteristics of wood wastes from origin and to suggest grade-classification for them. Korean proximate analysis was conducted, and heating value, heavy metals and Cl concentrations were analyzed for gradeclassification. Wood wastes were sampled from forest, living, construction and demolition, and industrial areas with origin. Moisture content of most wood wastes was ranged in 5$\sim$10%. VS (volatile solids) and ash contents of them showed > 95% and < 5%, respectively. Most wood wastes except wood for growing mushroom permitted the standard (low heating value $\geq$ 3,500 kcal/kg) for refusederived fuel. CCA (Cr, Cu, As) concentration of wood wastes used in bench, wasted fishing boat, and railroad crosstie was higher than that of the other ones. Cl content showed approximately 1.3% in wood box for fish and $\leq$ 0.2% in the other wood wastes. Cl content of all wood wasted used in this research permitted the standard (Cl $\leq$ 0.2%, dry weight basis) for refuse-derived fuel. If the wood wastes were classified in 3-grade, plywoods would be in 2nd grade, and MDF (medium density fiber), wooden bench, painted electric wire drum, wasted fishing boat, and railroad crosstie be in 3rd grade.

Genesis and Classification of the Red-Yellow Soils derived from Residuum on Acidic and Intermediate Rocks -II. Songjeong series (산성암(酸性岩) 및 중성암(中性岩)의 잔적층(殘積層)에 발달(發達)한 적황색토(赤黃色土)의 생성(生成) 및 분류(分類) -제(第)II보(報) 송정통(松汀統)에 관(關)하여)

  • Um, Ki Tae
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.6 no.2
    • /
    • pp.75-81
    • /
    • 1973
  • The morphological, physical, and chemical properties of Sonjeong series derived from acidic crystalline rocks are presented. Also it deals with the genesis and classification of the Songjeong series. Morphologically these soils have brown to dark brown loam A horizons and yellowish red to red clay loam Bt horizons with moderate, medium subangular blocky structure and thin patchy clay cutans on the ped faces. C horizons are very deep, yellowish red to yellowish brown fine sandy loam or sandy loam with original rock structure. Physically distribution of particle size indicates that clay increases with depth up to argillic horizons but below the argillic horizons clay content decrease. The moisture holding capacity is fairly good in Songjeong soils. Chemically soil reaction is strongly to very strongly acid throughout the profile and content of organic matter is less than 1 per cent except A horizons. Cation exchange capacity ranges from 5 to 9 me/100g of soils and base saturation is less than 35 per cent throughout the profile. The natural fertility of Songjeong soils are usually low. It needs lime, organic matter, and heavy application of fertilizer for the crop land. These soils occur temperate and humid climate under coniferous, deciduous, and mixed forest vegetation. Songjeong soils are classified as Red-Yellow Soils. Characteristically Songjeong soils are similar to Red-Yellow Podzolic soils in the United States but lack of A2 horizons and are quite liket Red-Yellow Soils of the Japan. According to new classification system which is 7th approximation of USDA Songjeong soils can be classified as fine loamy, mesic family of Typic Hapludults and in the FAO/UNESCO project World Soil Map as Orthic Acrisols.

  • PDF

Vegetation classification based on remote sensing data for river management (하천 관리를 위한 원격탐사 자료 기반 식생 분류 기법)

  • Lee, Chanjoo;Rogers, Christine;Geerling, Gertjan;Pennin, Ellis
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.6-7
    • /
    • 2021
  • Vegetation development in rivers is one of the important issues not only in academic fields such as geomorphology, ecology, hydraulics, etc., but also in river management practices. The problem of river vegetation is directly connected to the harmony of conflicting values of flood management and ecosystem conservation. In Korea, since the 2000s, the issue of river vegetation and land formation has been continuously raised under various conditions, such as the regulating rivers downstream of the dams, the small eutrophicated tributary rivers, and the floodplain sites for the four major river projects. In this background, this study proposes a method for classifying the distribution of vegetation in rivers based on remote sensing data, and presents the results of applying this to the Naeseong Stream. The Naeseong Stream is a representative example of the river landscape that has changed due to vegetation development from 2014 to the latest. The remote sensing data used in the study are images of Sentinel 1 and 2 satellites, which is operated by the European Aerospace Administration (ESA), and provided by Google Earth Engine. For the ground truth, manually classified dataset on the surface of the Naeseong Stream in 2016 were used, where the area is divided into eight types including water, sand and herbaceous and woody vegetation. The classification method used a random forest classification technique, one of the machine learning algorithms. 1,000 samples were extracted from 10 pre-selected polygon regions, each half of them were used as training and verification data. The accuracy based on the verification data was found to be 82~85%. The model established through training was also applied to images from 2016 to 2020, and the process of changes in vegetation zones according to the year was presented. The technical limitations and improvement measures of this paper were considered. By providing quantitative information of the vegetation distribution, this technique is expected to be useful in practical management of vegetation such as thinning and rejuvenation of river vegetation as well as technical fields such as flood level calculation and flow-vegetation coupled modeling in rivers.

  • PDF

An Analysis of Morphological Variation in Abies koreana Wilson and A. nephrolepis (Traut.) Maxim. of Korea (Pinaceae) and Their Phylogenetic Problems (한국산(韓國産) 분비나무와 구상나무의 형질분석(形質分析)과 종간유연관계(種間類緣關係))

  • Chang, Chin-Sung;Jeon, Jeong Ill;Hyun, Jung Oh
    • Journal of Korean Society of Forest Science
    • /
    • v.86 no.3
    • /
    • pp.378-390
    • /
    • 1997
  • Ten total populations of Korean fir (Abies koreana Wilson) and Manshurian fir [A. nephrolepis (Traut.) Maxim.] were sampled from south Korea to investigate patterns of intraspecific variation in these species and to evaluate a recognition of the two species. Principal components analysis and cluster analysis were performed both on seed-cone data and on needle morphology data. The characters that contributed most to the separation between A. koreana and A. nephrolepis along three principal components axis were leaf width, length of seed, width of seed wing, length of seed wing, cone width, width of scale, and length of bract tip, but these characters were not diagnostic because of overlap in reality. Therefore, all these characters were not reliable in distinguishing these two taxa including bract position (exerted and recurved vs. exerted and straight). The individuals of A. koreana from Mt. Chi-ri appeared quite unique probably on account of its larger cone size and longer scale tip, while those from Mt. Hal-la of A. koreana were generally distinct from others in terms of their larger seed and seed wing and longer scale width. The Mt. Duk-yu specimens of A. korecana appeared somewhat smaller but more data were needed due to the small sampling size. Generally, the gradual clinal geographic trends made evident by the position of resin ducts in leaves of A. koreana can be detected. The southern populations, Mt. Hal-la (an insular population) were generally distinct from the northern populations (Mt. Chi-ri, Mt. Ga-ya and Mt. Duk-yu) in terms of their position of resin duct (medial, within mesophyll vs marginal, close to epidermis : 100% vs 75 or 50%). Although no sharp boundary separating these two species could be detected based on cone and needle morphology, the observed clinal pattern was distinct in northern populations of A. koreana and southern population of A. nephrnlepis. In a preceding study of the flavonoids variation of 20 species in eastern Asia, flavanone (5-deoxyflavanone) was found to be characteristic of A. faxoniana Rehder et Wilson, A. georgei Orr of China and A. koreana of Korea. A. faxoniana, which is assumed to be primitive species, has position of resin duct relative to both the medial and the marginal, while A. georgei and A. koreana are identified by marginal position of resin duct. With respect of foliar flavonoids chemistry, A. koreana was distinct from A. nephrolepis : the southmost samples (Mt. Hal-la and Mt. Chi-ri) contained additional flavonoids derivatives (mainly flavanone) that were not found in the northmost samples of A. nephrolepis except a few individuals from Mts. Seo-rak and Tae-bak populations of Kwang-won province. The presence of A. koreana type flavonoids in two Chinese species suggested that position of resin duct may be a phyletic character. Abies koreana including two Chinese taxa, exhibited the most elaborate and specialized flavonoids profile within the Abies in eastern Asia. Contrary to our initial expectations, the apparent intermediates between A. nephrolepis and A. koreana in Duk-yu and Ga-ya mountains were found. The pattern of variation on position of resin duct and flavonoids chemistry in these populations of A. kareana suggested that genetic interchange or natural hybridization had occurred between these two species. The evidence needed to resolve the status of this taxon is still inconclusive in our opinion until intermediate individuals from Mts. Duk-yu and Ga-ya show indication of hybridization between the two species.

  • PDF

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.