• Title/Summary/Keyword: Supervised Data

Search Result 660, Processing Time 0.025 seconds

Characterizing the Spatial Distribution of Oak Wilt Disease Using Remote Sensing Data (원격탐사자료를 이용한 참나무시들음병 피해목의 공간분포특성 분석)

  • Cha, Sungeun;Lee, Woo-Kyun;Kim, Moonil;Lee, Sle-Gee;Jo, Hyun-Woo;Choi, Won-Il
    • Journal of Korean Society of Forest Science
    • /
    • v.106 no.3
    • /
    • pp.310-319
    • /
    • 2017
  • This study categorized the damaged trees by Supervised Classification using time-series-aerial photographs of Bukhan, Cheonggae and Suri mountains because oak wilt disease seemed to be concentrated in the metropolitan regions. In order to analyze the spatial characteristics of the damaged areas, the geographical characteristics such as elevation and slope were statistically analyzed to confirm their strong correlation. Based on the results from the statistical analysis of Moran's I, we have retrieved the following: (i) the value of Moran's I in Bukhan mountain is estimated to be 0.25, 0.32, and 0.24 in 2009, 2010 and 2012, respectively. (ii) the value of Moran's I in Cheonggye mountain estimated to be 0.26, 0.32 and 0.22 in 2010, 2012 and 2014, respectively and (iii) the value of Moran's I in Suri mountain estimated to be 0.42 and 0.42 in 2012 and 2014. respectively. These numbers suggest that the damaged trees are distributed in clusters. In addition, we conducted hotspot analysis to identify how the damaged tree clusters shift over time and we were able to verify that hotspots move in time series. According to our research outcome from the analysis of the entire hotspot areas (z-score>1.65), there were 80 percent probability of oak wilt disease occurring in the broadleaf or mixed-stand forests with elevation of 200~400 m and slope of 20~40 degrees. This result indicates that oak wilt disease hotspots can occur or shift into areas with the above geographical features or forest conditions. Therefore, this research outcome can be used as a basic resource when predicting the oak wilt disease spread-patterns, and it can also prevent disease and insect pest related harms to assist the policy makers to better implement the necessary solutions.

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF

Performance of Investment Strategy using Investor-specific Transaction Information and Machine Learning (투자자별 거래정보와 머신러닝을 활용한 투자전략의 성과)

  • Kim, Kyung Mock;Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.65-82
    • /
    • 2021
  • Stock market investors are generally split into foreign investors, institutional investors, and individual investors. Compared to individual investor groups, professional investor groups such as foreign investors have an advantage in information and financial power and, as a result, foreign investors are known to show good investment performance among market participants. The purpose of this study is to propose an investment strategy that combines investor-specific transaction information and machine learning, and to analyze the portfolio investment performance of the proposed model using actual stock price and investor-specific transaction data. The Korea Exchange offers daily information on the volume of purchase and sale of each investor to securities firms. We developed a data collection program in C# programming language using an API provided by Daishin Securities Cybosplus, and collected 151 out of 200 KOSPI stocks with daily opening price, closing price and investor-specific net purchase data from January 2, 2007 to July 31, 2017. The self-organizing map model is an artificial neural network that performs clustering by unsupervised learning and has been introduced by Teuvo Kohonen since 1984. We implement competition among intra-surface artificial neurons, and all connections are non-recursive artificial neural networks that go from bottom to top. It can also be expanded to multiple layers, although many fault layers are commonly used. Linear functions are used by active functions of artificial nerve cells, and learning rules use Instar rules as well as general competitive learning. The core of the backpropagation model is the model that performs classification by supervised learning as an artificial neural network. We grouped and transformed investor-specific transaction volume data to learn backpropagation models through the self-organizing map model of artificial neural networks. As a result of the estimation of verification data through training, the portfolios were rebalanced monthly. For performance analysis, a passive portfolio was designated and the KOSPI 200 and KOSPI index returns for proxies on market returns were also obtained. Performance analysis was conducted using the equally-weighted portfolio return, compound interest rate, annual return, Maximum Draw Down, standard deviation, and Sharpe Ratio. Buy and hold returns of the top 10 market capitalization stocks are designated as a benchmark. Buy and hold strategy is the best strategy under the efficient market hypothesis. The prediction rate of learning data using backpropagation model was significantly high at 96.61%, while the prediction rate of verification data was also relatively high in the results of the 57.1% verification data. The performance evaluation of self-organizing map grouping can be determined as a result of a backpropagation model. This is because if the grouping results of the self-organizing map model had been poor, the learning results of the backpropagation model would have been poor. In this way, the performance assessment of machine learning is judged to be better learned than previous studies. Our portfolio doubled the return on the benchmark and performed better than the market returns on the KOSPI and KOSPI 200 indexes. In contrast to the benchmark, the MDD and standard deviation for portfolio risk indicators also showed better results. The Sharpe Ratio performed higher than benchmarks and stock market indexes. Through this, we presented the direction of portfolio composition program using machine learning and investor-specific transaction information and showed that it can be used to develop programs for real stock investment. The return is the result of monthly portfolio composition and asset rebalancing to the same proportion. Better outcomes are predicted when forming a monthly portfolio if the system is enforced by rebalancing the suggested stocks continuously without selling and re-buying it. Therefore, real transactions appear to be relevant.

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

  • Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.1-23
    • /
    • 2018
  • From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.

Status of Industrial Environments of Some Industries in Taegu Kyungpook Area (대구지방 산업장에 있어서 건강장애요인과 작업환경검사에 대한 기업인의 수용태도 (ll))

  • Kim, Du-Hui;Seong, Su-Won
    • 월간산업보건
    • /
    • s.8
    • /
    • pp.4-30
    • /
    • 1988
  • Examination of working environments was conducted to get more detailed data about harmful working environments and to make a contribution to more effective management. Study was carried out on 722 factories located in Taegu city and eight counties in Kyungpook Province, Korea, for a period of one year, from February 1 to December 30, 1986. The total number and proportion of workers exposed to harmful material was 37,697, 45.2% among 83,368 workers. The results according to exposed material were as follows: 1. In the case of noise, proportion of exceeding the 8-hour TLV was 59%, Included were nail-cutting in assembly metal manufacturing industry and weaving process in textile. 2. Dust in mill process of coal manufacturing industries exceeded the TLV of second class of dust at all parts and exceeded the TLV at 6% as a whole.: 3. The fields of industry lower than 70 lux of illumination were storage equipment of food, auto-winder of textile, painting of wood wares and coal mixing, and 44% of all cases was lower than standard. 4. As a result of temperature index investigation(WBGT), about 12% of all sujects exceeded limit value. Included parts were rolling machine and reducing room. 5. In the case of organic solvents, TLV was exceeded at about 8%, The parts exceeded TLV according to materials belonged to this category were as follows. 1) Toluene: adhesive work in assembly metal manufacturing 2) Xylene: printing and paint mixing in chemical manufacturing 3) Methyl ethytl ketone: paint mixing in all parts examined and coating machine partially in chemical manufacturing 4) Methyl isobutyl ketone: printing in chemical manufacturing 5) Acetone: vapor polishing in assembly metal manufacturing 6. Among specified chemical materials, the concentration of HC1 in the air in metal assembly manufacturing factory exceeded TLV. in one of three assembly metal manufacturing examined. Others, such as benzene, acetic acid, formic acid, sodium hydroxide, formalin, ammonia, copper, chromate etc. were lower than TLV in its indoor atmospheric concentration. As a whole, the proportion of exceeding TLV was about 0.8% 7. The concentrations of inorganic lead were lower than TLV in all parts examined. The results of this investigation show the fact that current management of working environments is not satisfactory, and so more active management is needed.

  • PDF

Studies on the Processing Factors of Pesticide in Dried Carrot from Field Trial and Dipping Test (포장 및 침지실험 당근의 건조에 의한 농약 가공계수 산출 연구)

  • Park, Kun-Sang;Suh, Jung-Hyuck;Choi, Jeong-Heui;Kim, Sun-Gu;Lee, Hyo-Ku;Shim, Jae-Han
    • The Korean Journal of Pesticide Science
    • /
    • v.13 no.4
    • /
    • pp.209-215
    • /
    • 2009
  • This study was performed to produce the processing factors of pesticides in dried carrot. It is essential data for establishing the maximum residue limits (MRLs) of pesticides in dried carrot. The target pesticides were azinphos-methyl, chlorpyrifos, captan, endosulfan and triclorfon. These pesticides are included Korea's MRL list in carrot and USA's MRL list in dried foods. To infiltrate these pesticides up to each MRL level in carrot, the dipping test was performed in laboratory. Also, the supervised residue trial of the pesticide for carrot was conducted in the green house to recognize the field trial's tendency. In the dipping test in laboratory (including drying examination), the processing factors of the carrot at various concentrations and temperatures could be evaluated. In field test, the processing factors were 5.9 for azinphos-methyl, 1.7 for captan, 7.6 for chlorpyrifos, 6 for endosulfan, 0 for trichlorfon, respectively. The dipping test in laboratory on various kinds of conditions showed more precise processing factors than field trial. The processing factors obtained from the dipping test of carrot were 0~4.7 at the various concentration of the pesticides, and 0~6.7 at various drying temperature. The lower level processing factors were 0~0.6 for trichlorfon and the higher level were 3.0~5.8 for chlorpyrifos. The highest processing factor was 9.1 for captan.

Proposal of Security Orchestration Service Model based on Cyber Security Framework (사이버보안 프레임워크 기반의 보안 오케스트레이션 서비스 모델 제안)

  • Lee, Se-Ho;Jo, In-June
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.618-628
    • /
    • 2020
  • The purpose of this paper is to propose a new security orchestration service model by combining various security solutions that have been introduced and operated individually as a basis for cyber security framework. At present, in order to respond to various and intelligent cyber attacks, various single security devices and SIEM and AI solutions that integrate and manage them have been built. In addition, a cyber security framework and a security control center were opened for systematic prevention and response. However, due to the document-oriented cybersecurity framework and limited security personnel, the reality is that it is difficult to escape from the control form of fragmentary infringement response of important detection events of TMS / IPS. To improve these problems, based on the model of this paper, select the targets to be protected through work characteristics and vulnerable asset identification, and then collect logs with SIEM. Based on asset information, we established proactive methods and three detection strategies through threat information. AI and SIEM are used to quickly determine whether an attack has occurred, and an automatic blocking function is linked to the firewall and IPS. In addition, through the automatic learning of TMS / IPS detection events through machine learning supervised learning, we improved the efficiency of control work and established a threat hunting work system centered on big data analysis through machine learning unsupervised learning results.

3D Quantitative Analysis of Cell Nuclei Based on Digital Image Cytometry (디지털 영상 세포 측정법에 기반한 세포핵의 3차원 정량적 분석)

  • Kim, Tae-Yun;Choi, Hyun-Ju;Choi, Heung-Kook
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.7
    • /
    • pp.846-855
    • /
    • 2007
  • Significant feature extraction in cancer cell image analysis is an important process for grading cell carcinoma. In this study, we propose a method for 3D quantitative analysis of cell nuclei based upon digital image cytometry. First, we acquired volumetric renal cell carcinoma data for each grade using confocal laser scanning microscopy and segmented cell nuclei employing color features based upon a supervised teaming scheme. For 3D visualization, we used a contour-based method for surface rendering and a 3D texture mapping method for volume rendering. We then defined and extracted the 3D morphological features of cell nuclei. To evaluate what quantitative features of 3D analysis could contribute to diagnostic information, we analyzed the statistical significance of the extracted 3D features in each grade using an analysis of variance (ANOVA). Finally, we compared the 2D with the 3D features of cell nuclei and analyzed the correlations between them. We found statistically significant correlations between nuclear grade and 3D morphological features. The proposed method has potential for use as fundamental research in developing a new nuclear grading system for accurate diagnosis and prediction of prognosis.

  • PDF

Detection of Clavibacter michiganensis subsp. michiganensis Assisted by Micro-Raman Spectroscopy under Laboratory Conditions

  • Perez, Moises Roberto Vallejo;Contreras, Hugo Ricardo Navarro;Herrera, Jesus A. Sosa;Avila, Jose Pablo Lara;Tobias, Hugo Magdaleno Ramirez;Martinez, Fernando Diaz-Barriga;Ramirez, Rogelio Flores;Vazquez, Angel Gabriel Rodriguez
    • The Plant Pathology Journal
    • /
    • v.34 no.5
    • /
    • pp.381-392
    • /
    • 2018
  • Clavibacter michiganensis subsp. michiganesis (Cmm) is a quarantine-worthy pest in $M{\acute{e}}xico$. The implementation and validation of new technologies is necessary to reduce the time for bacterial detection in laboratory conditions and Raman spectroscopy is an ambitious technology that has all of the features needed to characterize and identify bacteria. Under controlled conditions a contagion process was induced with Cmm, the disease epidemiology was monitored. Micro-Raman spectroscopy ($532nm\;{\lambda}$ laser) technique was evaluated its performance at assisting on Cmm detection through its characteristic Raman spectrum fingerprint. Our experiment was conducted with tomato plants in a completely randomized block experimental design (13 plants ${\times}$ 4 rows). The Cmm infection was confirmed by 16S rDNA and plants showed symptoms from 48 to 72 h after inoculation, the evolution of the incidence and severity on plant population varied over time and it kept an aggregated spatial pattern. The contagion process reached 79% just 24 days after the epidemic was induced. Micro-Raman spectroscopy proved its speed, efficiency and usefulness as a non-destructive method for the preliminary detection of Cmm. Carotenoid specific bands with wavelengths at 1146 and $1510cm^{-1}$ were the distinguishable markers. Chemometric analyses showed the best performance by the implementation of PCA-LDA supervised classification algorithms applied over Raman spectrum data with 100% of performance in metrics of classifiers (sensitivity, specificity, accuracy, negative and positive predictive value) that allowed us to differentiate Cmm from other endophytic bacteria (Bacillus and Pantoea). The unsupervised KMeans algorithm showed good performance (100, 96, 98, 91 y 100%, respectively).

A neural-based predictive model of the compressive strength of waste LCD glass concrete

  • Kao, Chih-Han;Wang, Chien-Chih;Wang, Her-Yung
    • Computers and Concrete
    • /
    • v.19 no.5
    • /
    • pp.457-465
    • /
    • 2017
  • The Taiwanese liquid crystal display (LCD) industry has traditionally produced a huge amount of waste glass that is placed in landfills. Waste glass recycling can reduce the material costs of concrete and promote sustainable environmental protection activities. Concrete is always utilized as structural material; thus, the concrete compressive strength with a variety of mixtures must be studied using predictive models to achieve more precise results. To create an efficient waste LCD glass concrete (WLGC) design proportion, the related studies utilized a multivariable regression analysis to develop a compressive strength waste LCD glass concrete equation. The mix design proportion for waste LCD glass and the compressive strength relationship is complex and nonlinear. This results in a prediction weakness for the multivariable regression model during the initial growing phase of the compressive strength of waste LCD glass concrete. Thus, the R ratio for the predictive multivariable regression model is 0.96. Neural networks (NN) have a superior ability to handle nonlinear relationships between multiple variables by incorporating supervised learning. This study developed a multivariable prediction model for the determination of waste LCD glass concrete compressive strength by analyzing a series of laboratory test results and utilizing a neural network algorithm that was obtained in a related prior study. The current study also trained the prediction model for the compressive strength of waste LCD glass by calculating the effects of several types of factor combinations, such as the different number of input variables and the relevant filter for input variables. These types of factor combinations have been adjusted to enhance the predictive ability based on the training mechanism of the NN and the characteristics of waste LCD glass concrete. The selection priority of the input variable strategy is that evaluating relevance is better than adding dimensions for the NN prediction of the compressive strength of WLGC. The prediction ability of the model is examined using test results from the same data pool. The R ratio was determined to be approximately 0.996. Using the appropriate input variables from neural networks, the model validation results indicated that the model prediction attains greater accuracy than the multivariable regression model during the initial growing phase of compressive strength. Therefore, the neural-based predictive model for compressive strength promotes the application of waste LCD glass concrete.