• Title/Summary/Keyword: 모의 정확도 향상

Search Result 741, Processing Time 0.03 seconds

A Study of the Adjustment and Treatment Depending on the Change of Prostate Location Using DIPS in Proton Beam Therapy for Prostate Gland in which a Fiducial Gold Marker was Inserted (Fiducial Gold Marker가 삽입된 전립선암 양성자 치료 시 Digital Image Positioning System (DIPS)을 이용한 전립선의 위치변화에 따른 보정에 관한 연구)

  • Park, June-Ki;Kim, Sun-Young;Kim, Tae-Yoon;Choi, Kye-Sook;Yeom, Doo-Suk;Kang, Dong-Yoon;Choi, Seung-O;Park, Ji-Youn
    • The Journal of Korean Society for Radiation Therapy
    • /
    • v.20 no.1
    • /
    • pp.25-29
    • /
    • 2008
  • Purpose: To monitor the changes of location of prostate gland using DIPS and to examine the adjustment and proton beam therapy depending on the movement of prostate gland in proton beam therapy for prostate gland in which a fiducial gold marker was inserted. Materials and Methods: This study was conducted in ten patients with prostate cancer who received proton beam therapy since April of 2008. To monitor the change of prostate location, three fiducial gold markers were inserted prior to the treatment. To minimize the movement of prostate gland, patients were recommended to urinate prior to the treatment, to intake a certain amount of water and to concomitantly undergo rectal balloon. In these patients, the set-up position was identical to that for a CT-simulation. The PA (posterior-anterior) and lateral images were obtained using both DIPS (digital image positioning system) and a plain radiography, and they were compared between the two imaging modalities. Thus, the changes of the location of fiducial gold marker were assessed based on three coordinates (x, y, z) and then adjusted. This was followed by proton beam therapy. Results: Images which were taken using a plain radiography were compared with those which were taken using DIPS. In ten patients, according to a reference bony marker, the mean changes of the location of fiducial gold marker based on an iso-center were X-axis: $\pm$0.116 cm, Y-axis: $\pm$0.19 cm and Z-axis: $\pm$0.176 cm. These ten patients showed a difference in the changes of location of prostate gland and it ranged between RT: 0.04 cm and RT: 0.24 cm on the X-axis; between Inf: 0.03 cm and Sup: 0.42 cm on the Y-axis; and Post: 0.05 cm and Ant: 0.35 cm on the Z-axis. Conclusion: To minimize the movement of prostate gland, as the pre-treatment prior to the treatment. In all the patients, however, three fiducial gold markers showed a daily variation which were inserted in the prostate gland. Based on the above data, Thus, the requirement of gold marker matching system depending on the daily variation in the proton beam therapy for which more accurate establishment of target was confirmed. It is assumed that an accurate effect of proton beam therapy would be enhanced by adjusting the target-center depending on the location change of prostate gland using DIPS which was used in the current study.

  • PDF

Comparison of Breeding Value by Establishment of Genomic Relationship Matrix in Pure Landrace Population (유전체 관계행렬 구성에 따른 Landrace 순종돈의 육종가 비교)

  • Lee, Joon-Ho;Cho, Kwang-Hyun;Cho, Chung-Il;Park, Kyung-Do;Lee, Deuk Hwan
    • Journal of Animal Science and Technology
    • /
    • v.55 no.3
    • /
    • pp.165-171
    • /
    • 2013
  • Genomic relationship matrix (GRM) was constructed using whole genome SNP markers of swine and genomic breeding value was estimated by substitution of the numerator relationship matrix (NRM) based on pedigree information to GRM. Genotypes of 40,706 SNP markers from 448 pure Landrace pigs were used in this study and five kinds of GRM construction methods, G05, GMF, GOF, $GOF^*$ and GN, were compared with each other and with NRM. Coefficients of GOF considering each of observed allele frequencies showed the lowest deviation with coefficients of NRM and as coefficients of GMF considering the average minor allele frequency showed huge deviation from coefficients of NRM, movement of mean was expected by methods of allele frequency consideration. All GRM construction methods, except for $GOF^*$, showed normally distributed Mendelian sampling. As the result of breeding value (BV) estimation for days to 90 kg (D90KG) and average back-fat thickness (ABF) using NRM and GRM, correlation between BV of NRM and GRM was the highest by GOF and as genetic variance was overestimated by $GOF^*$, it was confirmed that scale of GRM is closely related with estimation of genetic variance. With the same amount of phenotype information, accuracy of BV based on genomic information was higher than BV based on pedigree information and these symptoms were more obvious for ABF then D90KG. Genetic evaluation of animal using relationship matrix by genomic information could be useful when there is lack of phenotype or relationship and prediction of BV for young animals without phenotype.

e-Navigation 관련 산업현황에 관한 기초연구

  • Choe, Han-Gyu;Gang, Byeong-Jae
    • 선박안전기술공단연구보고서
    • /
    • s.4
    • /
    • pp.1-108
    • /
    • 2007
  • 2007. 7. 23 IMO의 NAV(항해안전전문위원회)53차 회의에서는 e-Navigation을 해상에서의 안전, 보안, 해양환경보호를 목적으로 전자적인 수단에 의해 선박과 육상에서 해양정보를 수집, 교환, 표시함으로써 항구와 항구간의 항해 및 관련된 서비스를 향상시키는 것으로 정의하고 있다.2005년 11월 영국의 교통부 장관 Stephen 박사는 Royal Institute ofNavigation에서의 연설에서 해상안전과 환경보호를 위하여 선박의 항해를 감시하는 관제소 및 항행하는 선박에 유용하고 정확한 정보가 더 많이 필요함을 역설하였다. 그리고 첨단 기술에 의해 자동화된 항공 항법분야를 예로들면서, 선박의 항법 분야도 항해와 관련된 모든 시설 및 작업을 전자적 수단으로 대체하는 개념인 e-Navigation으로 전환되어야 하며 영국은 이에 필요한 작업을 주도하겠다는 의견을 피력하였다. Stephen은 e-Navigation 도입으로 얻을 수 있는 이익으로 첫째, 항해 실수로 인한 사고 확률저감, 둘째,사고 발생 시 인명 구조 및 피해 확산을 위한 효율적 대응, 셋째, 전통적인항해시설 설치 불필요로 인한 비용 저감, 넷째 선박입출항 수속의 간편화 및항로의 효율적 운용으로 인한 상업적 이익 등을 들었다. 반면에e-Navigation 체계로 전환 시 예상되는 장애로는 첫째, 체계 구축을 위한 비용(특히 개발도상국가들의 경우 어려움 예상), 둘째, e-Navigation의 성과 달성을 위하여 세계 전 해역의 모든 선박이 e-Navigation 체계에 동참하도록유도하는 문제, 셋째, 전자해도 표시 및 선교 장비들에 대한 표준화 문제, 넷째, 육상에 설치할 e-Navigation 센터의 설계 및 구축 등을 꼽았다.IMO는 2005년 81차 MSC(해사안전위원회) 회의에서 영국이 일본, 마샬아일랜드, 네덜란드, 노르웨이, 싱가포르, 미국과 공동으로 제안한 ‘e-Navigation전략 개발’ 의제를 2006년 82차 MSC 회의에서 채택하고, NAV(항해 전문위원회)를 통하여 2008년까지 e-Navigation의 구체적 개념을 정립하고 향후 개발하여야 할 전략적 비전과 정책을 수립하기로 하였다. 이어서 영국을 의장으로 e-Navigation 전략개발 통신작업반이 구성되었는데, 지난 년간 19개국, 16개 전문기관이 참여하여 아래의 작업이 수행되었다. ○ e-Navigation 개념의 정의와 목적 ○ e-Navigation에 대한 핵심 이슈 및 우선 순위 식별 ○ e-Navigation 개발에 따른 이점과 단점의 식별 ○ IMO 및 회원국 등의 역할 식별 ○ 이행계획을 포함한 추가 개발을 위한 작업계획의 작성 IMO에서 수행되고 있는 e-Navigation 전략 개발 의제 일정은 2008년까지이다. 이 전략 개발에 있어서 중요한 요소는 e-Navigation이 포함할 서비스범위, 포함하는 서비스 제공에 필요한 인프라 및 장비의 식별, 인프라 구축및 운용비용을 부담할 주체에 대한 논의, e-Navigation으로 인한 이익과 투자비용에 대한 비교 분석 등이다. 이 과정에서 정부, 선주, 항만운영자, 선원등의 입장 차이와 선진국과 개발도상국 간의 경제 수준 차이는 전략 개발에있어 큰 어려움을 줄 것이므로, 이들이 합의된 전략을 만들기 위해서는 예정된 기간보다 다소 늦어질 가능성도 있다.e-Navigation 전략 개발이 완료되면 1단계로는 해상교통 관제시스템, 선박선교 장비, 무선 통신장비 등에 대한 표준화 작업이 이루어질 것이다. 이 과정에서 각국 간에 자국 보유 기술을 표준화시키기 위한 경쟁이 치열할 것으로 예상된다. 2단계에서는 e-Navigation 체계 하에서의 다양하고 풍부한 서비스 제공을 위한 관련 소프트웨어 및 하드웨어의 개발이 이루어질 것으로전망되는데, 이는 지난 10년간 육상에서 인터넷망 설치 후 이루어진 관련 서비스 산업의 발전을 돌아보면 쉽게 짐작할 수 있을 것이다.e-Navigation 체계 하에서 선박의 항해는 현재와는 전혀 다른 패러다임으로 바뀔 것이다. 예를 들어 현재 입출항 시 요구되던 복잡한 절차는one-stop 쇼핑 형태로 단순화되고, 현재 선박 중심의 항해에서 육상e-Navigation 센터가 적극적으로 관여하는 항해 체계로 바뀔 것이며, 해상정보의 공유와 활용이 무선 인터넷을 통해 보다 광범위하게 이루어질 것이 다.e-Navigation의 잠재적 시장 규모는 선박에 새로이 탑재될 지능형 통합 항법시스템 구축과 육상 모니터링 및 지원 시스템 등 직접 시장이 약 50조원,전자해도, 통신장비, 관련 서비스 컨텐츠 등 간접 시장의 규모가 150조원으로 총 200조원으로 대략 추산하고 있다. 향후 이 거대한 시장을 차지하기 위한 전략 수립이 필요한 시점이다. 지금까지 항해 장비 관련 산업은 선진국의일부 업체들에 의해 독점되어 왔다. 우리나라는 조선과 해운에서 모두 선진국임에도 불구하고 이 분야에서는 대부분 수입에 의존해 왔다. e-Navigation체계 하에서는 전체 시장이 커지고 장비의 사양이 표준화됨에 따라 어느 소수 업체가 현재처럼 독점하기는 더 이상 어려울 것으로 예상된다. 따라서e-Navigation은 우리나라도 항해 장비 분야 시장을 차지할 수 있는 좋은 기회라고 할 수 있다. 특히 조선 1위의 장점을 적극 활용한다면 다른 나라보다우위의 경쟁력을 확보할 수도 있다. 또한, 서비스 분야의 시장은 IT 기술과밀접한 관계가 있으므로 IT 강국인 우리나라가 충분한 경쟁력을 갖고 있다고 할 수 있다.그러나, EU를 비롯한 선진국에서는 이미 e-Navigation 에 대비한 연구를10여년 전부터 수행해 왔다. 앞에서 언급한 EU의 MarNIS 사업은 현재 거의마무리 단계로 당장 실용화 할 수 있는 수준에 있는 것으로 보인다. 늦었지만 우리도 이를 따라잡기 위한 연구를 서둘러야 할 것이다. 국내에서도e-Navigation의 중요성을 깊이 인식하고, 2006년에는 관련 산학연 전문가들로 작업반을 구성하여 워크숍 등을 개최한 바 있다. 또한 해양수산부에서도e-Navigation 핵심기술 개발을 위한 연구사업을 기획 추진하고 있다.그러나 현재 항해통신장비들의 기술기준은 ITU의 전파규칙(RR)과 IMO결의 및 SOLAS 협약을 따르고 있는데 이들 규약이나 결의에 대한 국제적인 추이와 비교할 때 국내의 기술은 표준화되지 못한 부분이 많은 실정이다.본 연구에서는 e-Navigation sytem중 표준화가 필요한 요소와 전자해도,AIS 등 e-Navigation(통합전자항법시스템)관련 국내산업현황 실태조사를 통해 국내 e-Navigation기술개발 동향에 대해 조사하고자 한다.

  • PDF

Effect of Herd-mix Feeding System formulated by Energy Requirement Levels on the Performance of Lactating Cows (에너지 요구수준에 의하여 조제한 자가배합사료 사양체계가 젖소의 산유능력에 미치는 영향)

  • Sung, H.G.;Kim, D.K.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.5
    • /
    • pp.773-782
    • /
    • 2004
  • The objective of this work was to establish an approaching method for TMR feeding system in the farm situation by testing the effects of herd-mix feeding system on the performance of lactating cows. Fifty six Holstein cows were fed for the experimental period of 16 months. Prior to test the herd-mix feeding system, animals were kept on conventional feeding system for 4 months, separate feeding of forage with concentrate, then provided 3 types of herd-mix rations formulated by mean level of energy concentration requirement of higher 1/2 ranker in each herd for remaining 12 months to compare the effect on milk yield performance of animals with conventional feeding system by using a switch-over method. The herd-mix feeding system influenced substantially upon the improvements of milk yield(P<0.01) and milk fat percentage(P<0.05). In all of lactational ages, milk yield and milk fat were enhanced by the herd-mix feeding system. Especially, actual milk yield(AMY) and milk fat in the 1st lactating cows, and AMY and 4.0% fat corrected milk yield(FCM) in the 2nd lactating cows were increased signiticantly by the herd-mix feeding system(P<0.05). In the early and mid-stage of lactation, the herd-mix feeding system showed higher AMY and FCM compared with the conventional feeding system although the milk fat was not different. AMY, milk fat and FCM in the late-stage of lactation were increased generally by the herd-mix feeding system. Especially, milk fat and FCM obtained by the herd-mix feeding system were significantly higher than those by the conventional feeding system(P<0.01). The herd-mix feeding system showed a good type of milking curve with a higher persistency of FCM(93.24%) than in conventional one(92.69%). The income over feed cost with the herd-mix feeding system was lA-fold higher than that with the conventional feeding method. In conclusion, the results of this work suggest that the herd-mix feeding system based on a correct level of energy concentration of TMR and well-determined feed ingredients increases not only milk yield of dairy herd by enhancing the performance of cows in early to mid stage of lactation with improved milk persistency but also gross income of dairy farm.

Development of Prediction Model for Capsaicinoids Content in Red-Pepper Powder Using Near-Infrared Spectroscopy - Particle Size Effect (근적외선 스펙트럼을 이용한 고춧가루의 캡사이신 함량 예측 모델 개발 - 입자의 영향)

  • Mo, Changyeun;Kang, Sukwon;Lee, Kangjin;Lim, Jong-Guk;Cho, Byoung-Kwan;Lee, Hyun-Dong
    • Food Engineering Progress
    • /
    • v.15 no.1
    • /
    • pp.48-55
    • /
    • 2011
  • In this research, the near-infrared absorption from 1,100-2,300 nm was used to measure the content of capsaicinoids in the red-pepper powder by using the Acousto-optic tunable filters (AOTF) spectrometer with sample plate and sample rotating unit. Non-spicy red-pepper samples from one location (Younggwang-gun. Korea) were mixed with spicy one (var. Chungyang) to make samples separated by particle size (below 0.425 mm, 0.425-0.71 mm, and 0.71- 1.4 mm). The Partial Least Squares Regression (PLSR) model to predict the capsaicinoid content on particle sizes was developed with measured spectra by AOTF spectrometer and used to analyze the amount of capsaicinoids by HPLC. The PLSR Model of red-pepper powder of below 0.425 mm, 0.425-0.71 mm, and 0.71-1.4 mm with cross validation had ${R_V}^2$ = 0.948-0.979 and Standard Error of Prediction (SEP) = 6.56-7.94 mg%. The prediction error of smaller particle size of red-pepper powder was low. The best PLSR model was found in pretreatment of Range Normalization, Standard Normal Variate, and 1st Derivatives of red-pepper powder of below 1.4 mm with cross validation, having ${R_V}^2$ = 0.959 and SEP = 8.82 mg%.

Development of Correction Formulas for KMA AAOS Soil Moisture Observation Data (기상청 농업기상관측망 토양수분 관측자료 보정식 개발)

  • Choi, Sung-Won;Park, Juhan;Kang, Minseok;Kim, Jongho;Sohn, Seungwon;Cho, Sungsik;Chun, Hyenchung;Jung, Ki-Yuol
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.24 no.1
    • /
    • pp.13-34
    • /
    • 2022
  • Soil moisture data have been collected at 11 agrometeorological stations operated by The Korea Meteorological Administration (KMA). This study aimed to verify the accuracy of soil moisture data of KMA and develop a correction formula to be applied to improve their quality. The soil of the observation field was sampled to analyze its physical properties that affect soil water content. Soil texture was classified to be sandy loam and loamy sand at most sites. The bulk density of the soil samples was about 1.5 g/cm3 on average. The content of silt and clay was also closely related to bulk density and water holding capacity. The EnviroSCAN model, which was used as a reference sensor, was calibrated using the self-manufactured "reference soil moisture observation system". Comparison between the calibrated reference sensor and the field sensor of KMA was conducted at least three times at each of the 11 sites. Overall, the trend of fluctuations over time in the measured values of the two sensors appeared similar. Still, there were sites where the latter had relatively lower soil moisture values than the former. A linear correction formula was derived for each site and depth using the range and average of the observed data for the given period. This correction formula resulted in an improvement in agreement between sensor values at the Suwon site. In addition, the detailed approach was developed to estimate the correction value for the period in which a correction formula was not calculated. In summary, the correction of soil moisture data at a regular time interval, e.g., twice a year, would be recommended for all observation sites to improve the quality of soil moisture observation data.

Detection of Wildfire Burned Areas in California Using Deep Learning and Landsat 8 Images (딥러닝과 Landsat 8 영상을 이용한 캘리포니아 산불 피해지 탐지)

  • Youngmin Seo;Youjeong Youn;Seoyeon Kim;Jonggu Kang;Yemin Jeong;Soyeon Choi;Yungyo Im;Yangwon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1413-1425
    • /
    • 2023
  • The increasing frequency of wildfires due to climate change is causing extreme loss of life and property. They cause loss of vegetation and affect ecosystem changes depending on their intensity and occurrence. Ecosystem changes, in turn, affect wildfire occurrence, causing secondary damage. Thus, accurate estimation of the areas affected by wildfires is fundamental. Satellite remote sensing is used for forest fire detection because it can rapidly acquire topographic and meteorological information about the affected area after forest fires. In addition, deep learning algorithms such as convolutional neural networks (CNN) and transformer models show high performance for more accurate monitoring of fire-burnt regions. To date, the application of deep learning models has been limited, and there is a scarcity of reports providing quantitative performance evaluations for practical field utilization. Hence, this study emphasizes a comparative analysis, exploring performance enhancements achieved through both model selection and data design. This study examined deep learning models for detecting wildfire-damaged areas using Landsat 8 satellite images in California. Also, we conducted a comprehensive comparison and analysis of the detection performance of multiple models, such as U-Net and High-Resolution Network-Object Contextual Representation (HRNet-OCR). Wildfire-related spectral indices such as normalized difference vegetation index (NDVI) and normalized burn ratio (NBR) were used as input channels for the deep learning models to reflect the degree of vegetation cover and surface moisture content. As a result, the mean intersection over union (mIoU) was 0.831 for U-Net and 0.848 for HRNet-OCR, showing high segmentation performance. The inclusion of spectral indices alongside the base wavelength bands resulted in increased metric values for all combinations, affirming that the augmentation of input data with spectral indices contributes to the refinement of pixels. This study can be applied to other satellite images to build a recovery strategy for fire-burnt areas.

Utilization of Smart Farms in Open-field Agriculture Based on Digital Twin (디지털 트윈 기반 노지스마트팜 활용방안)

  • Kim, Sukgu
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2023.04a
    • /
    • pp.7-7
    • /
    • 2023
  • Currently, the main technologies of various fourth industries are big data, the Internet of Things, artificial intelligence, blockchain, mixed reality (MR), and drones. In particular, "digital twin," which has recently become a global technological trend, is a concept of a virtual model that is expressed equally in physical objects and computers. By creating and simulating a Digital twin of software-virtualized assets instead of real physical assets, accurate information about the characteristics of real farming (current state, agricultural productivity, agricultural work scenarios, etc.) can be obtained. This study aims to streamline agricultural work through automatic water management, remote growth forecasting, drone control, and pest forecasting through the operation of an integrated control system by constructing digital twin data on the main production area of the nojinot industry and designing and building a smart farm complex. In addition, it aims to distribute digital environmental control agriculture in Korea that can reduce labor and improve crop productivity by minimizing environmental load through the use of appropriate amounts of fertilizers and pesticides through big data analysis. These open-field agricultural technologies can reduce labor through digital farming and cultivation management, optimize water use and prevent soil pollution in preparation for climate change, and quantitative growth management of open-field crops by securing digital data for the national cultivation environment. It is also a way to directly implement carbon-neutral RED++ activities by improving agricultural productivity. The analysis and prediction of growth status through the acquisition of the acquired high-precision and high-definition image-based crop growth data are very effective in digital farming work management. The Southern Crop Department of the National Institute of Food Science conducted research and development on various types of open-field agricultural smart farms such as underground point and underground drainage. In particular, from this year, commercialization is underway in earnest through the establishment of smart farm facilities and technology distribution for agricultural technology complexes across the country. In this study, we would like to describe the case of establishing the agricultural field that combines digital twin technology and open-field agricultural smart farm technology and future utilization plans.

  • PDF

The Validation Study of the Questionnaire for Sasang Constitution Classification (the 2nd edition revised in 1995) - In the field of profile analysis (사상체질분류검사지(四象體質分類檢査紙)(QSCC)II에 대(對)한 타당화(妥當化) 연구(硏究) -각(各) 체질집단(體質集團)의 군집별(群集別) Profile 분석(分析)을 중심(中心)으로-)

  • Lee, Jung-Chan;Go, Byeong-Hui;Song, Il-Byeong
    • Journal of Sasang Constitutional Medicine
    • /
    • v.8 no.1
    • /
    • pp.247-294
    • /
    • 1996
  • By means of the statistical data which has been collected with newly revised QSCC made use of the outpatient group examined at Kyung-Hee Medical Center and an open ordinary person group, the author proceeded statistical analysis for the validation study of the revised questionnaire itself. First, check the accurate discrimination rate by performing discriminant analysis on the statistical data of the patient group. And next, sought T-score by applying the norms gained in process of standadization of the open ordinary person group to the Sasang scale score of the outpatient group and investigated the distinctive feature between the subpopulations which was devided in the process of multivarite cluster analysis. The result was summarized as follows ; 1. The validity of the questionnaire was established through the fact that the accurate discrimination rate the ratio between predicted group and actual group was figured out 70.08%. 2. At the profile analysis the response to the relevant scale showed notable upward tendency in each constitutional group and therefore it seems to be pertinent in the field of constitutional discrimination. 3. In the observation of the power of expression through the profile analysis of each constitutional group the Soyang group demonstrated the most remarkable outcome, the Soeum group was the most inferior and the Taieum group revealed a sort of dual property. 4. What is called the group of seceder out of three subpopulation of each constitutional group distinguished definitely from the contrasted groups at the point of the distinctive profile feature and the content is like following description. (1) The seceder group of Soyang-in showed considerably passive disposition differently from general character of ordinary Soyang group and an appearance attracting the attention is that they demonstrated comparatively higher response at Soeum scale (2) The seceder group of Taieum-in gained low scores in general that informed the passive disposition of the group and the other way of the general property of Taieum group which showed accompanied ascension in Taiyang-Taieum scales they demonstrated sharply declined score at Taiyang scale (3) The seceder group of Soeum-in demonstrated distinctive property similar to the profile feature of Soyang group and it notifies that the passive property of Soeum group was diluted for the most part. According to the above result, the validity of newly revised questionnaire has been proven successfully and the property of seceder groups could be noticed to some degree through the profile analysis on the course of this study. The result of this study is expected to use as a research materials to produce next edition of the questionnaire and it is regarded that further inquisition about the difference between the seceder group and the contrasted group is required for the promotion of the questionnaire as it refered several times in the contents of the main discourse.

  • PDF

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.