• Title/Summary/Keyword: 결정 알고리즘

Search Result 4,075, Processing Time 0.027 seconds

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Analysis of the Characteristics of the Seismic source and the Wave Propagation Parameters in the region of the Southeastern Korean Peninsula (한반도 남동부 지진의 지각매질 특성 및 지진원 특성 변수 연구)

  • Kim, Jun-Kyoung;Kang, Ik-Bum
    • Journal of the Korean Society of Hazard Mitigation
    • /
    • v.2 no.1 s.4
    • /
    • pp.135-141
    • /
    • 2002
  • Both non-linear damping values of the deep and shallow crustal materials and seismic source parameters are found from the observed near-field seismic ground motions at the South-eastern Korean Peninsula. The non-linear numerical algorithm applied in this study is Levenberg-Marquadet method. All the 25 sets of horizontal ground motions (east-west and north-south components at each seismic station) from 3 events (micro to macro scale) were used for the analysis of damping values and source parameters. The non-linear damping values of the deep and shallow crustal materials were found to be more similar to those of the region of the Western United States. The seismic source parameters found from this study also showed that the resultant stress drop values are relatively low compared to those of the Western United Sates. Consequently, comparisons of the various seismic parameters from this study and those of the United States Seismo-tectonic data suggest that the seismo-tectonic characteristics of the South eastern Korean Peninsula is more similar to those of the Western U.S.

Liver Splitting Using 2 Points for Liver Graft Volumetry (간 이식편의 체적 예측을 위한 2점 이용 간 분리)

  • Seo, Jeong-Joo;Park, Jong-Won
    • The KIPS Transactions:PartB
    • /
    • v.19B no.2
    • /
    • pp.123-126
    • /
    • 2012
  • This paper proposed a method to separate a liver into left and right liver lobes for simple and exact volumetry of the river graft at abdominal MDCT(Multi-Detector Computed Tomography) image before the living donor liver transplantation. A medical team can evaluate an accurate river graft with minimized interaction between the team and a system using this algorithm for ensuring donor's and recipient's safe. On the image of segmented liver, 2 points(PMHV: a point in Middle Hepatic Vein and PPV: a point at the beginning of right branch of Portal Vein) are selected to separate a liver into left and right liver lobes. Middle hepatic vein is automatically segmented using PMHV, and the cutting line is decided on the basis of segmented Middle Hepatic Vein. A liver is separated on connecting the cutting line and PPV. The volume and ratio of the river graft are estimated. The volume estimated using 2 points are compared with a manual volume that diagnostic radiologist processed and estimated and the weight measured during surgery to support proof of exact volume. The mean ${\pm}$ standard deviation of the differences between the actual weights and the estimated volumes was $162.38cm^3{\pm}124.39$ in the case of manual segmentation and $107.69cm^3{\pm}97.24$ in the case of 2 points method. The correlation coefficient between the actual weight and the manually estimated volume is 0.79, and the correlation coefficient between the actual weight and the volume estimated using 2 points is 0.87. After selection the 2 points, the time involved in separation a liver into left and right river lobe and volumetry of them is measured for confirmation that the algorithm can be used on real time during surgery. The mean ${\pm}$ standard deviation of the process time is $57.28sec{\pm}32.81$ per 1 data set ($149.17pages{\pm}55.92$).

Comparisons between the Two Dose Profiles Extracted from Leksell GammaPlan and Calculated by Variable Ellipsoid Modeling Technique (렉셀 감마플랜(LGP)에서 추출된 선량 분포와 가변 타원체 모형화기술(VEMT)에 의해 계산된 선량 분포 사이의 비교)

  • Hur, Beong Ik
    • Journal of the Korean Society of Radiology
    • /
    • v.11 no.1
    • /
    • pp.9-17
    • /
    • 2017
  • A high degree of precision and accuracy in Gamma Knife Radiosurgery(GKRS) is a fundamental requirement for therapeutical success. Elaborate radiation delivery and dose gradients with the steep fall-off of radiation are clinically applied thus necessitating a dedicated Quality Assurance(QA) program in order to guarantee dosimetric and geometric accuracy and reduce all the risk factors that can occur in GKRS. In this study, as a part of QA we verified the accuracy of single-shot dose profiles used in the algorithm of Gamma Knife Perfexion(PFX) treatment planning system employing Variable Ellipsoid Modeling Technique(VEMT). We evaluated the dose distributions of single-shots in a spherical ABC phantom with diameter 160 mm on Gamma Knife PFX. The single-shots were directed to the center of ABC phantom. Collimating configurations of 4, 8, and 16 mm sizes along x, y, and z axes were studied. Gamma Knife PFX treatment planning system being used in GKRS is called Leksell GammaPlan(LGP) ver 10.1.1. From the verification like this, the accuracy of GKRS will be doubled. Then the clinical application must be finally performed based on precision and accuracy of GKRS. Specifically the width at the 50% isodose level, that is, Full-Width-of-Half-Maximum(FWHM) was verified under such conditions that a patient's head is simulated as a sphere with diameter 160mm. All the data about dose profiles along x, y, and z axes predicted through VEMT were excellently consistent with dose profiles from LGP within specifications(${\leq}1mm$ at 50% isodose level) except for a little difference of FWHM and PENUMBRA(isodose level: 20%~80%) along z axis for 4 mm and 8mm collimating configurations. The maximum discrepancy of FWHM was less than 2.3% at all collimating configurations. The maximum discrepancy of PENUMBRA was given for the 8 mm collimator along z axis. The difference of FWHM and PENUMBRA in the dose distributions obtained with VEMT and LGP is too small to give the clinical significance in GKRS. The results of this study are considered as a reference for medical physicists involved in GKRS in the whole world. Therefore we can work to confirm the validity of dose distributions for all collimating configurations determined through the regular preventative maintenance program using the independent verification method VEMT for the results of LGP and clinically assure the perfect treatment for patients of GKRS. Thus the use of VEMT is expected that it will be a part of QA that can verify and operate the system safely.

Red Tide Detection through Image Fusion of GOCI and Landsat OLI (GOCI와 Landsat OLI 영상 융합을 통한 적조 탐지)

  • Shin, Jisun;Kim, Keunyong;Min, Jee-Eun;Ryu, Joo-Hyung
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.2_2
    • /
    • pp.377-391
    • /
    • 2018
  • In order to efficiently monitor red tide over a wide range, the need for red tide detection using remote sensing is increasing. However, the previous studies focus on the development of red tide detection algorithm for ocean colour sensor. In this study, we propose the use of multi-sensor to improve the inaccuracy for red tide detection and remote sensing data in coastal areas with high turbidity, which are pointed out as limitations of satellite-based red tide monitoring. The study area were selected based on the red tide information provided by National Institute of Fisheries Science, and spatial fusion and spectral-based fusion were attempted using GOCI image as ocean colour sensor and Landsat OLI image as terrestrial sensor. Through spatial fusion of the two images, both the red tide of the coastal area and the outer sea areas, where the quality of Landsat OLI image was low, which were impossible to observe in GOCI images, showed improved detection results. As a result of spectral-based fusion performed by feature-level and rawdata-level, there was no significant difference in red tide distribution patterns derived from the two methods. However, in the feature-level method, the red tide area tends to overestimated as spatial resolution of the image low. As a result of pixel segmentation by linear spectral unmixing method, the difference in the red tide area was found to increase as the number of pixels with low red tide ratio increased. For rawdata-level, Gram-Schmidt sharpening method estimated a somewhat larger area than PC spectral sharpening method, but no significant difference was observed. In this study, it is shown that coastal red tide with high turbidity as well as outer sea areas can be detected through spatial fusion of ocean colour and terrestrial sensor. Also, by presenting various spectral-based fusion methods, more accurate red tide area estimation method is suggested. It is expected that this result will provide more precise detection of red tide around the Korean peninsula and accurate red tide area information needed to determine countermeasure to effectively control red tide.

Development of JPEG2000 Viewer for Mobile Image System (이동형 의료영상 장치를 위한 JPEG2000 영상 뷰어 개발)

  • 김새롬;정해조;강원석;이재훈;이상호;신성범;유선국;김희중
    • Progress in Medical Physics
    • /
    • v.14 no.2
    • /
    • pp.124-130
    • /
    • 2003
  • Currently, as a consequence of PACS (Picture Archiving Communication System) implementation many hospitals are replacing conventional film-type interpretations of diagnostic medical images with new digital-format interpretations that can also be saved, and retrieve However, the big limitation in PACS is considered to be the lack of mobility. The purpose of this study is to determine the optimal communication packet size. This was done by considering the terms occurred in the wireless communication. After encoding medical image using JPGE2000 image compression method, This method embodied auto-error correction technique preventing the loss of packets occurred during wireless communication. A PC class server, with capabilities to load, collect data, save images, and connect with other network, was installed. Image data were compressed using JPEG2000 algorithm which supports the capability of high energy density and compression ratio, to communicate through a wireless network. Image data were also transmitted in block units coeded by JPEG2000 to prevent the loss of the packets in a wireless network. When JPGE2000 image data were decoded in a PUA (Personal Digital Assistant), it was instantaneous for a MR (Magnetic Resonance) head image of 256${\times}$256 pixels, while it took approximately 5 seconds to decode a CR (Computed Radiography) chest image of 800${\times}$790 pixels. In the transmission of the image data using a CDMA 1X module (Code-Division Multiple Access 1st Generation), 256 byte/sec was considered a stable transmission rate, but packets were lost in the intervals at the transmission rate of 1Kbyte/sec. However, even with a transmission rate above 1 Kbyte/sec, packets were not lost in wireless LAN. Current PACS are not compatible with wireless networks. because it does not have an interface between wired and wireless. Thus, the mobile JPEG2000 image viewing system was developed in order to complement mobility-a limitation in PACS. Moreover, the weak-connections of the wireless network was enhanced by re-transmitting image data within a limitations The results of this study are expected to play an interface role between the current wired-networks PACS and the mobile devices.

  • PDF

CNN-based Recommendation Model for Classifying HS Code (HS 코드 분류를 위한 CNN 기반의 추천 모델 개발)

  • Lee, Dongju;Kim, Gunwoo;Choi, Keunho
    • Management & Information Systems Review
    • /
    • v.39 no.3
    • /
    • pp.1-16
    • /
    • 2020
  • The current tariff return system requires tax officials to calculate tax amount by themselves and pay the tax amount on their own responsibility. In other words, in principle, the duty and responsibility of reporting payment system are imposed only on the taxee who is required to calculate and pay the tax accurately. In case the tax payment system fails to fulfill the duty and responsibility, the additional tax is imposed on the taxee by collecting the tax shortfall and imposing the tax deduction on For this reason, item classifications, together with tariff assessments, are the most difficult and could pose a significant risk to entities if they are misclassified. For this reason, import reports are consigned to customs officials, who are customs experts, while paying a substantial fee. The purpose of this study is to classify HS items to be reported upon import declaration and to indicate HS codes to be recorded on import declaration. HS items were classified using the attached image in the case of item classification based on the case of the classification of items by the Korea Customs Service for classification of HS items. For image classification, CNN was used as a deep learning algorithm commonly used for image recognition and Vgg16, Vgg19, ResNet50 and Inception-V3 models were used among CNN models. To improve classification accuracy, two datasets were created. Dataset1 selected five types with the most HS code images, and Dataset2 was tested by dividing them into five types with 87 Chapter, the most among HS code 2 units. The classification accuracy was highest when HS item classification was performed by learning with dual database2, the corresponding model was Inception-V3, and the ResNet50 had the lowest classification accuracy. The study identified the possibility of HS item classification based on the first item image registered in the item classification determination case, and the second point of this study is that HS item classification, which has not been attempted before, was attempted through the CNN model.

Enhancement of Inter-Image Statistical Correlation for Accurate Multi-Sensor Image Registration (정밀한 다중센서 영상정합을 위한 통계적 상관성의 증대기법)

  • Kim, Kyoung-Soo;Lee, Jin-Hak;Ra, Jong-Beom
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.4 s.304
    • /
    • pp.1-12
    • /
    • 2005
  • Image registration is a process to establish the spatial correspondence between images of the same scene, which are acquired at different view points, at different times, or by different sensors. This paper presents a new algorithm for robust registration of the images acquired by multiple sensors having different modalities; the EO (electro-optic) and IR(infrared) ones in the paper. The two feature-based and intensity-based approaches are usually possible for image registration. In the former selection of accurate common features is crucial for high performance, but features in the EO image are often not the same as those in the R image. Hence, this approach is inadequate to register the E0/IR images. In the latter normalized mutual Information (nHr) has been widely used as a similarity measure due to its high accuracy and robustness, and NMI-based image registration methods assume that statistical correlation between two images should be global. Unfortunately, since we find out that EO and IR images don't often satisfy this assumption, registration accuracy is not high enough to apply to some applications. In this paper, we propose a two-stage NMI-based registration method based on the analysis of statistical correlation between E0/1R images. In the first stage, for robust registration, we propose two preprocessing schemes: extraction of statistically correlated regions (ESCR) and enhancement of statistical correlation by filtering (ESCF). For each image, ESCR automatically extracts the regions that are highly correlated to the corresponding regions in the other image. And ESCF adaptively filters out each image to enhance statistical correlation between them. In the second stage, two output images are registered by using NMI-based algorithm. The proposed method provides prospective results for various E0/1R sensor image pairs in terms of accuracy, robustness, and speed.

Studies on the ecological variations of rice plant under the different seasonal cultures -II. A study on the year variations and prediction of heading dates of paddy rice under the different seasonal cultures- (재배시기 이동에 의한 수도의 생태변이에 관한 연구 -II. 재배시기 이동에 의한 수도출수기의 년차간변이와 그 조기예측-)

  • Hyun-Ok Choi
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.3
    • /
    • pp.41-48
    • /
    • 1965
  • This study was aimed at knowing the magnitude of year variation in rice heading dates under the different seasonal cultures, and to estimate the heading date in advance. Using six rice varieties such as Kwansan, Suwon#82, Suwon #144, Norin#17, Yukoo#132 and Paltal, the early, ordinary and late seasonal cultures had been carried out at Paddy Crop Division, Crop Experiment Station at Suwon for the six-year period 1959 to 1964. In addition the data of the standard rice cultures at the Provincial Offices of Rural Development for the 12-year period 1953 to 1954, were analyzed for the purpose of clarifying a relationship between variation of rice heading dates and some of meteorological data related to the locations and years. The results of this study are as follows: 1. Year variation of rice heading dates was as high as 14 to 21 days in the early seasonal culture and 7 to 14 days in the ordinary seasonal culture, while as low as one to seven days in the late seasonal culture which was the lowest among three cultures. The magnitude of variation depended greatly on variety, cultural season and location. 2. It was found out that there was a close negative correlation between the accumulated average air temperature for 40 days from 31 days after seeding and number of days to heading in the early seasonal culture. Accordingly, it was considered possible to predict the rice heading date through calculation of the accumulated average air temperature for the above period and then the linear regression(Y=a+bx). On the other hand, an estimation of the heading date in the late seasonal culture requires for the further studies. In the ordinary seasonal culture, no significant correlation between the accumulated average air temperature and number of days to heading was obtained in the six-year experiments conducted at Suwon. There was a varietal difference in relationship between the accumulated average air temperature for 70 days from seeding and number of days to heading in the standard cultures at the provincial offices of rural development. Some of varieties showed a significant correlation between two factors while the others didn't show any significant correlation. However, there was no regional difference in this relationship.

  • PDF

Ecoclimatic Map over North-East Asia Using SPOT/VEGETATION 10-day Synthesis Data (SPOT/VEGETATION NDVI 자료를 이용한 동북아시아의 생태기후지도)

  • Park Youn-Young;Han Kyung-Soo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.8 no.2
    • /
    • pp.86-96
    • /
    • 2006
  • Ecoclimap-1, a new complete surface parameter global database at a 1-km resolution, was previously presented. It is intended to be used to initialize the soil-vegetation- atmosphere transfer schemes in meteorological and climate models. Surface parameters in the Ecoclimap-1 database are provided in the form of a per-class value by an ecoclimatic base map from a simple merging of land cover and climate maps. The principal objective of this ecoclimatic map is to consider intra-class variability of life cycle that the usual land cover map cannot describe. Although the ecoclimatic map considering land cover and climate is used, the intra-class variability was still too high inside some classes. In this study, a new strategy is defined; the idea is to use the information contained in S10 NDVI SPOT/VEGETATION profiles to split a land cover into more homogeneous sub-classes. This utilizes an intra-class unsupervised sub-clustering methodology instead of simple merging. This study was performed to provide a new ecolimatic map over Northeast Asia in the framework of Ecoclimap-2 global database construction for surface parameters. We used the University of Maryland's 1km Global Land Cover Database (UMD) and a climate map to determine the initial number of clusters for intra-class sub-clustering. An unsupervised classification process using six years of NDVI profiles allows the discrimination of different behavior for each land cover class. We checked the spatial coherence of the classes and, if necessary, carried out an aggregation step of the clusters having a similar NDVI time series profile. From the mapping system, 29 ecosystems resulted for the study area. In terms of climate-related studies, this new ecosystem map may be useful as a base map to construct an Ecoclimap-2 database and to improve the surface climatology quality in the climate model.