• Title/Summary/Keyword: Region-based Retrieval

Search Result 147, Processing Time 0.024 seconds

Developing of Text Plagiarism Detection Model using Korean Corpus Data (한글 말뭉치를 이용한 한글 표절 탐색 모델 개발)

  • Ryu, Chang-Keon;Kim, Hyong-Jun;Cho, Hwan-Gue
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.2
    • /
    • pp.231-235
    • /
    • 2008
  • Recently we witnessed a few scandals on plagiarism among academic paper and novels. Plagiarism on documents is getting worse more frequently. Although plagiarism on English had been studied so long time, we hardly find the systematic and complete studies on plagiarisms in Korean documents. Since the linguistic features of Korean are quite different from those of English, we cannot apply the English-based method to Korean documents directly. In this paper, we propose a new plagiarism detecting method for Korean, and we throughly tested our algorithm with one benchmark Korean text corpus. The proposed method is based on "k-mer" and "local alignment" which locates the region of plagiarized document pairs fast and accurately. Using a Korean corpus which contains more than 10 million words, we establish a probability model (or local alignment score (random similarity by chance). The experiment has shown that our system was quite successful to detect the plagiarized documents.

Retrieval of Sulfur Dioxide Column Density from TROPOMI Using the Principle Component Analysis Method (주성분분석방법을 이용한 TROPOMI로부터 이산화황 칼럼농도 산출 연구)

  • Yang, Jiwon;Choi, Wonei;Park, Junsung;Kim, Daewon;Kang, Hyeongwoo;Lee, Hanlim
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_3
    • /
    • pp.1173-1185
    • /
    • 2019
  • We, for the first time, retrieved sulfur dioxide (SO2) vertical column density (VCD) in industrial and volcanic areas from TROPOspheric Monitoring Instrument (TROPOMI) using the Principle component analysis(PCA) algorithm. Furthermore, SO2 VCDs retrieved by the PCA algorithm from TROPOMI raw data were compared with those retrieved by the Differential Optical Absorption Spectroscopy (DOAS) algorithm (TROPOMI Level 2 SO2 product). In East Asia, where large amounts of SO2 are released to the surface due to anthropogenic source such as fossil fuels, the mean value of SO2 VCD retrieved by the PCA (DOAS) algorithm was shown to be 0.05 DU (-0.02 DU). The correlation between SO2 VCD retrieved by the PCA algorithm and those retrieved by the DOAS algorithm were shown to be low (slope = 0.64; correlation coefficient (R) = 0.51) for cloudy condition. However, with cloud fraction of less than 0.5, the slope and correlation coefficient between the two outputs were increased to 0.68 and 0.61, respectively. It means that the SO2 retrieval sensitivity to surface is reduced when the cloud fraction is high in both algorithms. Furthermore, the correlation between volcanic SO2 VCD retrieved by the PCA algorithm and those retrieved by the DOAS algorithm is shown to be high (R = 0.90) for cloudy condition. This good agreement between both data sets for volcanic SO2 is thought to be due to the higher accuracy of the satellite-based SO2 VCD retrieval for SO2 which is mainly distributed in the upper troposphere or lower stratosphere in volcanic region.

Sensitivity of COMS/GOCI Measured Top-of-atmosphere Reflectances to Atmospheric Aerosol Properties (COMS/GOCI 관측값의 대기 에어러솔의 특성에 대한 민감도 분석)

  • Lee, Kwon-Ho;Kim, Young-Joon
    • Korean Journal of Remote Sensing
    • /
    • v.24 no.6
    • /
    • pp.559-569
    • /
    • 2008
  • The Geostationary Ocean Color Imager (GOCI) on board the Communication Ocean Meteorological Satellite (COMS), the first geostationary ocean color sensor, requires accurate atmospheric correction since its eight bands are also affected by atmospheric constituents such as gases, molecules and atmospheric aerosols. Unlike gases and molecules in the atmosphere, aerosols can interact with sunlight by complex scattering and absorption properties. For the purpose of qualified ocean remote sensing, understanding of aerosol-radiation interactions is needed. In this study, we show micro-physical and optical properties of aerosols using the Optical Property of Aerosol and Cloud (OPAC) aerosol models. Aerosol optical properties, then, were used to analysis the relationship between theoretical satellite measured radiation from radiative transfer calculations and aerosol optical thickness (AOT) under various environments (aerosol type and loadings). It is found that the choice of aerosol type makes little different in AOT retrieval for AOT<0.2. Otherwise AOT differences between true and retrieved increase as AOT increases. Furthermore, the differences between the AOT and angstrom exponent from standard algorithms and this study, and the comparison with ground based sunphotometer observations are investigated. Over the northeast Asian region, these comparisons suggest that spatially averaged mean AOT retrieved from this study is much better than from standard ocean color algorithm. Finally, these results will be useful for aerosol retrieval or atmospheric correction of COMS/GOCI data processing.

X-linked recessive myotubular myopathy with MTM1 mutations

  • Han, Young-Mi;Kwon, Kyoung-Ah;Lee, Yun-Jin;Nam, Sang-Ook;Park, Kyung-Hee;Byun, Shin-Yun;Kim, Gu-Hwan;Yoo, Han-Wook
    • Clinical and Experimental Pediatrics
    • /
    • v.56 no.3
    • /
    • pp.139-142
    • /
    • 2013
  • X-linked recessive myotubular myopathy (XLMTM) is a severe congenital muscle disorder caused by mutations in the MTM1 gene and characterized by severe hypotonia and generalized muscle weakness in affected males. It is generally a fatal disorder during the neonatal period and early infancy. The diagnosis is based on typical histopathological findings on muscle biopsy, combined with suggestive clinical features. We experienced a case of a newborn who required intubation and ventilator care because of profound hypotonia and respiratory difficulty. The preliminary diagnosis at the time of request for retrieval was hypoxic ischemic encephalopathy, but the infant was clinically reevaluated for generalized weakness and muscle atrophy. Muscle biopsies showed variability in fiber size and centrally located nuclei in nearly all the fibers. We detected an MTM1 gene mutation of c.1261-1C>A in the intron 10 region, and diagnosed the neonate with myotubular myopathy. The same mutation was detected in his mother.

An Efficient Object Extraction Scheme for Low Depth-of-Field Images (낮은 피사계 심도 영상에서 관심 물체의 효율적인 추출 방법)

  • Park Jung-Woo;Lee Jae-Ho;Kim Chang-Ick
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.9
    • /
    • pp.1139-1149
    • /
    • 2006
  • This paper describes a novel and efficient algorithm, which extracts focused objects from still images with low depth-of-field (DOF). The algorithm unfolds into four modules. In the first module, a HOS map, in which the spatial distribution of the high-frequency components is represented, is obtained from an input low DOF image [1]. The second module finds OOI candidate by using characteristics of the HOS. Since it is possible to contain some holes in the region, the third module detects and fills them. In order to obtain an OOI, the last module gets rid of background pixels in the OOI candidate. The experimental results show that the proposed method is highly useful in various applications, such as image indexing for content-based retrieval from huge amounts of image database, image analysis for digital cameras, and video analysis for virtual reality, immersive video system, photo-realistic video scene generation and video indexing system.

  • PDF

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

Overview and Prospective of Satellite Chlorophyll-a Concentration Retrieval Algorithms Suitable for Coastal Turbid Sea Waters (연안 혼탁 해수에 적합한 위성 클로로필-a 농도 산출 알고리즘 개관과 전망)

  • Park, Ji-Eun;Park, Kyung-Ae;Lee, Ji-Hyun
    • Journal of the Korean earth science society
    • /
    • v.42 no.3
    • /
    • pp.247-263
    • /
    • 2021
  • Climate change has been accelerating in coastal waters recently; therefore, the importance of coastal environmental monitoring is also increasing. Chlorophyll-a concentration, an important marine variable, in the surface layer of the global ocean has been retrieved for decades through various ocean color satellites and utilized in various research fields. However, the commonly used chlorophyll-a concentration algorithm is only suitable for application in clear water and cannot be applied to turbid waters because significant errors are caused by differences in their distinct components and optical properties. In addition, designing a standard algorithm for coastal waters is difficult because of differences in various optical characteristics depending on the coastal area. To overcome this problem, various algorithms have been developed and used considering the components and the variations in the optical properties of coastal waters with high turbidity. Chlorophyll-a concentration retrieval algorithms can be categorized into empirical algorithms, semi-analytic algorithms, and machine learning algorithms. These algorithms mainly use the blue-green band ratio based on the reflective spectrum of sea water as the basic form. In constrast, algorithms developed for turbid water utilizes the green-red band ratio, the red-near-infrared band ratio, and the inherent optical properties to compensate for the effect of dissolved organisms and suspended sediments in coastal area. Reliable retrieval of satellite chlorophyll-a concentration from turbid waters is essential for monitoring the coastal environment and understanding changes in the marine ecosystem. Therefore, this study summarizes the pre-existing algorithms that have been utilized for monitoring turbid Case 2 water and presents the problems associated with the mornitoring and study of seas around the Korean Peninsula. We also summarize the prospective for future ocean color satellites, which can yield more accurate and diverse results regarding the ecological environment with the development of multi-spectral and hyperspectral sensors.

The Development of Travel Demand Nowcasting Model Based on Travelers' Attention: Focusing on Web Search Traffic Information (여행자 관심 기반 스마트 여행 수요 예측 모형 개발: 웹검색 트래픽 정보를 중심으로)

  • Park, Do-Hyung
    • The Journal of Information Systems
    • /
    • v.26 no.3
    • /
    • pp.171-185
    • /
    • 2017
  • Purpose Recently, there has been an increase in attempts to analyze social phenomena, consumption trends, and consumption behavior through a vast amount of customer data such as web search traffic information and social buzz information in various fields such as flu prediction and real estate price prediction. Internet portal service providers such as google and naver are disclosing web search traffic information of online users as services such as google trends and naver trends. Academic and industry are paying attention to research on information search behavior and utilization of online users based on the web search traffic information. Although there are many studies predicting social phenomena, consumption trends, political polls, etc. based on web search traffic information, it is hard to find the research to explain and predict tourism demand and establish tourism policy using it. In this study, we try to use web search traffic information to explain the tourism demand for major cities in Gangwon-do, the representative tourist area in Korea, and to develop a nowcasting model for the demand. Design/methodology/approach In the first step, the literature review on travel demand and web search traffic was conducted in parallel in two directions. In the second stage, we conducted a qualitative research to confirm the information retrieval behavior of the traveler. In the next step, we extracted the representative tourist cities of Gangwon-do and confirmed which keywords were used for the search. In the fourth step, we collected tourist demand data to be used as a dependent variable and collected web search traffic information of each keyword to be used as an independent variable. In the fifth step, we set up a time series benchmark model, and added the web search traffic information to this model to confirm whether the prediction model improved. In the last stage, we analyze the prediction models that are finally selected as optimal and confirm whether the influence of the keywords on the prediction of travel demand. Findings This study has developed a tourism demand forecasting model of Gangwon-do, a representative tourist destination in Korea, by expanding and applying web search traffic information to tourism demand forecasting. We compared the existing time series model with the benchmarking model and confirmed the superiority of the proposed model. In addition, this study also confirms that web search traffic information has a positive correlation with travel demand and precedes it by one or two months, thereby asserting its suitability as a prediction model. Furthermore, by deriving search keywords that have a significant effect on tourism demand forecast for each city, representative characteristics of each region can be selected.

Improvement of Cloud-data Filtering Method Using Spectrum of AERI (AERI 스펙트럼 분석을 통한 구름에 영향을 받은 스펙트럼 자료 제거 방법 개선)

  • Cho, Joon-Sik;Goo, Tae-Young;Shin, Jinho
    • Korean Journal of Remote Sensing
    • /
    • v.31 no.2
    • /
    • pp.137-148
    • /
    • 2015
  • The National Institute of Meteorological Research (NIMR) has operated the Fourier Transform InfraRed (FTIR) spectrometer which is the Atmospheric Emitted Radiance Interferometer (AERI) in Anmyeon island, Korea since June 2010. The ground-based AERI with similar hyper-spectral infrared sensor to satellite could be an alternative way to validate satellite-based remote sensing. In this regard, the NIMR has focused on the improvement of retrieval quality from the AERI, particularly cloud-data filtering method. The AERI spectrum which is measured on a typical clear day is selected reference spectrum and we used region of atmospheric window. We performed test of threshold in order to select valid threshold. We retrieved methane using new method which is used reference spectrum, and the other method which is used KLAPS cloud cover information, each retrieved methane was compared with that of ground-based in-situ measurements. The quality of AERI methane retrievals of new method was significantly more improved than method of used KLAPS. In addition, the comparison of vertical total column of methane from AERI and GOSAT shows good result.

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences (생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝)

  • Kang, Tae-Ho;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.155-162
    • /
    • 2008
  • Biological sequences such as DNA sequences and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological dataset with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with the fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. As the result, the experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.