Search | Korea Science

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

Park, Jongin;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.25 no.3
- /
- pp.19-41
- /
- 2019
According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.
https://doi.org/10.13088/jiis.2019.25.3.019 인용 PDF KSCI

Analysis of National Stream Drying Phenomena using DrySAT-WFT Model: Focusing on Inflow of Dam and Weir Watersheds in 5 River Basins (DrySAT-WFT 모형을 활용한 전국 하천건천화 분석: 전국 5대강 댐·보 유역의 유입량을 중심으로)

LEE, Yong-Gwan;JUNG, Chung-Gil;KIM, Won-Jin;KIM, Seong-Joon
- Journal of the Korean Association of Geographic Information Studies
- /
- v.23 no.2
- /
- pp.53-69
- /
- 2020
The increase of the impermeable area due to industrialization and urban development distorts the hydrological circulation system and cause serious stream drying phenomena. In order to manage this, it is necessary to develop a technology for impact assessment of stream drying phenomena, which enables quantitative evaluation and prediction. In this study, the cause of streamflow reduction was assessed for dam and weir watersheds in the five major river basins of South Korea by using distributed hydrological model DrySAT-WFT (Drying Stream Assessment Tool and Water Flow Tracking) and GIS time series data. For the modeling, the 5 influencing factors of stream drying phenomena (soil erosion, forest growth, road-river disconnection, groundwater use, urban development) were selected and prepared as GIS-based time series spatial data from 1976 to 2015. The DrySAT-WFT was calibrated and validated from 2005 to 2015 at 8 multipurpose dam watershed (Chungju, Soyang, Andong, Imha, Hapcheon, Seomjin river, Juam, and Yongdam) and 4 gauging stations (Osucheon, Mihocheon, Maruek, and Chogang) respectively. The calibration results showed that the coefficient of determination (R²) was 0.76 in average (0.66 to 0.84) and the Nash-Sutcliffe model efficiency was 0.62 in average (0.52 to 0.72). Based on the 2010s (2006~2015) weather condition for the whole period, the streamflow impact was estimated by applying GIS data for each decade (1980s: 1976~1985, 1990s: 1986~1995, 2000s: 1996~2005, 2010s: 2006~2015). The results showed that the 2010s averaged-wet streamflow (Q95) showed decrease of 4.1~6.3%, the 2010s averaged-normal streamflow (Q185) showed decreased of 6.7~9.1% and the 2010s averaged-drought streamflow (Q355) showed decrease of 8.4~10.4% compared to 1980s streamflows respectively on the whole. During 1975~2015, the increase of groundwater use covered 40.5% contribution and the next was forest growth with 29.0% contribution among the 5 influencing factors.
https://doi.org/10.11108/kagis.2020.23.2.053 인용 PDF KSCI

The Effectiveness of CT and MRI Contrast Agent for SUV in 18F-FDG PET/CT Scanning (¹⁸F-FDG PET/CT 검사에서 정량분석에 관한 CT와 MRI 조영제의 효과)

Cha, Sangyoung;Cho, Yonggwi;Lee, Yongki;Song, Jongnam;Choi, Namgil
- Journal of the Korean Society of Radiology
- /
- v.10 no.4
- /
- pp.255-261
- /
- 2016
In this study, among various factors having influence on SUV, we intended to compare and analyze the change of SUV using CT(4 type) and MRI(3 type) contrast agents which are commercialized now. We used Discovery 690 PET/CT(GE) and NEMA NU2 - 1994 PET phantom as experimental equipment. We have conducted a study as follows; first, we filled distilled water to phantom about two-thirds and injected radioisotope(18F-FDG 37 MBq), contrast agent. Second, we mixed CT contrast agent with distilled water and MRI contrast agent with that water separately. And then, we stirred the fluid and filled distilled water fully not to make air bubble. In emission scan, we had 15minutes scanning time after 40 minutes mixing contrast agent with distilled water. In transmission scan, we used CT scanning and its measurement conditions were tube voltage 120 kVp, tube current 40 mA, rotation time 0.5 sec, slice thickness 3.27 mm, DFOV 30 cm. Analyzing results, we set up some ROIs in 10th, 15th, 20th, 25th, 30th slice and measured SUVmean, SUVmax. Consequently, all images mixed 3 types of MRI contrast agent with distilled water have high SUVmean as compared with pure FDG image but there was no statistical significance. In SUVmax, they have high score and there was statistical significance. And other 4 images mixed 4 types of CT contrast agent with distilled water have significance in both SUVmean and SUVmax. Attenuation correction in PET/CT has been executed through various methods to make high quality image. But we figured out that using CT and MRI contrast agents before PET/CT scanning could make distortion of image and decrease diagnostic value. In that reason, we have to sort out the priority of examination in hospital not to disturb other examination's results. Through this process, we will be able to give superior medical service to our customers.
https://doi.org/10.7742/jksr.2016.10.4.255 인용 PDF KSCI

Study on data preprocessing methods for considering snow accumulation and snow melt in dam inflow prediction using machine learning & deep learning models (머신러닝&딥러닝 모델을 활용한 댐 일유입량 예측시 융적설을 고려하기 위한 데이터 전처리에 대한 방법 연구)

Jo, Youngsik;Jung, Kwansue
- Journal of Korea Water Resources Association
- /
- v.57 no.1
- /
- pp.35-44
- /
- 2024
Research in dam inflow prediction has actively explored the utilization of data-driven machine learning and deep learning (ML&DL) tools across diverse domains. Enhancing not just the inherent model performance but also accounting for model characteristics and preprocessing data are crucial elements for precise dam inflow prediction. Particularly, existing rainfall data, derived from snowfall amounts through heating facilities, introduces distortions in the correlation between snow accumulation and rainfall, especially in dam basins influenced by snow accumulation, such as Soyang Dam. This study focuses on the preprocessing of rainfall data essential for the application of ML&DL models in predicting dam inflow in basins affected by snow accumulation. This is vital to address phenomena like reduced outflow during winter due to low snowfall and increased outflow during spring despite minimal or no rain, both of which are physical occurrences. Three machine learning models (SVM, RF, LGBM) and two deep learning models (LSTM, TCN) were built by combining rainfall and inflow series. With optimal hyperparameter tuning, the appropriate model was selected, resulting in a high level of predictive performance with NSE ranging from 0.842 to 0.894. Moreover, to generate rainfall correction data considering snow accumulation, a simulated snow accumulation algorithm was developed. Applying this correction to machine learning and deep learning models yielded NSE values ranging from 0.841 to 0.896, indicating a similarly high level of predictive performance compared to the pre-snow accumulation application. Notably, during the snow accumulation period, adjusting rainfall during the training phase was observed to lead to a more accurate simulation of observed inflow when predicted. This underscores the importance of thoughtful data preprocessing, taking into account physical factors such as snowfall and snowmelt, in constructing data models.
https://doi.org/10.3741/JKWRA.2024.57.1.35 인용 PDF

Evaluation of the reconstruction of image acquired from CT simulator to reduce metal artifact (Metal artifact 감소를 위한 CT simulator 영상 재구성의 유용성 평가)

Choi, Ji Hun;Park, Jin Hong;Choi, Byung Don;Won, Hui Su;Chang, Nam Jun;Goo, Jang Hyun;Hong, Joo Wan
- The Journal of Korean Society for Radiation Therapy
- /
- v.26 no.2
- /
- pp.191-197
- /
- 2014
Purpose : This study presents the usefulness assessment of metal artifact reduction for orthopedic implants(O-MAR) to decrease metal artifacts from materials with high density when acquired CT images. Materials and Methods : By CT simulator, original CT images were acquired from Gammex and Rando phantom and those phantoms inserted with high density materials were scanned for other CT images with metal artifacts and then O-MAR was applied to those images, respectively. To evaluate CT images using Gammex phantom, 5 regions of interest(ROIs) were placed at 5 organs and 3 ROIs were set up at points affected by artifacts. The averages of standard deviation(SD) and CT numbers were compared with a plan using original image. For assessment of variations in dose of tissue around materials with high density, the volume of a cylindrical shape was designed at 3 places in images acquired from Rando phantom by Eclipse. With 6 MV, 7-fields, $15{\time}15cm2$ and 100 cGy per fraction, treatment planning was created and the mean dose were compared with a plan using original image. Results : In the test with the Gammex phantom, CT numbers had a few difference at established points and especially 3 points affected by artifacts had most of the same figures. In the case of O-MAR image, the more reduction in SD appeared at all of 8 points than non O-MAR image. In the test using the Rando Phantom, the variations in dose of tissue around high density materials had a few difference between original CT image and CT image with O-MAR. Conclusion : The CT images using O-MAR were acquired clearly at the boundary of tissue around high density materials and applying O-MAR was useful for correcting CT numbers.
PDF KSCI

ALGORITHMS FOR MOVING OBJECT DETECTION: YSTAR-NEOPAT SURVEY PROGRAM (이동천체 후보 검출을 위한 알고리즘 개발: YSTAR-NEOPAT 탐사프로그램)

Bae, Young-Ho;Byun, Yong-Ik;Kang, Yong-Woo;Park, Sun-Youp;Oh, Se-Heon;Yu, Seoung-Yeol;Han, Won-Young;Yim, Hong-Suh;Moon, Hong-Kyu
- Journal of Astronomy and Space Sciences
- /
- v.22 no.4
- /
- pp.393-408
- /
- 2005
We developed and compared two automatic algorithms for moving object detections in the YSTAR-NEOPAT sky survey program. One method, called starlist comparison method, is to identify moving object candidates by comparing the photometry data tables from successive images. Another method, called image subtraction method, is to identify the candidates by subtracting one image from another which isolates sources moving against background stars. The efficiency and accuracy of these algorithms have been tested using actual survey data from the YSTAR-NEOPAT telescope system. For the detected candidates, we performed eyeball inspection of animated images to confirm validity of asteroid detections. Main conclusions include followings. First, the optical distortion in the YSTAR-NEOPAT wide-field images can be properly corrected by comparison with USNO-B1.0 catalog and the astrometric accuracy can be preserved at around 1.5 arcsec. Secondly, image subtraction provides more robust and accurate detection of moving objects. For two different thresholds of 2.0 and $4.0\sigma$, image subtraction method uncovered 34 and 12 candidates and most of them are confirmed to be real. Starlist comparison method detected many more candidates, 60 and 6 for each threshold level, but nearly half of them turned out to be false detections.
https://doi.org/10.5140/JASS.2005.22.4.393 인용 PDF KSCI

Development of a Small Gamma Camera Using NaI(T1)-Position Sensitive Photomultiplier Tube for Breast Imaging (NaI (T1) 섬광결정과 위치민감형 광전자증배관을 이용한 유방암 진단용 소형 감마카메라 개발)

Kim, Jong-Ho;Choi, Yong;Kwon, Hong-Seong;Kim, Hee-Joung;Kim, Sang-Eun;Choe, Yearn-Seong;Lee, Kyung-Han;Kim, Moon-Hae;Joo, Koan-Sik;Kim, Byuug-Tae
- The Korean Journal of Nuclear Medicine
- /
- v.32 no.4
- /
- pp.365-373
- /
- 1998
Purpose: The conventional gamma camera is not ideal for scintimammography because of its large detector size (${\sim}500mm$ in width) causing high cost and low image quality. We are developing a small gamma camera dedicated for breast imaging. Materials and Methods: The small gamma camera system consists of a NaI (T1) crystal ($60 mm{\times}60 mm{\times}6 mm$) coupled with a Hamamatsu R3941 Position Sensitive Photomultiplier Tube (PSPMT), a resister chain circuit, preamplifiers, nuclear instrument modules, an analog to digital converter and a personal computer for control and display. The PSPMT was read out using a standard resistive charge division which multiplexes the 34 cross wire anode channels into 4 signals ($X^+,\;X^-,\;Y^+,\;Y^-$). Those signals were individually amplified by four preamplifiers and then, shaped and amplified by amplifiers. The signals were discriminated ana digitized via triggering signal and used to localize the position of an event by applying the Anger logic. Results: The intrinsic sensitivity of the system was approximately 8,000 counts/sec/${\mu}Ci$. High quality flood and hole mask images were obtained. Breast phantom containing $2{\sim}7 mm$ diameter spheres was successfully imaged with a parallel hole collimator The image displayed accurate size and activity distribution over the imaging field of view Conclusion: We have succesfully developed a small gamma camera using NaI(T1)-PSPMT and nuclear Instrument modules. The small gamma camera developed in this study might improve the diagnostic accuracy of scintimammography by optimally imaging the breast.
PDF

A Diagnostic Analysis on the Conservation Status for the Maintenance of the Front Wall of Jungjeongdang Area of Dodong-Seowon (도동서원 중정당 전면 담장의 보수를 위한 진단학적 보존 상태 분석)

Kim, Kyu-Yeon
- Journal of the Korean Institute of Traditional Landscape Architecture
- /
- v.37 no.1
- /
- pp.1-11
- /
- 2019
This study was conducted to analyze the conservation status by diagnostical methology for the front wall of Jungjeongdang area of Dodong-Seowon. The study was carried out as photogrammetry and mapping - investigation of materials and conservation status - analysis and evaluation of conservation status. The results are as follows. First, in the case of photogrammetry, each photograph was took in superposition, and the distortions of the photographs were corrected and synthesized. Based on this, actual survey drawings of the wall were prepared. Second, in case of material and conservation status, the wall is in the form of Wapyeondam and the material of the head part are tile, mud and lime, and the material of the body part are mud and tile. The mud was mixed with gravel, sand and straw. At the base part, amorphous natural stones and mud were used. The remarkable damage that appears on the wall is erosion of the base part, and some disintegration appears in the body part. There is a biological patina on the head and the base, and vegetation such as lichen is concentrated on the partial body. There was superficial deposit in the head part, and some tiles were broken or lost. Deep fissures are intensively located in some part of the eastern wall. Third, in the case of analysis and evaluation of the conservation status, it is considered that by the erosion of the foundation part and the disintegration of the body part, there is a possibility that physical damage will continue to be applied to the wall, so immediate action is necessary. The distribution of biological patina and vegetation does not appear to cause great problems in the wall, but it is necessary to reduce it in view of aesthetic problems. A cracked or missing tile would need to be replaced, and deep cracks in the eastern wall appear to have been caused by subsidence, and reinforcement of the underground is necessary to prevent further damage.
https://doi.org/10.14700/KITLA.2019.37.1.001 인용 PDF KSCI HTML

Simulation of the Ocean Circulation Around Ulleungdo and Dokdo Using a Numerical Model of High-Resolution Nested Grid (초고해상도 둥지격자 수치모델을 이용한 울릉도-독도 해역 해양순환 모의)

Kim, Daehyuk;Shin, Hong-Ryeol;Choi, Min-bum;Choi, Young-Jin;Choi, Byoung-Ju;Seo, Gwang-Ho;Kwon, Seok-Jae;Kang, Boonsoon
- Journal of Korean Society of Coastal and Ocean Engineers
- /
- v.32 no.6
- /
- pp.587-601
- /
- 2020
The ocean circulation was simulated in the East Sea and Ulleungdo-Dokdo region using ROMS (Regional Ocean Modeling System) model. By adopting the East Sea 3 km model and the HYCOM 9 km data, Ulleungdo 1 km model and Ulleungdo-Dokdo 300 m model were constructed with one-way grid nesting method. During the model development, a correction method was proposed for the distortion of the open boundary data which may be caused by the bathymetry data difference between the mother and child models and the interpolation/extrapolation method. Using this model, a super-high resolution ocean circulation with a horizontal resolution of 300 m near the Ulleungdo and Dokdo region was simulated for year 2018. In spite of applying the same conditions except for the initial and boundary data, the numerical models result indicated significantly different characteristics in the study area. Therefore, these results were compared and verified by using the surface current data estimated by satellites altimeter data and temperature data from NIFS (National Institute of Fisheries Science). They suggest that in general, the improvement of the one-way grid nesting with the HYCOM data on RMSE, Mean Bias, Pattern correlation and Vector correlation is greater in 300 m model than in the 1 km model. However, the nesting results of using East Sea 3 km model showed that simulations of the 1 km model were better than 300 m model. The models better resolved distinct ridge/trough structures of isotherms in the vertical sections of water temperature when using the higher horizontal resolution. Furthermore, Karman vortex street was simulated in Ulleungdo-Dokdo 300 m model due to the terrain effect of th islands that was not shown in the Ulleungdo 1 km model.
https://doi.org/10.9765/KSCOE.2020.32.6.587 인용 PDF KSCI

Analysis of Respiratory Motional Effect on the Cone-beam CT Image (Cone-beam CT 영상 획득 시 호흡에 의한 영향 분석)

Song, Ju-Young;Nah, Byung-Sik;Chung, Woong-Ki;Ahn, Sung-Ja;Nam, Taek-Keun;Yoon, Mi-Sun
- Progress in Medical Physics
- /
- v.18 no.2
- /
- pp.81-86
- /
- 2007
The cone-beam CT (CBCT) which is acquired using on-board imager (OBI) attached to a linear accelerator is widely used for the image guided radiation therapy. In this study, the effect of respiratory motion on the quality of CBCT image was evaluated. A phantom system was constructed in order to simulate respiratory motion. One part of the system is composed of a moving plate and a motor driving component which can control the motional cycle and motional range. The other part is solid water phantom containing a small cubic phantom ($2{\times}2{\times}2cm^3$) surrounded by air which simulate a small tumor volume in the lung air cavity CBCT images of the phantom were acquired in 20 different cases and compared with the image in the static status. The 20 different cases are constituted with 4 different motional ranges (0.7 cm, 1.6 cm, 2.4 cm, 3.1 cm) and 5 different motional cycles (2, 3, 4, 5, 6 sec). The difference of CT number in the coronal image was evaluated as a deformation degree of image quality. The relative average pixel intensity values as a compared CT number of static CBCT image were 71.07% at 0.7 cm motional range, 48.88% at 1.6 cm motional range, 30.60% at 2.4 cm motional range, 17.38% at 3.1 cm motional range The tumor phantom sizes which were defined as the length with different CT number compared with air were increased as the increase of motional range (2.1 cm: no motion, 2.66 cm: 0.7 cm motion, 3.06 cm: 1.6 cm motion, 3.62 cm: 2.4 cm motion, 4.04 cm: 3.1 cm motion). This study shows that respiratory motion in the region of inhomogeneous structures can degrade the image quality of CBCT and it must be considered in the process of setup error correction using CBCT images.
PDF

Search Result 560, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)