Search | Korea Science

An Efficient Multidimensional Scaling Method based on CUDA and Divide-and-Conquer (CUDA 및 분할-정복 기반의 효율적인 다차원 척도법)

Park, Sung-In;Hwang, Kyu-Baek
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.4
- /
- pp.427-431
- /
- 2010
Multidimensional scaling (MDS) is a widely used method for dimensionality reduction, of which purpose is to represent high-dimensional data in a low-dimensional space while preserving distances among objects as much as possible. MDS has mainly been applied to data visualization and feature selection. Among various MDS methods, the classical MDS is not readily applicable to data which has large numbers of objects, on normal desktop computers due to its computational complexity. More precisely, it needs to solve eigenpair problems on dissimilarity matrices based on Euclidean distance. Thus, running time and required memory of the classical MDS highly increase as n (the number of objects) grows up, restricting its use in large-scale domains. In this paper, we propose an efficient approximation algorithm for the classical MDS based on divide-and-conquer and CUDA. Through a set of experiments, we show that our approach is highly efficient and effective for analysis and visualization of data consisting of several thousands of objects.
PDF KSCI

F_MixBERT: Sentiment Analysis Model using Focal Loss for Imbalanced E-commerce Reviews

Fengqian Pang;Xi Chen;Letong Li;Xin Xu;Zhiqiang Xing
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.18 no.2
- /
- pp.263-283
- /
- 2024
Users' comments after online shopping are critical to product reputation and business improvement. These comments, sometimes known as e-commerce reviews, influence other customers' purchasing decisions. To confront large amounts of e-commerce reviews, automatic analysis based on machine learning and deep learning draws more and more attention. A core task therein is sentiment analysis. However, the e-commerce reviews exhibit the following characteristics: (1) inconsistency between comment content and the star rating; (2) a large number of unlabeled data, i.e., comments without a star rating, and (3) the data imbalance caused by the sparse negative comments. This paper employs Bidirectional Encoder Representation from Transformers (BERT), one of the best natural language processing models, as the base model. According to the above data characteristics, we propose the F_MixBERT framework, to more effectively use inconsistently low-quality and unlabeled data and resolve the problem of data imbalance. In the framework, the proposed MixBERT incorporates the MixMatch approach into BERT's high-dimensional vectors to train the unlabeled and low-quality data with generated pseudo labels. Meanwhile, data imbalance is resolved by Focal loss, which penalizes the contribution of large-scale data and easily-identifiable data to total loss. Comparative experiments demonstrate that the proposed framework outperforms BERT and MixBERT for sentiment analysis of e-commerce comments.
https://doi.org/10.3837/tiis.2024.02.001 인용 PDF HTML

A Novel Fundus Image Reading Tool for Efficient Generation of a Multi-dimensional Categorical Image Database for Machine Learning Algorithm Training

Park, Sang Jun;Shin, Joo Young;Kim, Sangkeun;Son, Jaemin;Jung, Kyu-Hwan;Park, Kyu Hyung
- Journal of Korean Medical Science
- /
- v.33 no.43
- /
- pp.239.1-239.12
- /
- 2018
Background: We described a novel multi-step retinal fundus image reading system for providing high-quality large data for machine learning algorithms, and assessed the grader variability in the large-scale dataset generated with this system. Methods: A 5-step retinal fundus image reading tool was developed that rates image quality, presence of abnormality, findings with location information, diagnoses, and clinical significance. Each image was evaluated by 3 different graders. Agreements among graders for each decision were evaluated. Results: The 234,242 readings of 79,458 images were collected from 55 licensed ophthalmologists during 6 months. The 34,364 images were graded as abnormal by at-least one rater. Of these, all three raters agreed in 46.6% in abnormality, while 69.9% of the images were rated as abnormal by two or more raters. Agreement rate of at-least two raters on a certain finding was 26.7%-65.2%, and complete agreement rate of all-three raters was 5.7%-43.3%. As for diagnoses, agreement of at-least two raters was 35.6%-65.6%, and complete agreement rate was 11.0%-40.0%. Agreement of findings and diagnoses were higher when restricted to images with prior complete agreement on abnormality. Retinal/glaucoma specialists showed higher agreements on findings and diagnoses of their corresponding subspecialties. Conclusion: This novel reading tool for retinal fundus images generated a large-scale dataset with high level of information, which can be utilized in future development of machine learning-based algorithms for automated identification of abnormal conditions and clinical decision supporting system. These results emphasize the importance of addressing grader variability in algorithm developments.
https://doi.org/10.3346/jkms.2018.33.e239 인용 KSCI

CNN based data anomaly detection using multi-channel imagery for structural health monitoring

Shajihan, Shaik Althaf V.;Wang, Shuo;Zhai, Guanghao;Spencer, Billie F. Jr.
- Smart Structures and Systems
- /
- v.29 no.1
- /
- pp.181-193
- /
- 2022
Data-driven structural health monitoring (SHM) of civil infrastructure can be used to continuously assess the state of a structure, allowing preemptive safety measures to be carried out. Long-term monitoring of large-scale civil infrastructure often involves data-collection using a network of numerous sensors of various types. Malfunctioning sensors in the network are common, which can disrupt the condition assessment and even lead to false-negative indications of damage. The overwhelming size of the data collected renders manual approaches to ensure data quality intractable. The task of detecting and classifying an anomaly in the raw data is non-trivial. We propose an approach to automate this task, improving upon the previously developed technique of image-based pre-processing on one-dimensional (1D) data by enriching the features of the neural network input data with multiple channels. In particular, feature engineering is employed to convert the measured time histories into a 3-channel image comprised of (i) the time history, (ii) the spectrogram, and (iii) the probability density function representation of the signal. To demonstrate this approach, a CNN model is designed and trained on a dataset consisting of acceleration records of sensors installed on a long-span bridge, with the goal of fault detection and classification. The effect of imbalance in anomaly patterns observed is studied to better account for unseen test cases. The proposed framework achieves high overall accuracy and recall even when tested on an unseen dataset that is much larger than the samples used for training, offering a viable solution for implementation on full-scale structures where limited labeled-training data is available.
https://doi.org/10.12989/sss.2022.29.1.181 인용 KSCI

Proteomic Screening of Antigenic Proteins from the Hard Tick, Haemaphysalis longicornis (Acari: Ixodidae)

Kim, Young-Ha;Islam, Mohammad Saiful;You, Myung-Jo
- Parasites, Hosts and Diseases
- /
- v.53 no.1
- /
- pp.85-93
- /
- 2015
Proteomic tools allow large-scale, high-throughput analyses for the detection, identification, and functional investigation of proteome. For detection of antigens from Haemaphysalis longicornis, 1-dimensional electrophoresis (1-DE) quantitative immunoblotting technique combined with 2-dimensional electrophoresis (2-DE) immunoblotting was used for whole body proteins from unfed and partially fed female ticks. Reactivity bands and 2-DE immunoblotting were performed following 2-DE electrophoresis to identify protein spots. The proteome of the partially fed female had a larger number of lower molecular weight proteins than that of the unfed female tick. The total number of detected spots was 818 for unfed and 670 for partially fed female ticks. The 2-DE immunoblotting identified 10 antigenic spots from unfed females and 8 antigenic spots from partially fed females. Matrix Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF) of relevant spots identified calreticulin, putative secreted WC salivary protein, and a conserved hypothetical protein from the National Center for Biotechnology Information and Swiss Prot protein sequence databases. These findings indicate that most of the whole body components of these ticks are non-immunogenic. The data reported here will provide guidance in the identification of antigenic proteins to prevent infestation and diseases transmitted by H. longicornis.
https://doi.org/10.3347/kjp.2015.53.1.85 인용 PDF KSCI

Estimating the Application Possibility of High-resolution Satellite Image for Update and Revision of Digital Map (수치지도의 수정 및 갱신을 위한 고해상도 위성영상의 적용 가능성 평가)

강준묵;이철희;이형석
- Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
- /
- v.20 no.3
- /
- pp.313-321
- /
- 2002
Supplying high-resolution satellite image, we take much interest in the update and the revision of digital map and thematic map based on the satellite image. This study presented the possibility of the update and the revision to the existing digital map on a scale of l/5,000 and 1/25,000 to take advantage of the IKONOS satellite image. We performed geometric correction to make use of the ground control points of the existing digital map in IKONOS mono-image and created ortho-image by extracting digital elevation model from three dimensional contour data and altitude on the existing digital map. We revised changed features in the method of screen digitizing by overlapping orthorectified satellite image and existing digital map and flawed features of the unchanged area on the satellite images for positional accuracy analysis. As a result, rectification error is calculated at $\pm$3.35m by RMSE. There is a good possibility of update of digital map under the scale of 1/10,000. It is possible to the update of the large scale digital map over the scale of l/5,000, as if we used the method of stereo image and ground control point surveying.
PDF KSCI

3-D Resistivity Imaing of a Large Scale Tumulus (대형 고분에서의 3차원 전기비저항 탐사)

Oh, Hyun-Dok;Yi, Myeong-Jong;Kim, Jung-Ho;Shin, Jong-Woo
- Geophysics and Geophysical Exploration
- /
- v.14 no.4
- /
- pp.316-323
- /
- 2011
To test the applicability of resistivity survey methods for the archaeological prospection of a large-scale tumulus, a three-dimensional resistivity survey was conducted at the $3^{rd}$ tumulus at Bokam-ri, in Naju city, South Korea. Since accurate topographic relief of the tumulus and electrode locations are required to obtain a high resolution image of the subsurface, electrodes were installed after making grids by threads, which is commonly used in the archaeological investigation. In the data acquisition, data were measured using a 2 m electrode spacing with the line spacing of 1 m and each survey line was shifted 1 m to form an effective grid of 1 m ${\times}$ 1 m. Though the 3-D inversion of data, we could obtain the 3-D image of the tumulus, where we could identify the brilliant signature of buried tombs made of stones. The results were compared with the previous excavation results and we could convince that a 3-D resistivity imaging method is very useful to investigate a large-scale tumulus.
https://doi.org/10.7582/GGE.2011.14.4.316 인용 PDF KSCI

A Study on the Architecture Design of Road and Facility Operation Management System for 3D Spatial Data Processing (3차원 공간데이터 처리를 위한 차로 및 시설물 운영 관리 시스템 아키텍처 설계 연구)

KIM, Duck-Ho;KIM, Sung-Jin;LEE, Jung-Uck
- Journal of the Korean Association of Geographic Information Studies
- /
- v.24 no.4
- /
- pp.136-147
- /
- 2021
Autonomous driving-related technologies are developing step by step by applying the degree of driving. It is essential that operational management technology for roads where autonomous vehicles move should also develop in line with autonomous driving technology. However, in the case of road operation management, it is currently managed using only two-dimensional information, showing limitations in the systematic operation management of lane and facility information and maintenance. This study proposed a plan to construct an operation management system architecture capable of 3D spatial information-based operation management by designing a convergence database that can process real-time big data with high-definition road map data. Through this study, when using a high-definition road map based operation management system for lane and facility maintenance in the future, it is possible to visualize and manage facilities, edit and analyze data of multiple users, link various GIS S/W and efficiently process large scale of real-time data.
https://doi.org/10.11108/kagis.2021.24.4.136 인용 PDF KSCI

MASSIVE STRUCTURES OF GALAXIES AT HIGH REDSHIFTS IN THE GREAT OBSERVATORIES ORIGINS DEEP SURVEY FIELDS

Kang, Eugene;Im, Myungshin
- Journal of The Korean Astronomical Society
- /
- v.48 no.1
- /
- pp.21-55
- /
- 2015
If the Universe is dominated by cold dark matter and dark energy as in the currently popular ${\Lambda}CDM$ cosmology, it is expected that large scale structures form gradually, with galaxy clusters of mass $M{\geq}10^{14}M_{\odot}$ appearing at around 6 Gyrs after the Big Bang (z ~ 1). Here, we report the discovery of 59 massive structures of galaxies with masses greater than a few times $10^{13}M_{\odot}$ at redshifts between z = 0.6 and 4.5 in the Great Observatories Origins Deep Survey fields. The massive structures are identified by running top-hat filters on the two dimensional spatial distribution of magnitude-limited samples of galaxies using a combination of spectroscopic and photometric redshifts. We analyze the Millennium simulation data in a similar way to the analysis of the observational data in order to test the ${\Lambda}CDM$ cosmology. We find that there are too many massive structures (M > $7{\times}10^{13}M_{\odot}$) observed at z > 2 in comparison with the simulation predictions by a factor of a few, giving a probability of < 1/2500 of the observed data being consistent with the simulation. Our result suggests that massive structures have emerged early, but the reason for the discrepancy with the simulation is unclear. It could be due to the limitation of the simulation such as the lack of key, unrecognized ingredients (strong non-Gaussianity or other baryonic physics), or simply a difficulty in the halo mass estimation from observation, or a fundamental problem of the ${\Lambda}CDM$ cosmology. On the other hand, the over-abundance of massive structures at high redshifts does not favor heavy neutrino mass of ~ 0.3 eV or larger, as heavy neutrinos make the discrepancy between the observation and the simulation more pronounced by a factor of 3 or more.
https://doi.org/10.5303/JKAS.2015.48.1.21 인용 PDF KSCI KPUBS HTML

Korea Emissions Inventory Processing Using the US EPA's SMOKE System

Kim, Soon-Tae;Moon, Nan-Kyoung;Byun, Dae-Won W.
- Asian Journal of Atmospheric Environment
- /
- v.2 no.1
- /
- pp.34-46
- /
- 2008
Emissions inputs for use in air quality modeling of Korea were generated with the emissions inventory data from the National Institute of Environmental Research (NIER), maintained under the Clean Air Policy Support System (CAPSS) database. Source Classification Codes (SCC) in the Korea emissions inventory were adapted to use with the U.S. EPA's Sparse Matrix Operator Kernel Emissions (SMOKE) by finding the best-matching SMOKE default SCCs for the chemical speciation and temporal allocation. A set of 19 surrogate spatial allocation factors for South Korea were developed utilizing the Multi-scale Integrated Modeling System (MIMS) Spatial Allocator and Korean GIS databases. The mobile and area source emissions data, after temporal allocation, show typical sinusoidal diurnal variations with high peaks during daytime, while point source emissions show weak diurnal variations. The model-ready emissions are speciated for the carbon bond version 4 (CB-4) chemical mechanism. Volatile organic carbon (VOC) emissions from painting related industries in area source category significantly contribute to TOL (Toluene) and XYL (Xylene) emissions. ETH (Ethylene) emissions are largely contributed from point industrial incineration facilities and various mobile sources. On the other hand, a large portion of OLE (Olefin) emissions are speciated from mobile sources in addition to those contributed by the polypropylene industry in point source. It was found that FORM (Formaldehyde) is mostly emitted from petroleum industry and heavy duty diesel vehicles. Chemical speciation of PM2.5 emissions shows that PEC (primary fine elemental carbon) and POA (primary fine organic aerosol) are the most abundant species from diesel and gasoline vehicles. To reduce uncertainties in processing the Korea emission inventory due to the mapping of Korean SCCs to those of U.S., it would be practical to develop and use domestic source profiles for the top 10 SCCs for area and point sources and top 5 SCCs for on-road mobile sources when VOC emissions from the sources are more than 90% of the total.
https://doi.org/10.5572/ajae.2008.2.1.034 인용 PDF

Search Result 45, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)