Search | Korea Science

Data Processing of AutoML-based Classification Models for Improving Performance in Unbalanced Classes (불균형 클래스에서 AutoML 기반 분류 모델의 성능 향상을 위한 데이터 처리)

Lee, Dong-Joon;Kang, Ji-Soo;Chung, Kyungyong
- Journal of Convergence for Information Technology
- /
- v.11 no.6
- /
- pp.49-54
- /
- 2021
With the recent development of smart healthcare technology, interest in daily diseases is increasing. However, healthcare data has an imbalance between positive and negative data. This is caused by the difficulty of collecting data because there are relatively many people who are not patients compared to patients with certain diseases. Data imbalances need to be adjusted because they affect performance in ongoing learning during disease prediction and analysis. Therefore, in this paper, We replace missing values through multiple imputation in detection models to determine whether they are prevalent or not, and resolve data imbalances through over-sampling. Based on AutoML using preprocessed data, We generate several models and select top 3 models to generate ensemble models.
https://doi.org/10.22156/CS4SMB.2021.11.06.049 인용 PDF KSCI

A Study on the 3D Measurement Data Application: The Detailed Restoration Modeling of Mireuksajiseoktap (미륵사지석탑 정밀복원모형 제작을 중심으로 한 3차원 실측데이터의 활용 연구)

Moon, Seang Hyen
- Korean Journal of Heritage: History & Science
- /
- v.44 no.2
- /
- pp.76-95
- /
- 2011
After dismantled, Mireuksajiseoktap(Stone pagoda of Mireuksa Templesite) is being in the stage of restoration design. Now, different ways - producing restoration model, a 3 dimension simulation - have been requested to make more detailed and clearer restoration design prior to confirmation of its restoration design and actual restoration carry-out. This thesis proposes the way to build the detailed model for better restoration plan using extensively-used Reverse Engineering technique and Rapid Prototyping. It also introduces each stage such as a 3-dimension actual measurement, building database, a 3-dimension simulation etc., to build a desirable model. On the top of that, this thesis reveals that after dismantled, MIruksaji stone pagoda's interior and exterior were not constructed into pieces but wholeness, so that its looks can be grasped in more virtually and clearly. Secondly, this thesis makes a 3-dimension study on the 2-dimension design possible by acquiring basic materials about a 3-dimension design. Thirdly, the individual feature of each member like the change of member location can be comprehended, considering comparing analysis and joint condition of member. Lastly, in the structural perspective this thesis can be used as reference materials for structure reinforcement design by grasping destructed aspects of stone pagoda and weak points of the structure. In dismantlement-repair and restoration work of cultural properties that require delicate attention and exactness, there may be evitable errors on time and space in building reinforcement and restoration design based on a 2-dimension plan. Especially, the more complicate and bigger the subject is, the more difficult an analysis about the status quo and its delicate design are. A series of pre-review, based on the 3-dimension data according to actual measurement, can be one of the effective way to minimize the possibility that errors about time - space happen by building more delicate plan and resolving difficulties.
https://doi.org/10.22755/kjchs.2011.44.2.76 인용 PDF

Application of Data mining for improving and predicting yield in wafer fabrication system (데이터마이닝을 이용한 반도체 FAB공정의 수율개선 및 예측)

백동현;한창희
- Journal of Intelligence and Information Systems
- /
- v.9 no.1
- /
- pp.157-177
- /
- 2003
This paper presents a comprehensive and successful application of data mining methodologies to improve and predict wafer yield in a semiconductor wafer fabrication system. As the wafer fabrication process is getting more complex and the volume of technological data gathered continues to be vast, it is difficult to analyze the cause of yield deterioration effectively by means of statistical or heuristic approaches. To begin with this paper applies a clustering method to automatically identify AUF (Area Uniform Failure) phenomenon from data instead of naked eye that bad chips occurs in a specific area of wafer. Next, sequential pattern analysis and classification methods are applied to and out machines and parameters that are cause of low yield, respectively. Furthermore, radial bases function method is used to predict yield of wafers that are in process. Finally, this paper demonstrates an information system, Y2R-PLUS (Yield Rapid Ramp-up, Prediction, analysis & Up Support), that is developed in order to analyze and predict wafer yield in a korea semiconductor manufacturer.
PDF

Exploratory Spatial Data Analysis (ESDA) for Age-Specific Migration Characteristics : A Case Study on Daegu Metropolitan City (연령별 인구이동 특성에 대한 탐색적 공간 데이터 분석 (ESDA) : 대구시를 사례로)

Kim, Kam-Young
- Journal of the Korean association of regional geographers
- /
- v.16 no.5
- /
- pp.590-609
- /
- 2010
The purpose of the study is to propose and evaluate Exploratory Spatial Data Analysis(ESDA) methods for examining age-specific population migration characteristics. First, population migration pyramid which is a pyramid-shaped graph designed with in-migration, out-migration, and net migration by age (or age group), was developed as a tool exploring age-specific migration propensities and structures. Second, various spatial statistics techniques based on local indicators of spatial association(LISA) such as Local Moran''s $I_i$, Getis-Ord ${G_i}^*$, and AMOEBA were suggested as ways to detect spatial dusters of age-specific net migration rate. These ESDA techniques were applied to age-specific population migration of Daegu Metropolitan City. Application results demonstrated that suggested ESDA methods can effectively detect new information and patterns such as contribution of age-specific migration propensities to population changes in a given region, relationship among different age groups, hot and cold spot of age-specific net migration rate, and similarity between age-specific spatial clusters.
PDF

Based on Multiple Reference Stations Ionospheric Anomaly Monitoring Algorithm on Consistency of Local Ionosphere (협역 전리층의 일관성을 이용한 다중 기준국 기반 전리층 이상 현상 감시 기법)

Song, Choongwon;Jang, JinHyeok;Sung, Sangkyung;Lee, Young Jae
- Journal of the Korean Society for Aeronautical & Space Sciences
- /
- v.45 no.7
- /
- pp.550-557
- /
- 2017
Ionospheric delay, which affect the accuracy of GNSS positioning, is generated by electrons in Ionosphere. Solar activity level, region and time could make change of this delay level. Dual frequency receiver could effectively eliminate the delay using difference of refractive index between L1 to L2 frequency. But, Single frequency receiver have to use limited correction such as ionospheric model in standalone GNSS or PRC(pseudorange correction) in Differential GNSS. Generally, these corrections is effective in normal condition. but, they might be useless, when TEC(total electron content) extremely increase in local area. In this paper, monitoring algorithm is proposed for local ionospheric anomaly using multiple reference stations. For verification, the algorithm was performed with specific measurement data in Ionospheric storm day (20. Nov. 2003). this algorithm would detect local ionospheric anomaly and improve reliability of ionospheric corrections for standalone receiver.
https://doi.org/10.5139/JKSAS.2017.45.7.550 인용 PDF KSCI

Usefulness of Data Mining in Criminal Investigation (데이터 마이닝의 범죄수사 적용 가능성)

Kim, Joon-Woo;Sohn, Joong-Kweon;Lee, Sang-Han
- Journal of forensic and investigative science
- /
- v.1 no.2
- /
- pp.5-19
- /
- 2006
Data mining is an information extraction activity to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis. Law enforcement agencies deal with mass data to investigate the crime and its amount is increasing due to the development of processing the data by using computer. Now new challenge to discover knowledge in that data is confronted to us. It can be applied in criminal investigation to find offenders by analysis of complex and relational data structures and free texts using their criminal records or statement texts. This study was aimed to evaluate possibile application of data mining and its limitation in practical criminal investigation. Clustering of the criminal cases will be possible in habitual crimes such as fraud and burglary when using data mining to identify the crime pattern. Neural network modelling, one of tools in data mining, can be applied to differentiating suspect's photograph or handwriting with that of convict or criminal profiling. A case study of in practical insurance fraud showed that data mining was useful in organized crimes such as gang, terrorism and money laundering. But the products of data mining in criminal investigation should be cautious for evaluating because data mining just offer a clue instead of conclusion. The legal regulation is needed to control the abuse of law enforcement agencies and to protect personal privacy or human rights.
PDF

Analysis of YouTube Trending Video Dataset by Country and Category (YouTube 인기 급상승 동영상 데이터셋의 국가별-카테고리별 분석)

Jung, Jimin;Kim, Seungjin;Jung, Sungwook;Lee, Dongyun
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2022.05a
- /
- pp.209-211
- /
- 2022
YouTube, a video platform used by millions of people worldwide, provides a rapidly growing video service. This study aims to understand the characteristics and cultural differences of each country using the Kaggle dataset, one of the public datasets, and to show the usefulness of the public dataset. For this purpose, we analyze data from 11 countries, 15 categories, and about 1.1 million trending videos. This study adopts Python to obtain the number of videos by category for data analysis, the selection period of videos rapidly increasing in popularity, and the ratio of unique videos. In the future, based on machine learning, we plan to research to help diagnose individual videos and establish channel operation plans and strategies by predicting the selection possibility and selection period based on machine learning.
PDF

Spatio-Temporal Patterns of a Public Bike Sharing System in Seoul - Focusing on Yeouido District - (서울시 공공자전거 공유시스템(PBSS)의 시공간적 이용 패턴 분석 - 서울시 여의도동을 중심으로 -)

Yun, Seung-yong;Min, Kyung-hun;Ko, Ha-jung
- Journal of the Korean Institute of Landscape Architecture
- /
- v.48 no.1
- /
- pp.1-14
- /
- 2020
Various policies and studies regarding use of PBSS (Public Bike Sharing System) and Programs (PBSP) have been conducted worldwide as the number systems or programs has increased. Although various phenomena and demands have been generated by the use of PBSS in everyday life, the majority of research and the policies in South Korea have been implemented focused on commuting life. The purpose of this study aimed to understand various PBSS demands using PBSS usage data in 2018 in the Yeouido districts through classifying usage patterns and analyzing features. The rental stations were classified into three types based on weekday/weekend usage rates. The usage of Yeouido's PBSS accounted for 4.3% of the total usage in Seoul Metropolitan City, while the number of PBSS rental stations accounted for 2% of all rental stations in the Seoul urban areas. Rental stations with a higher weekday utilization rates showed high utilization rates in all four seasons and were mainly distributed in work and residential areas. Other stations showed a concentrated usage pattern in spring (April-May) and autumn (September-October) seasons, and their locations were close to the entrance of nearby parks. Besides, renting and returning were often concentrated at certain rental stations for high weekend utilization as compared to the pattern of high weekday usage. Therefore, PBSS management and programs should be operated to reflect various usage demands rather than uniform PBSS operations. The result of this study is meaningful to provide basic data for effective PBSS operation by monitoring the demand for PBSS usage in spatio-temporal terms.
https://doi.org/10.9715/KILA.2020.48.1.001 인용 PDF KSCI

지구온난화에 따른 인천 지역 기상환경과 해양환경 변화의 관계 분석 : 귀추적 탐구 방법을 중심으로

Lee, Hyo-Nyeong;O, Hui-Jin;Lee, In-Ho;Kim, Min-Gi;Lee, Gyeong-Seop;Lee, Jun-Ho;Kim, Yeong-Geun;Jo, Su-Ho
- 한국지구과학회:학술대회논문집
- /
- 2010.04a
- /
- pp.70-70
- /
- 2010
이 연구의 목적은 귀추적 탐구 방법과 관련된 전략들을 적용하여 지구온난화에 관련되어 측정된 다양한 유형의 데이터를 관련된 사실, 원리, 법칙, 선행 연구 결과 등을 토대로 지구통합적인 관점과 지구계를 구성하는 요소들 간의 상호작용과 영향을 중심으로 재해석하고 이해하는 것이다. 지구과학(지구시스템과학)의 학문 성격, 최근 동향, 본성 및 탐구 대상의 특성에 대한 내용과 지구과학의 본질적 속성에 잘 부합하는 귀추적 탐구 방법에 대해 학습한 후, 학생들은 인천 및 다양한 지역의 기상과 해양 자료 분석을 통하여 관찰되는 현상(결과)의 원인과 영향을 파악하는 연구 활동을 하였다. 이 과정에서 귀추적 탐구를 충분히 이해할 수 있도록, 과학 탐구에서 귀추적 탐구 방법을 사용하는 과학자들의 예시와 모의 활동을 통하여, 귀추적 탐구 방법에 사용되는 다양한 사고 전략(예, 데이터의 재구성 전략, 유추 전략, 개념적 결합 전략 등)에 대한 예시를 경험하였다. 학생들은 지구온난화에 관련되어 나타나는 현상(조사된 사항 포함)과 영향에 대해 지구시스템적으로 이해하고 재해석하기 위해 지구시스템을 구성하는 요소(예, 수권, 대기권)와 관련된 데이터 정보를 검색하고 수집하였다. 1) 지구시스템과 지구온난화에 대한 조사하고, 2) 지구온난화 및 기후변화의 변동성 확인한 후, 3) 지구온난화와 관련된 선행 연구 결과 분석하였다. 또한, 지구과학의 본질적 속성에 잘 부합하는 귀추적 탐구 방법의 이해와 적용하는 과정에서 1) 지구 온난화 및 기후 변화의 실태 파악하고, 2) 인천 지역의 월별, 계절별 기온 변화 분석 및 경향 조사(탐색: 연구문제 규명)한 후, 3) 인천과 속초 지역의 기온, 수온의 변화 추이 및 분석 (조사: 원인 조사 과정)하였다. 4) 속초 지역의 평균해면기압변화 추이 및 분석한 후, 그 결과를 토대로 5) 문헌조사 및 선행연구 결과 분석을 통한 지구 온난화의 영향을 미치는 요인 재검토 및 확인(선택 및 설명)하여, 6) 인천지역과 속초지역의 지구온난화 원인 분석 및 문제점 보완(설명)하기 위해 7) 겨울철 지구온난화가 더 심각한지 부산지역과 포항지역의 자료 분석을 통하여 연구 결과 내용의 보완 (추가 조사 및 설명)한 후, 8) 분석 결과 및 해석 내용을 전문가와 상담 실시하였다. 이 연구는 연구를 진행하면서 얻은 결과를 교육적 측면에서 다시 정리해 보면 다음과 같다. 우선, 학생들의 지구환경적 문제 해결 과정에서 귀추적 탐구 방법을 활용한 문제 해결 능력을 향상시켰다. 아울러, 지구과학의 탐구 본성, 최근 동향, 탐구대상의 특성 등의 학습을 통해 지구과학도로서의 기본적인 소양과 자질 향상에 기여하였으며, 사회과학의 연구방법을 순수과학연구에 접목하여 과학자로서의 문제해결 능력과 시스템 사고력을 향상시켰다.
PDF

Determining on Model-based Clusters of Time Series Data (시계열데이터의 모델기반 클러스터 결정)

Jeon, Jin-Ho;Lee, Gye-Sung
- The Journal of the Korea Contents Association
- /
- v.7 no.6
- /
- pp.22-30
- /
- 2007
Most real word systems such as world economy, stock market, and medical applications, contain a series of dynamic and complex phenomena. One of common methods to understand these systems is to build a model and analyze the behavior of the system. In this paper, we investigated methods for best clustering over time series data. As a first step for clustering, BIC (Bayesian Information Criterion) approximation is used to determine the number of clusters. A search technique to improve clustering efficiency is also suggested by analyzing the relationship between data size and BIC values. For clustering, two methods, model-based and similarity based methods, are analyzed and compared. A number of experiments have been performed to check its validity using real data(stock price). BIC approximation measure has been confirmed that it suggests best number of clusters through experiments provided that the number of data is relatively large. It is also confirmed that the model-based clustering produces more reliable clustering than similarity based ones.
https://doi.org/10.5392/JKCA.2007.7.6.022 인용 PDF

Search Result 275, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)