Search | Korea Science

Detecting Errors in POS-Tagged Corpus on XGBoost and Cross Validation (XGBoost와 교차검증을 이용한 품사부착말뭉치에서의 오류 탐지)

Choi, Min-Seok;Kim, Chang-Hyun;Park, Ho-Min;Cheon, Min-Ah;Yoon, Ho;Namgoong, Young;Kim, Jae-Kyun;Kim, Jae-Hoon
- KIPS Transactions on Software and Data Engineering
- /
- v.9 no.7
- /
- pp.221-228
- /
- 2020
Part-of-Speech (POS) tagged corpus is a collection of electronic text in which each word is annotated with a tag as the corresponding POS and is widely used for various training data for natural language processing. The training data generally assumes that there are no errors, but in reality they include various types of errors, which cause performance degradation of systems trained using the data. To alleviate this problem, we propose a novel method for detecting errors in the existing POS tagged corpus using the classifier of XGBoost and cross-validation as evaluation techniques. We first train a classifier of a POS tagger using the POS-tagged corpus with some errors and then detect errors from the POS-tagged corpus using cross-validation, but the classifier cannot detect errors because there is no training data for detecting POS tagged errors. We thus detect errors by comparing the outputs (probabilities of POS) of the classifier, adjusting hyperparameters. The hyperparameters is estimated by a small scale error-tagged corpus, in which text is sampled from a POS-tagged corpus and which is marked up POS errors by experts. In this paper, we use recall and precision as evaluation metrics which are widely used in information retrieval. We have shown that the proposed method is valid by comparing two distributions of the sample (the error-tagged corpus) and the population (the POS-tagged corpus) because all detected errors cannot be checked. In the near future, we will apply the proposed method to a dependency tree-tagged corpus and a semantic role tagged corpus.
https://doi.org/10.3745/KTSDE.2020.9.7.221 인용 PDF KSCI

Police Officers' Cognitions of Police Investigation Specialization (수사경과제에 대한 경찰공무원의 인식)

Choi, Mu-Chan
- The Journal of the Korea Contents Association
- /
- v.9 no.6
- /
- pp.289-299
- /
- 2009
This study set out to analyze the perceptions of investigative police officers and division police officers regarding Police Investigation Specialization, which had been in effect for four years, identify the problems, and search for alternative policies. The results led to the following alternative policies; first, the communication among the members should be facilitated by trading the jobs between investigative policemen and division members in certain percentage regularly, integrating job education and special work, and developing diverse support programs for detective activities to provide them with opportunities to experience and understand investigation. The second suggestion concerns the investigation members' morale. There should be a range of measures to boost their moral such as allocating separate budget and personnel to support the investigation department and the treatment of major criminal and civil cases, giving each investigation team an office and investigation room to improve their working environment, readjusting the promotion ratio of Police Investigation Specialization to introduce a promotion system proper for each investigation's characteristics. The third suggestion is to secure job efficiency. It's required to reinforce the current short-term specialized education program to bring up practical and professional investigators, open the certification exam of professional investigators to all members so that every policeman can have their abilities recognized and approved as long as they have the demanded capabilities, and create a system of shedding off the members idle at work by reflecting low performance records when evaluating the members to decide who to dismiss from Police Investigation Specialization. And finally, it's important to divide duties rationally. The rationality of duties division can be guaranteed by setting the guidelines for direct handling for the team leader to help him devote himself to his duties, defining objective criteria of measuring investigation workload, and creating devoted systems and teams for simple and small accidents so that experienced investigators can deal with high-profile cases.
https://doi.org/10.5392/JKCA.2009.9.6.289 인용 PDF

Optimization Model for the Mixing Ratio of Coatings Based on the Design of Experiments Using Big Data Analysis (빅데이터 분석을 활용한 실험계획법 기반의 코팅제 배합비율 최적화 모형)

Noh, Seong Yeo;Kim, Young-Jin
- KIPS Transactions on Computer and Communication Systems
- /
- v.3 no.10
- /
- pp.383-392
- /
- 2014
The research for coatings is one of the most popular and active research in the polymer industry. For the coatings, electronics industry, medical and optical fields are growing more important. In particular, the trend is the increasing of the technical requirements for the performance and accuracy of the coatings by the development of automotive and electronic parts. In addition, the industry has a need of more intelligent and automated system in the industry is increasing by introduction of the IoT and big data analysis based on the environmental information and the context information. In this paper, we propose an optimization model for the design of experiments based coating formulation data objects using the Internet technologies and big data analytics. In this paper, the coating formulation was calculated based on the best data analysis is based on the experimental design, modify the operator with respect to the error caused based on the coating formulation used in the actual production site data and the corrected result data. Further optimization model to correct the reference value by leveraging big data analysis and Internet of things technology only existing coating formulation is applied as the reference data using a manufacturing environment and context information retrieval in color and quality, the most important factor in maintaining and was derived. Based on data obtained from an experiment and analysis is improving the accuracy of the combination data and making it possible to give a LOT shorter working hours per data. Also the data shortens the production time due to the reduction in the delivery time per treatment and It can contribute to cost reduction or the like defect rate reduced. Further, it is possible to obtain a standard data in the manufacturing process for the various models.
https://doi.org/10.3745/KTCCS.2014.3.10.383 인용 PDF KSCI

Incremental Maintenance of Horizontal Views Using a PIVOT Operation and a Differential File in Relational DBMSs (관계형 데이터베이스에서 PIVOT 연산과 차등 파일을 이용한 수평 뷰의 점진적인 관리)

Shin, Sung-Hyun;Kim, Jin-Ho;Moon, Yang-Sae;Kim, Sang-Wook
- The KIPS Transactions:PartD
- /
- v.16D no.4
- /
- pp.463-474
- /
- 2009
To analyze multidimensional data conveniently and efficiently, OLAP (On-Line Analytical Processing) systems or e-business are widely using views in a horizontal form to represent measurement values over multiple dimensions. These views can be stored as materialized views derived from several sources in order to support accesses to the integrated data. The horizontal views can provide effective accesses to complex queries of OLAP or e-business. However, we have a problem of occurring maintenance of the horizontal views since data sources are distributed over remote sites. We need a method that propagates the changes of source tables to the corresponding horizontal views. In this paper, we address incremental maintenance of horizontal views that makes it possible to reflect the changes of source tables efficiently. We first propose an overall framework that processes queries over horizontal views transformed from source tables in a vertical form. Under the proposed framework, we propagate the change of vertical tables to the corresponding horizontal views. In order to execute this view maintenance process efficiently, we keep every change of vertical tables in a differential file and then modify the horizontal views with the differential file. Because the differential file is represented as a vertical form, its tuples should be converted to those in a horizontal form to apply them to the out-of-date horizontal view. With this mechanism, horizontal views can be efficiently refreshed with the changes in a differential file without accessing source tables. Experimental results show that the proposed method improves average performance by 1.2$\sim$5.0 times over the existing methods.
https://doi.org/10.3745/KIPSTD.2009.16-D.4.463 인용 PDF KSCI

An Experimental Study for the Shear Property and the Temperature Dependency of Seismic Isolation Bearings (지진격리받침의 전단특성 및 온도의존성에 대한 실험적 연구)

Cho, Chang-Beck;Kwahk, Im-Jong;Kim, Young-Jin
- Journal of the Earthquake Engineering Society of Korea
- /
- v.12 no.1
- /
- pp.67-77
- /
- 2008
Seismic isolation has been studied continuously as a solution of the seismic engineering to reduce the sectional forces and the damages of structures caused by earthquakes. To certify reliable design and installation of the seismic isolation systems, seismic isolation bearings should be fabricated under well planned quality control process, and proper evaluation tests for their seismic performance should be followed. In this study, shear property evaluation tests for the lead rubber bearings(LRB) and the rubber bearings(RB) were implemented and the temperature dependency tests were also implemented to evaluate the changes of shear properties according to the changes of temperature. After evaluation tests, the measured shear properties were compared to their design values and their deviation was analyzed comparing with the allowable error ranges specified in Highway Bridge Design Specifications. These results showed that a considerable number of isolation bearings have so large deviations from their design values that their error ranges were over or very close to the allowable ranges. And the test results for temperature dependency showed that the shear properties of isolation bearings would be changed in great degree by the change of temperature during their service period. If these two types of changes in their shear properties are superposed, it would possible that the changes of shear properties from their original design values are over than 50%.
https://doi.org/10.5000/EESK.2008.12.1.067 인용 PDF KSCI

A 2-Step Global Optimization Algorithm for TDOA/FDOA of Communication Signals (통신 신호에서 TDOA/FDOA 정보 추출을 위한 2-단계 전역 최적화 알고리즘)

Kim, Dong-Gyu;Park, Jin-Oh;Lee, Moon Seok;Park, Young-Mi;Kim, Hyoung-Nam
- Journal of the Institute of Electronics and Information Engineers
- /
- v.52 no.4
- /
- pp.37-45
- /
- 2015
In modern electronic warfare systems, a demand on the more accurate estimation method based on TDOA and FDOA has been increased. TDOA/FDOA localization consists of two-stage procedures: the extraction of information from signals and the estimation of emitter location. Various algorithms based on CAF(complex ambiguity function), which is known as a basic method, has been presented in the area of extractions. When we extract TDOA and FDOA information using a conventional method based on the CAF algorithm from communication signals, considerably long integration time is required for the accurate position estimation of an unknown emitter far from sensors more than 300 km. Such long integration time yields huge amount of transmission data from sensors to a central processing unit, resulting in heavy computiational complexity. Therefore, we theoretically analyze the integration time for TDOA/FDOA information using CRLB and propose a two-stage global optimization algorithm which can minimize the transmission time and a computational complexity. The proposed method is compared with the conventional CAF-based algorithms in terms of a computational complexity and the CRLB to verify the estimation performance.
https://doi.org/10.5573/ieie.2015.52.4.037 인용 PDF KSCI

A Study on the GIS-based Deterministic MCDA Techniques for Evaluating the Flood Damage Reduction Alternatives (확정론적 다중의사결정기법을 이용한 최적 홍수저감대책 선정 기법 연구)

Lim, Kwang-Suop;Kim, Joo-Cheol;Hwang, Eui-Ho;Lee, Sang-Uk
- Journal of Korea Water Resources Association
- /
- v.44 no.12
- /
- pp.1015-1029
- /
- 2011
Conventional MCDA techniques have been used in the field of water resources in the past. A GIS can offer an effective spatial data-handling tool that can enhance water resources modeling through interfaces with sophisticated models. However, GIS systems have a limited capability as far as the analysis of the value structure is concerned. The MCDA techniques provide the tools for aggregating the geographical data and the decision maker's preferences into a one-dimensional value for analyzing alternative decisions. In other words, the MCDA allows multiple criteria to be used in deciding upon the best alternatives. The combination of GIS and MCDA capabilities is of critical importance in spatial multi-criteria analysis. The advantage of having spatial data is that it allows the consideration of the unique characteristics at every point. The purpose of this study is to identify, review, and evaluate the performance of a number of conventional MCDA techniques for integration with GIS. Even though there are a number of techniques which have been applied in many fields, this study will only consider the techniques that have been applied in floodplain decision-making problems. Two different methods for multi-criteria evaluation were selected to be integrated with GIS. These two algorithms are Compromise Programming (CP), Spatial Compromise Programming (SCP). The target region for a demonstration application of the methodology was the Suyoung River Basin in Korea.
https://doi.org/10.3741/JKWRA.2011.44.12.1015 인용 PDF KSCI

NRZ versus RZ Modulation Format in Lumped Dispersion Managed Systems (집중형 분산 제어 시스템에서 NRZ 변조 형식 대 RZ 변조 형식)

Lee, Seong-Real;Cho, Sung-Eon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.12 no.2
- /
- pp.328-335
- /
- 2008
The system performance of NRZ format in WDM transmission system with lumped dispersion management(DM) and optical phase conjugator(OPC) is compared with that of RZ format. It is confirmed that eye opening penalty(EOP) of both NRZ and RZ format in WDM transmission system having lumped DM combined with OPC are greatly improved than those in WDM system with only OPC. The optimal net residual dispersion(NRD) in the case of RZ format is decided to so small value that path-averaged dispersion coefficient become almost zero, while that in the case of NRZ format is decided to larger value, for the best improvement of overall WDM channels. It is also confirmed that EOP in the case of RZ format is more improved than that in the case of NRZ format in lumped DM with optimal NRD. This is resulted from that lumped DM combined with OPC suppress the signal distortion due to intrachannel four-wave mixing(IFWM) and intrachannel cross phase modulation(IXPM). Consequently, lumped DM combined with OPC proposed in this paper is effective technique to mitigate intrachannel nonlinearities in WDM transmitting RZ format.
https://doi.org/10.6109/jkiice.2008.12.2.328 인용 PDF KSCI

Automatic Traffic Data Collection Using Simulated Satellite Imagery (인공위성영상을 이용한 교통량측량 자동화)

조우석
- Korean Journal of Remote Sensing
- /
- v.11 no.3
- /
- pp.101-116
- /
- 1995
The fact that the demands on traffic data collection are imposed by economic and safety considerations raisese the question of the potential for complementing existing traffic data collection programs with satellite data. Evaluating and monitoring traffic characteristics is becoming increasingly important as worsening congestion, declining economic situations, and increasing environmental sensitivies are forcing the government and municipalities to make better use of existing roadway capacities. The present system of using automatic counters at selected points on highways works well from a temporal point of view (i.e., during a specific period of time at one location). However, the present system does not cover the spatial aspects of the entire road system (i.e., for every location during specific periods of time); the counters are employed only at points and only on selected highways. This lack of spatial coverage is due, in part, to the cost of the automatic counters systems (fixed procurement and maintenance costs) and of the personal required to deploy them. The current procedure is believed to work fairly well in the aggregate mode, at the macro level. However, at micro level, the numbers are more suspect. In addition, the statistics only work when assuming a certain homogenity among characteristics of highways in the same class, an assumption that is impossible to test whn little or no data is gathered on many of the highways for a given class. In this paper, a remote sensing system as complement of the existing system is considered and implemented. Since satellite imagery with high resolution is not available, digitized panchromatic imagery acquired from an aircraft platform is utilized for initial test of the feasibility and performance capability of remote sensing data. Different levels of imagery resolutions are evaluated in an attempt to determine what vehicle types could be classified and counted against a background of pavement types, which might be expected in panchromatic satellite imagery. The results of a systematic study with three different levels of resolutions (1m, 2m and 4m) show that the panchromat ic reflectances of vehicles and pavements would be distributed so similarly that it would be difficult to classify systematically and analytically remotely sensing vehicles on pavement within panchromatic range. Anaysis of the aerial photographs show that the shadows of the vehicles could be a cue for vehicle detection.
https://doi.org/10.7780/kjrs.1995.11.3.101 인용 PDF

Multiple Linear Regression Analysis of PV Power Forecasting for Evaluation and Selection of Suitable PV Sites (태양광 발전소 건설부지 평가 및 선정을 위한 선형회귀분석 기반 태양광 발전량 추정 모델)

Heo, Jae;Park, Bumsoo;Kim, Byungil;Han, SangUk
- Korean Journal of Construction Engineering and Management
- /
- v.20 no.6
- /
- pp.126-131
- /
- 2019
The estimation of available solar energy at particular locations is critical to find and assess suitable locations of PV sites. The amount of PV power generation is however affected by various geographical factors (e.g., weather), which may make it difficult to identify the complex relationship between affecting factors and power outputs and to apply findings from one study to another in different locations. This study thus undertakes a regression analysis using data collected from 172 PV plants spatially distributed in Korea to identify critical weather conditions and estimate the potential power generation of PV systems. Such data also include solar radiation, precipitation, fine dust, humidity, temperature, cloud amount, sunshine duration, and wind speed. The estimated PV power generation is then compared to the actual PV power generation to evaluate prediction performance. As a result, the proposed model achieves a MAPE of 11.696(%) and an R-squred of 0.979. It is also found that the variables, excluding humidity, are all statistically significant in predicting the efficiency of PV power generation. According, this study may facilitate the understanding of what weather conditions can be considered and the estimation of PV power generation for evaluating and determining suitable locations of PV facilities.
https://doi.org/10.6106/KJCEM.2019.20.6.126 인용 PDF KSCI

Search Result 25,977, Processing Time 0.051 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)