• Title/Summary/Keyword: 데이터 평가 모델

Search Result 2,514, Processing Time 0.032 seconds

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

Exploring User Attitude to Information Privacy (개인정보 노출에 대한 인터넷 사용자의 태도에 관한 연구)

  • Baek, Seung Ik;Choi, Duk Sun
    • The Journal of Society for e-Business Studies
    • /
    • v.20 no.1
    • /
    • pp.45-59
    • /
    • 2015
  • As many companies have been interested in big data, they have invested a lot of resources to get more customer data. Some companies try to trade the data illegally. In order to collect more customer data, companies provide various incentive programs to customers. However, their results are normally much less than their expectations. This study focuses on exploring the relative importance of the factors which influence customer attitudes to providing his/her personal information. This study conducts a conjoint analysis to assess trade-offs among the five influential factors-monetary reward, concern for data collection, concern for secondary use, concern for unauthorized use, and concern for errors. This study finds that the customer attitude to providing personal information is most influenced by the concern for secondary use. Furthermore, it shows that there are some differences between the light internet user group and the heavy internet user group in the relative importances of these factors. The monetary rewards appeal to the heavy internet users, rather than the light internet users.

Lifting Work Process Optimization Method in High-rise Building Construction Through Improvement of CYCLONE Modeling Method (CYCLONE 모델링 기법 개선을 통한 초고층 공사의 자재 양중 작업 프로세스 최적화 연구)

  • Hawng, Doowon;Kwon, Okyung;Choi, Yoonki
    • Korean Journal of Construction Engineering and Management
    • /
    • v.18 no.2
    • /
    • pp.58-70
    • /
    • 2017
  • The planning for material lifting operations is one of the key processes in high-rise building construction. Several previous studies have used rough calculations by referring to existing practices or establishing a target value for lifting cycle time or operating rate. Therefore, the purpose of this study is to propose a material lifting process optimization method for reducing the lifting cycle time and increasing the operating rate. In this study, we improve the cyclic operation network (CYCLONE) modeling method that considers the duration and zone information of each work task. This method can be used to hand over work tasks to another crew group in the work process. According to this methodology, this study optimizes the material lifting process, performs a sensitivity analysis, and evaluates the field applicability of the proposed material lifting process optimization method. Therefore, the optimized process was then applied to a high-rise building construction site. The lifting work process time and operating rate for the simulated as - is lifting process data, optimized process data, and field application result data were compared for each lifting height. From this comparison, the effectiveness of the optimization methodology was confirmed.

Validity and reliability of a Korean version of the wellness evaluation of lifestyle (K-WEL) (한국형 웰니스 생활양식 측정도구 (K-WEL)의 타당도와 신뢰도 검증)

  • Kim, Hee Sook;Song, Yeonungsuk;Kwon, So-Hi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.6
    • /
    • pp.1609-1619
    • /
    • 2016
  • The aim of this study was to identify the construct validity and reliability of a Korean version of the Wellness Evaluation of Lifestyle (K-WEL). A total of 345 nursing students completed the 99-item K-WEL. Construct validity using exploratory and confirmatory factor analysis were conducted using SPSS WIN 22.0 and AMOS 18.0. The final K-WEL consisted of 71 scored items, 14 subscales (self worth, work, spirituality, gender identity, love, friendship, realistic belief, leisure, exercise, nutrition, stress management, emotional responsiveness, sense of control, sense of humor) and 4 factors (essential, social, physical and coping self). Goodness of fit of the final research model was acceptable as shown by ${\chi}^2=225.12$, p<.001, CMIN/DF=3.17, RMSEA=.08, NFI=.87, IFI=.91, CFI=.91. The convergent validity and discriminant validity was evaluated by AVE (.61~.69) and C.R. (.79~.89). The Cronbach's alpha values were .55~.87 for the subscales of K-WEL. This study shows that the K-WEL is a valid and reliable measurement to assess multidimensional aspects of wellness.

Estimation of Natural frequencies in Osteoporotic Mouse Femur: A finite Element Analysis and a Vibration Test (골다공증에 걸린 쥐 대퇴골의 고유진동수 예측: 유한 요소 해석 및 진동 실험)

  • Kim, Yoon-Hyuk;Byun, Chang-Hwan;Oh, Taek-Yul
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.25 no.4
    • /
    • pp.239-246
    • /
    • 2005
  • In this study, a finite element analysis and a vibration test were performed to estimate the natural frequencies of mouse femurs with osteoporosis. Three groups of the femurs include the osteoporotic group, the treated group and the normal group. For the finite element analysis, the micro finite element model of the femur was reconstructed using the Micro-CT images and the Voxel mesh generation algorithm. In the vibration test, the natural frequencies were measured by the mobility test. from the results, the averaged natural frequencies in the osteoporotic group were the highest, followed by those in the treated group. The finite element models were validated within 15% errors by comparing the natural frequencies in the finite element analysis with those in the vibration test. The developed Micro-CT system, the Yokel mesh generation algorithm, the presented finite element analysis, and vibration test could be useful for the investigation of the structural change of the bone tissue, and the diagnosis and the treatment in the osteoporosis.

Building Energy Savings due to Incorporated Daylight-Glazing Systems (통합 채광시스템의 건물 냉난방 에너지 성능평가)

  • Kim, Jeong-Tai;Ahn, Hyun-Tae;Kim, Gon
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.19 no.6
    • /
    • pp.1-8
    • /
    • 2005
  • The quantity of light available for a space can be translated in term of the amount of energy savings through a process of a building energy simulation. To get significant energy savings in general illumination, the electric lighting system must be incorporated with a daylight - activated dimmer control. A prototype configuration of an once interior has been established and the integration between the building envelope and lighting and HVAC systems is evaluated based on computer modeling of a lighting control facility. First of all, an energy-efficient luminaire system is designed and the lighting analysis program, Lumen-Micro 2000 predicts the optimal layout of a conventional fluorescent lighting future to meet the designed lighting level and calculates unit power density, which translates the demanded met of electric lighting energy. A dimming control system integrated with the contribution of daylighting has been applied to the operating of the artificial lighting. Annual cooling load due to lighting and the projecting saving amount of cooling load due to daylighting under overcast diffuse sky m evaluated by computer software ENER-Win. In brief, the results from building energy simulation with measured daylight illumination levels and the performance of lighting control system indicate that daylighting can save over 70 percent of the required energy for general illumination in the perimeter zones through the year A 25[%] of electric energy for cooling and almost all off heating energy may be saved by dimming and turning off the luminaires in the perimeter zones.

Context Awareness Model using the Improved Google Activity Recognition (개선된 Google Activity Recognition을 이용한 상황인지 모델)

  • Baek, Seungeun;Park, Sangwon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.1
    • /
    • pp.57-64
    • /
    • 2015
  • Activity recognition technology is gaining attention because it can provide useful information follow user's situation. In research of activity recognition before smartphone's dissemination, we had to infer user's activity by using independent sensor. But now, with development of IT industry, we can infer user's activity by using inner sensor of smartphone. So, more animated research of activity recognition is being implemented now. By applying activity recognition system, we can develop service like recommending application according to user's preference or providing information of route. Some previous activity recognition systems have a defect using up too much energy, because they use GPS sensor. On the other hand, activity recognition system which Google released recently (Google Activity Recognition) needs only a few power because it use 'Network Provider' instead of GPS. Thus it is suitable to smartphone application system. But through a result from testing performance of Google Activity Recognition, we found that is difficult to getting user's exact activity because of unnecessary activity element and some wrong recognition. So, in this paper, we describe problems of Google Activity Recognition and propose AGAR(Advanced Google Activity Recognition) applied method to improve accuracy level because we need more exact activity recognition for new service based on activity recognition. Also to appraise value of AGAR, we compare performance of other activity recognition systems and ours and explain an applied possibility of AGAR by developing exemplary program.

Retrieval of Land Surface Temperature Using Landsat 8 Images with Deep Neural Networks (Landsat 8 영상을 이용한 심층신경망 기반의 지표면온도 산출)

  • Kim, Seoyeon;Lee, Soo-Jin;Lee, Yang-Won
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.3
    • /
    • pp.487-501
    • /
    • 2020
  • As a viable option for retrieval of LST (Land Surface Temperature), this paper presents a DNN (Deep Neural Network) based approach using 148 Landsat 8 images for South Korea. Because the brightness temperature and emissivity for the band 10 (approx. 11-㎛ wavelength) of Landsat 8 are derived by combining physics-based equations and empirical coefficients, they include uncertainties according to regional conditions such as meteorology, climate, topography, and vegetation. To overcome this, we used several land surface variables such as NDVI (Normalized Difference Vegetation Index), land cover types, topographic factors (elevation, slope, aspect, and ruggedness) as well as the T0 calculated from the brightness temperature and emissivity. We optimized four seasonal DNN models using the input variables and in-situ observations from ASOS (Automated Synoptic Observing System) to retrieve the LST, which is an advanced approach when compared with the existing method of the bias correction using a linear equation. The validation statistics from the 1,728 matchups during 2013-2019 showed a good performance of the CC=0.910~0.917 and RMSE=3.245~3.365℃, especially for spring and fall. Also, our DNN models produced a stable LST for all types of land cover. A future work using big data from Landsat 5/7/8 with additional land surface variables will be necessary for a more reliable retrieval of LST for high-resolution satellite images.

A Study on the Standardization of Education Modules for ARPA/Radar Simulation (ARPA/레이더 시뮬레이션 교육 모듈의 표준화 연구)

  • Park, Young-Soo
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.22 no.6
    • /
    • pp.631-638
    • /
    • 2016
  • A mariner cadet gains the ability to identify and avoid potential collisions with other ships through ARPA/Radar simulation education. This research surveyed first domestic and overseas's rules (e.g., MOMAF's Standard, the STCW Convention, etc.) of the simulation education, upon investigation the only content and timing of this simulation-based education are specified according to these rules, and maritime education institutions issue the related certification autonomously after a student has taken the simulation because no simulation education module exists to further guide the ARPA/Radar simulation. As a result, it is difficult for students to acquire consistent maritime ability through ARPA/Radar simulation. This paper discusses standardization of these education modules to produce more consistent mariner ability, and verify the degree of improvement of education that would be achieved by enacting the proposed education module. The simulation education system used in maritime institutions in Korea was investigated, and scenarios reflecting traffic flow in actual waterways was proposed based on marine traffic surveys so teaching modules can educate/assess more effectively based on core marine abilities. Improvements in education and training were also verified using data collected over 2 years based on a standardized module. Each education institution can enact an effective, systematic education approach using standardized ARPA/Radar education modules proposed in this paper, and this can set a foundation to contribute to safer vessel navigation by improving maritime abilities.

Korean Abbreviation Generation using Sequence to Sequence Learning (Sequence-to-sequence 학습을 이용한 한국어 약어 생성)

  • Choi, Su Jeong;Park, Seong-Bae;Kim, Kweon-Yang
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.183-187
    • /
    • 2017
  • Smart phone users prefer fast reading and texting. Hence, users frequently use abbreviated sequences of words and phrases. Nowadays, abbreviations are widely used from chat terms to technical terms. Therefore, gathering abbreviations would be helpful to many services, including information retrieval, recommendation system, and so on. However, manually gathering abbreviations needs to much effort and cost. This is because new abbreviations are continuously generated whenever a new material such as a TV program or a phenomenon is made. Thus it is required to generate of abbreviations automatically. To generate Korean abbreviations, the existing methods use the rule-based approach. The rule-based approach has limitations, in that it is unable to generate irregular abbreviations. Another problem is to decide the correct abbreviation among candidate abbreviations generated rules. To address the limitations, we propose a method of generating Korean abbreviations automatically using sequence-to-sequence learning in this paper. The sequence-to-sequence learning can generate irregular abbreviation and does not lead to the problem of deciding correct abbreviation among candidate abbreviations. Accordingly, it is suitable for generating Korean abbreviations. To evaluate the proposed method, we use dataset of two type. As experimental results, we prove that our method is effective for irregular abbreviations.