• Title/Summary/Keyword: Large amount of point data

Search Result 110, Processing Time 0.027 seconds

The Unsupervised Learning-based Language Modeling of Word Comprehension in Korean

  • Kim, Euhee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.11
    • /
    • pp.41-49
    • /
    • 2019
  • We are to build an unsupervised machine learning-based language model which can estimate the amount of information that are in need to process words consisting of subword-level morphemes and syllables. We are then to investigate whether the reading times of words reflecting their morphemic and syllabic structures are predicted by an information-theoretic measure such as surprisal. Specifically, the proposed Morfessor-based unsupervised machine learning model is first to be trained on the large dataset of sentences on Sejong Corpus and is then to be applied to estimate the information-theoretic measure on each word in the test data of Korean words. The reading times of the words in the test data are to be recruited from Korean Lexicon Project (KLP) Database. A comparison between the information-theoretic measures of the words in point and the corresponding reading times by using a linear mixed effect model reveals a reliable correlation between surprisal and reading time. We conclude that surprisal is positively related to the processing effort (i.e. reading time), confirming the surprisal hypothesis.

Generation of Masked Face Image Using Deep Convolutional Autoencoder (컨볼루션 오토인코더를 이용한 마스크 착용 얼굴 이미지 생성)

  • Lee, Seung Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.8
    • /
    • pp.1136-1141
    • /
    • 2022
  • Researches of face recognition on masked faces have been increasingly important due to the COVID-19 pandemic. To realize a stable and practical recognition performance, large amount of facial image data should be acquired for the purpose of training. However, it is difficult for the researchers to obtain masked face images for each human subject. This paper proposes a novel method to synthesize a face image and a virtual mask pattern. In this method, a pair of masked face image and unmasked face image, that are from a single human subject, is fed into a convolutional autoencoder as training data. This allows learning the geometric relationship between face and mask. In the inference step, for a unseen face image, the learned convolutional autoencoder generates a synthetic face image with a mask pattern. The proposed method is able to rapidly generate realistic masked face images. Also, it could be practical when compared to methods which rely on facial feature point detection.

Ontology Construction of Diet Data for Food Hygiene Informatization (식품 위생 정보화를 위한 식단 정보 온톨로지 구축과 활용)

  • Cha, Kyung-Ae;Yeo, Sun-Dong;Yoon, Seong-Wook;Hong, Won-Kee
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.1
    • /
    • pp.21-27
    • /
    • 2017
  • To guarantee the effectiveness of the HACCP(Hazard analysis and critical control points) system, it is necessary to develop of an ontology-based information system that can automatically manage the large amount of HACCP records or information derived from the HACCP operation results. In this paper, we construct a food information ontology which represents the relationships between ingredients, recipe, and features of food categories. Moreover, we develop HACCP automation application adopt the ontology to verify the semantic quality of the designed ontology model by performing HACCP processes such as HACCP diet classification. We expect to contribute to develop a food hygiene information and improve the accuracy of the HACCP data through the semantic system.

Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics (빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축)

  • Jo, Nam-ok;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.33-56
    • /
    • 2016
  • Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.

An Analysis of Information Visualization Problems using User Interface Design Principles (이용자 인터페이스 설계 원칙에 의한 정보시각화 시스템 평가 및 문제점 분석)

  • Lee, Jee-Yeon
    • Journal of Information Management
    • /
    • v.34 no.2
    • /
    • pp.67-88
    • /
    • 2003
  • There have been increased interests in information visualization. Information visualization has been considered as a way to summarize textual data so that the users can access large amount of data more efficiently and effectively. However, many information visualization techniques stem from scientific visualization techniques, which might be difficult for the regular users to understand. More importantly, the system models used by most of the information visualization techniques do not have real world counterpart. For example, most of the users do not represent or process the textual data in terms of fisheye view or a topological map. This means that there is no affordance on the current information visualization systems from the users point of view. In this paper, we analyzed this problem by using the user interface design principles to point out what lacks in the current information visualization systems. More specifically, we have applied Nielson's Heuristic Evaluation technique to review four representative information visualization techniques. The analysis results confirmed our original hypothesis on why the current information visualization systems are not part of the mainstream information systems. Finally, we suggested to invest more efforts in improving the currently prevalent and familiar bullet list type textual information presentation method based on the usability studies and the intelligent content analysis.

Design Thinking Methodology for Social Innovation using Big Data and Qualitative Research (사회혁신분야에서 근거이론 기반 질적연구와 빅데이터 분석을 활용한 디자인 씽킹 방법론)

  • Park, Sang Hyeok;Oh, Seung Hee;Park, Soon Hwa
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.13 no.4
    • /
    • pp.169-181
    • /
    • 2018
  • Under the constantly intensifying global competition environment, many companies are exploring new business opportunities in the field of social innovation using creating shared value. In seeking social innovation, it is a key starting point of social innovation to clarify the problem to be solved and to grasp the cause of the problem. Among the many problem solving methodologies, design thinking is getting the most attention recently in various fields. Design Thinking is a creative problem solving method which is used as a business innovation tool to empathize with human needs and find out the potential desires that the public does not know, and is actively used as a tool for social innovation to solve social problems. However, one of the difficulties experienced by many of the design thinking project participants is that it is difficult to analyze the observed data efficiently. When analyzing data only offline, it takes a long time to analyze a large amount of data, and it has a limit in processing unstructured data. This makes it difficult to find fundamental problems from the data collected through observation while performing design thinking. The purpose of this study is to integrate qualitative data analysis and quantitative data analysis methods in order to make the data analysis collected at the observation stage of the design thinking project for social innovation more scientific to complement the limit of the design thinking process. The integrated methodology presented in this study is expected to contribute to innovation performance through design thinking by providing practical guidelines and implications for design thinking implementers as a valuable tool for social innovation.

Deep Learning-based Professional Image Interpretation Using Expertise Transplant (전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론)

  • Kim, Taejin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.79-104
    • /
    • 2020
  • Recently, as deep learning has attracted attention, the use of deep learning is being considered as a method for solving problems in various fields. In particular, deep learning is known to have excellent performance when applied to applying unstructured data such as text, sound and images, and many studies have proven its effectiveness. Owing to the remarkable development of text and image deep learning technology, interests in image captioning technology and its application is rapidly increasing. Image captioning is a technique that automatically generates relevant captions for a given image by handling both image comprehension and text generation simultaneously. In spite of the high entry barrier of image captioning that analysts should be able to process both image and text data, image captioning has established itself as one of the key fields in the A.I. research owing to its various applicability. In addition, many researches have been conducted to improve the performance of image captioning in various aspects. Recent researches attempt to create advanced captions that can not only describe an image accurately, but also convey the information contained in the image more sophisticatedly. Despite many recent efforts to improve the performance of image captioning, it is difficult to find any researches to interpret images from the perspective of domain experts in each field not from the perspective of the general public. Even for the same image, the part of interests may differ according to the professional field of the person who has encountered the image. Moreover, the way of interpreting and expressing the image also differs according to the level of expertise. The public tends to recognize the image from a holistic and general perspective, that is, from the perspective of identifying the image's constituent objects and their relationships. On the contrary, the domain experts tend to recognize the image by focusing on some specific elements necessary to interpret the given image based on their expertise. It implies that meaningful parts of an image are mutually different depending on viewers' perspective even for the same image. So, image captioning needs to implement this phenomenon. Therefore, in this study, we propose a method to generate captions specialized in each domain for the image by utilizing the expertise of experts in the corresponding domain. Specifically, after performing pre-training on a large amount of general data, the expertise in the field is transplanted through transfer-learning with a small amount of expertise data. However, simple adaption of transfer learning using expertise data may invoke another type of problems. Simultaneous learning with captions of various characteristics may invoke so-called 'inter-observation interference' problem, which make it difficult to perform pure learning of each characteristic point of view. For learning with vast amount of data, most of this interference is self-purified and has little impact on learning results. On the contrary, in the case of fine-tuning where learning is performed on a small amount of data, the impact of such interference on learning can be relatively large. To solve this problem, therefore, we propose a novel 'Character-Independent Transfer-learning' that performs transfer learning independently for each character. In order to confirm the feasibility of the proposed methodology, we performed experiments utilizing the results of pre-training on MSCOCO dataset which is comprised of 120,000 images and about 600,000 general captions. Additionally, according to the advice of an art therapist, about 300 pairs of 'image / expertise captions' were created, and the data was used for the experiments of expertise transplantation. As a result of the experiment, it was confirmed that the caption generated according to the proposed methodology generates captions from the perspective of implanted expertise whereas the caption generated through learning on general data contains a number of contents irrelevant to expertise interpretation. In this paper, we propose a novel approach of specialized image interpretation. To achieve this goal, we present a method to use transfer learning and generate captions specialized in the specific domain. In the future, by applying the proposed methodology to expertise transplant in various fields, we expected that many researches will be actively conducted to solve the problem of lack of expertise data and to improve performance of image captioning.

A Study on Determinants of Commercial Land Values in Gwangju City (광주시 상업지 지가의 형성요인에 관한 연구)

  • Lee, Hyun-Wook
    • Journal of the Korean association of regional geographers
    • /
    • v.2 no.2
    • /
    • pp.159-171
    • /
    • 1996
  • The aim of this study is which factors affect the commercial land values and how they act upon them through distribution of commercial land values by multiple regression analysis in Gwangju city. The major findings of this study are as follows: (1) The changes of commercial land values distribution in $1989{\sim}1996$, We see that the commercial area of higher land values extends following the main arterial road. This is related to urbanization in urban fringe while the decline of commercial land values occurs in city center with long history of commercial region. This is due to unsuitableness in rapid changes of commercial environment because of fragmented lots, old buildings. traffic congestion etc. (2) The regions where commercial land values greatly rose are the west in constructed the new planning city center of Sangmu-dong. and the south west in which is related to the extension of high density apartment and the location of big discount stores. (3) Through the changes in commercial land values distribution map. and road map, topographical map, we know that commercial land values is related to various factors; namely, distance from CBD, convenient traffic, reputation of commercial district, condition of a road, size of supplementary, a degree of commercial land use etc. (4) From the above related factor, six variables are extracted by operational definition. That is the spatial distance from the city center, the walking distance to a stopping place, the road width, the amount of bus traffic, the amount of pedestrian, the number of the shop. (5) Data of seven variables are collected on the highest values point of each Dong. We applicate multiple regression analysis with commercial land values as a dependent variable, extracted six variables as independent variables. (6) As a result of multiple regression on the determinants of commercial land values, the variables which is greatly related to commercial land values are the amount of pedestrain, the spatial distance from city center. We identify that two variables explain variance of the commercial land values by 65%. (7) In order to make clear about not explained 35%. we carry out analysis of residual. In consequence, we see small estimate in downtown area and large estimate in urban fringe. This feature is due to simple core structure of Gwangju city and limits of this regression model.

  • PDF

Process Fault Probability Generation via ARIMA Time Series Modeling of Etch Tool Data

  • Arshad, Muhammad Zeeshan;Nawaz, Javeria;Park, Jin-Su;Shin, Sung-Won;Hong, Sang-Jeen
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2012.02a
    • /
    • pp.241-241
    • /
    • 2012
  • Semiconductor industry has been taking the advantage of improvements in process technology in order to maintain reduced device geometries and stringent performance specifications. This results in semiconductor manufacturing processes became hundreds in sequence, it is continuously expected to be increased. This may in turn reduce the yield. With a large amount of investment at stake, this motivates tighter process control and fault diagnosis. The continuous improvement in semiconductor industry demands advancements in process control and monitoring to the same degree. Any fault in the process must be detected and classified with a high degree of precision, and it is desired to be diagnosed if possible. The detected abnormality in the system is then classified to locate the source of the variation. The performance of a fault detection system is directly reflected in the yield. Therefore a highly capable fault detection system is always desirable. In this research, time series modeling of the data from an etch equipment has been investigated for the ultimate purpose of fault diagnosis. The tool data consisted of number of different parameters each being recorded at fixed time points. As the data had been collected for a number of runs, it was not synchronized due to variable delays and offsets in data acquisition system and networks. The data was then synchronized using a variant of Dynamic Time Warping (DTW) algorithm. The AutoRegressive Integrated Moving Average (ARIMA) model was then applied on the synchronized data. The ARIMA model combines both the Autoregressive model and the Moving Average model to relate the present value of the time series to its past values. As the new values of parameters are received from the equipment, the model uses them and the previous ones to provide predictions of one step ahead for each parameter. The statistical comparison of these predictions with the actual values, gives us the each parameter's probability of fault, at each time point and (once a run gets finished) for each run. This work will be extended by applying a suitable probability generating function and combining the probabilities of different parameters using Dempster-Shafer Theory (DST). DST provides a way to combine evidence that is available from different sources and gives a joint degree of belief in a hypothesis. This will give us a combined belief of fault in the process with a high precision.

  • PDF

Health Status of Elderly Persons in Korea (한국노인의 건강상태에 대한 조사연구)

  • 최영희;김문실;변영순;원종순
    • Journal of Korean Academy of Nursing
    • /
    • v.20 no.3
    • /
    • pp.307-323
    • /
    • 1990
  • This Study was done to design and test an instrument to measure the health status of the elderly including physical, psychologyical and social dimensions. Data collection was done from July 18 to August 17, 1990. Subjects were 412 older persons in Korea. A convenience sample was used but the place of residence was stratified into large, medium and small city and rural areas. Participants located in Sudaemun-Gu, Mapo-Gu, and Kangnam-Gu, Seoul were interviewed by brained nursing students, and those in Chungju, Jonju, Chuncheon, and Jinju by professors of nursing colleges. Rural residents were interviewed by community health practioners working in Kungsang-Buk-Do, Kyngsang- Nam - Bo, Jonla Buk -Do, and Kyung Ki- Do. The tool developed for this study was a structured questionnaire based on previous literature and then tested for reliability and validity. This tool contained 20 physical health status items, 17 mental-emotional health status items and 38 social health status items. Physical health status items clustered in to six factors such as personal hygiene, activity, home management, digestive, sexual, sensory, and climination functions. Mental-emotional health status items clustered into two factors, mental health and emotional health. Social health status items clustered into seven factors, grandparent, parent, spouse, friend, kinships, group member and religious role functions. Data analysis included percentage, average, S.D., t-test and ANOVA. The results of the analysis were as follows : 1. The tool measuring the health status of the elderly and developed for this research had a relatively high reliavility indicated by a cronbach=0.97793. 2. Average score of the subjects physical health status was 4, 054 in a 5 point likert scale, mentalemotional health status was 3.803, social health status was 2.939 and the total average was 3.521. The social status of the subjects was the lowest and the next was mental-emotional health status ; physical health status was the highest. 3. Educational background, perceived health status, the amount of pocket money were related to physical and mental-emotional health status and family structure was related mental-emotional physical and social health status. Occupation was related to physical and mental-emotional status. Area of residence was related to metal-emotional and social status. Source of living in the expeneses was related to physical and mental-emotional health status marital status to mental-emotional and social health status, and the number living in the home physical health status and religion to social health status. The following conciusions were derived from the above results ; 1. The health status of Korean elderly was relatively sound but social health status was the most vulnerable. The Social activity for Korean elderly is needed to improve social health. 2. Educational background, perceived health status and the amount of pocket money must be considered in the health assessment criteria of the elderly, Family structure, marial status, occupation, residence variables and sources of living expense must also be considered as significant. 3. A health education program based on the educational background of the elderly, and provision of an occupational socioeconomic welfare policy will be useful in order to increase social health status of Korean elderly.

  • PDF