• Title/Summary/Keyword: 집합론

Search Result 279, Processing Time 0.025 seconds

A method for metadata extraction from a collection of records using Named Entity Recognition in Natural Language Processing (자연어 처리의 개체명 인식을 통한 기록집합체의 메타데이터 추출 방안)

  • Chiho Song
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.24 no.2
    • /
    • pp.65-88
    • /
    • 2024
  • This pilot study explores a method of extracting metadata values and descriptions from records using named entity recognition (NER), a technique in natural language processing (NLP), a subfield of artificial intelligence. The study focuses on handwritten records from the Guro Industrial Complex, produced during the 1960s and 1970s, comprising approximately 1,200 pages and 80,000 words. After the preprocessing process of the records, which included digitization, the study employed a publicly available language API based on Google's Bidirectional Encoder Representations from Transformers (BERT) language model to recognize entity names within the text. As a result, 173 names of people and 314 of organizations and institutions were extracted from the Guro Industrial Complex's past records. These extracted entities are expected to serve as direct search terms for accessing the contents of the records. Furthermore, the study identified challenges that arose when applying the theoretical methodology of NLP to real-world records consisting of semistructured text. It also presents potential solutions and implications to consider when addressing these issues.

A Survey of DEA Applications in Measuring the Efficiency Performance of Construction Organizations (비모수 분석방법에 의한 국내 건설조직 성과 측정 방향에 관한 연구 - DEA를 이용한 국내 연구 문헌 고찰을 기반으로 -)

  • Lee, Yoon-Sun
    • Korean Journal of Construction Engineering and Management
    • /
    • v.15 no.5
    • /
    • pp.103-114
    • /
    • 2014
  • Data envelopment analysis (DEA) measures the relative efficiency of decision making units (DMUs) with multiple performance factors that are grouped into outputs and inputs. DEA has proven to be superior to simple aggregation of performance measures, and is also useful for evaluating the performance of construction companies for comparison with competitor performance. The purpose of this study was to survey literatures on the application of DEA methodology and to propose a methodological scheme to measure the performance of construction organizations. Articles on previous studies were surveyed and examined as part of a comprehensive review. The survey revealed that the application of DEA in the construction industry was li mited. Further, the survey indicated that there is a need for the development of a methodological framework on the special goals and subjects of performance measurement, methods of data structure and collection, selection of appropriate DEA models, analysis of results, and post-test. Based on the survey, this study identified and discussed the types of major issues and topics for future studies from a methodological perspective, which could be helpful to researchers interested in using DEA to study performance issues in construction organizations.

Line Matching Method for Linking Wayfinding Process with the Road Name Address System (길찾기 과정의 도로명주소 체계 연계를 위한 선형 객체 매칭 방법)

  • Bang, Yoon Sik;Yu, Ki Yun
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.24 no.4
    • /
    • pp.115-123
    • /
    • 2016
  • The road name address system has been in effect in Korea since 2012. However, the existing address system is still being used in many fields because of the difference between the spatial awareness of people and the road name address system. For the spatial awareness based on the road name address system, various spatial datasets in daily life should be referenced by the road names. The goal of this paper is to link the road name address system with the wayfinding process, which is closely related to the spatial awareness. To achieve our goal, we designed and implemented a geometric matching method for spatial data sets. This method generates network neighborhoods from road objects in the 'road name address map' and the 'pedestrian network data'. Then it computes the geometric similarities between the neighborhoods to identify corresponding road name for each object in the network data. The performance by F0.5 was assessed at 0.936 and it was improved to 0.978 by the manual check for 10% of the test data selected by the similarity. By help of our method, the road name address system can be utilized in the wayfinding services, and further in the spatial awareness of people.

Realtime Facial Expression Control and Projection of Facial Motion Data using Locally Linear Embedding (LLE 알고리즘을 사용한 얼굴 모션 데이터의 투영 및 실시간 표정제어)

  • Kim, Sung-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.2
    • /
    • pp.117-124
    • /
    • 2007
  • This paper describes methodology that enables animators to create the facial expression animations and to control the facial expressions in real-time by reusing motion capture datas. In order to achieve this, we fix a facial expression state expression method to express facial states based on facial motion data. In addition, by distributing facial expressions into intuitive space using LLE algorithm, it is possible to create the animations or to control the expressions in real-time from facial expression space using user interface. In this paper, approximately 2400 facial expression frames are used to generate facial expression space. In addition, by navigating facial expression space projected on the 2-dimensional plane, it is possible to create the animations or to control the expressions of 3-dimensional avatars in real-time by selecting a series of expressions from facial expression space. In order to distribute approximately 2400 facial expression data into intuitional space, there is need to represents the state of each expressions from facial expression frames. In order to achieve this, the distance matrix that presents the distances between pairs of feature points on the faces, is used. In order to distribute this datas, LLE algorithm is used for visualization in 2-dimensional plane. Animators are told to control facial expressions or to create animations when using the user interface of this system. This paper evaluates the results of the experiment.

A study on discharge estimation for the event using a deep learning algorithm (딥러닝 알고리즘을 이용한 강우 발생시의 유량 추정에 관한 연구)

  • Song, Chul Min
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.246-246
    • /
    • 2021
  • 본 연구는 강우 발생시 유량을 추정하는 것에 목적이 있다. 이를 위해 본 연구는 선행연구의 모형 개발방법론에서 벗어나 딥러닝 알고리즘 중 하나인 합성곱 신경망 (convolution neural network)과 수문학적 이미지 (hydrological image)를 이용하여 강우 발생시 유량을 추정하였다. 합성곱 신경망은 일반적으로 분류 문제 (classification)을 해결하기 위한 목적으로 개발되었기 때문에 불특정 연속변수인 유량을 모의하기에는 적합하지 않다. 이를 위해 본 연구에서는 합성곱 신경망의 완전 연결층 (Fully connected layer)를 개선하여 연속변수를 모의할 수 있도록 개선하였다. 대부분 합성곱 신경망은 RGB (red, green, blue) 사진 (photograph)을 이용하여 해당 사진이 나타내는 것을 예측하는 목적으로 사용하지만, 본 연구의 경우 일반 RGB 사진을 이용하여 유출량을 예측하는 것은 경험적 모형의 전제(독립변수와 종속변수의 관계)를 무너뜨리는 결과를 초래할 수 있다. 이를 위해 본 연구에서는 임의의 유역에 대해 2차원 공간에서 무차원의 수문학적 속성을 갖는 grid의 집합으로 정의되는 수문학적 이미지는 입력자료로 활용했다. 합성곱 신경망의 구조는 Convolution Layer와 Pulling Layer가 5회 반복하는 구조로 설정하고, 이후 Flatten Layer, 2개의 Dense Layer, 1개의 Batch Normalization Layer를 배열하고, 다시 1개의 Dense Layer가 이어지는 구조로 설계하였다. 마지막 Dense Layer의 활성화 함수는 분류모형에 이용되는 softmax 또는 sigmoid 함수를 대신하여 회귀모형에서 자주 사용되는 Linear 함수로 설정하였다. 이와 함께 각 층의 활성화 함수는 정규화 선형함수 (ReLu)를 이용하였으며, 모형의 학습 평가 및 검정을 판단하기 위해 MSE 및 MAE를 사용했다. 또한, 모형평가는 NSE와 RMSE를 이용하였다. 그 결과, 모형의 학습 평가에 대한 MSE는 11.629.8 m3/s에서 118.6 m3/s로, MAE는 25.4 m3/s에서 4.7 m3/s로 감소하였으며, 모형의 검정에 대한 MSE는 1,997.9 m3/s에서 527.9 m3/s로, MAE는 21.5 m3/s에서 9.4 m3/s로 감소한 것으로 나타났다. 또한, 모형평가를 위한 NSE는 0.7, RMSE는 27.0 m3/s로 나타나, 본 연구의 모형은 양호(moderate)한 것으로 판단하였다. 이에, 본 연구를 통해 제시된 방법론에 기반을 두어 CNN 모형 구조의 확장과 수문학적 이미지의 개선 또는 새로운 이미지 개발 등을 추진할 경우 모형의 예측 성능이 향상될 수 있는 여지가 있으며, 원격탐사 분야나, 위성 영상을 이용한 전 지구적 또는 광역 단위의 실시간 유량 모의 분야 등으로의 응용이 가능할 것으로 기대된다.

  • PDF

Evaluation of Web Service Similarity Assessment Methods (웹서비스 유사성 평가 방법들의 실험적 평가)

  • Hwang, You-Sub
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.4
    • /
    • pp.1-22
    • /
    • 2009
  • The World Wide Web is transitioning from being a mere collection of documents that contain useful information toward providing a collection of services that perform useful tasks. The emerging Web service technology has been envisioned as the next technological wave and is expected to play an important role in this recent transformation of the Web. By providing interoperable interface standards for application-to-application communication, Web services can be combined with component based software development to promote application interaction and integration both within and across enterprises. To make Web services for service-oriented computing operational, it is important that Web service repositories not only be well-structured but also provide efficient tools for developers to find reusable Web service components that meet their needs. As the potential of Web services for service-oriented computing is being widely recognized, the demand for effective Web service discovery mechanisms is concomitantly growing. A number of techniques for Web service discovery have been proposed, but the discovery challenge has not been satisfactorily addressed. Unfortunately, most existing solutions are either too rudimentary to be useful or too domain dependent to be generalizable. In this paper, we propose a Web service organizing framework that combines clustering techniques with string matching and leverages the semantics of the XML-based service specification in WSDL documents. We believe that this is one of the first attempts at applying data mining techniques in the Web service discovery domain. Our proposed approach has several appealing features : (1) It minimizes the requirement of prior knowledge from both service consumers and publishers; (2) It avoids exploiting domain dependent ontologies; and (3) It is able to visualize the semantic relationships among Web services. We have developed a prototype system based on the proposed framework using an unsupervised artificial neural network and empirically evaluated the proposed approach and tool using real Web service descriptions drawn from operational Web service registries. We report on some preliminary results demonstrating the efficacy of the proposed approach.

  • PDF

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

A study on distinctive view of Cheng I's the sage-theory (정이(程?) 성인론(聖人論)의 특징에 관한 고찰)

  • Kim, Sang-Rae
    • The Journal of Korean Philosophical History
    • /
    • no.56
    • /
    • pp.151-180
    • /
    • 2018
  • Since the completion of the theories on human ethics and moral had been established to pursue by Confucian thinkers like Confucius and Mencius, they generally had agreed to present the basic principles for human education which every human could be the sage. In these principles for human ethics and morality there is on the premise that the knowledge about your own ethical and that the completion of the so-called act(爲) and learning(學). They had given to us that how to get a goal for the ethical and moral lives there are several academic oriented methodology will have act and learning set. In the point of achieving complete figures which act and learning for good society, there was named the sage(聖). This concept sage has two major types. One is on for the political figures that completed, and the other one is for the realm of academic side. Confucian as above mentioned the moral human being is equipped with a complete personality and political ability to make man and society perfect. Confucius has been understood as a complete human being. Yes, ideal for these two types of figures will be fulfilled in some way? They take a mystical ability to a priori or a posteriori, such as human effort can reach the sage. There are many thinkers are obvious and logical answer for this major problem in the system of confucian philosophy I have been trying. About the sage(聖), inherently natural learning(生知) occur to the position sage or knowledge (學知), can lead to there are two of the doctrine for that problem. With the study of learning and knowledge on human beings and real society the two systems concerned together. In fact, the main content of the "Analects of Confucius" we have a set of ethical and moral values not the benevolent conversation about Jin(仁) and his disciples a steady emphasis but on in praise of learning (學) for. However, at the time in Han Tang(漢唐) Han Wi(韓愈) and Wang Chung(王充), according to such thinkers the sage is already a priori determined, cannot be reached by human effort. But At the beginning of the Neo-Confucianism, Cheng I(程?) for the pioneer this Song(宋) scholars, regarding this issue could rebirth the thought that every human could be the sage through the learning as the pre-Chin(先秦) times.

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.

Classification of Domestic Freight Data and Application for Network Models in the Era of 'Government 3.0' ('정부 3.0' 시대를 맞이한 국내 화물 자료의 집계 수준에 따른 분류체계 구축 및 네트워크 모형 적용방안)

  • YOO, Han Sol;KIM, Nam Seok
    • Journal of Korean Society of Transportation
    • /
    • v.33 no.4
    • /
    • pp.379-392
    • /
    • 2015
  • Freight flow data in Korea has been collected for a variety of purposes by various organizations. However, since the representation and format of the data varies, it has not been substantially used for freight analyses and furthermore for freight policies. In order to increase the applicability of those data sets, it is required to bring them in a table and compare for finding the differences. Then, it is shown that the raw data can be aggregated by a particular criterion such as mode, origin and destination, and type commodity. This study aims to examine the freight data issue in terms of three different points of view. First, we investigated various freight volume data sets which are released by several organizations. Second, we tried to develop formulations for freight volume data. Third, we discussed how to apply the formulations to network models in which particular OR (Operations Research) techniques are used. The results emphasized that some data might be useless for modeling once they are aggregated. As a result of examining the freight volume data, this study found that 14 organizations share their data sets at various aggregation levels. This study is not an ordinary research article, which normally includes data analysis, because it seems to be impossible to conduct extensive case studies. The reason is that the data dealt in this study are diverse. Nevertheless, this study might guide the research direction in the freight transport research society in terms of data issue. Especially, it can be concluded that this study is a timely research because the governmemt has emphasized the importance of sharing data to public throughout 'government 3.0' for research purpose.