• Title/Summary/Keyword: Text Construction

Search Result 386, Processing Time 0.028 seconds

A Study on Constructing a Digital Archive System of the Modern Korean Christian Collections (근대 한국기독교 자료의 디지털 아카이브 시스템 구축에 관한 연구)

  • Yang, Ji-Ann
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.8
    • /
    • pp.681-691
    • /
    • 2022
  • The purpose of this study is to construct a digital archive system by analyzing the collections of the Korean Christian Museum at S University, which has a large number of materials related to Korean Christianity published in the modern period from the time of Korea's enlightenment until liberation. In order to construct a digital archive system, indexes and metadata for the collection are complied according to the pre-defined format. After digitizing the selected collection, a database is built using metadata information, and the actual system is divided into a web standard-based management system and a user service system. Also a content-based search system is constructed, which provides the matching value of retrieval results in units of one character and an automatic search term completion function to enhance user convenience. Therefore, collections in the museum, which are difficult to access the original text, are digitized and provided so that they can be easily used, laying the foundation for the long-term development of humanities contents for improving the accessibility and availability of collections for both researchers and the public.

Test Dataset for validating the meaning of Table Machine Reading Language Model (표 기계독해 언어 모형의 의미 검증을 위한 테스트 데이터셋)

  • YU, Jae-Min;Cho, Sanghyun;Kwon, Hyuk-Chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.164-167
    • /
    • 2022
  • In table Machine comprehension, the knowledge required for language models or the structural form of tables changes depending on the domain, showing a greater performance degradation compared to text data. In this paper, we propose a pre-learning data construction method and an adversarial learning method through meaningful tabular data selection for constructing a pre-learning table language model robust to these domain changes in table machine reading. In order to detect tabular data sed for decoration of web documents without structural information from the extracted table data, a rule through heuristic was defined to identify head data and select table data was applied. An adversarial learning method between tabular data and infobax data with knowledge information about entities was applied. When the data was refined compared to when it was trained with the existing unrefined data, F1 3.45 and EM 4.14 increased in the KorQuAD table data, and F1 19.38, EM 4.22 compared to when the data was not refined in the Spec table QA data showed increased performance.

  • PDF

A Study on the Construction of Financial-Specific Language Model Applicable to the Financial Institutions (금융권에 적용 가능한 금융특화언어모델 구축방안에 관한 연구)

  • Jae Kwon Bae
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.3
    • /
    • pp.79-87
    • /
    • 2024
  • Recently, the importance of pre-trained language models (PLM) has been emphasized for natural language processing (NLP) such as text classification, sentiment analysis, and question answering. Korean PLM shows high performance in NLP in general-purpose domains, but is weak in domains such as finance, medicine, and law. The main goal of this study is to propose a language model learning process and method to build a financial-specific language model that shows good performance not only in the financial domain but also in general-purpose domains. The five steps of the financial-specific language model are (1) financial data collection and preprocessing, (2) selection of model architecture such as PLM or foundation model, (3) domain data learning and instruction tuning, (4) model verification and evaluation, and (5) model deployment and utilization. Through this, a method for constructing pre-learning data that takes advantage of the characteristics of the financial domain and an efficient LLM training method, adaptive learning and instruction tuning techniques, were presented.

Construction and Utilization Plan of Steep Slope and Underground Spatial Information DB for Steep Slope Disaster Prevention (급경사지방재를 위한 급경사지정보 및 지하공간정보 DB 구축과 활용 방안 연구)

  • Lee, Kyungchul;Jang, Yonggu;Song, Jihye;Kang, Injoon
    • Journal of the Korean GEO-environmental Society
    • /
    • v.15 no.7
    • /
    • pp.13-21
    • /
    • 2014
  • Recently, a great number of natural disasters have more frequently happened than the past. The National Emergency Management Agency of Korea has made preparation for the integrated management system of steep slope lands. There is information based on the steep slope inspection sheets and the underground spatial information related to the prevention against steep slope disasters. Nevertheless, building a complete DB System to prevent the hazards and secure the safeties should be urgently dealt with. It is mainly because the information of the National Disaster Management System is restricted to the text-based brief data. Therefore, the purpose of this study is to suggest the method as to building steep slope DB system for disaster prevention and maximizing the availabilities. This study shows the way of building a web-based DB system having its root in the steep slope inspection sheets. The method of establishing the ideal DB system that has liaisons between the Ministry of Land, Infrastructure and Transport and the National Emergency Management Agency is discussed in this study. Furthermore the optimization of DB utilization will assist the various integrated steep slope management systems based on U-IT which are ongoing projects.

Implementation of Character and Object Metadata Generation System for Media Archive Construction (미디어 아카이브 구축을 위한 등장인물, 사물 메타데이터 생성 시스템 구현)

  • Cho, Sungman;Lee, Seungju;Lee, Jaehyeon;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1076-1084
    • /
    • 2019
  • In this paper, we introduced a system that extracts metadata by recognizing characters and objects in media using deep learning technology. In the field of broadcasting, multimedia contents such as video, audio, image, and text have been converted to digital contents for a long time, but the unconverted resources still remain vast. Building media archives requires a lot of manual work, which is time consuming and costly. Therefore, by implementing a deep learning-based metadata generation system, it is possible to save time and cost in constructing media archives. The whole system consists of four elements: training data generation module, object recognition module, character recognition module, and API server. The deep learning network module and the face recognition module are implemented to recognize characters and objects from the media and describe them as metadata. The training data generation module was designed separately to facilitate the construction of data for training neural network, and the functions of face recognition and object recognition were configured as an API server. We trained the two neural-networks using 1500 persons and 80 kinds of object data and confirmed that the accuracy is 98% in the character test data and 42% in the object data.

Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences (기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구)

  • Kim, Seon-Wu;Ko, Gun-Woo;Choi, Won-Jun;Jeong, Hee-Seok;Yoon, Hwa-Mook;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.4
    • /
    • pp.141-164
    • /
    • 2018
  • Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

Civic Participation in Smart City : A Role and Direction (스마트도시 구현을 위한 시민참여의 역할과 방향에 관한 연구)

  • Nam, Woo-Min;Park, Keon Chul
    • Journal of Internet Computing and Services
    • /
    • v.23 no.6
    • /
    • pp.79-86
    • /
    • 2022
  • This study aims to analyze the research trends on the civic participation in a smart city and to present implications to policy makers, industry professionals and researchers. As rapid urbanization is defining development trend of modern city, urban problems such as transportation, environment, and energy are spreading and intensifying around the city. Countries around the world are introducing smart cities to solve these urban problems and to achieve sustainable development. Recently, many countries are modifying urban planning from top-down to down-up by actively engaging citizens to participate in the urban construction process directly and indirectly. Although the construction of smart cities is being promoted in Korea to solve urban problems, awareness of smart cities and civic participation are low. In order to overcome this situation, discussions on ideas and methods that can increase civic participation in smart cities are continuously being conducted. Therefore, in this study, by collecting publication containing both 'Smart Cities' and 'Participation (Engagement)' in Scopus DB, the topics of related studies were categorized and research trends were analyzed using topic modeling. Through this study, it is expected that it can be used as evidence to understand the direction of civic participation research in smart cities and to present the direction of related research in the future.

Definition and Division in Intelligent Service Facility for Integrating Management (지능화시설의 통합운영관리를 위한 정의 및 구분에 관한 연구)

  • PARK, Jeong-Woo;YIM, Du-Hyun;NAM, Kwang-Woo;KIM, Jin-Young
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.19 no.4
    • /
    • pp.52-62
    • /
    • 2016
  • Smart City is urban development for complex problem solving that provides convenience and safety for citizens, and it is a blueprint for future cities. In 2008, the Korean government defined the construction, management, and government support of U-Cities in the legislation, Act on the Construction, Etc. of Ubiquitous Cities (Ubiquitous City Act), which included definitions of terms used in the act. In addition, the Minister of Land, Infrastructure and Transport has established a "ubiquitous city master plan" considering this legislation. The concept of U-Cities is complex, due to the mix of informatization and urban planning. Because of this complexity, the foundation of relevant regulations is inadequate, which is impeding the establishment and implementation of practical plans. Smart City intelligent service facilities are not easy to define and classify, because technology is rapidly changing and includes various devices for gathering and expressing information. The purpose of this study is to complement the legal definition of the intelligent service facility, which is necessary for integrated management and operation. The related laws and regulations on U-City were analyzed using text-mining techniques to identify insufficient legal definitions of intelligent service facilities. Using data gathered from interviews with officials responsible for constructing U-Cities, this study identified problems generated by implementing intelligent service facilities at the field level. This strategy should contribute to improved efficiency management, the foundation for building integrated utilization between departments. Efficiencies include providing a clear concept for establishing five-year renewable plans for U-Cities.

A Suggestion for Spatiotemporal Analysis Model of Complaints on Officially Assessed Land Price by Big Data Mining (빅데이터 마이닝에 의한 공시지가 민원의 시공간적 분석모델 제시)

  • Cho, Tae In;Choi, Byoung Gil;Na, Young Woo;Moon, Young Seob;Kim, Se Hun
    • Journal of Cadastre & Land InformatiX
    • /
    • v.48 no.2
    • /
    • pp.79-98
    • /
    • 2018
  • The purpose of this study is to suggest a model analysing spatio-temporal characteristics of the civil complaints for the officially assessed land price based on big data mining. Specifically, in this study, the underlying reasons for the civil complaints were found from the spatio-temporal perspectives, rather than the institutional factors, and a model was suggested monitoring a trend of the occurrence of such complaints. The official documents of 6,481 civil complaints for the officially assessed land price in the district of Jung-gu of Incheon Metropolitan City over the period from 2006 to 2015 along with their temporal and spatial poperties were collected and used for the analysis. Frequencies of major key words were examined by using a text mining method. Correlations among mafor key words were studied through the social network analysis. By calculating term frequency(TF) and term frequency-inverse document frequency(TF-IDF), which correspond to the weighted value of key words, I identified the major key words for the occurrence of the civil complaint for the officially assessed land price. Then the spatio-temporal characteristics of the civil complaints were examined by analysing hot spot based on the statistics of Getis-Ord $Gi^*$. It was found that the characteristic of civil complaints for the officially assessed land price were changing, forming a cluster that is linked spatio-temporally. Using text mining and social network analysis method, we could find out that the occurrence reason of civil complaints for the officially assessed land price could be identified quantitatively based on natural language. TF and TF-IDF, the weighted averages of key words, can be used as main explanatory variables to analyze spatio-temporal characteristics of civil complaints for the officially assessed land price since these statistics are different over time across different regions.

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.