• Title/Summary/Keyword: Text Construction

Search Result 386, Processing Time 0.036 seconds

Multiple Cause Model-based Topic Extraction and Semantic Kernel Construction from Text Documents (다중요인모델에 기반한 텍스트 문서에서의 토픽 추출 및 의미 커널 구축)

  • 장정호;장병탁
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.5
    • /
    • pp.595-604
    • /
    • 2004
  • Automatic analysis of concepts or semantic relations from text documents enables not only an efficient acquisition of relevant information, but also a comparison of documents in the concept level. We present a multiple cause model-based approach to text analysis, where latent topics are automatically extracted from document sets and similarity between documents is measured by semantic kernels constructed from the extracted topics. In our approach, a document is assumed to be generated by various combinations of underlying topics. A topic is defined by a set of words that are related to the same topic or cooccur frequently within a document. In a network representing a multiple-cause model, each topic is identified by a group of words having high connection weights from a latent node. In order to facilitate teaming and inferences in multiple-cause models, some approximation methods are required and we utilize an approximation by Helmholtz machines. In an experiment on TDT-2 data set, we extract sets of meaningful words where each set contains some theme-specific terms. Using semantic kernels constructed from latent topics extracted by multiple cause models, we also achieve significant improvements over the basic vector space model in terms of retrieval effectiveness.

An Artificial Neural Network Based Phrase Network Construction Method for Structuring Facility Error Types (설비 오류 유형 구조화를 위한 인공신경망 기반 구절 네트워크 구축 방법)

  • Roh, Younghoon;Choi, Eunyoung;Choi, Yerim
    • Journal of Internet Computing and Services
    • /
    • v.19 no.6
    • /
    • pp.21-29
    • /
    • 2018
  • In the era of the 4-th industrial revolution, the concept of smart factory is emerging. There are efforts to predict the occurrences of facility errors which have negative effects on the utilization and productivity by using data analysis. Data composed of the situation of a facility error and the type of the error, called the facility error log, is required for the prediction. However, in many manufacturing companies, the types of facility error are not precisely defined and categorized. The worker who operates the facilities writes the type of facility error in the form with unstructured text based on his or her empirical judgement. That makes it impossible to analyze data. Therefore, this paper proposes a framework for constructing a phrase network to support the identification and classification of facility error types by using facility error logs written by operators. Specifically, phrase indicating the types are extracted from text data by using dictionary which classifies terms by their usage. Then, a phrase network is constructed by calculating the similarity between the extracted phrase. The performance of the proposed method was evaluated by using real-world facility error logs. It is expected that the proposed method will contribute to the accurate identification of error types and to the prediction of facility errors.

Development of Intelligent OCR Technology to Utilize Document Image Data (문서 이미지 데이터 활용을 위한 지능형 OCR 기술 개발)

  • Kim, Sangjun;Yu, Donghui;Hwang, Soyoung;Kim, Minho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.212-215
    • /
    • 2022
  • In the era of so-called digital transformation today, the need for the construction and utilization of big data in various fields has increased. Today, a lot of data is produced and stored in a digital device and media-friendly manner, but the production and storage of data for a long time in the past has been dominated by print books. Therefore, the need for Optical Character Recognition (OCR) technology to utilize the vast amount of print books accumulated for a long time as big data was also required in line with the need for big data. In this study, a system for digitizing the structure and content of a document object inside a scanned book image is proposed. The proposal system largely consists of the following three steps. 1) Recognition of area information by document objects (table, equation, picture, text body) in scanned book image. 2) OCR processing for each area of the text body-table-formula module according to recognized document object areas. 3) The processed document informations gather up and returned to the JSON format. The model proposed in this study uses an open-source project that additional learning and improvement. Intelligent OCR proposed as a system in this study showed commercial OCR software-level performance in processing four types of document objects(table, equation, image, text body).

  • PDF

Economic Feasibility Analysis of Nationwide Expansion of Agro-meteorological Early Warning Service for Weather Risk Management in Korea (농업기상재해 조기경보서비스의 전국 확대에 따른 경제적 타당성 분석)

  • Sangtaek Seo;Yun Hee Jeong;Soo Jin Kim;Kyo-Moon Shim
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.3
    • /
    • pp.236-244
    • /
    • 2023
  • The purpose of this study was to examine the economic feasibility of providing services according to the nationwide expansion of early warning services. The net present value method, one of the cost-benefit analysis methods, was applied to the analysis. As a benefit item that constituted the net present value, the damage reduction amount using crop insurance data and the willingness to pay for the use of early warning services were used. The cost items included system construction and maintenance costs, and text transmission costs. As a result of the analysis, it was found that the nationwide expansion of early warning services had economic feasibility, and its economic effect varied depending on the level of text message use (10 % to 40 %, 10 %p interval) of participating farmers. In the future, the economic effect of early warning services is expected to increase further due to the increase in the number of farmers participating in early warning services and the increase in crop damage caused by climate change. It is necessary to further enhance the economic effect of early warning services by actively utilizing information delivery means through apps or the web as well as text messages.

Analysis of major issues in the field of Maritime Autonomous Surface Ships using text mining: focusing on S.Korea news data (텍스트 마이닝을 활용한 자율운항선박 분야 주요 이슈 분석 : 국내 뉴스 데이터를 중심으로)

  • Hyeyeong Lee;Jin Sick Kim;Byung Soo Gu;Moon Ju Nam;Kook Jin Jang;Sung Won Han;Joo Yeoun Lee;Myoung Sug Chung
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.20 no.spc1
    • /
    • pp.12-29
    • /
    • 2024
  • The purpose of this study is to identify the social issues discussed in Korea regarding Maritime Autonomous Surface Ships (MASS), the most advanced ICT field in the shipbuilding industry, and to suggest policy implications. In recent years, it has become important to reflect social issues of public interest in the policymaking process. For this reason, an increasing number of studies use media data and social media to identify public opinion. In this study, we collected 2,843 domestic media articles related to MASS from 2017 to 2022, when MASS was officially discussed at the International Maritime Organization, and analyzed them using text mining techniques. Through term frequency-inverse document frequency (TF-IDF) analysis, major keywords such as 'shipbuilding,' 'shipping,' 'US,' and 'HD Hyundai' were derived. For LDA topic modeling, we selected eight topics with the highest coherence score (-2.2) and analyzed the main news for each topic. According to the combined analysis of five years, the topics '1. Technology integration of the shipbuilding industry' and '3. Shipping industry in the post-COVID-19 era' received the most media attention, each accounting for 16%. Conversely, the topic '5. MASS pilotage areas' received the least media attention, accounting for 8 percent. Based on the results of the study, the implications for policy, society, and international security are as follows. First, from a policy perspective, the government should consider the current situation of each industry sector and introduce MASS in stages and carefully, as they will affect the shipbuilding, port, and shipping industries, and a radical introduction may cause various adverse effects. Second, from a social perspective, while the positive aspects of MASS are often reported, there are also negative issues such as cybersecurity issues and the loss of seafarer jobs, which require institutional development and strategic commercialization timing. Third, from a security perspective, MASS are expected to change the paradigm of future maritime warfare, and South Korea is promoting the construction of a maritime unmanned system-based power, but it emphasizes the need for a clear plan and military leadership to secure and develop the technology. This study has academic and policy implications by shedding light on the multidimensional political and social issues of MASS through news data analysis, and suggesting implications from national, regional, strategic, and security perspectives beyond legal and institutional discussions.

A Study on the Perception and Experience of Daejeon Public Library Users Using Text Mining: Focusing on SNS and Online News Articles (텍스트마이닝을 활용한 대전시 공공도서관 이용자의 인식과 경험 연구 - SNS와 온라인 뉴스 기사를 중심으로 -)

  • Jiwon Choi;Seung-Jin Kwak
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.58 no.2
    • /
    • pp.363-384
    • /
    • 2024
  • This study was conducted to examine the user's experiences with the public library in Daejeon using big data analysis, focusing on the text mining technique. To know this, first, the overall evaluation and perception of users about the public library in Daejeon were explored by collecting data on social media. Second, through analysis using online news articles, the pending issues that are being discussed socially were identified. As a result of the analysis, the proportion of users with children was first high. Next, it was found that topics through LDA analysis appeared in four categories: 'cultural event/program', 'data use', 'physical environment and facilities', and 'library service'. Finally, it was confirmed that keywords for the additional construction of libraries and complex cultural spaces and the establishment of a library cooperation system appeared at the core in the news article data. Based on this, it was proposed to build a library in consideration of regional balance and to create a social parenting community network through business agreements with childcare and childcare institutions. This will contribute to identifying the policy and social trends of public libraries in Daejeon and implementing data-based public library operations that reflect local community demands.

Construction of Vegetation Information Management System Using GIS (GIS를 이용한 식생정보 통합관리시스템 구축 방안)

  • Song, Ji Hye;Kang, In Joon;Hong, Soon Heon;Park, Dong Hyun
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.22 no.4
    • /
    • pp.99-106
    • /
    • 2014
  • After 1960 forest and ecosystem are rapidly destroyed by industrialization and urbanization. Accordingly, studies that produce vegetation map continue for forest and ecosystem management. Since 1986 national natural environment survey is being conducted in Korea. Also, vegetation information is managed properly through forest geospatial information service(FGIS) of the Department of Environment when NGIS project was promoted since 1995. But it provide dominant species information based on text. In particular, some vegetation information dose not provide to end-user. Therefore, we suggest construction method of vegetation information management system based on GIS to solve the problem. Also, we suggest connection method of related system for an accurate analysis, planning and decision-making support.

The Current Status of Utilization and Demand on Cancer Information in the Faculties of Medical School in Korea (국내 의과대학 교수의 암정보 활용 현황과 요구도)

  • Lim, Min-Kyung;Park, Sook-Kyung;Yang, Jeong-Hee;Lee, Young-Sung
    • Journal of Preventive Medicine and Public Health
    • /
    • v.36 no.1
    • /
    • pp.39-46
    • /
    • 2003
  • Objectives : To investigate the availability and demand for overall cancer-related information, and to establish a basic plan for the construction of a cancer database and information system based on the research results from Korea. Methods : Postal and telephone surveys were carried out, between August 2001 and November 2001, of 323 affiliated faculty professors from medical universities and colleges in Korea. The data were analyzed with descriptive statistical methods, with regard to the present status and demand for health and cancer-related information. Results : Most (over 80%) subjects studied utilized the health-related information provided on Internet website from foreign countries, such as Medline, but similar comprehensive information system lacked in Korea. The construction of a cancer-related database of domestic research results was revealed to be in a great demand. Information on registration and statistics (52.8%), study results (48.5%) and study resources (37.4%) were the major ingredients required in the database. In constructing a database of the cancer-related research results, a full-text service, continuous updating of data, and the development of standardized user-friendly searching tool were regarded as the necessary components. The formulation of an information sharing system, regarding cancer-related clinical trials, was investigated as being quite feasible. Conclusion : This study demonstrated the great importance of cancer information systems, and much demand for an available cancer-related database based on Korean research results.

Television Debates: Genre Conventions and Their Limits as Public Spheres (사회적 공론장으로서 텔레비전 토론 프로그램: 장르 관습과 한계)

  • Kim, Hoon-Soon;Kim, Eun-Jung
    • Korean journal of communication and information
    • /
    • v.18
    • /
    • pp.63-97
    • /
    • 2002
  • Public debate is an essential communication process of our society and now it's carried out generally by television. The purpose of this study is to discuss on the potentialities and limits of TV debate as a public space. First, we examine the way of television's construction of public debate to discover the conventions of the genre. Second, examine its limitation and potentials as an public sphere. We analyse four TV debate programs during one month(June, 2001) using text analysis: format construction, nature of agenda, characteristics of panels and chairman, participation of audience, type of knowledge. The result shows that although numbers of programs are increased, many TV debates not differentiated each other in their format, panel, and contents, and merely reproduce genre conventions. Especially in policy debates, abstract agenda, male-dominated panel, limited participation of audience, and elitism and authoritative are prevailing. The genre's preconceived formulae and fixed convention restrict its own possibility of a participant and democratic public sphere. So, in order for TV debates to function as a open public sphere, to be flexible and re-examinate the proper frame for mass media public sphere.

  • PDF

Construction Scheme of Training Data using Automated Exploring of Boundary Categories (경계범주 자동탐색에 의한 확장된 학습체계 구성방법)

  • Choi, Yun-Jeong;Jee, Jeong-Gyu;Park, Seung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.6
    • /
    • pp.479-488
    • /
    • 2009
  • This paper shows a reinforced construction scheme of training data for improvement of text classification by automatic search of boundary category. The documents laid on boundary area are usually misclassified as they are including multiple topics and features. which is the main factor that we focus on. In this paper, we propose an automated exploring methodology of optimal boundary category based on previous research. We consider the boundary area among target categories to new category to be required training, which are then added to the target category sementically. In experiments, we applied our method to complex documents by intentionally making errors in training process. The experimental results show that our system has high accuracy and reliability in noisy environment.