• Title/Summary/Keyword: Text Document

Search Result 670, Processing Time 0.024 seconds

Operation Technique of Spatial Data Change Recognition Data per File (파일 단위 공간데이터 변경 인식 데이터 운영 기법)

  • LEE, Bong-Jun
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.24 no.4
    • /
    • pp.184-193
    • /
    • 2021
  • The system for managing spatial data updates the existing information by extracting only the information that is different from the existing information for the newly obtained spatial information file to update the stored information. In order to extract only objects that have changed from existing information, it is necessary to compare whether there is any difference from existing information for all objects included in the newly obtained spatial information file. This study was conducted to improve this total inspection method in a situation where the amount of spatial information that is frequently updated increases and data update is required at the national level. In this study, before inspecting individual objects in a new acquisition space information file, a method of determining whether individual space objects have been changed only by the information in the file was considered. Spatial data files have structured data characteristics different from general image or text document files, so it is possible to determine whether to change the file unit in a simpler way compared to the existing method of creating and managing file hash. By reducing the number of target files that require full inspection, it is expected to improve the use of resources in the system by saving the overall data quality inspection time and saving data extraction time.

A Case Study on the Application of AI-OCR for Data Transformation of Paper Records (종이기록 데이터화를 위한 AI-OCR 적용 사례연구)

  • Ahn, Sejin;Hwang, Hyunho;Yim, Jin Hee
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.3
    • /
    • pp.165-193
    • /
    • 2022
  • It can be said that digital technology is at the center of the change in the modern work environment. In particular, in general public institutions that prove their work with records produced by business management systems and document production systems, the record management system is also the work environment itself. Gimpo City applied for the 2021 public cloud leading project of the National Information Society Agency (NIA) to proactively respond to the 4th industrial revolution technology era and implemented a public cloud-based AI-OCR technology enhancement project with 330 million won in support of 330 million won. Through this, it was converted into data beyond the limitations of non-electronic records limited to search and image viewing that depend on standardized index values. In addition, a 98% recognition rate was realized by applying a new technology called AI-OCR. Since digital technology has been used to improve work efficiency, productivity, development cost, and record management service levels of internal and external users, we would like to share the direction of enhancing expertise in the record management and implementation of work environment innovation.

Development of Scaffolding Strategies Model by Information Search Process (ISP) (정보탐색과정(ISP)에 의한 스캐폴딩 전략 모형 개발)

  • Jeong-Hoon Lim
    • Journal of Korean Library and Information Science Society
    • /
    • v.54 no.1
    • /
    • pp.143-165
    • /
    • 2023
  • This study aims to propose a scaffolding strategy that can be applied to the information search process by using Kuhlthau's ISP model, which presented a design and implementation strategy for the mediation role in the learning process. To this end, the relevant literature was reviewed to categorize scaffolding strategies, and impressions were collected from the students surveys after providing 150 middle school students in the Daejeon area with the project class to which the scaffolding strategy based on the ISP model was applied. The collected data were processed into a form suitable for analysis through data preprocessing for word frequencies to be extracted, and topic analysis was performed using STM (Structural Topic Modeling). First, after determining the optimal number of topics and extracting topics for each stage of the ISP model, the extracted topics were classified into three types: cognitive domain-macro perspective, cognitive domain-micro perspective, and emotional domain perspective. In this process, we focused on cognitive verbs and emotional verbs among words extracted through text mining, and presented a scaffolding strategy model related to each topic by reviewing representative document cases. Based on the results of this study, if an appropriate scaffolding strategy is provided at the ISP model stage, a positive effect on learners' self-directed task solving can be expected.

A Study on Establishing a Market Entry Strategy for the Satellite Industry Using Future Signal Detection Techniques (미래신호 탐지 기법을 활용한 위성산업 시장의 진입 전략 수립 연구)

  • Sehyoung Kim;Jaehyeong Park;Hansol Lee;Juyoung Kang
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.249-265
    • /
    • 2023
  • Recently, the satellite industry has been paying attention to the private-led 'New Space' paradigm, which is a departure from the traditional government-led industry. The space industry, which is considered to be the next food industry, is still receiving relatively little attention in Korea compared to the global market. Therefore, the purpose of this study is to explore future signals that can help determine the market entry strategies of private companies in the domestic satellite industry. To this end, this study utilizes the theoretical background of future signal theory and the Keyword Portfolio Map method to analyze keyword potential in patent document data based on keyword growth rate and keyword occurrence frequency. In addition, news data was collected to categorize future signals into first symptom and early information, respectively. This is utilized as an interpretive indicator of how the keywords reveal their actual potential outside of patent documents. This study describes the process of data collection and analysis to explore future signals and traces the evolution of each keyword in the collected documents from a weak signal to a strong signal by specifically visualizing how it can be used through the visualization of keyword maps. The process of this research can contribute to the methodological contribution and expansion of the scope of existing research on future signals, and the results can contribute to the establishment of new industry planning and research directions in the satellite industry.

A School-tailored High School Integrated Science Q&A Chatbot with Sentence-BERT: Development and One-Year Usage Analysis (인공지능 문장 분류 모델 Sentence-BERT 기반 학교 맞춤형 고등학교 통합과학 질문-답변 챗봇 -개발 및 1년간 사용 분석-)

  • Gyeongmo Min;Junehee Yoo
    • Journal of The Korean Association For Science Education
    • /
    • v.44 no.3
    • /
    • pp.231-248
    • /
    • 2024
  • This study developed a chatbot for first-year high school students, employing open-source software and the Korean Sentence-BERT model for AI-powered document classification. The chatbot utilizes the Sentence-BERT model to find the six most similar Q&A pairs to a student's query and presents them in a carousel format. The initial dataset, built from online resources, was refined and expanded based on student feedback and usability throughout over the operational period. By the end of the 2023 academic year, the chatbot integrated a total of 30,819 datasets and recorded 3,457 student interactions. Analysis revealed students' inclination to use the chatbot when prompted by teachers during classes and primarily during self-study sessions after school, with an average of 2.1 to 2.2 inquiries per session, mostly via mobile phones. Text mining identified student input terms encompassing not only science-related queries but also aspects of school life such as assessment scope. Topic modeling using BERTopic, based on Sentence-BERT, categorized 88% of student questions into 35 topics, shedding light on common student interests. A year-end survey confirmed the efficacy of the carousel format and the chatbot's role in addressing curiosities beyond integrated science learning objectives. This study underscores the importance of developing chatbots tailored for student use in public education and highlights their educational potential through long-term usage analysis.

Analysis of Generative AI Technology Trends Based on Patent Data (특허 데이터 기반 생성형 AI 기술 동향 분석)

  • Seongmu Ryu;Taewon Song;Minjeong Lee;Yoonju Choi;Soonuk Seol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • This paper analyzes the trends in generative AI technology based on patent application documents. To achieve this, we selected 5,433 generative AI-related patents filed in South Korea, the United States, and Europe from 2003 to 2023, and analyzed the data by country, technology category, year, and applicant, presenting it visually to find insights and understand the flow of technology. The analysis shows that patents in the image category account for 36.9%, the largest share, with a continuous increase in filings, while filings in the text/document and music/speech categories have either decreased or remained stable since 2019. Although the company with the highest number of filings is a South Korean company, four out of the top five filers are U.S. companies, and all companies have filed the majority of their patents in the U.S., indicating that generative AI is growing and competing centered around the U.S. market. The findings of this paper are expected to be useful for future research and development in generative AI, as well as for formulating strategies for acquiring intellectual property.

A Study on the documentary characteristics of acupuncture and moxibustion recorded in Dusagyeong(杜思敬)'s "Jesaengbalsu(濟生拔粹)" (두사경(杜思敬)의 "제생발수(濟生拔粹)"에 수록된 침구의적(鍼灸醫籍)에 관한 문헌)

  • Kim, Jung-Ho;Kim, Ki-Wook;Park, Hyun-Guk
    • Journal of Korean Medical classics
    • /
    • v.22 no.2
    • /
    • pp.71-83
    • /
    • 2009
  • The documentary characteristics of acupuncture and moxibustion recorded in Dusagyeong(杜思敬)'s".Jesaengbalsu(濟生拔粹)" can be summarized into 3 major parts: 1. "Gyeolgo-ungichimbeop(潔古雲岐鍼法)" and "Dutaesachimbeop(竇太師鍼法)" 1) "Gyeolgo-ungichimbeop" was edited by Dusagyeong of the Won dynasty, and was recorded in "Jesaengbalsu". Du was influenced by his teacher Heohyeong(許衡) and followed Janggyeolgo(張潔古) and his son Jangbyeok(張璧), and collected his work "Chimgu-pyeon(鍼灸篇)" for Jang and named it "Gyeolgo-ungichimbeop", and took the content from the medical book of Jang and his student Wang-haejang(王海藏). (2) "Jesaengbalsu"'s original edition exists today. The "Gyeolgo-ungichimbeop" listed in "Jesaengbalsu"'s index contain two collections, the first collection being "Gyeolgo-ungichimbeop" and the second collection being "Dutaesachimbeop(竇太師鍼法)" (3) Gyeolgo(潔古)、Un-gija(雲岐子)'s acupuncture methods can be seen in Un-gija "Bomyeongjipryuyo(保命集類要)" and Wanghaejang "Chasananji(此事難知)". (4) The related acupuncture methods are 'Non-gyeong-rak-yeongsubosabeop(論經絡迎隨補瀉法)', 'Gyeong-rakchwiwonbeop(經絡取原法)', 'Jeopgyeongbeop(接經法)', and 'Sang-hanyeolbyeongjabeop(傷寒熱病刺法)' (5) Du's edition of the entire text of 'Gyeolgojajetongbeop(潔古刺諸痛法)' 'Jasimtongjehyeol(刺心痛諸穴)' and the first half of 'Jeopgyeongbeop(接經法)' is all recorded in "Somunbyeonggigi-uibomyeongjip(素問病機氣宜保命集)". The existing "Somunbyeonggigi-uibomyeongjip" is a combination of the unfinished posthumous work of Yuwanso(劉完素), "Gi-ui(氣宜)" and "Byeonggi(病機)" with works such as Jangwonso(張元素)'s '"Bomyeongseo(保命書)"'. (6) Of the titles "Gyeolgo-ungichimbeop" and "Dutaesachimbeop", the 14$\sim$19th chapters "Dutaesachimbeop" should be concentrated at the end of the chapter, and the 16th chapter that Du added was put after chapter 14 "Yujujiyobu(流注指要賦)", and chapters 20, 21 should be put in "Gyeolgoungichimbeop" after chapter 13. 2. "Chimgyeongjeok-yeongjip(鍼經摘英集)" (1) "Chimgyeongjeok-yeongjip" is a collection of the acupuncture and moxibustion contents of medical books from the Geum and Won dynasties that Dusagyeong collected and organized during the Won dynasty, which is consisted of 5 chapters : "Guchimshik(九鍼式)", "Jeolyangchwisuhyeolbeop(折量取腧穴法)", "Bosabeop(補瀉法)", "Yongchimhoheupbeop(用鍼呼吸法)", "Chibyeongjik-ralgyeol(治病直剌訣)". (2) First, the contents. The nine acupuncture needles[九鍼] listed in "Guchimshik(九鍼式)" is the first existing document recording to systematically illustrate the 'nine classical needles' in drawing and text form which reflects the forms of the needles of the era. Second, "Jeolyangchwisuhyeolbeop(折量取腧穴法)" has the same basic way of measuring points [量穴法] as Wang-yuil's "Dong-insuhyeolchimgudo-gyeong(銅人腧穴鍼灸圖經)" and the same point selection rules as "Jeonyeongbang(全嬰方)". Third, in "Bosabeop(補瀉法)", "Somun(素問)" and Janggyeolgo's "Yeongsubosabeop(迎隨補瀉法)" is put together. Fourth, in "Yongchimhoheupbeop(用鍼呼吸法)", the cold and heat supplementation and draining [寒熱補瀉] method that combines breathing with inner and outer rotation[外 內撚] is recorded. Fifth, "Chi-byeongjik-ralgyeol(治病直剌訣)" is the main part of "Chimgyeongjeok-yeongjip(鍼經摘英集)" listing 69 acupuncture treatments reflecting Du's scholastic ideas on aspects such as syndrome differentiation[辨證], needling method and type of needle[鍼具]. (3) The content of this book was quoted by "Bojebang Chimgumun(普濟方 鍼灸門)" and when Gomu compiled "Chimguchwiyeong", he put the acupuncture treatments for the main indications of the disease patterns[鍼方主治病證] of this book in the related main indications of acupuncture points[腧穴主治證], which influenced books on acupuncture points there after. 3. "Chimgyeongjeolyo(鍼經節要)" (1) Consists of 1 volume. The original title of this book is "Dong-insuhyeolchimgudo-gyeong (銅人腧穴鍼灸圖經)" and the author is Wang-yuil of the Northern Song dynasty, written in the 4th year of the Cheonseong(天聖) era of the Song dynasty(1026). (2) Dusagyeong selected the contents on pathology of the 12 meridians in volume one and two, the introduction and five transport points[五輸穴] in volume 5 of "Dong-indo-gyeong(銅人圖經)" and named it "Chimgyeongjeolyo." During the Won dynasty it was recorded in "Jesaengbalsu".

  • PDF

Analysis of Research Trends of 'Word of Mouth (WoM)' through Main Path and Word Co-occurrence Network (주경로 분석과 연관어 네트워크 분석을 통한 '구전(WoM)' 관련 연구동향 분석)

  • Shin, Hyunbo;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.179-200
    • /
    • 2019
  • Word-of-mouth (WoM) is defined by consumer activities that share information concerning consumption. WoM activities have long been recognized as important in corporate marketing processes and have received much attention, especially in the marketing field. Recently, according to the development of the Internet, the way in which people exchange information in online news and online communities has been expanded, and WoM is diversified in terms of word of mouth, score, rating, and liking. Social media makes online users easy access to information and online WoM is considered a key source of information. Although various studies on WoM have been preceded by this phenomenon, there is no meta-analysis study that comprehensively analyzes them. This study proposed a method to extract major researches by applying text mining techniques and to grasp the main issues of researches in order to find the trend of WoM research using scholarly big data. To this end, a total of 4389 documents were collected by the keyword 'Word-of-mouth' from 1941 to 2018 in Scopus (www.scopus.com), a citation database, and the data were refined through preprocessing such as English morphological analysis, stopwords removal, and noun extraction. To carry out this study, we adopted main path analysis (MPA) and word co-occurrence network analysis. MPA detects key researches and is used to track the development trajectory of academic field, and presents the research trend from a macro perspective. For this, we constructed a citation network based on the collected data. The node means a document and the link means a citation relation in citation network. We then detected the key-route main path by applying SPC (Search Path Count) weights. As a result, the main path composed of 30 documents extracted from a citation network. The main path was able to confirm the change of the academic area which was developing along with the change of the times reflecting the industrial change such as various industrial groups. The results of MPA revealed that WoM research was distinguished by five periods: (1) establishment of aspects and critical elements of WoM, (2) relationship analysis between WoM variables, (3) beginning of researches of online WoM, (4) relationship analysis between WoM and purchase, and (5) broadening of topics. It was found that changes within the industry was reflected in the results such as online development and social media. Very recent studies showed that the topics and approaches related WoM were being diversified to circumstantial changes. However, the results showed that even though WoM was used in diverse fields, the main stream of the researches of WoM from the start to the end, was related to marketing and figuring out the influential factors that proliferate WoM. By applying word co-occurrence network analysis, the research trend is presented from a microscopic point of view. Word co-occurrence network was constructed to analyze the relationship between keywords and social network analysis (SNA) was utilized. We divided the data into three periods to investigate the periodic changes and trends in discussion of WoM. SNA showed that Period 1 (1941~2008) consisted of clusters regarding relationship, source, and consumers. Period 2 (2009~2013) contained clusters of satisfaction, community, social networks, review, and internet. Clusters of period 3 (2014~2018) involved satisfaction, medium, review, and interview. The periodic changes of clusters showed transition from offline to online WoM. Media of WoM have become an important factor in spreading the words. This study conducted a quantitative meta-analysis based on scholarly big data regarding WoM. The main contribution of this study is that it provides a micro perspective on the research trend of WoM as well as the macro perspective. The limitation of this study is that the citation network constructed in this study is a network based on the direct citation relation of the collected documents for MPA.

Production Date and Patrons of Korean Treasure #978: Transcription of the Avatamsaka Sutra (Zhou Version) in Gold on White Paper (보물 제978호 <백지금니대방광불화엄경(白紙金泥大方廣佛華嚴經) 권(卷)29>의 조성 연대 및 발원자 고찰)

  • Won, Seunghyun
    • MISULJARYO - National Museum of Korea Art Journal
    • /
    • v.98
    • /
    • pp.78-103
    • /
    • 2020
  • Transcribed Buddhist sutras generally consist of a frontispiece illustration, sutra illustrations, and sutra text, although some parts may be lost over time. Most transcribed sutras originally include an official record of the transcription (saseonggi) at either the beginning or end of the volume, which document various details of the production, including who commissioned the sutra and when it was transcribed. If such records are unavailable or difficult to decipher, the date of the sutra can only be estimated by comparison to other works with known production dates. This is the case with Korean Treasure #978, the "Transcription of the Avatamsaka Sutra (Zhou Version) in Gold on White Paper" (hereinafter, "Avatamsaka Sutra, Volume 29"), which does not contain any details of its production. Based on formal comparisons, the volume has been estimated to date from the early Joseon period. Important criteria for estimating the production date include the type of calligraphy script and the overall expression of the sutra illustrations. However, these features are missing from some early Joseon sutras, making it difficult to definitively assert which characteristics are representative of the period. Also, transcribed sutras from the late Goryeo period (after 1350) and early Joseon period are often very similar in terms of the expression of the frontispiece illustrations and sutra illustrations. From the late Goryeo period through the early Joseon period, the illustrations of transcribed sutras, which had previously been relatively detailed and realistic, gradually became more formalized and stylized. Significantly, Avatamsaka Sutra, Volume 29 includes illustrations showing both styles of expression (i.e., realistic and formalized). Moreover, the hemp leaf design on the frontispiece and the border around the sutra illustrations are unique features that have never been seen on any other transcribed sutras. Notably, however, Avatamsaka Sutra in Gold on White Paper, Volume 26 (hereinafter, "Avatamsaka Sutra, Volume 26"), which has not yet been introduced in academic research, is complete with frontispiece, sutra illustrations, and sutra text. This sutra is identical to Avatamsaka Sutra, Volume 29 in size, composition, and details, and is thus estimated to have been produced at the same time and by the same patrons. According to the record at the end of the volume, Avatamsaka Sutra, Volume 26 was commissioned in 1348 by Gi Cheol (d. 1365), which corresponds to the estimated date of Avatamsaka Sutra, Volume 29 derived by formal comparison. Based on this new information, Avatamsaka Sutra, Volume 29 was likely produced in the late Goryeo period rather than the early Joseon period, as has previously been presumed. The new study of Avatamsaka Sutra, Volume 26 also seems to confirm that both sutras were transcribed by highly skilled artisans in 1348 of the late Goryeo period, a transitional phase in the expression of sutra illustrations.

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.