• Title/Summary/Keyword: Text data

Search Result 2,956, Processing Time 0.034 seconds

An Analysis of News Media Coverage of the QRcode: Based on 2008-2023 News Big Data (QR코드에 대한 언론 보도 경향: 2008-2023년 뉴스 빅데이터 분석)

  • Sunjeong Kim;Jisu Lee
    • Journal of the Korean Society for information Management
    • /
    • v.41 no.2
    • /
    • pp.269-294
    • /
    • 2024
  • This study analyzed the news media coverage of QRcodes in Korea over a 16-year period (2008 to 2023). A total of 13,335 articles were extracted from the Korea Press Foundation's BigKinds. A quantitative and content analysis was conducted on the news frames. The results indicated that the quantity of news coverage has increased. The greatest quantity of news coverage was observed in 2020, and the most frequently discussed topic in the news was 'IT_Science'. The results of the keyword analysis indicated that the primary words were 'QRcode', 'smartphone', 'service', 'application', and 'payment'. The news media primarily focused on the QRcode's ability to provide instant access and recognition technology. This study demonstrates that advanced information and communication technologies and the increased prevalence of mobile devices have led to a rise in the utilization of QRcodes. Furthermore, QRcodes have become a significant information media in contemporary society.

Performance Improvement of Topic Modeling using BART based Document Summarization (BART 기반 문서 요약을 통한 토픽 모델링 성능 향상)

  • Eun Su Kim;Hyun Yoo;Kyungyong Chung
    • Journal of Internet Computing and Services
    • /
    • v.25 no.3
    • /
    • pp.27-33
    • /
    • 2024
  • The environment of academic research is continuously changing due to the increase of information, which raises the need for an effective way to analyze and organize large amounts of documents. In this paper, we propose Performance Improvement of Topic Modeling using BART(Bidirectional and Auto-Regressive Transformers) based Document Summarization. The proposed method uses BART-based document summary model to extract the core content and improve topic modeling performance using LDA(Latent Dirichlet Allocation) algorithm. We suggest an approach to improve the performance and efficiency of LDA topic modeling through document summarization and validate it through experiments. The experimental results show that the BART-based model for summarizing article data captures the important information of the original articles with F1-Scores of 0.5819, 0.4384, and 0.5038 in Rouge-1, Rouge-2, and Rouge-L performance evaluations, respectively. In addition, topic modeling using summarized documents performs about 8.08% better than topic modeling using full text in the performance comparison using the Perplexity metric. This contributes to the reduction of data throughput and improvement of efficiency in the topic modeling process.

THE CORRELATION BETWEEN AMYLIN AND INSULIN BY TREATMENT WITH 2-DEOXY-D-GLUCOSE AND/OR MANNOSE IN RAT INSULINOMA INS-1E CELLS

  • H.S. KIM;S.S. JOO;Y.-M. YOO
    • The Korean Journal of Physiology and Pharmacology
    • /
    • v.72 no.4
    • /
    • pp.517-528
    • /
    • 2021
  • Aamylin or islet amyloid polypeptide (IAPP) is a peptide synthesized and secreted with insulin by the pancreatic β-cells. A role for amylin in the pathogenesis of type 2 diabetes (T2D) by causing insulin resistance or inhibiting insulin synthesis and secretion has been suggested by in vitro and in vivo studies. These studies are consistent with the effect of endogenous amylin on pancreatic β-cells to modulate and/or restrain insulin secretion. Here, we reported the correlation between amylin and insulin in rat insulinoma inS-1e cells by treating 2-deoxy-ᴰ-glucose (2-DG) and/or mannose. Cell viability was not affected by 24 h treatment with 2-DG and/or mannose, but it was significantly decreased by 48 h treatment with 5 and 10 mm 2-DG. in the 24 h treatment, the synthesis of insulin in the cells and the secretion of insulin into the media showed a significant inverse association. in the 48-h treatment, amylin synthesis vs. the secretion and insulin synthesis vs. the secretion showed a significant inverse relation. The synthesis of amylin vs. insulin and the secretion of amylin vs. insulin showed a significant inverse relationship. The p-ERK, antioxidant enzymes (Cu/Zn-superoxide dismutase (SOD), Mn-SOD, and catalase), and endoplasmic reticulum (ER) stress markers (cleaved caspase-12, CHOP, p-SAPK/JNK, and BiP/GRP78) were significantly increased or decreased by the 24 h and 48 h treatments. These data suggest the relative correlation to the synthesis of amylin by cells vs. the secretion into the media, the synthesis of amylin vs. insulin, and the secretion of amylin vs. insulin under 2-DG and/or mannose in rat insulinoma INS-1E cells. Therefore, these results can provide primary data for the hypothesis that the amylin-insulin relationships may be involved with the human amylin toxicity in pancreatic beta cells.

Improving Explainability of Generative Pre-trained Transformer Model for Classification of Construction Accident Types: Validation of Saliency Visualization

  • Byunghee YOO;Yuncheul WOO;Jinwoo KIM;Moonseo PARK;Changbum Ryan AHN
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.1284-1284
    • /
    • 2024
  • Leveraging large language models and safety accident report data has unique potential for analyzing construction accidents, including the classification of accident types, injured parts, and work processes, using unstructured free text accident scenarios. We previously proposed a novel approach that harnesses the power of fine-tuned Generative Pre-trained Transformer to classify 6 types of construction accidents (caught-in-between, cuts, falls, struck-by, trips, and other) with an accuracy of 82.33%. Furthermore, we proposed a novel methodology, saliency visualization, to discern which words are deemed important by black box models within a sentence associated with construction accidents. It helps understand how individual words in an input sentence affect the final output and seeks to make the model's prediction accuracy more understandable and interpretable for users. This involves deliberately altering the position of words within a sentence to reveal their specific roles in shaping the overall output. However, the validation of saliency visualization results remains insufficient and needs further analysis. In this context, this study aims to qualitatively validate the effectiveness of saliency visualization methods. In the exploration of saliency visualization, the elements with the highest importance scores were qualitatively validated against the construction accident risk factors (e.g., "the 4m pipe," "ear," "to extract staircase") emerging from Construction Safety Management's Integrated Information data scenarios provided by the Ministry of Land, Infrastructure, and Transport, Republic of Korea. Additionally, construction accident precursors (e.g., "grinding," "pipe," "slippery floor") identified from existing literature, which are early indicators or warning signs of potential accidents, were compared with the words with the highest importance scores of saliency visualization. We observed that the words from the saliency visualization are included in the pre-identified accident precursors and risk factors. This study highlights how employing saliency visualization enhances the interpretability of models based on large language processing, providing valuable insights into the underlying causes driving accident predictions.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

  • Kim Yu-Seop;Chang Jeong-Ho
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.749-758
    • /
    • 2004
  • In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.

Mapping Categories of Heterogeneous Sources Using Text Analytics (텍스트 분석을 통한 이종 매체 카테고리 다중 매핑 방법론)

  • Kim, Dasom;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.193-215
    • /
    • 2016
  • In recent years, the proliferation of diverse social networking services has led users to use many mediums simultaneously depending on their individual purpose and taste. Besides, while collecting information about particular themes, they usually employ various mediums such as social networking services, Internet news, and blogs. However, in terms of management, each document circulated through diverse mediums is placed in different categories on the basis of each source's policy and standards, hindering any attempt to conduct research on a specific category across different kinds of sources. For example, documents containing content on "Application for a foreign travel" can be classified into "Information Technology," "Travel," or "Life and Culture" according to the peculiar standard of each source. Likewise, with different viewpoints of definition and levels of specification for each source, similar categories can be named and structured differently in accordance with each source. To overcome these limitations, this study proposes a plan for conducting category mapping between different sources with various mediums while maintaining the existing category system of the medium as it is. Specifically, by re-classifying individual documents from the viewpoint of diverse sources and storing the result of such a classification as extra attributes, this study proposes a logical layer by which users can search for a specific document from multiple heterogeneous sources with different category names as if they belong to the same source. Besides, by collecting 6,000 articles of news from two Internet news portals, experiments were conducted to compare accuracy among sources, supervised learning and semi-supervised learning, and homogeneous and heterogeneous learning data. It is particularly interesting that in some categories, classifying accuracy of semi-supervised learning using heterogeneous learning data proved to be higher than that of supervised learning and semi-supervised learning, which used homogeneous learning data. This study has the following significances. First, it proposes a logical plan for establishing a system to integrate and manage all the heterogeneous mediums in different classifying systems while maintaining the existing physical classifying system as it is. This study's results particularly exhibit very different classifying accuracies in accordance with the heterogeneity of learning data; this is expected to spur further studies for enhancing the performance of the proposed methodology through the analysis of characteristics by category. In addition, with an increasing demand for search, collection, and analysis of documents from diverse mediums, the scope of the Internet search is not restricted to one medium. However, since each medium has a different categorical structure and name, it is actually very difficult to search for a specific category insofar as encompassing heterogeneous mediums. The proposed methodology is also significant for presenting a plan that enquires into all the documents regarding the standards of the relevant sites' categorical classification when the users select the desired site, while maintaining the existing site's characteristics and structure as it is. This study's proposed methodology needs to be further complemented in the following aspects. First, though only an indirect comparison and evaluation was made on the performance of this proposed methodology, future studies would need to conduct more direct tests on its accuracy. That is, after re-classifying documents of the object source on the basis of the categorical system of the existing source, the extent to which the classification was accurate needs to be verified through evaluation by actual users. In addition, the accuracy in classification needs to be increased by making the methodology more sophisticated. Furthermore, an understanding is required that the characteristics of some categories that showed a rather higher classifying accuracy of heterogeneous semi-supervised learning than that of supervised learning might assist in obtaining heterogeneous documents from diverse mediums and seeking plans that enhance the accuracy of document classification through its usage.

The Development of Real-time Video Associated Data Service System for T-DMB (T-DMB 실시간 비디오 부가데이터 서비스 시스템 개발)

  • Kim Sang-Hun;Kwak Chun-Sub;Kim Man-Sik
    • Journal of Broadcast Engineering
    • /
    • v.10 no.4 s.29
    • /
    • pp.474-487
    • /
    • 2005
  • T-DMB (Terrestrial-Digital Multimedia Broadcasting) adopted MPEG-4 BIFS (Binary Format for Scene) Core2D scene description profile and graphics profile as the standard of video associated data service. By using BIFS, we can support to overlay objects, i.e. text, stationary image, circle, polygon, etc., on the main display of receiving end according to the properties designated in broadcasting side and to make clickable buttons and website links on desired objects. Therefore, a variety of interactive data services can be served by BIFS. In this paper, we implement real-time video associated data service system far T-DMB. Our developing system places emphasis on real-time data service by user operation and on inter-working and stability with our previously developed video encoder. Our system consists of BIFS Real-time System, Automatic Stream Control System and Receiving Monitoring System. Basic functions of our system are designed to reflect T-DMB programs and characteristics of program production environment as a top priority. Our developed system was used in BIFS trial service via KBS T-DMB, it is supposed to be used in T-DMB main service after improvement process such as intensifying system stability.

XML Document Editing System for Structural Processing of the Digital Document to Including Mathematical Formula (수식을 포함한 전자문헌의 구조적 처리를 위한 XML 문서편집시스템)

  • 윤화묵;유범종;김창수;정회경
    • Journal of the Korean Society for information Management
    • /
    • v.19 no.4
    • /
    • pp.96-111
    • /
    • 2002
  • A lot of accumulated data of many quantity exist within a institution or an organization, but most data is remained in form of standardization as each institution or organization. There are difficulty in exchange and share of information. New concept of knowledge information resource management to overcome this disadvantage was introduced, and the digitization of knowledge information resources to share and manage accumulated data is been doing. Specially, in science technic or education scholarship it, the tendency that importing XML to process necessary data to exchange and share of knowledge information resources structurally, and limitation of back for search and indexing or reusability is happened according as expression of great many mathematics used inside electron document of these sphere is processed to nonstructural data of image or text and so on. There is interest converged in processing of mathematics that use MathML to overcome this, and we require the solution to be able to process MathML easily and efficiently on structural document. In this paper, designed and implemented of XML document editing system which easy structural process of electronic document for knowledge information resources, and create and express MathML easily on structural document without expert knowledge about MathML.

Design for Database Retrieval System using Virtual Database in Intranet (인트라넷에서 가상데이터베이스를이용한 데이터베이스 검색 시스템의 설계)

  • Lee, Dong-Wook;Park, Young-Bae
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1404-1417
    • /
    • 1998
  • Currently, there exists two different methods for database retrieval in the internet. First is to use the search engine and the second is to use the plug-in or ActiveX technology, If a search engine, which makes use of indices built from keywords of simple text data in order to do a search, is used when accessing a database, first it is not possible to access more than one database at a time, second it is also not possible to support various conditional retrievals as in using query language, and third the set of data received might include many unwanted data, in other words, precision rate might be relatively low. Plug in or Active technology make use of Web browset to execute chents' query in order to do a database retrieval. Problems associated with this is that it is not possible to activate more than one DBMS simultaneously even if they are of the same data model. sefond it is not possible to execute a user query other than the ones thai arc previou sly defined by the client program In this paper, to resolve those aforementioned problems we design and implement database retrieval system using a virtual database, which makes it possible to provide direct query jntertacc through the conventional Web browser. We assume that the virtual database is designed and aggregated from more than one relational database using the same data model.

  • PDF