• Title/Summary/Keyword: Text comparing

Search Result 269, Processing Time 0.034 seconds

Effects of Medium Experience on Medium Perception and Communication Process (텍스트매체 사용에 있어서 매체 경험이 매체 인지와 의사소통과정에 미치는 영향)

  • Yang, Jae-Ho;Lee, Hyun-Kyu;Suh, Kil-Soo
    • Asia pacific journal of information systems
    • /
    • v.9 no.3
    • /
    • pp.1-23
    • /
    • 1999
  • The objective of this study is to examine the media richness theory and the social information processing model by analyzing the effect of media experience on media perception and communication process. To accomplish this objective, a laboratory experiment was conducted. The independent variable was text medium experience and a face-to-face medium was added as a control group. The dependent variables were medium perception and communication process. Medium perception includes perceived richness, medium feeling, task satisfaction, and communication satisfaction. Communication processes were also analyzed to compare each treatment group. The results can be summarized into two facts. First, face-to-face group showed higher perceived richness than text medium group. And experienced text medium group perceived their text medium richer than inexperienced text medium group. Second, experienced text medium groups showed more interactions between subjects than inexperienced text medium group. Experienced text medium group also showed more agreements and meta-communication which could be found in face-to-face group. The result of this study supported media richness theory by finding that face-to-face medium was perceived richer than text medium, And the results also proved social information processing model by comparing experienced text medium group and inexperienced text medium group. The text medium, although thought to be the leanest one, could be perceived richer if users had lots of experience on it.

  • PDF

CR-M-SpanBERT: Multiple embedding-based DNN coreference resolution using self-attention SpanBERT

  • Joon-young Jung
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.35-47
    • /
    • 2024
  • This study introduces CR-M-SpanBERT, a coreference resolution (CR) model that utilizes multiple embedding-based span bidirectional encoder representations from transformers, for antecedent recognition in natural language (NL) text. Information extraction studies aimed to extract knowledge from NL text autonomously and cost-effectively. However, the extracted information may not represent knowledge accurately owing to the presence of ambiguous entities. Therefore, we propose a CR model that identifies mentions referring to the same entity in NL text. In the case of CR, it is necessary to understand both the syntax and semantics of the NL text simultaneously. Therefore, multiple embeddings are generated for CR, which can include syntactic and semantic information for each word. We evaluate the effectiveness of CR-M-SpanBERT by comparing it to a model that uses SpanBERT as the language model in CR studies. The results demonstrate that our proposed deep neural network model achieves high-recognition accuracy for extracting antecedents from NL text. Additionally, it requires fewer epochs to achieve an average F1 accuracy greater than 75% compared with the conventional SpanBERT approach.

Patent Document Similarity Based on Image Analysis Using the SIFT-Algorithm and OCR-Text

  • Park, Jeong Beom;Mandl, Thomas;Kim, Do Wan
    • International Journal of Contents
    • /
    • v.13 no.4
    • /
    • pp.70-79
    • /
    • 2017
  • Images are an important element in patents and many experts use images to analyze a patent or to check differences between patents. However, there is little research on image analysis for patents partly because image processing is an advanced technology and typically patent images consist of visual parts as well as of text and numbers. This study suggests two methods for using image processing; the Scale Invariant Feature Transform(SIFT) algorithm and Optical Character Recognition(OCR). The first method which works with SIFT uses image feature points. Through feature matching, it can be applied to calculate the similarity between documents containing these images. And in the second method, OCR is used to extract text from the images. By using numbers which are extracted from an image, it is possible to extract the corresponding related text within the text passages. Subsequently, document similarity can be calculated based on the extracted text. Through comparing the suggested methods and an existing method based only on text for calculating the similarity, the feasibility is achieved. Additionally, the correlation between both the similarity measures is low which shows that they capture different aspects of the patent content.

Unstructured Data Quantification Scheme Based on Text Mining for User Feedback Extraction (사용자 의견 추출을 위한 텍스트 마이닝 기반 비정형 데이터 정량화 방안)

  • Jo, Jung-Heum;Chung, Yong-Taek;Choi, Seong-Wook;Ok, Changsoo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.4
    • /
    • pp.131-137
    • /
    • 2018
  • People write reviews of numerous products or services on the Internet, in their blogs or community bulletin boards. These unstructured data contain important emotions and opinions about the author's product or service, which can provide important information for future product design or marketing. However, this text-based information cannot be evaluated quantitatively, and thus they are difficult to apply to mathematical models or optimization problems for product design and improvement. Therefore, this study proposes a method to quantitatively extract user's opinion or preference about a specific product or service by utilizing a lot of text-based information existing on the Internet or online. The extracted unstructured text information is decomposed into basic unit words, and positive rate is evaluated by using existing emotional dictionaries and additional lists proposed in this study. This can be a way to effectively utilize unstructured text data, which is being generated and stored in vast quantities, in product or service design. Finally, to verify the effectiveness of the proposed method, a case study was conducted using movie review data retrieved from a portal website. By comparing the positive rates calculated by the proposed framework with user ratings for movies, a guideline on text mining based evaluation of unstructured data is provided.

An International Comparative Study on Home Economics Text Books of Middle School (중학교 가정교과서의 국제비교 연구)

  • 차미경;윤인경
    • Journal of Korean Home Economics Education Association
    • /
    • v.3 no.1
    • /
    • pp.113-129
    • /
    • 1991
  • This study was conducted to compare the outward aspects, objectives, and the contents of Home Economics text books of middle schools of Korea, Japan, U.S.A. and England. The results were summarized as follows. 1. The outward aspects of tex books: The Korean text books were small in size and the quality of paper was inferior to those of foreign countries. The Japanese text books were written by many authors, contained many lab works and data. Text books of U.S.A. were big in size made with good quality paper and contained many colour pictures. Text books England contained many problems and lab works. 2. Objectives of the Home Economics and Unit objectives: The objective of the subjects of Home Economics was written only in Korean text books. The unit objectives were described most concretely and detailedly in Korean text books comparing with other countries. 3. Contents: Korean text books covered all six areas of foods, clothings, housing, home management, family and occupation and theoretical explanations prevailed. Japanese text books contained numerous lab works, lacked two areas of home management and occupation, thecontents included a few practical lab works two areas of home management and occupation, the contents included a few practical lab works. In the text books of U.S.A. contained all six areas of Home Economics were covered and special emphasis was placed on self discovory and self development, and vocational guidance was also stressed. The text book of England contained only three areas of Home Economics, clothing, foods and housing; the number of area was limited but the basic theories of covered area was intended to lead to self comprehension through questions and lab works.

  • PDF

Study of 'Ji-Qi-Shang-Chong' in Shang-han-lun's 15th Text (상한론(傷寒論) 15조(條)의 '기기상충(其氣上衝)'에 대한 고찰)

  • Lee, Seung-Jun;Kim, Yeong-Mok
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.25 no.6
    • /
    • pp.961-967
    • /
    • 2011
  • This study is about 'Ji-Qi-Shang-Chong(其氣上衝)' in Shang-han-lun("傷寒論")'s 15th text. Shang-han-lun is a basic text about pathology of Traditional Korean Medicine written by Zhang-Zhong-Jing(張仲景). In that text, there are so many cases of people having some symptoms, how to treat them, and which herb medicine to give them, and the side effects of wrong treatments. In those cases, there is symptom said 'Ji-Qi-Shang-Chong(其氣上衝)' in the 15th text. But there is no detailed description about that. So this study is aimed at studying exactly meaning of the 15th text's 'Ji-Qi-Shang-Chong(其氣上衝)' by comparing historical medical practitioners and analyzing with the bibliography, pathology, herb pharmacology, herbal medicine, pharmacology part. In the bibliographical analysis, this sentence has been transmitted from original Shan-han-lun written by Zhang-Zhong-Jing(張仲景). Former part of this sentence "太陽病, 下之後, 其氣上衝者, 可與桂枝湯". is most correspondent part with Zhong-Jing(仲景)'s. And there is correctional possibility about latter part.

Prediction of Physical Examination Demand Using Text Mining (텍스트 마이닝을 이용한 건강검진 수요 예측)

  • Park, Kyungbo;Kim, Mi Ryang
    • Journal of Information Technology Services
    • /
    • v.21 no.5
    • /
    • pp.95-106
    • /
    • 2022
  • Recently, physical examinations have become an important strategy to reduce costs for individuals and society. Pre-physical counseling is important for an effective physical examination. However, incomplete counseling is being conducted because the demand for physical examinations is not predicted. Therefore, in this study, the demand for physical examination was predicted using text mining and stepwise regression. As a result of the analysis, the most recent text data showed a high explanatory power of the demand for physical examination. Also, large amounts of data have high explanatory power. In addition, it was found that the high frequency of the text "health food" reduces the number of health examination customers. And the higher the frequency of the text of the word "food", the lower the number of physical examination customers. However, when the word "wild ginseng" was exposed a lot on Twitter, the number of physical examination customers visiting hospitals increased. In other words, customers consume efficiently by comparing the health examination price with the price of consumer goods. The proposed research framework can help predict demand in other industries.

A WWMBERT-based Method for Improving Chinese Text Classification Task (중국어 텍스트 분류 작업의 개선을 위한 WWMBERT 기반 방식)

  • Wang, Xinyuan;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.408-410
    • /
    • 2021
  • In the NLP field, the pre-training model BERT launched by the Google team in 2018 has shown amazing results in various tasks in the NLP field. Subsequently, many variant models have been derived based on the original BERT, such as RoBERTa, ERNIEBERT and so on. In this paper, the WWMBERT (Whole Word Masking BERT) model suitable for Chinese text tasks was used as the baseline model of our experiment. The experiment is mainly for "Text-level Chinese text classification tasks" are improved, which mainly combines Tapt (Task-Adaptive Pretraining) and "Multi-Sample Dropout method" to improve the model, and compare the experimental results, experimental data sets and model scoring standards Both are consistent with the official WWMBERT model using Accuracy as the scoring standard. The official WWMBERT model uses the maximum and average values of multiple experimental results as the experimental scores. The development set was 97.70% (97.50%) on the "text-level Chinese text classification task". and 97.70% (97.50%) of the test set. After comparing the results of the experiments in this paper, the development set increased by 0.35% (0.5%) and the test set increased by 0.31% (0.48%). The original baseline model has been significantly improved.

Text Classification Using Parallel Word-level and Character-level Embeddings in Convolutional Neural Networks

  • Geonu Kim;Jungyeon Jang;Juwon Lee;Kitae Kim;Woonyoung Yeo;Jong Woo Kim
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.771-788
    • /
    • 2019
  • Deep learning techniques such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) show superior performance in text classification than traditional approaches such as Support Vector Machines (SVMs) and Naïve Bayesian approaches. When using CNNs for text classification tasks, word embedding or character embedding is a step to transform words or characters to fixed size vectors before feeding them into convolutional layers. In this paper, we propose a parallel word-level and character-level embedding approach in CNNs for text classification. The proposed approach can capture word-level and character-level patterns concurrently in CNNs. To show the usefulness of proposed approach, we perform experiments with two English and three Korean text datasets. The experimental results show that character-level embedding works better in Korean and word-level embedding performs well in English. Also the experimental results reveal that the proposed approach provides better performance than traditional CNNs with word-level embedding or character-level embedding in both Korean and English documents. From more detail investigation, we find that the proposed approach tends to perform better when there is relatively small amount of data comparing to the traditional embedding approaches.

Text Region Detection using Edge and Regional Minima/Maxima Transformation from Natural Scene Images (에지 및 국부적 최소/최대 변환을 이용한 자연 이미지로부터 텍스트 영역 검출)

  • Park, Jong-Cheon;Lee, Keun-Wang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.2
    • /
    • pp.358-363
    • /
    • 2009
  • Text region detection from the natural scene images used in a variety of applications, many research are needed in this field. Recent research methods is to detect the text region using various algorithm which it is combination of edge based and connected component based. Therefore, this paper proposes an text region detection using edge and regional minima/maxima transformation algorithm from natural scene images, and then detect the connected components of edge and regional minima/maxima, labeling edge and regional minima/maxima connected components. Analysis the labeled regions and then detect a text candidate regions, each of detected text candidates combined and create a single text candidate image, Final text region validated by comparing the similarity and adjacency of individual characters, and then as the final text regions are detected. As the results of experiments, proposed algorithm improved the correctness of text regions detection using combined edge and regional minima/maxima connected components detection methods.