• Title/Summary/Keyword: Text comparing

Search Result 270, Processing Time 0.026 seconds

Location Inference of Twitter Users using Timeline Data (타임라인데이터를 이용한 트위터 사용자의 거주 지역 유추방법)

  • Kang, Ae Tti;Kang, Young Ok
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.69-81
    • /
    • 2015
  • If one can infer the residential area of SNS users by analyzing the SNS big data, it can be an alternative by replacing the spatial big data researches which result from the location sparsity and ecological error. In this study, we developed the way of utilizing the daily life activity pattern, which can be found from timeline data of tweet users, to infer the residential areas of tweet users. We recognized the daily life activity pattern of tweet users from user's movement pattern and the regional cognition words that users text in tweet. The models based on user's movement and text are named as the daily movement pattern model and the daily activity field model, respectively. And then we selected the variables which are going to be utilized in each model. We defined the dependent variables as 0, if the residential areas that users tweet mainly are their home location(HL) and as 1, vice versa. According to our results, performed by the discriminant analysis, the hit ratio of the two models was 67.5%, 57.5% respectively. We tested both models by using the timeline data of the stress-related tweets. As a result, we inferred the residential areas of 5,301 users out of 48,235 users and could obtain 9,606 stress-related tweets with residential area. The results shows about 44 times increase by comparing to the geo-tagged tweets counts. We think that the methodology we have used in this study can be used not only to secure more location data in the study of SNS big data, but also to link the SNS big data with regional statistics in order to analyze the regional phenomenon.

Discussion of the procedures and contents of Gangneung Danoje as a county festival (고을축제로서 강릉단오제의 절차와 내용에 대한 검토)

  • Han, Yang-Myong
    • (The) Research of the performance art and culture
    • /
    • no.18
    • /
    • pp.563-598
    • /
    • 2009
  • Gangneung Danoje is a local festival that has its origin in the county festival handed down premodern society. It was designated as an important intangible cultural asset in 1966, and has appreciated as a representative traditional festival of Korea since UNESCO designated it as 'the Oral and Intangible Heritage of Humanity' in 2005. Generally, it is known as a festival which it keeps up the premodern tradition. However, we can know that it is different from the festival performed in the 19 century in its framework and contents if we pay attention to the texts of festival performed in Gangneung today. I think that this change is a result of cultural adaptation by the change of transmission situation, and especially today texts of festival is an result of the pursuit, restoration and reproduction of its traditional form so as to be designated as a cultural asset. In this paper, after I have an accurate grasp of the traditional form of Gangneung Danoje from the present data related to Gangneung Danoje, I will compare its traditional text with the existing text which was reconstructed at the time of being designated as an important intangible cultural asset. To do this work, I verified the composition method of a county festival by investigating a general aspect of county festival in the $Chos{\breve{o}}n$ dynasty, brought out the aspect of its change by comparing its existing text with the procedures and contents of Gangneung Danoje before the Japanese occupation by force, and discussed the matter of its some changes. As a result of my work, I ascertained that the present Gangneung Danoje is very different from the festival transmitted in premodern society in the structure, time and space of festival, and the contents of performance containing the procedures of meeting god, seeing god off, and enjoying god.

Analyzing the Trend of False·Exaggerated Advertisement Keywords Using Text-mining Methodology (1990-2019) (텍스트마이닝 기법을 활용한 허위·과장광고 관련 기사의 트렌드 분석(1990-2019))

  • Kim, Do-Hee;Kim, Min-Jeong
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.4
    • /
    • pp.38-49
    • /
    • 2021
  • This study analyzed the trend of the term 'false and exaggerated advertisement' in 5,141 newspaper articles from 1990 to 2019 using text mining methodology. First of all, we identified the most frequent keywords of false and exaggerated advertisements through frequency analysis for all newspaper articles, and understood the context between the extracted keywords. Next, to examine how false and exaggerated advertisements have changed, the frequency analysis was performed by separating articles by 10 years, and the tendency of the keyword that became an issue was identified by comparing the number of academic papers on the subject of the highest keywords of each year. Finally, we identified trends in false and exaggerated advertisements based on the detailed keywords in the topic using the topic modeling. In our results, it was confirmed that the topic that became an issue at a specific time was extracted as the frequent keywords, and the keyword trends by period changed in connection with social and environmental factors. This study is meaningful in helping consumers spend wisely by cultivating background knowledge about unfair advertising. Furthermore, it is expected that the core keyword extraction will provide the true purpose of advertising and deliver its implications to companies and related employees who commit misconduct.

A Study on Comparison of Later Commentaries about Kyeokguk theory of Jeokcheonsu (『적천수(滴天髓)』 격국론의 후대 평주 간 비교연구)

  • Yi, Bo-young;Kim, Ki-Seung
    • Industry Promotion Research
    • /
    • v.7 no.1
    • /
    • pp.81-87
    • /
    • 2022
  • This study used a method of comparing and analyzing various editions of Jeokcheonsu, and aims to confirm why different views have arisen on commentaries that differ according to the perspective of one original text, which interpretation is more valid among them. The biggest part of the misunderstanding of Myeongri theory in Jeokcheonsu is Kyeokguk theory. Jeokcheonsu does not set a high value on Kyeokguk, and it is highly regarded as the Myeongri classics that emphasizes Eokbuyongsin. However, as a result of classifying the original text by theory, we can see there are about 5 sentences that directly mention Eokbu theory, but 9 sentences that explain Kyeokguk theory and 15 sentences if we include the sentences that explain Jonggyeok and Hwagyeok. Even looking that metaphoric speech is mainly used, it is also clear that it's not a book written to be read by a beginner of Myeongri. This is Myeongri texts written to convey more profound logic and enlightenment to a person who has sufficient knowledge by having learned the principle of Myeongri. A single sentence of 'Jaegwaninsubunpyeonjeong Gyeomronsiksanggyeokgukjeong' would have been sufficient to explain the Kyeokguk theory, because it's written on the assumption of the reader's level. Among the later commentaries about the theory of Myeongri contained in Jeokcheosu, 4 persons'commentaries on the original text of 'Palkyeok', 'Gwansal', Sangkwan', 'Wolryeong', 'Saengsi', 'Cheongtak' related to Kyeokguk theory was compared and analyzed.

A Study on Dataset Generation Method for Korean Language Information Extraction from Generative Large Language Model and Prompt Engineering (생성형 대규모 언어 모델과 프롬프트 엔지니어링을 통한 한국어 텍스트 기반 정보 추출 데이터셋 구축 방법)

  • Jeong Young Sang;Ji Seung Hyun;Kwon Da Rong Sae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.11
    • /
    • pp.481-492
    • /
    • 2023
  • This study explores how to build a Korean dataset to extract information from text using generative large language models. In modern society, mixed information circulates rapidly, and effectively categorizing and extracting it is crucial to the decision-making process. However, there is still a lack of Korean datasets for training. To overcome this, this study attempts to extract information using text-based zero-shot learning using a generative large language model to build a purposeful Korean dataset. In this study, the language model is instructed to output the desired result through prompt engineering in the form of "system"-"instruction"-"source input"-"output format", and the dataset is built by utilizing the in-context learning characteristics of the language model through input sentences. We validate our approach by comparing the generated dataset with the existing benchmark dataset, and achieve 25.47% higher performance compared to the KLUE-RoBERTa-large model for the relation information extraction task. The results of this study are expected to contribute to AI research by showing the feasibility of extracting knowledge elements from Korean text. Furthermore, this methodology can be utilized for various fields and purposes, and has potential for building various Korean datasets.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

An Implementation of Car Navigation System using NFC (NFC를 활용한 자동차 내비게이션 시스템의 구현)

  • Shin, Yejin;Seol, Soonuk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.5
    • /
    • pp.1194-1200
    • /
    • 2014
  • One of problems in car navigation systems is that a number of inputs are required for a directions search. This may even causes a car accident due to poor concentration while driving. In order to solve the issue we design and implement a safe and easy-to-use car navigation system which allows a driver to look up directions with just a single tap. Our smartphone app creates an NFC tag by deriving location information from a text, voice, GPS, and so on. The driver can then get the directions by just tapping a phone or card with the Tag on the navigation system. We show the feasibility and effectiveness of our system by comparing our implementation with the existing car navigation systems.

A Review of the Opinion Target Extraction using Sequence Labeling Algorithms based on Features Combinations

  • Aziz, Noor Azeera Abdul;MohdAizainiMaarof, MohdAizainiMaarof;Zainal, Anazida;HazimAlkawaz, Mohammed
    • Journal of Internet Computing and Services
    • /
    • v.17 no.5
    • /
    • pp.111-119
    • /
    • 2016
  • In recent years, the opinion analysis is one of the key research fronts of any domain. Opinion target extraction is an essential process of opinion analysis. Target is usually referred to noun or noun phrase in an entity which is deliberated by the opinion holder. Extraction of opinion target facilitates the opinion analysis more precisely and in addition helps to identify the opinion polarity i.e. users can perceive opinion in detail of a target including all its features. One of the most commonly employed algorithms is a sequence labeling algorithm also called Conditional Random Fields. In present article, recent opinion target extraction approaches are reviewed based on sequence labeling algorithm and it features combinations by analyzing and comparing these approaches. The good selection of features combinations will in some way give a good or better accuracy result. Features combinations are an essential process that can be used to identify and remove unneeded, irrelevant and redundant attributes from data that do not contribute to the accuracy of a predictive model or may in fact decrease the accuracy of the model. Hence, in general this review eventually leads to the contribution for the opinion analysis approach and assist researcher for the opinion target extraction in particular.

Closed loop type MCV(Main Control Valve) for Hydraulic Excavator (유압 굴삭기용 폐루프 타입 MCV(Main Control Valve))

  • Lim T.H.;Lee H.S.;Yang S.Y.
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2005.06a
    • /
    • pp.864-870
    • /
    • 2005
  • Hydraulic excavators have been popular devices in construction field because of its multi-workings and economic efficiency. The mathematical models of excavators have many nonlinearities because of nonlinear opening characteristics and dead zone of main control valve, oil temperature variation, etc. The objective of this paper is to develop a simulator for hydraulic excavator using AMESim. Components and whole circuit are expressed graphically. Parameters and nonlinear characteristics are inputted in text style. From the simulation results, fixed spring stiffness of MCV can't satisfy accuracy of spool displacement under whole P-Q diagrams. Closed loop type MCV containing proportional gain is proposed in this paper that can reduce displacement error. The ability of closed loop MCV is verified through comparing with normal type MCV using AMESim simulator. The simulator can be used to forecastexcavator behavior when new components, new mechanical attachments, hydraulic circuit changes, and new control algorithm are applied. The simulator could be a kind of development platform for various new excavators.

  • PDF

A Study on Informetric Analysis for Measuring the Qualitative Research Performance (연구성과의 질적 평가를 위한 계량정보학적 분석에 관한 연구)

  • Kang, Dae-Shin;Moon, Sung-Been
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.377-394
    • /
    • 2009
  • There are some limitations in the existing bibliometric methods to satisfy the various requests of the interest parties including researchers, managers, policy makers to identify 1) which research group or researcher is the key player, and the overall trends of the particular technological sub-fields, 2) which research groups, institutions or countries mainly use their research outputs, 3) what are the spin-offs from research outputs to some scientific and technological fields, 4) in which levels they are when comparing their quantitative and qualitative research outputs to those of other competitive institutions. It is essential to develop new informetric indicators and methodologies in order to satisfy stakeholder's various demands and to strengthen qualitative analysis in measuring research performance. This study suggested informetric indicators such as article quality index, citation impact index, international cooperation index, excellent article production index and methodologies including citation analysis, text mining.