• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.028 seconds

Analysis of the National Police Agency business trends using text mining (텍스트 마이닝 기법을 이용한 경찰청 업무 트렌드 분석)

  • Sun, Hyunseok;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.301-317
    • /
    • 2019
  • There has been significant research conducted on how to discover various insights through text data using statistical techniques. In this study we analyzed text data produced by the Korean National Police Agency to identify trends in the work by year and compare work characteristics among local authorities by identifying distinctive keywords in documents produced by each local authority. A preprocessing according to the characteristics of each data was conducted and the frequency of words for each document was calculated in order to draw a meaningful conclusion. The simple term frequency shown in the document is difficult to describe the characteristics of the keywords; therefore, the frequency for each term was newly calculated using the term frequency-inverse document frequency weights. The L2 norm normalization technique was used to compare the frequency of words. The analysis can be used as basic data that can be newly for future police work improvement policies and as a method to improve the efficiency of the police service that also help identify a demand for improvements in indoor work.

A study on the Elements of Interest for VR Game Users Using Text Mining and Text Network Analysis - Focused on STEAM User Review Data - (텍스트마이닝과 네트워크 분석을 적용한 VR 게임 사용자의 관심 요소 연구 - STEAM 사용자 리뷰 데이터를 중심으로 -)

  • Wui, Min-Young;Na, Ji Young;Park, Young Il
    • Journal of Korea Game Society
    • /
    • v.18 no.6
    • /
    • pp.69-82
    • /
    • 2018
  • The need of high quality VR contents has been steadily raised in recent years. Therefore, this study investigated the user's interest factors of VR game which is receiving the most attention among VR contents. We used STEAM review data and applied Text mining and Network analysis to perform this research. As a result, it was possible to confirm 4 word clusters related VR game users. Each cluster is named by 'presence', 'first person view game', 'auditory factor' and 'interaction'. This study has its meaning. First, user related research would be very helpful to develop high quality VR game. Second, it confirms that review data of VR game users can be structured, analyzed and used.

Rural Tourism Image and Major Activity Space in Gochang County Shown in Social Data - Focusing on the Keyword 'Gochang-gun Travel' - (소셜데이터에 나타난 고창군의 농촌관광 이미지와 주요 활동공간 - '고창군 여행' 키워드를 중심으로 -)

  • Kim, Young-Jin;Son, Gwangryul;Lee, Dongchae;Son, Yong-hoon
    • Journal of Korean Society of Rural Planning
    • /
    • v.27 no.3
    • /
    • pp.103-116
    • /
    • 2021
  • In this study, the characteristics of rural tourism image perceived by urban residents were analyzed through text analysis of blog data. In order to examine the images related to rural tourism, blog data written with the keyword "Gochang-gun travel" was used. LDA topic analysis, one of the text mining techniques, was used for the analysis. In the tourism image of Gochang-gun, 9 topics were derived, and 112 major places appeared. This was divided into 3 main activities and 5 object spaces through the review of keywords and the original text of blog data. As a result of the analysis, the traditional main resources of the region, Seonun mountain, Seonun temple, and Gochang-eup fortress, formed topic. On the other hand, world heritage such as dolmen and Ungok wetland did not appear as topic. In particular, the farms operated by the private sector form individual topics, and the theme farm can be seen as an important resource for tourism in Gochang-gun. Also, through the distribution of place keywords, it was possible to understand the characteristics of travel by region and the usage behavior of visitors. In the case of Gochang-gun, there was a phenomenon in which visitors were biased by region. This seems to be the result of Gochang-gun seeking to vitalize local tourism focusing on natural, ecological, and scenic resources. It is necessary to establish a plan for balanced regional development and develop other types of tourism resources. This study is different in that it identified the types and characteristics of rural tourism images in the region perceived by visitors, and the status of tourism at the regional level.

Single Shot Detector for Detecting Clickable Object in Mobile Device Screen (모바일 디바이스 화면의 클릭 가능한 객체 탐지를 위한 싱글 샷 디텍터)

  • Jo, Min-Seok;Chun, Hye-won;Han, Seong-Soo;Jeong, Chang-Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.1
    • /
    • pp.29-34
    • /
    • 2022
  • We propose a novel network architecture and build dataset for recognizing clickable objects on mobile device screens. The data was collected based on clickable objects on the mobile device screen that have numerous resolution, and a total of 24,937 annotation data were subdivided into seven categories: text, edit text, image, button, region, status bar, and navigation bar. We use the Deconvolution Single Shot Detector as a baseline, the backbone network with Squeeze-and-Excitation blocks, the Single Shot Detector layer structure to derive inference results and the Feature pyramid networks structure. Also we efficiently extract features by changing the input resolution of the existing 1:1 ratio of the network to a 1:2 ratio similar to the mobile device screen. As a result of experimenting with the dataset we have built, the mean average precision was improved by up to 101% compared to baseline.

A Case Study on Text Analysis Using Meal Kit Product Review Data (밀키트 제품 리뷰 데이터를 이용한 텍스트 분석 사례 연구)

  • Choi, Hyeseon;Yeon, Kyupil
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.1-15
    • /
    • 2022
  • In this study, text analysis was performed on the mealkit product review data to identify factors affecting the evaluation of the mealkit product. The data used for the analysis were collected by scraping 334,498 reviews of mealkit products in Naver shopping site. After preprocessing the text data, wordclouds and sentiment analyses based on word frequency and normalized TF-IDF were performed. Logistic regression model was applied to predict the polarity of reviews on mealkit products. From the logistic regression models derived for each product category, the main factors that caused positive and negative emotions were identified. As a result, it was verified that text analysis can be a useful tool that provides a basis for maximizing positive factors for a specific category, menu, and material and removing negative risk factors when developing a mealkit product.

The Effect of Text Information Frame Ratio and Font Size on the Text Readability of Circle Smartwatch

  • Park, Seungtaek;Park, Jaekyu;Choe, Jaeho;Jung, Eui S.
    • Journal of the Ergonomics Society of Korea
    • /
    • v.33 no.6
    • /
    • pp.499-513
    • /
    • 2014
  • Objective: The objective of this study was to examine frame ratio of text information and font size in the circle smartwatch. Background: Recently, electronic manufacturers try to develop the original metaphor of traditional wrist watch (circle) in terms of smartwatch. They endeavor to break the square display in order to improve emotional customer satisfaction. Method: The experiments examined twenty level of text information design, combinations of four frame ratios (1:1, 4:3, 16:9, 21:9) and five font sizes (6pt, 7pt, 8pt, 9pt, 10pt). Nineteen participants volunteered for the experiment. Dependent variables were WPM (Words per Minute), reading preference, design preference and total preference. Furthermore, small circle display was made by using circle display data (1.3inch), which was exhibited in IFA (International Funkausstellung) 2014. Results: As a result, ANOVA (Analysis of Variance) revealed that WPM, and task time preference affect the specific frame ratio and font size. Results of ANOVA for reading preference, design preference, total preference were grouped by post-analysis LSD (Least Significant Difference). Among users, display ratio (16:9, 21:9), and font size (9pt) were preferred. In conclusion, 16:9 display ratio and 9pt are adaptable for text information in 1.3inch circle display. Conclusion: From the study, it is shown that 16:9 display ratio and 9pt size are more adaptable for text information in 1.3inch circle display than others. It is mainly due to the fact that the order of frame ratio and font size may affect the usability of reading long text information in a small circle display. Therefore, when developers design a circle display, the square frame ratio and font size are required to be considered according to circle size. Application: The 16:9 display ratio and 9pt font size may be utilized as a text information frame in the circle display design guideline for smartwatch.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

Multidimensional Analysis of XML Documents using XML Cubes (XML 큐브를 이용한 다차원 XML 문서 분석)

  • Park, Byung-Kwon
    • Proceedings of the Korea Association of Information Systems Conference
    • /
    • 2005.05a
    • /
    • pp.65-78
    • /
    • 2005
  • Nowadays, large amounts of XML documents are available on the Internet. Thus, we need to analyze them multi-dimensionally in the same way as relational data. In this paper, we propose a new frame-work for multidimensional analysis of XML documents, which we call XML-OLAP. We base XML-OLAP on XML warehouses where every fact data as well as dimension data are stored as XML documents. We build XML cubes from XML warehouses. We propose a new multidimensional expression language for XML cubes, which we call XML-MDX. XML-MDX statements target XML cubes and use XQuery expressions to designate the measure data. They specify text mining operators for aggregating text constituting the measure data. We evaluate XML-OLAP by applying it to a U.S. patent XML warehouse. We use XML-MDX queries, which demonstrate that XML-OLAP is effective for multi-dimensionally analyzing the U.S. patents.

  • PDF

A study for system design that guarantees the integrity of computer files based on blockchain and checksum

  • Kim, Minyoung
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.392-401
    • /
    • 2021
  • When a data file is shared through various methods on the Internet, the data file may be damaged in various cases. To prevent this, some websites provide the checksum value of the download target file in text data type. The checksum value provided in this way is then compared with the checksum value of the downloaded file and the published checksum value. If they are the same, the file is regarded as the same. However, the checksum value provided in text form is easily tampered with by an attacker. Because of this, if the correct checksum cannot be verified, the reliability and integrity of the data file cannot be ensured. In this paper, a checksum value is generated to ensure the integrity and reliability of a data file, and this value and related file information are stored in the blockchain. After that, we will introduce the research contents for designing and implementing a system that provides a function to share the checksum value stored in the block chain and compare it with other people's files.

Evaluation of Similarity Analysis of Newspaper Article Using Natural Language Processing

  • Ayako Ohshiro;Takeo Okazaki;Takashi Kano;Shinichiro Ueda
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.6
    • /
    • pp.1-7
    • /
    • 2024
  • Comparing text features involves evaluating the "similarity" between texts. It is crucial to use appropriate similarity measures when comparing similarities. This study utilized various techniques to assess the similarities between newspaper articles, including deep learning and a previously proposed method: a combination of Pointwise Mutual Information (PMI) and Word Pair Matching (WPM), denoted as PMI+WPM. For performance comparison, law data from medical research in Japan were utilized as validation data in evaluating the PMI+WPM method. The distribution of similarities in text data varies depending on the evaluation technique and genre, as revealed by the comparative analysis. For newspaper data, non-deep learning methods demonstrated better similarity evaluation accuracy than deep learning methods. Additionally, evaluating similarities in law data is more challenging than in newspaper articles. Despite deep learning being the prevalent method for evaluating textual similarities, this study demonstrates that non-deep learning methods can be effective regarding Japanese-based texts.