• Title/Summary/Keyword: 텍스트 빈도 분석

Search Result 332, Processing Time 0.025 seconds

Maritime Safety Tribunal Ruling Analysis using SentenceBERT (SentenceBERT 모델을 활용한 해양안전심판 재결서 분석 방법에 대한 연구)

  • Bori Yoon;SeKil Park;Hyerim Bae;Sunghyun Sim
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.7
    • /
    • pp.843-856
    • /
    • 2023
  • The global surge in maritime traffic has resulted in an increased number of ship collisions, leading to significant economic, environmental, physical, and human damage. The causes of these maritime accidents are multifaceted, often arising from a combination of crew judgment errors, negligence, complexity of navigation routes, weather conditions, and technical deficiencies in the vessels. Given the intricate nuances and contextual information inherent in each incident, a methodology capable of deeply understanding the semantics and context of sentences is imperative. Accordingly, this study utilized the SentenceBERT model to analyze maritime safety tribunal decisions over the last 20 years in the Busan Sea area, which encapsulated data on ship collision incidents. The analysis revealed important keywords potentially responsible for these incidents. Cluster analysis based on the frequency of specific keyword appearances was conducted and visualized. This information can serve as foundational data for the preemptive identification of accident causes and the development of strategies for collision prevention and response.

Online Privacy Protection: An Analysis of Social Media Reactions to Data Breaches (온라인 정보 보호: 소셜 미디어 내 정보 유출 반응 분석)

  • Seungwoo Seo;Youngjoon Go;Hong Joo Lee
    • Knowledge Management Research
    • /
    • v.25 no.1
    • /
    • pp.1-19
    • /
    • 2024
  • This study analyzed the changes in social media reactions of data subjects to major personal data breach incidents in South Korea from January 2014 to October 2022. We collected a total of 1,317 posts written on Naver Blogs within a week immediately following each incident. Applying the LDA topic modeling technique to these posts, five main topics were identified: personal data breaches, hacking, information technology, etc. Analyzing the temporal changes in topic distribution, we found that immediately after a data breach incident, the proportion of topics directly mentioning the incident was the highest. However, as time passed, the proportion of mentions related indirectly to the personal data breach increased. This suggests that the attention of data subjects shifts from the specific incident to related topics over time, and interest in personal data protection also decreases. The findings of this study imply a future need for research on the changes in privacy awareness of data subjects following personal data breach incidents.

Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec (Word2Vec 기반의 의미적 유사도를 고려한 웹사이트 키워드 선택 기법)

  • Lee, Donghun;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.83-96
    • /
    • 2018
  • Extracting keywords representing documents is very important because it can be used for automated services such as document search, classification, recommendation system as well as quickly transmitting document information. However, when extracting keywords based on the frequency of words appearing in a web site documents and graph algorithms based on the co-occurrence of words, the problem of containing various words that are not related to the topic potentially in the web page structure, There is a difficulty in extracting the semantic keyword due to the limit of the performance of the Korean tokenizer. In this paper, we propose a method to select candidate keywords based on semantic similarity, and solve the problem that semantic keyword can not be extracted and the accuracy of Korean tokenizer analysis is poor. Finally, we use the technique of extracting final semantic keywords through filtering process to remove inconsistent keywords. Experimental results through real web pages of small business show that the performance of the proposed method is improved by 34.52% over the statistical similarity based keyword selection technique. Therefore, it is confirmed that the performance of extracting keywords from documents is improved by considering semantic similarity between words and removing inconsistent keywords.

A Study on Marine Accident Ontology Development and Data Management: Based on a Situation Report Analysis of Southwest Coast Marine Accidents in Korea (해양사고 온톨로지 구축 및 데이터 관리방안 연구: 서해남부해역 선박사고 상황보고서 분석을 중심으로)

  • Lee, Young Jai;Kang, Seong Kyung;Gu, Ja-Yeong
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.25 no.4
    • /
    • pp.423-432
    • /
    • 2019
  • Along with an increase in marine activities every year, the frequency of marine accidents is on the rise. Accordingly, various research activities and policies for marine safety are being implemented. Despite these efforts, the number of accidents are increasing every year, bringing their effectiveness into question. Preliminary studies relying on annual statistical reports provide precautionary measures for items that stand out significantly, through the comparison of statistical provision items. Since the 2000s, large-scale marine accidents have repeatedly occurred, and case studies have examined the "accident response." Likewise, annual statistics or accident cases are used as core data in policy formulation for domestic maritime safety. However, they are just a summary of post-accident results. In this study, limitations of current marine research and policy are evaluated through a literature review of case studies and analyses of marine accidents. In addition, the ontology of the marine accident information classification system will be revised to improve the current limited usage of the information through an attribute analysis of boating accident status reports and text mining. These aspects consist of the reporter, the report method, the rescue organization, corrective measures, vulnerability of response, payloads, cause of oil spill, damage pattern, and the result of an accident response. These can be used consistently in the future as classified standard terms to collect and utilize information more efficiently. Moreover, the research proposes a data collection and quality assurance method for the practical use of ontology. A clear understanding of the problems presently faced in marine safety will allow "suf icient quality information" to be leveraged for the purpose of conducting various researches and realizing effective policies.

Analysis of Municipal Ordinances for Smart Cities of Municipal Governments: Using Topic Modeling (지방자치단체의 스마트시티 조례 분석: 토픽모델링을 활용하여)

  • Hyungjun Seo
    • Informatization Policy
    • /
    • v.30 no.1
    • /
    • pp.41-66
    • /
    • 2023
  • This study aims to reveal the direction of municipal ordinances for smart cities, while focusing on 74 municipal ordinances from 72 municipal governments through topic modeling. As a result, the main keywords that show a high frequency belong to establishment and operations of the Smart City Committee. From the result of topic modeling Latent Dirichlet Allocation(LDA), it classifies municipal ordinances for smart cities into eight topics as follows: Topic 1(security for process of smart cities), Topic 2(promotion of smart city industry), Topic 3(composition of a smart city consultative body for local residents), Topic 4(support system for smart cities), Topic 5(management for personal information), Topic 6(use of smart city data), Topic 7(implementation for intelligent public administration), and Topic 8(smart city promotion). As for topic categorization by region, Topics 5, 6, and 8 which are mostly related to the practical operation of smart cities have a significant portion of municipal ordinances for smart cities in the Seoul metropolitan area. Then, Topics 2, 3, and 4 which are mostly related to the initial implementation of smart cities have a significant portion of municipal ordinances for smart cities in provincial areas.

An Exploration of Discrepancies between Text and Content Knowledge of Pre-service Elementary Teachers through an Analysis of Questions and Answers Created in the Interactive Reading of a Teacher's Guide: Focusing on a 'Shadow and Mirror' Unit (상호작용적 독해 과정에서 생성된 질문과 답변의 분석을 통한 교사용 지도서와 초등예비교사의 내용지식 사이의 불일치 탐색 -'그림자와 거울' 단원을 중심으로)

  • Arla Go;Jiwon Lee
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.3
    • /
    • pp.253-263
    • /
    • 2023
  • This study explored the discrepancy between the text of a teacher's guide about straight and reflective light and the content knowledge of pre-service elementary teachers. A total of 455 questions and 543 answers generated by 279 pre-service elementary teachers after reading a 'Shadow and Mirror' unit in the teacher's guide were analyzed. The questions were classified according to the types of concepts and discrepancies, and the answers were analyzed for accuracy. The results of analyzing the concepts of questions revealed that the pre-service elementary teachers were most curious about the shadow in the straight concept, the mirror image in the reflection concept, and the light source in other concepts. The questions with a low correct answer rate due to incorrect- or non-answers, such as those concerning the superposition principle of light by reflection, the principle of experimental tools, and images by lenses, were only partially or not included in the teacher's guide. When the questions were classified according to the type of discrepancy, the frequency of questions due to knowledge deficit was higher than that due to knowledge clash. This demonstrates that the concepts that teachers need to know must be supplemented with the contents of the teacher's guide. Discrepancies due to knowledge clashes are often caused by conflicts between what is experienced in everyday life and what is presented in textbooks. Therefore, it is necessary to reduce the discrepancy between the texts of the teacher's guide and the knowledge of pre-service elementary teachers by including the differences between the actual context of everyday life and the context of the textbook in the teacher's guide.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

A Study on Eva Armisen's Artworks -Focused on Beauty of Universality, Deterritorialization of Art and Design- (에바 알머슨 작품 연구 -보편성의 미, 미술과 디자인의 탈경계를 중심으로-)

  • Byun, Trina Hyunjin
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.8
    • /
    • pp.435-447
    • /
    • 2016
  • In the 21st century, the phenomenon of interaction in between fine art and design has become more increasingly prevalent. In this paper, the author has analyzed the major works of Spanish artist Eva $Armis{\dot{e}}n$ on a cultural criticism perspective, and has proposed a framework for a deeper understating of the artworks, which reflected the characteristics of contemporary art and culture such as deterritorialization of art and design. As a result, it has been found that the main theme of her artworks is about preferred attitude of a human being in relationship with others, unlike daily lives or innocence of childhood which are well-known subjects to the public. Her main female character could have been formed by blending all of her aesthetic reason, and characteristics of this era and cultural elements. It means that the area where the public enjoys the sense of beauty have been extending from the area of the beautiful to the beauty of universality. It has been found that deterriorialization phenomena, which is a characteristic of post-modern art and design work to dismantle an existing order, the repression, appeared in her work. However, several research areas of her works such as relationship between text and image or formative elements or aphorism etc. still have remained to be solved.

Analysis method of patent document to Forecast Patent Registration (특허 등록 예측을 위한 특허 문서 분석 방법)

  • Koo, Jung-Min;Park, Sang-Sung;Shin, Young-Geun;Jung, Won-Kyo;Jang, Dong-Sik
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.4
    • /
    • pp.1458-1467
    • /
    • 2010
  • Recently, imitation and infringement rights of an intellectual property are being recognized as impediments to nation's industrial growth. To prevent the huge loss which comes from theses impediments, many researchers are studying protection and efficient management of an intellectual property in various ways. Especially, the prediction of patent registration is very important part to protect and assert intellectual property rights. In this study, we propose the patent document analysis method by using text mining to predict whether the patent is registered or rejected. In the first instance, the proposed method builds the database by using the word frequencies of the rejected patent documents. And comparing the builded database with another patent documents draws the similarity value between each patent document and the database. In this study, we used k-means which is partitioning clustering algorithm to select criteria value of patent rejection. In result, we found conclusion that some patent which similar to rejected patent have strong possibility of rejection. We used U.S.A patent documents about bluetooth technology, solar battery technology and display technology for experiment data.

Exploration on Modern People's Emotion regarding Abolition of Racing Model (레이싱 모델 폐지에 관한 현대인의 감성 탐색)

  • Jung, Sang-Pil
    • Journal of Digital Convergence
    • /
    • v.18 no.11
    • /
    • pp.571-579
    • /
    • 2020
  • The purpose of the study was to explore modern people's emotion regarding sex commercialization related to the abolition of grid girl. To collect data, based on 'reply journalism', this study collected 15 blogs, 10 online cafe contents, 1 youtube video clip, and 364 replies associated with the three online contents. To analyze the data, interpretive text analysis was utilized and the following results were obtained. As results, the analysis on the replies shows that the most strong emotion of the modern people regarding the abolition of grid girl is anti-feminism that includes hatred toward feminists and even females, criticism on feminism, and notion of 'women's enemy is women themselves'. In addition, sympathy toward racing models who lost their jobs, requirement of same abolition to the people with similar occupations, spatial separation between men and women, and consent on the abolition of racing models were found. Unlike the feminists' emotion regarding sex commercialization and racing models, modern people's emotion was different from them. Rather, ordinary people have doubted and even criticized on the rationales of feminism. Unlike feminists' notion about sex commercialization of racing models, these results imply that social image of racing models has changed and wish their position is respected as an ordinary occupation, without issues of sex commercialization.