• Title/Summary/Keyword: 빈도 기반 텍스트 분석

Search Result 106, Processing Time 0.02 seconds

A Study on an Automatic Summarization System Using Verb-Based Sentence Patterns (술어기반 문형정보를 이용한 자동요약시스템에 관한 연구)

  • 최인숙;정영미
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.4
    • /
    • pp.37-55
    • /
    • 2001
  • The purpose of this study is to present a text summarization system using a knowledge base containing information about verbs and their arguments that are statistically obtained from a subject domain. The system consists of two modules: the training module and the summarization module. The training module is to extract cue verbs and their basic sentence patterns by counting the frequency of verbs and case markers respectively, and the summarization module is substantiate basic sentence patterns and to generate summaries. Basic sentence patterns are substantiated by applying substantiation rules to the syntactics structure of sentences. A summary is then produced by connecting simple sentences that the are generated through the substantiation module of basic sentence patterns. ‘robbery’in the daily newspapers are selected for a test collection. The system generates natural summaries without losing any essential information by combining both cue verbs and essential arguments. In addition, the use of statistical techniques makes it possible to apply this system to other subject domains through its learning capability.

  • PDF

WV-BTM: A Technique on Improving Accuracy of Topic Model for Short Texts in SNS (WV-BTM: SNS 단문의 주제 분석을 위한 토픽 모델 정확도 개선 기법)

  • Song, Ae-Rin;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.51-58
    • /
    • 2018
  • As the amount of users and data of NS explosively increased, research based on SNS Big data became active. In social mining, Latent Dirichlet Allocation(LDA), which is a typical topic model technique, is used to identify the similarity of each text from non-classified large-volume SNS text big data and to extract trends therefrom. However, LDA has the limitation that it is difficult to deduce a high-level topic due to the semantic sparsity of non-frequent word occurrence in the short sentence data. The BTM study improved the limitations of this LDA through a combination of two words. However, BTM also has a limitation that it is impossible to calculate the weight considering the relation with each subject because it is influenced more by the high frequency word among the combined words. In this paper, we propose a technique to improve the accuracy of existing BTM by reflecting semantic relation between words.

A Study on the Product Planning Model based on Word2Vec using On-offline Comment Analysis (온·오프라인 댓글 분석이 활용된 Word2Vec 기반 상품기획 모델연구)

  • Ahn, Yeong-Hwi;Jung, Jin-Young;Park, Koo-Rack
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.79-80
    • /
    • 2021
  • 인터넷은 우리 경제를 디지털 경제로 변화시키며 전자상거래도 증가하고 있다. 따라서 구매자가 전자상거래에서 남기는 긍정적인, 부정적인 상품평은 상품기획의 주요 정보가 될 수 있다. 본 논문에서는 버티컬 무소음 마우스 10,000개에 대한 정형화된 데이터셋을 Word2Vec을 이용하여 유사도 분석, 온라인 상품평 빈도분석 상위 50개 단어를 제시하여 실제 상품을 사용한 후 설문조사 시행을 하였다. 온라인 상품평 유사도 분석결과 클릭 키워드에 대한 장점으로 통증(.986), 디자인(.982)가 분석되었으며 단점은 적응(.866), 불편(.854)이었다. 오프라인 상품평에서는 장점으로 디자인(17명), 단점으로 불편(11명)이었다. 또한 온라인과 오프라인의 상품평을 비교함으로써 구매자의 긍정, 부정의 의미를 교차 확인하여 유의미한 정보를 제시 하였다고 볼수 있다. 따라서 본 연구에서 제시하는 상품기획 프로세스를 신상품 개발 및 기존 상품의 개선 전략으로 적용할 수 있겠다.

  • PDF

Topic-Network based Topic Shift Detection on Twitter (트위터 데이터를 이용한 네트워크 기반 토픽 변화 추적 연구)

  • Jin, Seol A;Heo, Go Eun;Jeong, Yoo Kyung;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.1
    • /
    • pp.285-302
    • /
    • 2013
  • This study identified topic shifts and patterns over time by analyzing an enormous amount of Twitter data whose characteristics are high accessibility and briefness. First, we extracted keywords for a certain product and used them for representing the topic network allows for intuitive understanding of keywords associated with topics by nodes and edges by co-word analysis. We conducted temporal analysis of term co-occurrence as well as topic modeling to examine the results of network analysis. In addition, the results of comparing topic shifts on Twitter with the corresponding retrieval results from newspapers confirm that Twitter makes immediate responses to news media and spreads the negative issues out quickly. Our findings may suggest that companies utilize the proposed technique to identify public's negative opinions as quickly as possible and to apply for the timely decision making and effective responses to their customers.

The Stream of Uncertainty in Scientific Knowledge using Topic Modeling (토픽 모델링 기반 과학적 지식의 불확실성의 흐름에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.1
    • /
    • pp.191-213
    • /
    • 2019
  • The process of obtaining scientific knowledge is conducted through research. Researchers deal with the uncertainty of science and establish certainty of scientific knowledge. In other words, in order to obtain scientific knowledge, uncertainty is an essential step that must be performed. The existing studies were predominantly performed through a hedging study of linguistic approaches and constructed corpus with uncertainty word manually in computational linguistics. They have only been able to identify characteristics of uncertainty in a particular research field based on the simple frequency. Therefore, in this study, we examine pattern of scientific knowledge based on uncertainty word according to the passage of time in biomedical literature where biomedical claims in sentences play an important role. For this purpose, biomedical propositions are analyzed based on semantic predications provided by UMLS and DMR topic modeling which is useful method to identify patterns in disciplines is applied to understand the trend of entity based topic with uncertainty. As time goes by, the development of research has been confirmed that uncertainty in scientific knowledge is moving toward a decreasing pattern.

The Effect of Changes in Airbnb Host's Marketing Strategy on Listing Performance in the COVID-19 Pandemic (COVID-19 팬데믹에서 Airbnb 호스트의 마케팅 전략의 변화가 공유성과에 미치는 영향)

  • Kim, So Yeong;Sim, Ji Hwan;Chung, Yeo Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.1-27
    • /
    • 2021
  • The entire tourism industry is being hit hard by the COVID-19 as a global pandemic. Accommodation sharing services such as Airbnb, which have recently expanded due to the spread of the sharing economy, are particularly affected by the pandemic because transactions are made based on trust and communication between consumer and supplier. As the pandemic situation changes individuals' perceptions and behavior of travel, strategies for the recovery of the tourism industry have been discussed. However, since most studies present macro strategies in terms of traditional lodging providers and the government, there is a significant lack of discussion on differentiated pandemic response strategies considering the peculiarity of the sharing economy centered on peer-to-peer transactions. This study discusses the marketing strategy for individual hosts of Airbnb during COVID-19. We empirically analyze the effect of changes in listing descriptions posted by the Airbnb hosts on listing performance after COVID-19 was outbroken. We extract nine aspects described in the listing descriptions using the Attention-Based Aspect Extraction model, which is a deep learning-based aspect extraction method. We model the effect of aspect changes on listing performance after the COVID-19 by observing the frequency of each aspect appeared in the text. In addition, we compare those effects across the types of Airbnb listing. Through this, this study presents an idea for a pandemic crisis response strategy that individual service providers of accommodation sharing services can take depending on the listing type.

Design and Implementation of a Distribute Multimedia System (분산 멀티미디어 스트리밍 시스템 설계 및 구현)

  • Kim, Sang-Kuk;Shin, Hwa-Jong;Kim, Se-Young;Shin, Dong-Kyoo;Shin, Dong-Il
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.10a
    • /
    • pp.677-680
    • /
    • 2000
  • 웹이 등장하면서 지금까지 인터넷 상에서 텍스트와 이미지를 이용하여 정보를 표현하고 전달하는 방법이 가장 많이 사용되어왔다. 그러나 웹 관련 기술의 비약적인 발달과 네트워크 속도의 증가 및 인터넷의 급속한 보급으로 단순한 텍스트와 이미지 중심의 HTML 문서를 이용한 정보의 전달이 아닌 멀티미디어 데이터를 이용한 정보의 표현과 전달이 점차 증대되고 있다. 이에 따라 멀티미디어 데이터를 전송하기 위한 스트리밍 프로토콜도 등장하였다. 최근에는 컴퓨터의 성능 증가 및 네트워크 속도의 증가(초고속 통신 서비스의 보급)에 의해 멀티미디어 데이터의 전송이 가능하게됨으로써 기존의 공중파나 CATV 방송국의 형태 지니고 인터넷 상에서 실시간 생방송 서비스와 VOD(Video On Demand) 서비스를 제공하는 인터넷 방송국이 급속하게 생겨나고 있다.[11] 인터넷 방송은 동영상과 오디오의 실시간 전달을 가능하게 하는 멀티미디어 스트리밍 기술과 멀티미디어를 실시간으로 전송할 수 있는 실시간 전송 프로토콜을 기반으로 발전하고 있다. 인터넷 상에서 멀티미디어 스트리밍 서비스를 하는 대부분의 인터넷 방송은 스트리밍 서버로서 RealNetworks사의 RealSystem과 Microsoft사의 WMT(Windows Media Technologies)를 사용하고 있다. 본 논문은 Real Server와 WMT의 비교 분석을 통해 실시간 전송 프로토콜을 지원하고, 멀티미디어 스트리밍 기술을 지원하는 자바를 기반으로 한 분산 서버 구조의 스트리밍 서버, 서버간의 부하를 제어하는 미들웨어, 멀티미디어 스트림을 재생할 수 있는 클라이언트를 설계하고 구현한다. 방법에 대해서 자세히 살펴보고 실제 SQL Server 7.0 환경에서 구축된 공간 엔진 및 OLE DB 제공자 컴포넌트의 구현 예에 대하여 살펴볼 것이다. 혈액내방사선 조사량이 안전용량 범위(200rad)에 속하며 48시간 후 체내잔류량은 서양인과 큰 차이가 없었다.비출현의 소견을 보이는 악성종양 환자의 골 신티그람 53개중 44개 (83.0%)에서 척추 및 늑골에 미만성, 또는 다발성 침습이 관찰되었다. 또 골전이 부위를 두개골, 척추, 견대부, 늑골, 골반, 사지의 근위부 장골의 6개 부위로 나누어 분석할 경우 49개(92.5%)에서 3부위 이상에 전이가 발견되었고, 35개(66.0%)에서 4부위 이상에 전이가 발견되었으며, 5부위 이상, 6개 부위에 모두 전이가 발견된 것은 각각 20개 (37 7%), 11개(20.8%)이었다. 이상의 성적으로 보아 악성종양 환자의 골 신피그라피에서 신장 영상의 비출현은 종양의 광범위한 골전이를 간접적으로 시사하는 소견으로 생각된다. 여러 악성종양중 전립선암에서 신장 영상 비출현의 빈도가 가장 높았으며, 특히 위암에서 골전이 및 신장 영상 비출현의 빈도가 높음은 주목할 만한 것이라 하겠다.출한 결과 인,규소 증가와 자가영양성 미소플랑크톤(ANP)증가에 미치는 요인이 해안과 외해에서 동일하게 가장 큰 설명력을 보였다. N:P 비도 해안에서 36.4, 외해에서 32.6을 보이고 있어 인이 상당히 부족한 것으로 나타났다. 따라서 조사해역은 인이 식물플랑크톤 성장에 중요한 제한요인으로 작용하고 있다고 판단된다.의 회전. 전위력의 강도, 적용시점, 그리고 키, 체중등의 신체적 요인 등이 있으나 능숙한 기계사

  • PDF

Falling Accidents Analysis in Construction Sites by Using Topic Modeling (토픽 모델링을 이용한 건설현장 추락재해 분석)

  • Ryu, Hanguk
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.7
    • /
    • pp.175-182
    • /
    • 2019
  • We classify topics on fall incidents occurring in construction sites using topic modeling among machine learning techniques and analyze the causes of the accidents according to each topic. In order to apply topic modeling based on latent dirichlet allocation, text data was preprocessed and evaluated with Perplexity score to improve the reliability of the model. The most common falling accidents happened to the daily workers belonging to small construction site. Most of the causes were not operated properly due to lack of safety equipment, inadequacy of arrangement and wearing, and low performance of safety equipment. In order to prevent and reduce the falling accidents, it is important to educate the daily workers of small construction site, arrange the workplace, and check the wearing of personal safety equipment and device.

The Research of New Multimedia Design Development on Internet - Focus on the Type - (인터넷에서의 뉴멀티미디어 디자인 개발에 관한 연구 - 서체의 활용을 중심으로 -)

  • 류성현;신계옥;이은주;이현주
    • Archives of design research
    • /
    • v.11 no.3
    • /
    • pp.47-55
    • /
    • 1998
  • Homepage design on the Web is incredibly growing fast as the integrative information method than any other media. At the beginning the homepage was designed by text mostly, however, it has been changed to use the multimedia. Design in the Web is changed by the development of computer technology. Rnal destination of Web is an user platform that consists of pixel of monitor screen. Monitor requires different approach from printing material because of the limited of range of presentation and the property of light. This paper has studied to find the possibilities of expression of type which can be the basic structure for HTML. By the case study of homepage, it has analyzed the kind of types, presentation method, the number of frequency, and variation of design.

  • PDF

Comparison of Readability between Documents in the Community Question-Answering (질의응답 커뮤니티에서 문서 간 이독성 비교)

  • Mun, Gil-Seong
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.10
    • /
    • pp.25-34
    • /
    • 2020
  • Community question and answering service is one of the main sources of information and knowledge in the Web. The quality of information in question and answer documents is determined by the clarity of the question and the relevance of the answers, and the readability of a document is a key factor for evaluating the quality. This study is to measure the quality of documents used in community question and answering service. For this purpose, we compare the frequency of occurrence by vocabulary level used in community documents and measure the readability index of documents by institution of author. To measure the readability index, we used the Dale-Chall formula which is calculated by vocabulary level and sentence length. The results show that the vocabulary used in the answers is more difficult than in the questions and the sentence length is longer. The gap in readability between questions and answers is also found by writing institution. The results of this study can be used as basic data for improving online counseling services.