• Title/Summary/Keyword: 자주 사용된 키워드

Search Result 21, Processing Time 0.033 seconds

An Efficient Frequent Melody Indexing Method to Improve Performance of Query-By-Humming System (허밍 질의 처리 시스템의 성능 향상을 위한 효율적인 빈번 멜로디 인덱싱 방법)

  • You, Jin-Hee;Park, Sang-Hyun
    • Journal of KIISE:Databases
    • /
    • v.34 no.4
    • /
    • pp.283-303
    • /
    • 2007
  • Recently, the study of efficient way to store and retrieve enormous music data is becoming the one of important issues in the multimedia database. Most general method of MIR (Music Information Retrieval) includes a text-based approach using text information to search a desired music. However, if users did not remember the keyword about the music, it can not give them correct answers. Moreover, since these types of systems are implemented only for exact matching between the query and music data, it can not mine any information on similar music data. Thus, these systems are inappropriate to achieve similarity matching of music data. In order to solve the problem, we propose an Efficient Query-By-Humming System (EQBHS) with a content-based indexing method that efficiently retrieve and store music when a user inquires with his incorrect humming. For the purpose of accelerating query processing in EQBHS, we design indices for significant melodies, which are 1) frequent melodies occurring many times in a single music, on the assumption that users are to hum what they can easily remember and 2) melodies partitioned by rests. In addition, we propose an error tolerated mapping method from a note to a character to make searching efficient, and the frequent melody extraction algorithm. We verified the assumption for frequent melodies by making up questions and compared the performance of the proposed EQBHS with N-gram by executing various experiments with a number of music data.

Research Outcomes and Limitations of Records and Archives Organization in Korea (국내 기록조직 연구의 성과와 과제)

  • Lee, Eun-Ju;Rho, Jee-Hyun
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.20 no.4
    • /
    • pp.129-146
    • /
    • 2020
  • This study aims to investigate the outcomes and limitations of research studies on records and archives organization published in Korea. In particular, it will serve as an in-depth examination of the contribution of this area of research to the improvements and changes in the country's records management field. To this end, 150 journal articles related to the records and archives organization were gathered. After extracting refined keywords from the titles and author-assigned keywords, terminology analysis and contents analysis were conducted. On the one hand, terminology analysis (frequency and network analysis) identified frequently discussed topics and the relationships between them. On the other hand, through content analysis, the study revealed the detailed contents regarding the two main topics and their meanings.

A Methodology for Extracting Shopping-Related Keywords by Analyzing Internet Navigation Patterns (인터넷 검색기록 분석을 통한 쇼핑의도 포함 키워드 자동 추출 기법)

  • Kim, Mingyu;Kim, Namgyu;Jung, Inhwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.123-136
    • /
    • 2014
  • Recently, online shopping has further developed as the use of the Internet and a variety of smart mobile devices becomes more prevalent. The increase in the scale of such shopping has led to the creation of many Internet shopping malls. Consequently, there is a tendency for increasingly fierce competition among online retailers, and as a result, many Internet shopping malls are making significant attempts to attract online users to their sites. One such attempt is keyword marketing, whereby a retail site pays a fee to expose its link to potential customers when they insert a specific keyword on an Internet portal site. The price related to each keyword is generally estimated by the keyword's frequency of appearance. However, it is widely accepted that the price of keywords cannot be based solely on their frequency because many keywords may appear frequently but have little relationship to shopping. This implies that it is unreasonable for an online shopping mall to spend a great deal on some keywords simply because people frequently use them. Therefore, from the perspective of shopping malls, a specialized process is required to extract meaningful keywords. Further, the demand for automating this extraction process is increasing because of the drive to improve online sales performance. In this study, we propose a methodology that can automatically extract only shopping-related keywords from the entire set of search keywords used on portal sites. We define a shopping-related keyword as a keyword that is used directly before shopping behaviors. In other words, only search keywords that direct the search results page to shopping-related pages are extracted from among the entire set of search keywords. A comparison is then made between the extracted keywords' rankings and the rankings of the entire set of search keywords. Two types of data are used in our study's experiment: web browsing history from July 1, 2012 to June 30, 2013, and site information. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The original sample dataset contains 150 million transaction logs. First, portal sites are selected, and search keywords in those sites are extracted. Search keywords can be easily extracted by simple parsing. The extracted keywords are ranked according to their frequency. The experiment uses approximately 3.9 million search results from Korea's largest search portal site. As a result, a total of 344,822 search keywords were extracted. Next, by using web browsing history and site information, the shopping-related keywords were taken from the entire set of search keywords. As a result, we obtained 4,709 shopping-related keywords. For performance evaluation, we compared the hit ratios of all the search keywords with the shopping-related keywords. To achieve this, we extracted 80,298 search keywords from several Internet shopping malls and then chose the top 1,000 keywords as a set of true shopping keywords. We measured precision, recall, and F-scores of the entire amount of keywords and the shopping-related keywords. The F-Score was formulated by calculating the harmonic mean of precision and recall. The precision, recall, and F-score of shopping-related keywords derived by the proposed methodology were revealed to be higher than those of the entire number of keywords. This study proposes a scheme that is able to obtain shopping-related keywords in a relatively simple manner. We could easily extract shopping-related keywords simply by examining transactions whose next visit is a shopping mall. The resultant shopping-related keyword set is expected to be a useful asset for many shopping malls that participate in keyword marketing. Moreover, the proposed methodology can be easily applied to the construction of special area-related keywords as well as shopping-related ones.

Knowledge Graph-based Korean New Words Detection Mechanism for Spam Filtering (스팸 필터링을 위한 지식 그래프 기반의 신조어 감지 매커니즘)

  • Kim, Ji-hye;Jeong, Ok-ran
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.79-85
    • /
    • 2020
  • Today, to block spam texts on smartphone, a simple string comparison between text messages and spam keywords or a blocking spam phone numbers is used. As results, spam text is sent in a gradually hanged way to prevent if from being automatically blocked. In particular, for words included in spam keywords, spam texts are sent to abnormal words using special characters, Chinese characters, and whitespace to prevent them from being detected by simple string match. There is a limit that traditional spam filtering methods can't block these spam texts well. Therefore, new technologies are needed to respond to changing spam text messages. In this paper, we propose a knowledge graph-based new words detection mechanism that can detect new words frequently used in spam texts and respond to changing spam texts. Also, we show experimental results of the performance when detected Korean new words are applied to the Naive Bayes algorithm.

New Input Keyword Extraction of Equipments Involved in Ignition Using Morphological Analysis (형태소 분석을 이용한 발화관련 기기의 새로운 입력 키워드 추출)

  • Kim, Eun Ju;Choi, Jeung Woo;Ryu, Joung Woo
    • Fire Science and Engineering
    • /
    • v.28 no.2
    • /
    • pp.91-97
    • /
    • 2014
  • New types of fire accidents appear or the existing types disappeared because of rapidly changing society. We proposed a methodology of extracting new nouns from fire investigation data each of which is an accident report producted by fire investigators. The new nouns could be used in modifying the existing categories for classifying fire accidents. We analysed morphology of the product names and the ignition summaries using the proposed method for the fire accidents classified as the etc sub-category of the category of equipments involved in ignition. In this paper, we found "dryer" as a new sub-category of the agricultural equipment category and "boiler" in the seasonal appliance category from the product names of the fire accidents. We also extracted the new input keywords of "aquarium" and "monitor" in the commercial facilities category and the video, audio apparatus category from the ignition summaries respectively. Using the four subcategories, we reclassified 548 (14.39%) of 3,808 fire accidents assigned to the etc sub-category.

A three-step sentence searching method for implementing a chatting system (채팅 시스템 구현을 위한 3단계 문장 검색 방법)

  • Jeon, Won-Pyo;Song, Yoeng-Kil;Kim, Hark-Soo
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.37 no.2
    • /
    • pp.205-212
    • /
    • 2013
  • The previous chatting systems have generally used methods based on lexical agreement between users' input sentences and target sentences in a database. However, these methods often raise well-known lexical disagreement problems. To resolve some of lexical disagreement problems, we propose a three-step sentence searching method that is sequentially applied when the previous step is failed. The first step is to compare common keyword sequences between users' inputs and target sentences in the lexical level. The second step is to compare sentence types and semantic markers between users' input and target sentences in the semantic level. The last step is to match users's inputs against predefined lexico-syntactic patterns. In the experiments, the proposed method showed better response precision and user satisfaction rate than simple keyword matching methods.

Korea National College of Agriculture and Fisheries in Naver News by Web Crolling : Based on Keyword Analysis and Semantic Network Analysis (웹 크롤링에 의한 네이버 뉴스에서의 한국농수산대학 - 키워드 분석과 의미연결망분석 -)

  • Joo, J.S.;Lee, S.Y.;Kim, S.H.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.23 no.2
    • /
    • pp.71-86
    • /
    • 2021
  • This study was conducted to find information on the university's image from words related to 'Korea National College of Agriculture and Fisheries (KNCAF)' in Naver News. For this purpose, word frequency analysis, TF-IDF evaluation and semantic network analysis were performed using web crawling technology. In word frequency analysis, 'agriculture', 'education', 'support', 'farmer', 'youth', 'university', 'business', 'rural', 'CEO' were important words. In the TF-IDF evaluation, the key words were 'farmer', 'dron', 'agricultural and livestock food department', 'Jeonbuk', 'young farmer', 'agriculture', 'Chonju', 'university', 'device', 'spreading'. In the semantic network analysis, the Bigrams showed high correlations in the order of 'youth' - 'farmer', 'digital' - 'agriculture', 'farming' - 'settlement', 'agriculture' - 'rural', 'digital' - 'turnover'. As a result of evaluating the importance of keywords as five central index, 'agriculture' ranked first. And the keywords in the second place of the centrality index were 'farmers' (Cc, Cb), 'education' (Cd, Cp) and 'future' (Ce). The sperman's rank correlation coefficient by centrality index showed the most similar rank between Degree centrality and Pagerank centrality. The KNCAF articles of Naver News were used as important words such as 'agriculture', 'education', 'support', 'farmer', 'youth' in terms of word frequency. However, in the evaluation including document frequency, the words such as 'farmer', 'dron', 'Ministry of Agriculture, Food and Rural Affairs', 'Jeonbuk', and 'young farmers' were found to be key words. The centrality analysis considering the network connectivity between words was suitable for evaluation by Cd and Cp. And the words with strong centrality were 'agriculture', 'education', 'future', 'farmer', 'digital', 'support', 'utilization'.

A Study on Research Topics for Thyroid Cancer in Korea (국내 갑상선암 연구 주제 동향 분석)

  • Yang, Ji-Yeon;Shin, Seung-Hyeok;Heo, Seong-Min;Lee, Tae-Gyeong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.409-410
    • /
    • 2019
  • 본 논문에서는 국내 갑상선암의 연구 동향을 파악하기 위해 텍스트 중심의 접근법을 제안한다. 국내 갑상선암은 2000년대에 들어서며 발생이 급증하여 과잉진단의 논란을 불러일으켰으나, 다양한 분야의 자정 노력으로 수술 환자수가 크게 줄었다. 본 연구에서는 텍스트 마이닝 기술을 사용하여 디비피아에 등록되어 있는 갑상선암 관련 논문의 키워드와 초록을 수집하여 분석하였다. 1980년대는 대부분의 사례보고가 있었고 1990년대에 들어서면서 검진을 통한 조기 진단의 내용이 자주 나타났다. 2000년대에는 여러 장비들을 활용한 검사방법과 미세한 암의 발견에 대한 논의가 증가하였음을 확인 할 수 있었다. 2010년대에 들어서는 환자의 삶의 질에 대한 연구가 많이 이루어졌다. 지난 수십 년 동안 갑상선 암 연구 주제에 대해 뚜렷한 변화가 나타났으며, 향후 연구의 기초자료로 활용될 수 있으리라 기대된다.

  • PDF

Analysis of YouTube's role as a new platform between media and consumers

  • Hur, Tai-Sung;Im, Jung-ju;Song, Da-hye
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.53-60
    • /
    • 2022
  • YouTube realistically shows fake news and biased content based on facts that have not been verified due to low entry barriers and ambiguity in video regulation standards. Therefore, this study aims to analyze the influence of the media and YouTube on individual behavior and their relationship. Data from YouTube and Twitter are randomly imported with selenium, beautiful soup, and Twitter APIs to classify the 31 most frequently mentioned keywords. Based on 31 keywords classified, data were collected from YouTube, Twitter, and Naver News, and positive, negative, and neutral emotions were classified and quantified with NLTK's Natural Language Toolkit (NLTK) Vader model and used as analysis data. As a result of analyzing the correlation of data, it was confirmed that the higher the negative value of news, the more positive content on YouTube, and the positive index of YouTube content is proportional to the positive and negative values on Twitter. As a result of this study, YouTube is not consistent with the emotion index shown in the news due to its secondary processing and affected characteristics. In other words, processed YouTube content intuitively affects Twitter's positive and negative figures, which are channels of communication. The results of this study analyzed that YouTube plays a role in assisting individual discrimination in the current situation where accurate judgment of information has become difficult due to the emergence of yellow media that stimulates people's interests and instincts.

Comparing Corporate and Public ESG Perceptions Using Text Mining and ChatGPT Analysis: Based on Sustainability Reports and Social Media (텍스트마이닝과 ChatGPT 분석을 활용한 기업과 대중의 ESG 인식 비교: 지속가능경영보고서와 소셜미디어를 기반으로)

  • Jae-Hoon Choi;Sung-Byung Yang;Sang-Hyeak Yoon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.347-373
    • /
    • 2023
  • As the significance of ESG (Environmental, Social, and Governance) management amplifies in driving sustainable growth, this study delves into and compares ESG trends and interrelationships from both corporate and societal viewpoints. Employing a combination of Latent Dirichlet Allocation Topic Modeling (LDA) and Semantic Network Analysis, we analyzed sustainability reports alongside corresponding social media datasets. Additionally, an in-depth examination of social media content was conducted using Joint Sentiment Topic Modeling (JST), further enriched by Semantic Network Analysis (SNA). Complementing text mining analysis with the assistance of ChatGPT, this study identified 25 different ESG topics. It highlighted differences between companies aiming to avoid risks and build trust, and the general public's diverse concerns like investment options and working conditions. Key terms like 'greenwashing,' 'serious accidents,' and 'boycotts' show that many people doubt how companies handle ESG issues. The findings from this study set the foundation for a plan that serves key ESG groups, including businesses, government agencies, customers, and investors. This study also provide to guide the creation of more trustworthy and effective ESG strategies, helping to direct the discussion on ESG effectiveness.