• Title/Summary/Keyword: Keywords Extraction

Search Result 139, Processing Time 0.026 seconds

Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec (Word2Vec 기반의 의미적 유사도를 고려한 웹사이트 키워드 선택 기법)

  • Lee, Donghun;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.83-96
    • /
    • 2018
  • Extracting keywords representing documents is very important because it can be used for automated services such as document search, classification, recommendation system as well as quickly transmitting document information. However, when extracting keywords based on the frequency of words appearing in a web site documents and graph algorithms based on the co-occurrence of words, the problem of containing various words that are not related to the topic potentially in the web page structure, There is a difficulty in extracting the semantic keyword due to the limit of the performance of the Korean tokenizer. In this paper, we propose a method to select candidate keywords based on semantic similarity, and solve the problem that semantic keyword can not be extracted and the accuracy of Korean tokenizer analysis is poor. Finally, we use the technique of extracting final semantic keywords through filtering process to remove inconsistent keywords. Experimental results through real web pages of small business show that the performance of the proposed method is improved by 34.52% over the statistical similarity based keyword selection technique. Therefore, it is confirmed that the performance of extracting keywords from documents is improved by considering semantic similarity between words and removing inconsistent keywords.

A Study on the Deduction of Social Issues Applying Word Embedding: With an Empasis on News Articles related to the Disables (단어 임베딩(Word Embedding) 기법을 적용한 키워드 중심의 사회적 이슈 도출 연구: 장애인 관련 뉴스 기사를 중심으로)

  • Choi, Garam;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.1
    • /
    • pp.231-250
    • /
    • 2018
  • In this paper, we propose a new methodology for extracting and formalizing subjective topics at a specific time using a set of keywords extracted automatically from online news articles. To do this, we first extracted a set of keywords by applying TF-IDF methods selected by a series of comparative experiments on various statistical weighting schemes that can measure the importance of individual words in a large set of texts. In order to effectively calculate the semantic relation between extracted keywords, a set of word embedding vectors was constructed by using about 1,000,000 news articles collected separately. Individual keywords extracted were quantified in the form of numerical vectors and clustered by K-means algorithm. As a result of qualitative in-depth analysis of each keyword cluster finally obtained, we witnessed that most of the clusters were evaluated as appropriate topics with sufficient semantic concentration for us to easily assign labels to them.

Relevance Feedback Agent for Improving Precision in Korean Web Information Retrieval System (한국어 웹 정보검색 시스템의 정확도 향상을 위한 연관 피드백 에이전트)

  • Baek, Jun-Ho;Choe, Jun-Hyeok;Lee, Jeong-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.7
    • /
    • pp.1832-1840
    • /
    • 1999
  • Since the existed Korean Web IR systems generally use boolean system, it is difficult to retrieve the information to be wanted at one time. Also, because of the feature that web documents have the frequent abbreviation and many links, the keyword extraction using the inverted document frequency extracts the improper keywords for adding ambiguous meaning problem. Therefore, users must repeat the modification of the queries until they get the proper information. In this paper, we design and implement the relevance feedback agent system for resolving the above problems. The relevance feedback agent system extracts the proper information in response to user's preferred keywords and stores these keywords in preference DB table. When users retrieve this information later, the relevance feedback agent system will search it adding relevant keywords to user's queries. As a result of this method, the system can reduce the number of modification of user's queries and improve the efficiency of the IR system.

  • PDF

A Design of Similar Video Recommendation System using Extracted Words in Big Data Cluster (빅데이터 클러스터에서의 추출된 형태소를 이용한 유사 동영상 추천 시스템 설계)

  • Lee, Hyun-Sup;Kim, Jindeog
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.172-178
    • /
    • 2020
  • In order to recommend contents, the company generally uses collaborative filtering that takes into account both user preferences and video (item) similarities. Such services are primarily intended to facilitate user convenience by leveraging personal preferences such as user search keywords and viewing time. It will also be ranked around the keywords specified in the video. However, there is a limit to analyzing video similarities using limited keywords. In such cases, the problem becomes serious if the specified keyword does not properly reflect the item. In this paper, I would like to propose a system that identifies the characteristics of a video as it is by the system without human intervention, and analyzes and recommends similarities between videos. The proposed system analyzes similarities by taking into account all words (keywords) that have different meanings from training videos, and in such cases, the methods handled by big data clusters are applied because of the large scale of data and operations.

Group-wise Keyword Extraction of the External Audit using Text Mining and Association Rules (텍스트마이닝과 연관규칙을 이용한 외부감사 실시내용의 그룹별 핵심어 추출)

  • Seong, Yoonseok;Lee, Donghee;Jung, Uk
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.1
    • /
    • pp.77-89
    • /
    • 2022
  • Purpose: In order to improve the audit quality of a company, an in-depth analysis is required to categorize the audit report in the form of a text document containing the details of the external audit. This study introduces a systematic methodology to extract keywords for each group that determines the differences between groups such as 'audit plan' and 'interim audit' using audit reports collected in the form of text documents. Methods: The first step of the proposed methodology is to preprocess the document through text mining. In the second step, the documents are classified into groups using machine learning techniques and based on this, important vocabularies that have a dominant influence on the performance of classification are extracted. In the third step, the association rules for each group's documents are found. In the last step, the final keywords for each group representing the characteristics of each group are extracted by comparing the important vocabulary for classification with the important vocabulary representing the association rules of each group. Results: This study quantitatively calculates the importance value of the vocabulary used in the audit report based on machine learning rather than the qualitative research method such as the existing literature search, expert evaluation, and Delphi technique. From the case study of this study, it was found that the extracted keywords describe the characteristics of each group well. Conclusion: This study is meaningful in that it has laid the foundation for quantitatively conducting follow-up studies related to key vocabulary in each stage of auditing.

Sudden sensorineural hearing loss after third molar extraction: Case report and literature review (제 3대구치 발치 후 발생한 돌발성 난청: 증례보고 및 문헌 고찰)

  • Kim, Hyung Ki;Kim, Il-hyung;Ku, Jeong-Kui;Noh, Min-Ho
    • The Journal of the Korean dental association
    • /
    • v.58 no.7
    • /
    • pp.404-411
    • /
    • 2020
  • This study reports the unusual complications of 22-year-old male who presented with sudden hearing loss after the right mandibular third molar extraction under local anesthesia with 3.6 ml of 2 % lidocaine. Total 8.75 mg of oral dexamethasone for 1 week immediately after extraction was prescribed in department of oral and maxillofacial surgery but hearing did not improve after 1 week. As referral to otolaryngology, total 600 mg of oral methylon and hyperbaric oxygen therapies were operated for 2 weeks. The hearing of patient was improved at 6 weeks after extraction but tinnitus was persisted even after 12 months. The reason and treatment were discussed with literature review, searching with the keywords ['hearing loss' AND ('dental' OR 'tooth extraction'OR'teeth extraction')] in PubMed and Google scholar at October 2019. Total five cases were reported after tooth extraction with local anesthesia. The sudden hearing loss could be associated with local anesthesia containing vasoconstrictors. Early steroid (extensive medication and intra-tympanic injection) and hyperbaric oxygen therapies were recommended within 2 weeks. As a proper treatment, hearing could be improved but other additional symptoms, such as tinnitus, dizziness, might be remained.

  • PDF

Analyzing the Trend of False·Exaggerated Advertisement Keywords Using Text-mining Methodology (1990-2019) (텍스트마이닝 기법을 활용한 허위·과장광고 관련 기사의 트렌드 분석(1990-2019))

  • Kim, Do-Hee;Kim, Min-Jeong
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.4
    • /
    • pp.38-49
    • /
    • 2021
  • This study analyzed the trend of the term 'false and exaggerated advertisement' in 5,141 newspaper articles from 1990 to 2019 using text mining methodology. First of all, we identified the most frequent keywords of false and exaggerated advertisements through frequency analysis for all newspaper articles, and understood the context between the extracted keywords. Next, to examine how false and exaggerated advertisements have changed, the frequency analysis was performed by separating articles by 10 years, and the tendency of the keyword that became an issue was identified by comparing the number of academic papers on the subject of the highest keywords of each year. Finally, we identified trends in false and exaggerated advertisements based on the detailed keywords in the topic using the topic modeling. In our results, it was confirmed that the topic that became an issue at a specific time was extracted as the frequent keywords, and the keyword trends by period changed in connection with social and environmental factors. This study is meaningful in helping consumers spend wisely by cultivating background knowledge about unfair advertising. Furthermore, it is expected that the core keyword extraction will provide the true purpose of advertising and deliver its implications to companies and related employees who commit misconduct.

A Design of KP AGENT for Intelligent Information Retrieval (지능형 정보검색을 위한 KP AGENT의 설계)

  • 박경우;배상현
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.2
    • /
    • pp.443-451
    • /
    • 2000
  • Until now, there have been various kinds of science information databsae which databased the science technology information, but they do not satisfy the aspiration of the users. Therefore, in the position of the users, it suggests the technology information space as a now paradigm, which supplement the function of science information DB. ICPIS which inputs described papers with keywords, offers the itemized summary of these contents, the visual indication and comparison of similar thesis, and it also supplises the abundant summary information, survey information, more than ten volumes of info communication thesis with starting the casual relation extraction for the users, playing a significant role in ICPIS is called KP, and it is package of domain knowledge that unifies the extraction and structure narration of the technology information. ICPIS extracts the technology information among the thesis that are deserved by the natural language treatment in the itemized KP keywords described, and form the prescribed summary structure in KP.

  • PDF

Academic Conference Categorization According to Subjects Using Topical Information Extraction from Conference Websites (학회 웹사이트의 토픽 정보추출을 이용한 주제에 따른 학회 자동분류 기법)

  • Lee, Sue Kyoung;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.2
    • /
    • pp.61-77
    • /
    • 2017
  • Recently, the number of academic conference information on the Internet has rapidly increased, the automatic classification of academic conference information according to research subjects enables researchers to find the related academic conference efficiently. Information provided by most conference listing services is limited to title, date, location, and website URL. However, among these features, the only feature containing topical words is title, which causes information insufficiency problem. Therefore, we propose methods that aim to resolve information insufficiency problem by utilizing web contents. Specifically, the proposed methods the extract main contents from a HTML document collected by using a website URL. Based on the similarity between the title of a conference and its main contents, the topical keywords are selected to enforce the important keywords among the main contents. The experiment results conducted by using a real-world dataset showed that the use of additional information extracted from the conference websites is successful in improving the conference classification performances. We plan to further improve the accuracy of conference classification by considering the structure of websites.

A Study on Keywords Extraction from Entertainment News using Bigdata Processing (빅데이터 처리를 통한 연예 뉴스에서의 키워드 추출에 관한 연구)

  • Yoo, Sang-Hyun;Lee, Sang-Jun
    • Jounal of The Korea Society of Information Technology Policy & Management
    • /
    • v.11 no.6
    • /
    • pp.1503-1507
    • /
    • 2019
  • With the softness of online entertainment news articles and the increasing number of quick-reporting articles in the entertainment sector, many people have access to entertainment front-page articles and are now able to make reviews of celebrities. It is not easy to systematically analyze which news articles are about which celebrities in a real-time environment, although their reputation is a key factor in the entertainment agency's business strategy, which should make the most of its affiliated celebrity resources. Based on the amount of celebrity references mentioned in entertainment news data, this paper proposes an entertainment news keyword analysis system, which extracts celebrities that are the subject of the article and associates them with the celebrity entertainment agency in question. Through the system proposed in this paper, advertisers or entertainment agencies can judge the value of the celebrity as reference material for the business. In addition, it can lay the groundwork for an investment strategy by predicting the outlook for the entertainment company for brokerages and investors.