• 제목/요약/키워드: keywords

검색결과 2,360건 처리시간 0.025초

자연어 처리 기법을 활용한 산업재해 위험요인 구조화 (Structuring Risk Factors of Industrial Incidents Using Natural Language Process)

  • 강성식;장성록;이종빈;서용윤
    • 한국안전학회지
    • /
    • 제36권1호
    • /
    • pp.56-63
    • /
    • 2021
  • The narrative texts of industrial accident reports help to identify accident risk factors. They relate the accident triggers to the sequence of events and the outcomes of an accident. Particularly, a set of related keywords in the context of the narrative can represent how the accident proceeded. Previous studies on text analytics for structuring accident reports have been limited to extracting individual keywords without context. We proposed a context-based analysis using a Natural Language Processing (NLP) algorithm to remedy this shortcoming. This study aims to apply Word2Vec of the NLP algorithm to extract adjacent keywords, known as word embedding, conducted by the neural network algorithm based on supervised learning. During processing, Word2Vec is conducted by adjacent keywords in narrative texts as inputs to achieve its supervised learning; keyword weights emerge as the vectors representing the degree of neighboring among keywords. Similar keyword weights mean that the keywords are closely arranged within sentences in the narrative text. Consequently, a set of keywords that have similar weights presents similar accidents. We extracted ten accident processes containing related keywords and used them to understand the risk factors determining how an accident proceeds. This information helps identify how a checklist for an accident report should be structured.

의사결정나무를 활용한 온라인 소비자 리뷰 평가에 영향을 주는 핵심 키워드 도출 연구: 별점과 좋아요를 중심으로 (Core Keywords Extraction forEvaluating Online Consumer Reviews Using a Decision Tree: Focusing on Star Ratings and Helpfulness Votes)

  • 민경수;유동희
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제32권3호
    • /
    • pp.133-150
    • /
    • 2023
  • Purpose This study aims to develop classification models using a decision tree algorithm to identify core keywords and rules influencing online consumer review evaluations for the robot vacuum cleaner on Amazon.com. The difference from previous studies is that we analyze core keywords that affect the evaluation results by dividing the subjects that evaluate online consumer reviews into self-evaluation (star ratings) and peer evaluation (helpfulness votes). We investigate whether the core keywords influencing star ratings and helpfulness votes vary across different products and whether there is a similarity in the core keywords related to star ratings or helpfulness votes across all products. Design/methodology/approach We used random under-sampling to balance the dataset. We progressively removed independent variables based on decreasing importance through backwards elimination to evaluate the classification model's performance. As a result, we identified classification models that best predict star ratings and helpfulness votes for each product's online consumer reviews. Findings We have identified that the core keywords influencing self-evaluation and peer evaluation vary across different products, and even for the same model or features, the core keywords are not consistent. Therefore, companies' producers and marketing managers need to analyze the core keywords of each product to highlight the advantages and prepare customized strategies that compensate for the shortcomings.

A Study on the General Public's Perceptions of Dental Fear Using Unstructured Big Data

  • Han-A Cho;Bo-Young Park
    • 치위생과학회지
    • /
    • 제23권4호
    • /
    • pp.255-263
    • /
    • 2023
  • Background: This study used text mining techniques to determine public perceptions of dental fear, extracted keywords related to dental fear, identified the connection between the keywords, and categorized and visualized perceptions related to dental fear. Methods: Keywords in texts posted on Internet portal sites (NAVER and Google) between 1 January, 2000, and 31 December, 2022, were collected. The four stages of analysis were used to explore the keywords: frequency analysis, term frequency-inverse document frequency (TF-IDF), centrality analysis and co-occurrence analysis, and convergent correlations. Results: In the top ten keywords based on frequency analysis, the most frequently used keyword was 'treatment,' followed by 'fear,' 'dental implant,' 'conscious sedation,' 'pain,' 'dental fear,' 'comfort,' 'taking medication,' 'experience,' and 'tooth.' In the TF-IDF analysis, the top three keywords were dental implant, conscious sedation, and dental fear. The co-occurrence analysis was used to explore keywords that appear together and showed that 'fear and treatment' and 'treatment and pain' appeared the most frequently. Conclusion: Texts collected via unstructured big data were analyzed to identify general perceptions related to dental fear, and this study is valuable as a source data for understanding public perceptions of dental fear by grouping associated keywords. The results of this study will be helpful to understand dental fear and used as factors affecting oral health in the future.

키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법 (A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model)

  • 조원진;노상규;윤지영;박진수
    • Asia pacific journal of information systems
    • /
    • 제21권1호
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

건설사업관리기술자가 알아야 할 안전키워드에 관한 연구 (A study on Safety Keywords that Construction Project Managers Should Know)

  • 김상원;최유진;홍성욱;안태한
    • 한국건축시공학회:학술대회논문집
    • /
    • 한국건축시공학회 2017년도 춘계 학술논문 발표대회
    • /
    • pp.95-96
    • /
    • 2017
  • A total of eleven keywords for construction project management engineers to know about safety are as follows. Before the construction, there are two keywords as the safety organization plan and safety management plan, safety compliance, document beach, safety check, safety management fee, hazardous risk prevention plan, safety education, safety accident, seven keywords, There are two keywords of safety inspection comprehensive report, safety management document. It is necessary to understand the keywords well in accordance with the relevant time and to do the best in safety work, and it should be recognized that safety inspection is carried out every month and the safety work should be done faithfully.

  • PDF

Conceptual Extraction of Compound Korean Keywords

  • Lee, Samuel Sangkon
    • Journal of Information Processing Systems
    • /
    • 제16권2호
    • /
    • pp.447-459
    • /
    • 2020
  • After reading a document, people construct a concept about the information they consumed and merge multiple words to set up keywords that represent the material. With that in mind, this study suggests a smarter and more efficient keyword extraction method wherein scholarly journals are used as the basis for the establishment of production rules based on a concept information of words appearing in a document in a way in which author-provided keywords are functional although they do not appear in the body of the document. This study presents a new way to determine the importance of each keyword, excluding non-relevant keywords. To identify the validity of extracted keywords, titles and abstracts of journals about natural language and auditory language were collected for analysis. The comparison of author-provided keywords with the keyword results of the developed system showed that the developed system was highly useful, with an accuracy rate as good as up to 96%.

Metadata Processing Technique for Similar Image Search of Mobile Platform

  • Seo, Jung-Hee
    • Journal of information and communication convergence engineering
    • /
    • 제19권1호
    • /
    • pp.36-41
    • /
    • 2021
  • Text-based image retrieval is not only cumbersome as it requires the manual input of keywords by the user, but is also limited in the semantic approach of keywords. However, content-based image retrieval enables visual processing by a computer to solve the problems of text retrieval more fundamentally. Vision applications such as extraction and mapping of image characteristics, require the processing of a large amount of data in a mobile environment, rendering efficient power consumption difficult. Hence, an effective image retrieval method on mobile platforms is proposed herein. To provide the visual meaning of keywords to be inserted into images, the efficiency of image retrieval is improved by extracting keywords of exchangeable image file format metadata from images retrieved through a content-based similar image retrieval method and then adding automatic keywords to images captured on mobile devices. Additionally, users can manually add or modify keywords to the image metadata.

A Sustainable Tourism Study in Underdeveloped Areas Using Big Data Analysis Techniques

  • Hyun-Seok Kim;Sang-Hak Lee;Gi-Hwan Ryu
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제16권2호
    • /
    • pp.112-118
    • /
    • 2024
  • We Design The problem of underdeveloped areas is emerging as a social problem. Industrialization drove the population to the cities, creating underdeveloped areas. Underdeveloped areas are causing social problems such as population decline and aging. It is necessary to study the continuous tourism development of underdeveloped areas through development and improvement projects. Using social media big data to investigate keywords in underdeveloped areas and see the connection between keywords. The purpose of this study was to conduct core research divided by type and to investigate the keywords of tourism in underdeveloped areas through concor analysis of underdeveloped areas. As a result of the study, keywords were connected for each type of redevelopment, regional development, regional economy, and underdeveloped areas. Through this, the keywords for sustainable tourism in underdeveloped areas were identified. It is hoped that this study will develop sustainable tourism for the keywords of underdeveloped areas.

빅데이터를 활용한 국가생태문화탐방로 이용자의 경험분석 - 부안 마실길과 군산 구불길을 대상으로 - (An Analysis of the Experience of Users of National Ecological and Cultural Exploration Routes Using Big Data - A Focus on the Buan Masil Road and Gunsan Gubul Road -)

  • 이현정;안병철
    • 한국환경복원기술학회지
    • /
    • 제23권6호
    • /
    • pp.151-166
    • /
    • 2020
  • Various experience keywords were derived through text mining analysis of two National Ecological and Cultural Exploration Routes. The results of this study were drawn as follows: The interaction between the experience keywords was analyzed by the degree centrality, closeness centrality, and betweenness centrality value calculated through the centrality analysis of the research site experience keywords. First, In the text mining analysis, 'walking' appeared as the top keyword in the I, II, and III periods of the two target areas. The keywords related to the stay type of "rental cottage" and "recreational forest" were derived for Masil Road in relation to accommodation facilities. However, the keywords related to the accommodation were not derived in Gubul Road. Second, as a result of the centrality analysis, the degree centrality of the keywords "walking", "sea", "look", "salt flats" of Masil Road and "walking", "lake" and "park" of Gubul Road was high. The keywords located at the center are "walking" and "sea" in the Masil Road, and "walking" in the Gubul Road. As an influential keyword, Masil Road is "experience" and Gubul Road is "history". Third, According to the results of the analysis, the keywords that appeared at the top of the Gubul Road are derived from the keywords related to the 1 ~ 8 course, and it is judged that the visitors are visiting the 1 ~ 8 course trail evenly. However, the Gubul Road only appears in the top keyword only for a few courses. Through this, it seems that three courses are intensively visited as the main course of 6 Gubul Road, 6-1 Gubul Road, and 8 Gubul Road.

SNS를 이용한 잠재적 광고 키워드 추출 시스템 설계 및 구현 (Design and Implementation of Potential Advertisement Keyword Extraction System Using SNS)

  • 서현곤;박희완
    • 한국융합학회논문지
    • /
    • 제9권7호
    • /
    • pp.17-24
    • /
    • 2018
  • 빅데이터 처리 분야에서 중요한 이슈 중 하나는 인터넷의 주요 키워드를 추출하고 이것을 이용하여 필요한 정보를 가공하는 것이다. 현재까지 제안된 대부분의 키워드 추출 방법들은 대형 포털 사이트의 검색기능을 기반으로 이미 게시된 글이나 작성된 문서 또는 고정된 내용에 기반하고 있다. 본 논문에서는 SNS에 게시되는 다양한 이슈, 대화, 관심 분야, 의견 등 동적인 메시지를 기반으로 이슈 키워드 및 연관 키워드를 추출하여 잠재적 쇼핑 연관 키워드 광고 마케팅에 도움을 주는 시스템(KAES: Keyword Advertisement Extraction System based on SNS)을 개발한다. KAES 시스템은 특정 계정 리스트를 작성하여 SNS에서 빈도수가 가장 많은 핵심 키워드 및 연관 키워드를 추출한다.