• Title/Summary/Keyword: document classification

Search Result 451, Processing Time 0.029 seconds

Application of Advertisement Filtering Model and Method for its Performance Improvement (광고 글 필터링 모델 적용 및 성능 향상 방안)

  • Park, Raegeun;Yun, Hyeok-Jin;Shin, Ui-Cheol;Ahn, Young-Jin;Jeong, Seungdo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.11
    • /
    • pp.1-8
    • /
    • 2020
  • In recent years, due to the exponential increase in internet data, many fields such as deep learning have developed, but side effects generated as commercial advertisements, such as viral marketing, have been discovered. This not only damages the essence of the internet for sharing high-quality information, but also causes problems that increase users' search times to acquire high-quality information. In this study, we define advertisement as "a text that obscures the essence of information transmission" and we propose a model for filtering information according to that definition. The proposed model consists of advertisement filtering and advertisement filtering performance improvement and is designed to continuously improve performance. We collected data for filtering advertisements and learned document classification using KorBERT. Experiments were conducted to verify the performance of this model. For data combining five topics, accuracy and precision were 89.2% and 84.3%, respectively. High performance was confirmed, even if atypical characteristics of advertisements are considered. This approach is expected to reduce wasted time and fatigue in searching for information, because our model effectively delivers high-quality information to users through a process of determining and filtering advertisement paragraphs.

A Study on the Progression and Prevalence of Myopia according to Age for the Last Five Years : from 2008 to 2012 (최근 5년간 연령에 따른 근시 유병률 진행에 관한 연구 : 2008년에서 2012년 중심으로)

  • Lee, Wan-Seok;Ye, Ki-Hun;Shin, Bum-Joo
    • Journal of Korean Ophthalmic Optics Society
    • /
    • v.19 no.1
    • /
    • pp.121-133
    • /
    • 2014
  • Purpose: In this study, we analyzed the progression and prevalence of myopia according to age for the last five years. Methods: We have done a comparative analysis of the progression and prevalence of myopia with the Korean National Health and Nutrition Examination Survey document from 2008 to 2012. Results: According to classification of myopia by age group for the last five years, the prevalence of low myopia was 25.5% for 5-11ages group, 25.1% for 12-18ages, 27.3% for 19-29ages, 30.7% for 30-39ages, 29.6% for 40-49ages, 19.2% for 50-59ages, 11.8% for 60-69ages, and 20.2% for over 70ages respectively. The prevalence of moderate myopia was 21.7% for 5-11ages group, 43.6% for 12-18ages, 36.2% for 19-29ages, 30.0% for 30-39ages, 20.4% for 40-49ages, 9.9% for 50-59ages, 5.2% for 60-69ages, and 7.6% for over 70ages respectively. The prevalence of high myopia was 2.1% for 5-11ages group, 11.7% for 12-18ages, 11.5% for 19-29ages, 6.9% for 30-39ages, 5.6% for 40-49ages, 1.9% for 50-59ages, 1.5% for 60-69ages, and 1.0% for over 70ages respectively. Conclusions: We must recognize an importance to the increase of the progression and prevalence of myopia, so it is necessary to provide a social interest in prevention of deteriorating vision and eye health welfare.

Impact of Word Embedding Methods on Performance of Sentiment Analysis with Machine Learning Techniques

  • Park, Hoyeon;Kim, Kyoung-jae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.8
    • /
    • pp.181-188
    • /
    • 2020
  • In this study, we propose a comparative study to confirm the impact of various word embedding techniques on the performance of sentiment analysis. Sentiment analysis is one of opinion mining techniques to identify and extract subjective information from text using natural language processing and can be used to classify the sentiment of product reviews or comments. Since sentiment can be classified as either positive or negative, it can be considered one of the general classification problems. For sentiment analysis, the text must be converted into a language that can be recognized by a computer. Therefore, text such as a word or document is transformed into a vector in natural language processing called word embedding. Various techniques, such as Bag of Words, TF-IDF, and Word2Vec are used as word embedding techniques. Until now, there have not been many studies on word embedding techniques suitable for emotional analysis. In this study, among various word embedding techniques, Bag of Words, TF-IDF, and Word2Vec are used to compare and analyze the performance of movie review sentiment analysis. The research data set for this study is the IMDB data set, which is widely used in text mining. As a result, it was found that the performance of TF-IDF and Bag of Words was superior to that of Word2Vec and TF-IDF performed better than Bag of Words, but the difference was not very significant.

Categorization of POIs Using Word and Context information (관심 지점 명칭의 단어와 문맥 정보를 활용한 관심 지점의 분류)

  • Choi, Su Jeong;Park, Seong-Bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.470-476
    • /
    • 2014
  • A point of interest is a specific point location such as a cafe, a gallery, a shop, or a park. It consists of a name, a category, a location, and so on. Its information is necessary for location-based application, above all category is basic information. However, category information should be automatically gathered because it costs high to gather it manually. In this paper, we propose a novel method to estimate category of POIs automatically using an inner word and local context. An inner word is a word that contains POI's name. Their name sometimes expose category information. Thus, their name is used as inner word information in estimating category of POIs. Local context information means words around a POI's name in a document that mentioned the name. The context include information to estimate category. The evaluation of the proposed method is performed on two data sets. According to the experimental results, proposed model using combination inner word and local context show higher accuracy than that of model using each.

The Study of Information Strategy Plan to Design OASIS' Future Model (오아시스(전통의학정보포털)의 미래모형 설계를 위한 정보화전략계획 연구)

  • Yea, Sang-Jun;Kim, Chul;Kim, Jin-Hyun;Kim, Sang-Kyun;Jang, Hyun-Chul;Kim, Ik-Tae;Jang, Yun-Ji;Seong, Bo-Seok;Song, Mi-Young
    • Korean Journal of Oriental Medicine
    • /
    • v.17 no.2
    • /
    • pp.63-71
    • /
    • 2011
  • Objectives : We studied the ISP(information strategy plan) of oasis spanning 5 years. From this study we aimed at total road map to upgrade the service systematically and to carry out the related projects. If we do it as road map, oasis will be the core infra service contributing to the improvement of TKM(traditional korean medicine) research capability. Methods : We carried out 3 step ISP method composed of environmental analysis, current status analysis and future plan. We used paper, report and trend analysis document as base materials and did the survey to get opinions from users and TKM experts. We limited this study to drawing the conceptual design of oasis. Results : From environmental analysis we knew that China and USA built up the largest TM databases. We did the survey to get the activation ways of oasis. And we did the benchmarking on the advanced services through current status analysis. Finally we determined 'maximize the research value based the open TKM knowledge infra' as oasis' vision. And we designed oasis' future system which is composed of service layer, application layer and contents layer. Conclusion : First TKM related documents, research materials, researcher information and standards are merged to elevate the TKM information level. Concretely large scale TKM information infra project such as TKM information classification code development, TKM library network building and CAM research information offering are carried out at the same time.

A Study On Managing Electronic Mail Messages as Records of Public Institutions (공공기관의 이메일기록 관리 방안 연구)

  • Song, Ji Hyoun
    • The Korean Journal of Archival Studies
    • /
    • no.15
    • /
    • pp.141-183
    • /
    • 2007
  • It is not an overstatement that nowadays electronic mails are communicated more frequently as well as conveniently than phones and facsimiles, not only in routine life hot also in business transactions. Also, it is evident that emails will be used more and more as a communication method between internal and external organizations. If the information transferred and received via emails takes a role of business records, it is no wonder that emails should be uniformly managed as public records. Currently, however, specific policies or guidelines for the management of email records are not available, nor do most of public employees realize that emails are the actual records of the organization. In fact, the three research methods have been used for this study in the purpose of the establishment of email records management scheme. First of all, bibliographic research has been conducted in an effort to describes the definition and types of email records indicated in the guidelines of each nation, as well as the differences from the transitory email messages. Secondly, email management guidelines and policies of public institutions of England, The United States, Australia, and Canada, so-called the advanced countries of the records management, have been analyzed to examine the advanced examples of email management. In order to manage email records effectively, the functional requirements - capture, classification, storage, access, tracking, disposition, and role and responsibility were categorized in this thesis, based on the ISO 15489. As the designs of these foreign guidelines vary one another, common factors of them were extracted to be included in the realm of the seven stages. Lastly, this thesis has analyzed characteristics of the email system within the Electronic Document Management System of existing administrative institutions. Also, it has examined the overall environment of the email records management of public institutions and sought out its improvement. In essence, focused on the crucial factors on email management drawn out from the email management guidelines of foreign nations and the analysis of the policies, this thesis proposes an email records management scheme for Korean public intuitions, as well as an email management model suitable for forthcoming e-government era.

A Method for Prediction of Quality Defects in Manufacturing Using Natural Language Processing and Machine Learning (자연어 처리 및 기계학습을 활용한 제조업 현장의 품질 불량 예측 방법론)

  • Roh, Jeong-Min;Kim, Yongsung
    • Journal of Platform Technology
    • /
    • v.9 no.3
    • /
    • pp.52-62
    • /
    • 2021
  • Quality control is critical at manufacturing sites and is key to predicting the risk of quality defect before manufacturing. However, the reliability of manual quality control methods is affected by human and physical limitations because manufacturing processes vary across industries. These limitations become particularly obvious in domain areas with numerous manufacturing processes, such as the manufacture of major nuclear equipment. This study proposed a novel method for predicting the risk of quality defects by using natural language processing and machine learning. In this study, production data collected over 6 years at a factory that manufactures main equipment that is installed in nuclear power plants were used. In the preprocessing stage of text data, a mapping method was applied to the word dictionary so that domain knowledge could be appropriately reflected, and a hybrid algorithm, which combined n-gram, Term Frequency-Inverse Document Frequency, and Singular Value Decomposition, was constructed for sentence vectorization. Next, in the experiment to classify the risky processes resulting in poor quality, k-fold cross-validation was applied to categorize cases from Unigram to cumulative Trigram. Furthermore, for achieving objective experimental results, Naive Bayes and Support Vector Machine were used as classification algorithms and the maximum accuracy and F1-score of 0.7685 and 0.8641, respectively, were achieved. Thus, the proposed method is effective. The performance of the proposed method were compared and with votes of field engineers, and the results revealed that the proposed method outperformed field engineers. Thus, the method can be implemented for quality control at manufacturing sites.

Analysis of the Status of Natural Language Processing Technology Based on Deep Learning (딥러닝 중심의 자연어 처리 기술 현황 분석)

  • Park, Sang-Un
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.63-81
    • /
    • 2021
  • The performance of natural language processing is rapidly improving due to the recent development and application of machine learning and deep learning technologies, and as a result, the field of application is expanding. In particular, as the demand for analysis on unstructured text data increases, interest in NLP(Natural Language Processing) is also increasing. However, due to the complexity and difficulty of the natural language preprocessing process and machine learning and deep learning theories, there are still high barriers to the use of natural language processing. In this paper, for an overall understanding of NLP, by examining the main fields of NLP that are currently being actively researched and the current state of major technologies centered on machine learning and deep learning, We want to provide a foundation to understand and utilize NLP more easily. Therefore, we investigated the change of NLP in AI(artificial intelligence) through the changes of the taxonomy of AI technology. The main areas of NLP which consists of language model, text classification, text generation, document summarization, question answering and machine translation were explained with state of the art deep learning models. In addition, major deep learning models utilized in NLP were explained, and data sets and evaluation measures for performance evaluation were summarized. We hope researchers who want to utilize NLP for various purposes in their field be able to understand the overall technical status and the main technologies of NLP through this paper.

Methodological Status and Improvement of Additional Evaluation of Health Impact Items in Environmental Impact Assessment (환경영향평가서 내 건강영향 항목 추가·평가의 방법론적 현황과 개선)

  • Ha, Jongsik
    • Journal of Environmental Impact Assessment
    • /
    • v.29 no.6
    • /
    • pp.453-466
    • /
    • 2020
  • The addition and evaluation of health impact items in Environmental Impact Assessment document are written in hygiene and public health items only for specific development projects and are being reviewed. However, after the publication of the evaluation manual on the addition and evaluation of health impact items in 2011, there is a demand for continuous methodology and improvement plans despite partial improvement. Therefore, in order to propose a methodological improvement of the evaluation manual, this technical paper identified detailed improvement requirements based on the consultation opinions on hygiene and public health items, and investigated and suggested ways to solve this problem by reviewing the contents of the research so far. As for the improvement requirements, the contents related to mitigation plan, post management, effect prediction, assessment, and present-condition investigation were presented in Environmental Impact Assessment documents for the entire development project at a frequency of 93%, 85%, 80%, 74%, and 67%, respectively. Particularly, the detailed improvement requirements related to mitigation plan consisted of an establishment direction and a management of development project. Considering the current evaluation manual and the frequency of improvement requirements, this paper proposed concrete methods or improvement plans for major methodologies for each classification of hygiene and public health items. Furthermore, a comprehensive evaluation methodology related to whether a project is implemented was proposed, which is not provided in the current assessment manual.

A Study on the Design of the Appraisal System of Permanent Archival Institutions : Focused on the Seoul Metropolitan Archives (영구기록물관리기관의 재평가체계 설계 연구 서울기록원을 중심으로)

  • Lee, Eunjung;Kim, Dabeen;Kim, Sunyou;Kim, Heejin;Ryu, Hanjo
    • The Korean Journal of Archival Studies
    • /
    • no.76
    • /
    • pp.5-37
    • /
    • 2023
  • This study aimed to design an evaluation system applicable to permanent record management institutions, focusing on the Seoul Archives, in order to implement the reevaluation of permanent record management institutions. As a process for this, an area for evaluating evidence, administrative, and historical values was established and detailed evaluation factors were derived. In order to effectively apply the set evaluation factors, the evaluation procedure was designed by dividing them into three stages. In the first stage of law-based evaluation, long-term preservation was determined by identifying the position and legal form of policymakers that can be immediately evaluated according to clear standards. Records that have not been determined for long-term preservation were reorganized into evaluation factors, such as record management standards, official document classification tables, pledges, and policies, which are the second stage of business function-based evaluation, and then comprehensively applied to review the validity of long-term preservation of held records. In the second stage of evaluation, records that were not judged as long-term preservation were judged by applying historical events, cultural assets, and collection policies in the subject-based evaluation stage, which is the third stage of evaluation. The designed evaluation system can find significance in minimizing the arbitrariness reflected in the evaluation and increasing the efficiency of the evaluation, and it has been confirmed that it is possible to evaluate comprehensively reflecting the various contexts and values of the records. In addition, a re-evaluation system suitable for permanent records management institutions was established by combining balanced macro-evaluation and micro-evaluation.