• Title/Summary/Keyword: 선택적 바이그램

Search Result 9, Processing Time 0.023 seconds

Feature Extraction to Detect Hoax Articles (낚시성 인터넷 신문기사 검출을 위한 특징 추출)

  • Heo, Seong-Wan;Sohn, Kyung-Ah
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1210-1215
    • /
    • 2016
  • Readership of online newspapers has grown with the proliferation of smart devices. However, fierce competition between Internet newspaper companies has resulted in a large increase in the number of hoax articles. Hoax articles are those where the title does not convey the content of the main story, and this gives readers the wrong information about the contents. We note that the hoax articles have certain characteristics, such as unnecessary celebrity quotations, mismatch in the title and content, or incomplete sentences. Based on these, we extract and validate features to identify hoax articles. We build a large-scale training dataset by analyzing text keywords in replies to articles and thus extracted five effective features. We evaluate the performance of the support vector machine classifier on the extracted features, and a 92% accuracy is observed in our validation set. In addition, we also present a selective bigram model to measure the consistency between the title and content, which can be effectively used to analyze short texts in general.

Dual SMS SPAM Filtering: A Graph-based Feature Weighting Method (듀얼 SMS 스팸 필터링: 그래프 기반 자질 가중치 기법)

  • Hwang, Jae-Won;Ko, Young-Joong
    • Annual Conference on Human and Language Technology
    • /
    • 2014.10a
    • /
    • pp.95-99
    • /
    • 2014
  • 본 논문에서는 최근 급속히 증가하여 사회적 이슈가 되고 있는 SMS 스팸 필터링을 위한 듀얼 SMS 스팸필터링 기법을 제안한다. 지속적으로 증가하고 새롭게 변형되는 SMS 문자 필터링을 위해서는 패턴 및 스팸 단어 사전을 통한 필터링은 많은 수작업을 요구하여 부적합하다. 그리하여 기계 학습을 이용한 자동화 시스템 구축이 요구되고 있으며, 효과적인 기계 학습을 위해서는 자질 선택과 자질의 가중치 책정 방법이 중요하다. 하지만 SMS 문자 특성상 문장들이 짧기 때문에 출현하는 자질의 수가 적어 분류의 어려움을 겪게 된다. 이 같은 문제를 개선하기 위하여 본 논문에서는 슬라이딩 윈도우 기반 N-gram 확장을 통해 자질을 확장하고, 확장된 자질로 그래프를 구축하여 얕은 구조적 특징을 표현한다. 학습 데이터에 출현한 N-gram 자질을 정점(Vertex)으로, 자질의 출현 빈도를 그래프의 간선(Edge)의 가중치로 설정하여 햄(HAM)과 스팸(SPAM) 그래프를 각각 구성한다. 이렇게 구성된 그래프를 바탕으로 노드의 중요도와 간선의 가중치를 활용하여 최종적인 자질의 가중치를 결정한다. 입력 문자가 도착하면 스팸과 햄의 그래프를 각각 이용하여 입력 문자의 2개의 자질 벡터(Vector)를 생성한다. 생성된 자질 벡터를 지지 벡터 기계(Support Vector Machine)를 이용하여 각 SVM 확률 값(Probability Score)을 얻어 스팸 여부를 결정한다. 3가지의 실험환경에서 바이그램 자질과 이진 가중치를 사용한 기본 시스템보다 F1-Score의 약 최대 2.7%, 최소 0.5%까지 향상되었으며, 결과적으로 평균 약 1.35%의 성능 향상을 얻을 수 있었다.

  • PDF

A Study of Building Digital Capacity of Museum Professionals through the Use of Virtual Museum (가상박물관 활용을 통한 박물관 전문인력의 디지털 역량 강화 방안 연구)

  • Kim, Seon-Mi;Lee, Jong-Wook
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.10
    • /
    • pp.39-46
    • /
    • 2022
  • The overall digital transformation in society is rapidly progressing with the corona virus epidemic. In particular, in the field of cultural heritage and museums, digital transformation is taking place throughout the preservation, management, and utilization of cultural heritage. To respond to this, the importance of cultivating the digital literacy of museum professionals to select and utilize digital cultural heritage information is increasing. However, the current digital capacity education of museum professionals has not reached the cultivation of digital literacy due to one-way theory and one-way practical education. To overcome this, we propose a digital capacity building program using virtual museums. We propose a curriculum based on participatory museums, cooperative learning, and project-based learning theories. Learners experience the entire process of acquiring, selecting, and utilizing digital cultural heritage information through individual, cooperative, constant, exhibitions, and project-based learning programs. We were evaluated by experts in terms of education, museum education, and ICT technology education to prove its usability and derive improvements. This study will contribute to building the digital capacity of museum professionals.

Effect of Motive for Major Selection on Major Satisfaction, Campus-life Satisfaction, and Self-directed Learning Ability among Nursing Students (간호대학생의 전공선택동기가 전공만족도, 대학생활만족도 및 자기주도 학습능력에 미치는 영향)

  • Kim, Yu-Jeong;Yoo, Hana;Park, Mijeong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.10
    • /
    • pp.261-270
    • /
    • 2016
  • This study attempted to survey nursing students' motives for choosing nursing as their major, and to examine how such motives affect their satisfaction with their major, campus-life satisfaction, and self-directed learning ability. This study was conducted as a descriptive survey. Data were collected using a self-report questionnaire during the period from the 1st to 15th of April, 2015, and the questionnaires from 195 nursing students were used in the analysis using Fisher's exact test, t-test, one way ANOVA, Mann Whitney test, and ANCOVA. Only 41.5% of the nursing students chose nursing as their major because of their aptitude and interest. The motive for the selection of the major was found to have a significant effect on their satisfaction with the major (p<.001), campus-life satisfaction (p=.008), and self-directed learning ability (p=.001). Middle and high school students should be provided with various types of information on nursing, so that they can have the opportunity to choose nursing based on their aptitude and interest before entering university. Once they start university, nursing students' adjustment to campus-life and learning ability should be enhanced through various extracurricular activity programs in order to stimulate their interest in the major.

A Comparative Study on Optimal Feature Identification and Combination for Korean Dialogue Act Classification (한국어 화행 분류를 위한 최적의 자질 인식 및 조합의 비교 연구)

  • Kim, Min-Jeong;Park, Jae-Hyun;Kim, Sang-Bum;Rim, Hae-Chang;Lee, Do-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.11
    • /
    • pp.681-691
    • /
    • 2008
  • In this paper, we have evaluated and compared each feature and feature combinations necessary for statistical Korean dialogue act classification. We have implemented a Korean dialogue act classification system by using the Support Vector Machine method. The experimental results show that the POS bigram does not work well and the morpheme-POS pair and other features can be complementary to each other. In addition, a small number of features, which are selected by a feature selection technique such as chi-square, are enough to show steady performance of dialogue act classification. We also found that the last eojeol plays an important role in classifying an entire sentence, and that Korean characteristics such as free order and frequent subject ellipsis can affect the performance of dialogue act classification.

A Study on the Differences in Hotel Choice Factors according to the Payment Level of Accommodation Charge (숙박비 지불수준에 따른 호텔선택요인 차이연구)

  • Nam, Taeg-Yeong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.3
    • /
    • pp.33-43
    • /
    • 2020
  • The purpose of this study is to investigate the level of payment of accommodation charge to hotel customers and to analyze the differences in hotel choice factors according to the level of payment(low, medium and high prices) to present marketing measures for attracting customers by hotel price range. To achieve the purpose of research, a survey was conducted on hotel customers from February 1, 2020 to April 30, 2020. A total of 350 questionnaires were distributed, eliminating 45 inappropriate copies for analysis, and finally utilizing 305 questionnaires for analysis. According to the analysis, among the basic factors, the biggest difference between groups was hotel size, breakfast menu, restaurants, and auxiliary facilities. It was analyzed that there are differences between groups in the amenity section in the room factor and outside tourism programs in the incidental factor. The main factors were analyzed as the most important factor, although there were no differences between groups. Based on this, the marketing plan is proposed as follows. Low-cost hotels are targeted at women in their 20s with high school diplomas, and it is recommended to have low-cost price policies and promotions. Mid-priced hotels are targeted at men in their 40s with college degrees, and they should strive to operate shuttle buses, promote room prices, and educate employees. In the case of high-priced hotels, it was analyzed that overall service reinforcement, employee education, and viral marketing are important, targeting high school graduates in their 20s.

VRIFA: A Prediction and Nonlinear SVM Visualization Tool using LRBF kernel and Nomogram (VRIFA: LRBF 커널과 Nomogram을 이용한 예측 및 비선형 SVM 시각화도구)

  • Kim, Sung-Chul;Yu, Hwan-Jo
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.5
    • /
    • pp.722-729
    • /
    • 2010
  • Prediction problems are widely used in medical domains. For example, computer aided diagnosis or prognosis is a key component in a CDSS (Clinical Decision Support System). SVMs with nonlinear kernels like RBF kernels, have shown superior accuracy in prediction problems. However, they are not preferred by physicians for medical prediction problems because nonlinear SVMs are difficult to visualize, thus it is hard to provide intuitive interpretation of prediction results to physicians. Nomogram was proposed to visualize SVM classification models. However, it cannot visualize nonlinear SVM models. Localized Radial Basis Function (LRBF) was proposed which shows comparable accuracy as the RBF kernel while the LRBF kernel is easier to interpret since it can be linearly decomposed. This paper presents a new tool named VRIFA, which integrates the nomogram and LRBF kernel to provide users with an interactive visualization of nonlinear SVM models, VRIFA visualizes the internal structure of nonlinear SVM models showing the effect of each feature, the magnitude of the effect, and the change at the prediction output. VRIFA also performs nomogram-based feature selection while training a model in order to remove noise or redundant features and improve the prediction accuracy. The area under the ROC curve (AUC) can be used to evaluate the prediction result when the data set is highly imbalanced. The tool can be used by biomedical researchers for computer-aided diagnosis and risk factor analysis for diseases.

Distribution of HCV Genotypes in Chronic Korean HCV Patients

  • Lee, Kyung-Ok;Jeong, Su-Jin;Byun, Ji-Young;Shim, Ae-Sug;Seong, Hye-Soon;Kim, Kyung-Tae
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.39 no.1
    • /
    • pp.49-55
    • /
    • 2007
  • HCV is a single-stranded RNA virus and more than 1 million new cases are reported annually worldwide. The six major HCV genotypes and numerous subtypes vary in their geographic distribution. It is thought that genetic heterogeneity of HCV may account for some of the differences in disease outcome and response to treatment observed in HCV infected persons. In this study, we determined HCV genotypes among chronic Korean HCV patients and evaluated direct sequence PCR protocols developed. For the study, 232 chronic HCV patient sera were used. HCV RNA was extracted and two pairs of consensus PCR primers were selected in 5'UTR region for amplification of HCV RNA. Amplification products obtained from the HCV positive cases were subjected to automatic sequencing. Sequences were compared with those in GenBank by using the BLAST program. From this study, five HCV genotypes, 1b, 2a, 2b, 2c and 3a were found. HCV genotypes 4, 5 and 6 were not determined. HCV genotype 1b (53.9%, 125/232) and 2a (35.8%, 83/232) were most frequently found. This group was followed by 2b (3.9%, 9/232), 3a (3.4%, 8/232) and 2c (3.0%, 7/232). The data presented here suggest a complex distribution of HCV types and they were well correlated with other reports on Koreans and will be helpful for type-specific follow-up of Korean HCV patients. This study showed that 5'UTR direct sequence analysis is a sensitive and rapid method to identify HCV genotypes.

  • PDF

Parotid Gland Tumors (이하선종양에 대한 임상적고찰)

  • 박혁동;심윤상;오경균;이용식
    • Proceedings of the KOR-BRONCHOESO Conference
    • /
    • 1993.05a
    • /
    • pp.97-97
    • /
    • 1993
  • Primary tumor arises infrequently in the parotid gland and generally, only about 20 to 40 percent of which prove to be malignant. They are characterized by histopathologic diversity, slow tumor growth, significant proportion of patients who have received previous treatment elsewhere. We have reviewed retrospectively 101 cases of parotid gland tumors which were treated for the recent eight years (1985-1992), Non-neoplastic tumor-like lesions were all excluded.

  • PDF