• 제목/요약/키워드: D-LDA

검색결과 66건 처리시간 0.024초

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • 제19권3호
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

Wavelet based Fuzzy Integral System for 3D Face Recognition (퍼지적분을 이용한 웨이블릿 기반의 3차원 얼굴 인식)

  • Lee, Yeung-Hak;Shim, Jae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • 제35권10호
    • /
    • pp.616-626
    • /
    • 2008
  • The face shape extracted by the depth values has different appearance as the most important facial feature information and the face images decomposed into frequency subband are signified personal features in detail. In this paper, we develop a method for recognizing the range face images by combining the multiple frequency domains for each depth image and depth fusion using fuzzy integral. For the proposed approach, the first step tries to find the nose tip that has a protrusion shape on the face from the extracted face area. It is used as the reference point to normalize for orientated facial pose and extract multiple areas by the depth threshold values. In the second step, we adopt as features for the authentication problem the wavelet coefficient extracted from some wavelet subband to use feature information. The third step of approach concerns the application of eigenface and Linear Discriminant Analysis (LDA) method to reduce the dimension and classify. In the last step, the aggregation of the individual classifiers using the fuzzy integral is explained for extracted coefficient at each resolution level. In the experimental results, using the depth threshold value 60 (DT60) show the highest recognition rate among the regions, and the depth fusion method achieves 98.6% recognition rate, incase of fuzzy integral.

3D Face Recognition using Wavelet Transform Based on Fuzzy Clustering Algorithm (펴지 군집화 알고리즘 기반의 웨이블릿 변환을 이용한 3차원 얼굴 인식)

  • Lee, Yeung-Hak
    • Journal of Korea Multimedia Society
    • /
    • 제11권11호
    • /
    • pp.1501-1514
    • /
    • 2008
  • The face shape extracted by the depth values has different appearance as the most important facial information. The face images decomposed into frequency subband are signified personal features in detail. In this paper, we develop a method for recognizing the range face images by multiple frequency domains for each depth image using the modified fuzzy c-mean algorithm. For the proposed approach, the first step tries to find the nose tip that has a protrusion shape on the face from the extracted face area. And the second step takes into consideration of the orientated frontal posture to normalize. Multiple contour line areas which have a different shape for each person are extracted by the depth threshold values from the reference point, nose tip. And then, the frequency component extracted from the wavelet subband can be adopted as feature information for the authentication problems. The third step of approach concerns the application of eigenface to reduce the dimension. And the linear discriminant analysis (LDA) method to improve the classification ability between the similar features is adapted. In the last step, the individual classifiers using the modified fuzzy c-mean method based on the K-NN to initialize the membership degree is explained for extracted coefficient at each resolution level. In the experimental results, using the depth threshold value 60 (DT60) showed the highest recognition rate among the extracted regions, and the proposed classification method achieved 98.3% recognition rate, incase of fuzzy cluster.

  • PDF

3D Face Recognition in the Multiple-Contour Line Area Using Fuzzy Integral (얼굴의 등고선 영역을 이용한 퍼지적분 기반의 3차원 얼굴 인식)

  • Lee, Yeung-Hak
    • Journal of Korea Multimedia Society
    • /
    • 제11권4호
    • /
    • pp.423-433
    • /
    • 2008
  • The surface curvatures extracted from the face contain the most important personal facial information. In particular, the face shape using the depth information represents personal features in detail. In this paper, we develop a method for recognizing the range face images by combining the multiple face regions using fuzzy integral. For the proposed approach, the first step tries to find the nose tip that has a protrusion shape on the face from the extracted face area and has to take into consideration of the orientated frontal posture to normalize. Multiple areas are extracted by the depth threshold values from reference point, nose tip. And then, we calculate the curvature features: principal curvature, gaussian curvature, and mean curvature for each region. The second step of approach concerns the application of eigenface and Linear Discriminant Analysis(LDA) method to reduce the dimension and classify. In the last step, the aggregation of the individual classifiers using the fuzzy integral is explained for each region. In the experimental results, using the depth threshold value 40 (DT40) show the highest recognition rate among the regions, and the maximum curvature achieves 98% recognition rate, incase of fuzzy integral.

  • PDF

An Exploratory Research Trends Analysis in Journal of the Korea Contents Association using Topic Modeling (토픽 모델링을 활용한 한국콘텐츠학회 논문지 연구 동향 탐색)

  • Seok, Hye-Eun;Kim, Soo-Young;Lee, Yeon-Su;Cho, Hyun-Young;Lee, Soo-Kyoung;Kim, Kyoung-Hwa
    • The Journal of the Korea Contents Association
    • /
    • 제21권12호
    • /
    • pp.95-106
    • /
    • 2021
  • The purpose of this study is to derive major topics in content R&D and provide directions for academic development by exploring research trends over the past 20 years using topic modeling targeting 9,858 papers published in the Journal of the Korean Contents Association. To secure the reliability and validity of the extracted topics, not only the quantitative evaluation technique but also the qualitative technique were applied step-by-step and repeated until a corpus of the level agreed upon by the researchers was generated, and detailed analysis procedures were presented accordingly. As a result of the analysis, 8 core topics were extracted. This shows that the Korean Contents Association is publishing convergence and complex research papers in various fields without limiting to a specific academic field. Also, before 2012, the proportion of topics in the field of engineering and technology appeared relatively high, while after 2012, the proportion of topics in the field of social sciences appeared relatively high. Specifically, the topic of 'social welfare' showed a fourfold increase in the second half compared to the first half. Through topic-specific trend analysis, we focused on the turning point in time at which the inflection point of the trend line appeared, explored the external variables that affected the research trend of the topic, and identified the relationship between the topic and the external variable. It is hoped that the results of this study can provide implications for active discussions in domestic content-related R&D and industrial fields.

Speech Recognition Using Linear Discriminant Analysis and Common Vector Extraction (선형 판별분석과 공통벡터 추출방법을 이용한 음성인식)

  • 남명우;노승용
    • The Journal of the Acoustical Society of Korea
    • /
    • 제20권4호
    • /
    • pp.35-41
    • /
    • 2001
  • This paper describes Linear Discriminant Analysis and common vector extraction for speech recognition. Voice signal contains psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word spelled out by different speakers can be very different heard. This property of speech signal make it very difficult to extract common properties in the same speech class (word or phoneme). Linear algebra method like BT (Karhunen-Loeve Transformation) is generally used for common properties extraction In the speech signals, but common vector extraction which is suggested by M. Bilginer et at. is used in this paper. The method of M. Bilginer et al. extracts the optimized common vector from the speech signals used for training. And it has 100% recognition accuracy in the trained data which is used for common vector extraction. In spite of these characteristics, the method has some drawback-we cannot use numbers of speech signal for training and the discriminant information among common vectors is not defined. This paper suggests advanced method which can reduce error rate by maximizing the discriminant information among common vectors. And novel method to normalize the size of common vector also added. The result shows improved performance of algorithm and better recognition accuracy of 2% than conventional method.

  • PDF