A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)
-
- Journal of Intelligence and Information Systems
- /
- v.19 no.3
- /
- pp.1-23
- /
- 2013
To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.
For the stable and high yields of low-land rice in Korea, the characteristics of rice plant for the vegetative and physiological responses, plant type formation, and yield components have been studied in order to obtain the fundamental data for the improvement of cultural practices, especially for the ideal fertilizer application. Furthermore the environmental conditions in Korea including temperatures, light, precipitation, and soil conditions have been compared in the broad sense with those in Japan, and the application of nitrogen, phosphorus, potassium, silicate and other micro-nutrients were described in relation to the characteristics of environmental conditions for the improvement of fertilizer application. 1. The average yield of polished-rice per 10 are in Korea is about 204 kg and this values are much less than those in Japan and Taiwan where they produce 77% to 13% more than in Korea. The rate of yield increase a year in Korea is 4.2 kg, but in Japan and Taiwan the rates of yield increase a year are 81 % and 62%, respectively. It was also found that the coefficient of variation of yield is 7.7% in Korea, 6.7% in Japan and 2.5% in Taiwan. This means that the stability of producing rice in Korea is very low when compared with those in Japan and Taiwan. 2. It was learned from the results obtained from the 'annual yield estimation experiment' that there are big differences in the respect of plant type formations between rice crops grown in Japan and Korea. The important differences found were as follows: (1) The numbers of spikelets per 3.3 square meters are 891 in Korea and 1, 007 in Japan(13% more than in Korea). (2) The numbers of tillers per 3.3 square meters at the stage of maximum tillering are 1, 150 in Korea, but in Japan they showed 19% more than in Korea. (3) The ratio of effective tillers to total tillers is 77.5% in Korea and 74.7% in Japan, which seems to be higher in Korea than in Japan. But the ratio in Korea is very low when considered the numbers of total tillers in both countries. (4) The ratio of grain to straw is 85.4% in Korea and 96.3% in Japan. 3. The average temperatures during the growing season at the area of Suwon, Kwangjoo and Taegu are almost same as those in the district of Jookokoo(Fookoo yama) in Japan, i.e., the temperatures during the rice-growing season in Korea are similar to those in the southern-warm regions of Japan. 4. Considering the minimum temperatures at the stage of limiting transplanting, 13