• Title/Summary/Keyword: Social big data analysis

Search Result 723, Processing Time 0.029 seconds

A Comparison Study of RNN, CNN, and GAN Models in Sequential Recommendation (순차적 추천에서의 RNN, CNN 및 GAN 모델 비교 연구)

  • Yoon, Ji Hyung;Chung, Jaewon;Jang, Beakcheol
    • Journal of Internet Computing and Services
    • /
    • v.23 no.4
    • /
    • pp.21-33
    • /
    • 2022
  • Recently, the recommender system has been widely used in various fields such as movies, music, online shopping, and social media, and in the meantime, the recommender model has been developed from correlation analysis through the Apriori model, which can be said to be the first-generation model in the recommender system field. In 2005, many models have been proposed, including deep learning-based models, which are receiving a lot of attention within the recommender model. The recommender model can be classified into a collaborative filtering method, a content-based method, and a hybrid method that uses these two methods integrally. However, these basic methods are gradually losing their status as methodologies in the field as they fail to adapt to internal and external changing factors such as the rapidly changing user-item interaction and the development of big data. On the other hand, the importance of deep learning methodologies in recommender systems is increasing because of its advantages such as nonlinear transformation, representation learning, sequence modeling, and flexibility. In this paper, among deep learning methodologies, RNN, CNN, and GAN-based models suitable for sequential modeling that can accurately and flexibly analyze user-item interactions are classified, compared, and analyzed.

A Study on the Landscape Cognition of Wind Power Plant in Social Media (소셜미디어에 나타난 풍력발전시설의 경관 인식 연구)

  • Woo, Kyung-Sook;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.50 no.5
    • /
    • pp.69-79
    • /
    • 2022
  • This study aims to assess the current understanding of the landscape of wind power facilities as renewable energy sources that supply sightseeing, tourism, and other opportunities. Therefore, social media data related to the landscape of wind power facilities experienced by visitors from different regions was analyzed. The analysis results showed that the common characteristics of the landscape of wind power facilities are based on the scale of wind power facilities, the distance between overlook points of wind power facilities, the visual openness of the wind power facilities from the overlook points, and the terrain where the wind power facilities are located. In addition, the preference for wind power facilities is higher in places where the shape of wind power facilities and the surrounding landscape can be clearly seen- flat ground or the sea are considered better landscapes. Negative keywords about the landscape appear on Gade Mountain in Taibai, Meifeng Mountain in Taibai, Taiqi Mountain, and Gyeongju Wind Power Generation Facilities on Gyeongshang Road in Gangwon. The keyword 'negation' occurs when looking at wind power facilities at close range. Because of the high angle of the view, viewers can feel overwhelmed seeing the size of the facility and the ridge simultaneously, feeling psychological pressure. On the contrary, positive landscape adjectives are obtained from wind power facilities on flat ground or the sea. Visitors think that the visual volume of the landscape is fully ensured on flat ground or the sea, and it is a symbolic element that can represent the site. This study analyzes landscape awareness based on the opinions of visitors who have experienced wind power facilities. However, wind power facilities are built in different areas. Therefore, landscape characteristics are different, and there are many variables, such as viewpoints and observers, so the research results are difficult to popularize and have limitations. In recent years, landscape damage due to the construction of wind power facilities has become a hot issue, and the domestic methods of landscape evaluation of wind power facilities are unsatisfactory. Therefore, when evaluating the landscape of wind power facilities, the scale of wind power facilities, the inherent natural characteristics of the area where wind power facilities are set up, and the distance between wind power facilities and overlook points are important elements to consider. In addition, wind power facilities are set in the natural environment, which needs to be protected. Therefore, from the landscape perspective, it is necessary to study the landscape of wind power facilities and the surrounding environment.

Analysis on Change in Korean Marriage Behaviors (한국인 혼인행태 변화분석)

  • 이삼식
    • Korea journal of population studies
    • /
    • v.16 no.2
    • /
    • pp.84-110
    • /
    • 1993
  • This study aims at identifying the recent change in marriage behaviors in Korea. The data used here is the vital statistics compiled from the vital registration system of which registration form is put on one from together with the civil registration form. According to the results of this analysis, since 1970 the number of marriages has steadily increased from about 300, 000 in the former of 1970s to about 400, 000 in the latter of 1980s, appproximately coincided with the change in population size at the marriageable age span. The few exceptions that can be seen in the 1970s seem to result from the impact of social upheavals during 1950s; since the birth cohorts affected by the low fertility during the Korean war and the post-war baby-boom generations chracterized by the high fertility entered the marriage market in the 1970s. However, the marriage rate shows a little increase from around 7 in the former of 1970s to around 9 in the latter of 1980s, indicating that the marriage prevalence has been more or less inconsiderably changed during this period. It is also found that the proportion of remarriage to the total marriages has increased to around 10 per cent in 1989, while decreasing that of first marriage. This fact can be attributable to the higher prevalence of divorces and the collapsing of the Confucianism ethic which contributed to expediting the remarriage of widows. Although this proportion is insignificant compared with that of the of more developed countries, it is not difficult to say that the proportion of remarriages will continue to increase in future. The age first at first marriage(AFM) which directly affects the span exposed to the risks pregnancy has increased to the age about 28 for male and about 25 for female in recent years. However, big difference in AFM between urban and rural areas has narrowed, resultant from the increasing involuntary postponement of marriage of rural young population who have met difficulties in seeking their bride or bridegroom in rural areas characterized by the heavy out-migration of young, particularly female, population. The present study shows the reverse relationship between AFM and educational attainment; i.e, the higher the educational attainment the lower the AFM. The conditions which are taken into considerations were the class and the family in the past time but which are, educational attainment, job and personal characteristics. With regard to the age condition, in recent years the male prefers the female younger than himself on the average by 3 years and vice versa, which is reduced form 4-5 years in beginning of 1970s. The age difference bride and bridegroom tends to decrease with the educational attainment increase. This may be attributable to the fact that the persons with the higher educational attainment prefer the love marriage and hence are more likely to choose their counterparts in the about same age. The education condition is characterized by the bridegroom having the higher educational level than bride. It is also significant to note that the proportion of love marriage has increased, whereas that of traditional arranged marriage has decreased. This is true in the urban areas than the rural areas, indicating that rights as well as responsibilities for marriage have been handed over the young population from their parents. In conclusion, the change in the marriage behaviors in Korea are characterized by increasing tendency for the postponement of first marriage, higher prevalences of divorces and a result remarriages, increase of love marriages, narrowing age difference between bride and bridegroom, etc. which are the main results of rapid industrization, increase in educational and economic activity opportunities and change in the ideals of marriages during the past decades. These phenomena prevailing in Korean society would affect not only the family structure that will become less proliferiated but the population size and structure. The most important is that the changes in marriage behaviors of Koreans and their impact on the society with respect to norms, values, morals, of individual and family in the social aspect, change in population size and structure in the demograpic aspects, and economic development in the economic aspects should be integrated into the plannings towards to the future.

  • PDF

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.

Analysis of the Weight of SWOT Factors of Korean Venture Companies Based on the Industry 4.0 (4차 산업혁명 기반 한국 벤처기업의 SWOT요인에 대한 중요도 분석)

  • Lee, Dongik;Lee, Sangsuk
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.16 no.4
    • /
    • pp.115-133
    • /
    • 2021
  • This study examines the concept and related technologies of the 4th industrial revolution that has been mixed so far and examines the socio-economic changes and influences resulting from it, and the cases of responding to the 4th industrial revolution in major countries. Based on this, by deriving SWOT factors and calculating the importance of each factor for Korean venture companies to prepare for the forth industrial revolution, it was intended to help the government and policymakers in suggesting directions for establishing related policies. Furthermore, the purpose of this study was to suggest a direction for securing global competitiveness to Korean venture entrepreneurs and to help with basic and systematic analysis for further academic in-depth research. For this study, a total of 21 items derived through extensive literature research and data research to understand what are the necessary competency factors for internal and external environmental changes in order for Korean venture companies to have global competitiveness in the era of the 4th Industrial Revolution. After reviewing SWOT factors by three expert groups and confirming them through Delphi survey, the importance of each item was analyzed by using AHP, a systematic decision-making technique. As a result of the analysis, it was shown that Strength(48%), Opportunity(25%), Threat(16%), Weakness(11%) were considered important in order. In terms of sub-items, 'quick and flexible commercialization capability', 'platform/big data/non-face-to-face service activation', and 'ICT infrastructure and it's utilization' were shown to be of the comparatively high importance. On the other hand, in the lower three items, 'macro-economic stability and social infrastructure', 'difficulty in entering overseas markets due to global protectionism', and 'absolutely inferior in foreign investment' were found to have low priority. As a result of the correlation verification by item to see differences in opinions by industry, academia, and policy expert groups, there was no significant difference of opinion, as industry and academic experts showed a high correlation and industry experts and policy experts showed a moderate correlation. The correlation between the academic and policy experts was not statistically significant (p<0.01), so it was analyzed that there was a difference of opinion on importance. This was due to the fact that policy experts highly valued 'quick and flexible commercialization', which are strengths, and 'excellent educational system and high-quality manpower' and 'creation of new markets' which are opportunity items, while academic experts placed great importance on 'support part of government policy', which are strengths. The implication of this study is that in order for Korean venture companies to secure competitiveness in the field of the 4th industrial revolution, it is necessary to have a policy that preferentially supports the relevant items of strengths and opportunity factors. The difference in the details of strength factors and opportunity factors, which shows a high level of variability, suggests that it is necessary to actively review it and reflect it in the policy.

A Method for Evaluating News Value based on Supply and Demand of Information Using Text Analysis (텍스트 분석을 활용한 정보의 수요 공급 기반 뉴스 가치 평가 방안)

  • Lee, Donghoon;Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.45-67
    • /
    • 2016
  • Given the recent development of smart devices, users are producing, sharing, and acquiring a variety of information via the Internet and social network services (SNSs). Because users tend to use multiple media simultaneously according to their goals and preferences, domestic SNS users use around 2.09 media concurrently on average. Since the information provided by such media is usually textually represented, recent studies have been actively conducting textual analysis in order to understand users more deeply. Earlier studies using textual analysis focused on analyzing a document's contents without substantive consideration of the diverse characteristics of the source medium. However, current studies argue that analytical and interpretive approaches should be applied differently according to the characteristics of a document's source. Documents can be classified into the following types: informative documents for delivering information, expressive documents for expressing emotions and aesthetics, operational documents for inducing the recipient's behavior, and audiovisual media documents for supplementing the above three functions through images and music. Further, documents can be classified according to their contents, which comprise facts, concepts, procedures, principles, rules, stories, opinions, and descriptions. Documents have unique characteristics according to the source media by which they are distributed. In terms of newspapers, only highly trained people tend to write articles for public dissemination. In contrast, with SNSs, various types of users can freely write any message and such messages are distributed in an unpredictable way. Again, in the case of newspapers, each article exists independently and does not tend to have any relation to other articles. However, messages (original tweets) on Twitter, for example, are highly organized and regularly duplicated and repeated through replies and retweets. There have been many studies focusing on the different characteristics between newspapers and SNSs. However, it is difficult to find a study that focuses on the difference between the two media from the perspective of supply and demand. We can regard the articles of newspapers as a kind of information supply, whereas messages on various SNSs represent a demand for information. By investigating traditional newspapers and SNSs from the perspective of supply and demand of information, we can explore and explain the information dilemma more clearly. For example, there may be superfluous issues that are heavily reported in newspaper articles despite the fact that users seldom have much interest in these issues. Such overproduced information is not only a waste of media resources but also makes it difficult to find valuable, in-demand information. Further, some issues that are covered by only a few newspapers may be of high interest to SNS users. To alleviate the deleterious effects of information asymmetries, it is necessary to analyze the supply and demand of each information source and, accordingly, provide information flexibly. Such an approach would allow the value of information to be explored and approximated on the basis of the supply-demand balance. Conceptually, this is very similar to the price of goods or services being determined by the supply-demand relationship. Adopting this concept, media companies could focus on the production of highly in-demand issues that are in short supply. In this study, we selected Internet news sites and Twitter as representative media for investigating information supply and demand, respectively. We present the notion of News Value Index (NVI), which evaluates the value of news information in terms of the magnitude of Twitter messages associated with it. In addition, we visualize the change of information value over time using the NVI. We conducted an analysis using 387,014 news articles and 31,674,795 Twitter messages. The analysis results revealed interesting patterns: most issues show lower NVI than average of the whole issue, whereas a few issues show steadily higher NVI than the average.

The Analysis of Urban Park Catchment Areas - Perspectives from Quality Service of Hangang Park - (한강공원의 질적 서비스와 이용자 영향권의 상관관계 분석)

  • Lee, Seo Hyo;Kim, Harry;Lee, Jae Ho
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.6
    • /
    • pp.27-36
    • /
    • 2021
  • At a time when the equitable use of urban parks is gradually emerging as a social issue, this study was initiated to expand the influence of urban parks by improving the quality of park services, thereby resolving areas not covered by urban park services. This study targeted the Hangang Park in Seoul, where the qualitative service of parks shows the greatest difference. The influence relationship between the qualitative services of the park and the user's sphere of influence, which indicates the distribution of park users, was proposed to assess the influence of improvements in the quality of service. As a research method, the top three districts and the bottom three districts were selected through the Han River Park user satisfaction survey conducted from 2017 to 2019, and a qualitative service evaluation was carried out. It was derived using the data acquired in September. Afterward, by performing a spatial autocorrelation analysis on the user's sphere of influence, additional verification of the user's sphere of influence was performed numerically and visually. As a result of the study, the user influence in the top three districts, with high-quality service, was stronger and wider than that of the lower three districts. It was confirmed that the quality of service of the park affects the user influence. This shows that to realize park equity, it is necessary to improve the quality of services through continuous management and improvement of individual parks and the creation of new parks. This study has significance in that it recognizes the limitations of research on park services from a supplier's point of view and evaluates the qualitative services of parks from the perspective of actual park users. We propose an alternative to deal with the lower the park deprivation index.

Analysis of Urban Growth Pattern and Characteristics by Administrative District Hierarchy : 1985~2005 (행정구역 위계별 도시성장 패턴 및 특성 분석 : 1985~2005를 중심으로)

  • Park, So-Young;Jeon, Sung-Woo;Choi, Chul-Uong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.12 no.4
    • /
    • pp.34-47
    • /
    • 2009
  • Rapid urbanization is causing environmental and ecological damage, development thoughtless for the environment, and social and economical issues. It is important to grasp urban growth situations and characteristics, reflect them, and establish a policy for the solution of issues pursuant to urbanization and the sustainable and efficient development of national land. This research aims to be used as basic data in establishing an urban policy by analyzing the situations and characteristics of urban growth for the past 20 years in our entire country rather than an existing district. For this, some urban districts were sampled using a 1980s and 2000s version of land cover map produced by Ministry of Environment, and then pattern analysis for urban growth by administrative district ranks was conducted using GIS and a statistical technique. As a result, the development zone area after 1980s has increased by 2.5 times as compared to that before 1980s, and especially in the farm villages neighboring the national capital region, it has increased by 21.2 times. Special cities and metropolitan cities were developed at the districts being low in altitude, close to the principal road and the major downtown, high in road ratio, and restricted environmentally, ecologically and legally, and were diverted from mountains, forests and grassland to urban land. On the other hand, farm villages neighboring a large city, farm villages neighboring the national capital region, and local farm villages were developed at the districts being high in altitude, far from the principal road and the major downtown, low in road ratio, and not restricted environmentally, ecologically and legally, and were diverted from farmland to urban land. That is, it can be seen that urban development has been actively realized despite the unfavorable topographical conditions in the suburban districts due to lack of available land and various regulations and policies as urban growth around big cities expands.

  • PDF

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

A study on the employment preparation cost and attitude of college student for Job-seeking (국내 대학생의 취업태도 및 취업준비 비용에 관한 연구)

  • Chung, Bhum-Suk;Jeong, Hwa-Min
    • Management & Information Systems Review
    • /
    • v.33 no.4
    • /
    • pp.1-19
    • /
    • 2014
  • This Study focuses on the university students' job attitude and cost of employment preparation. Nowadays, many university and college students spend a big money improving their employment preparation such as studying on foreign language, getting various kinds of certificates and tooth correction, clothing etc. for employment interview. This study investigated the cost of employment preparation and Job attitude of the 484 students of universities and colleges, the analysis of the collected data was conducted with SPSS 12.0 program by using frequency analysis, factor analysis, reliability assessment, correlation test, t-test, one way ANOVA. The university students paid more costs of employment preparation such as a language training abroad, a private training, and clothing than the college students. Also, Allied social science students paid more costs of the language training abroad, and clothing than allied computer science and allied design students. The female students paid more money than male students for tooth correction. The costs of language training abroad, private training and clothing are affected the students' socioeconomic background of a home. Regarding the job attitude of students, the university students are feeling more positive than the college students of the employment efficacy and cognition of the education environment. As result, the differences in the cost of employment preparation by the university type, faculty major course, their sex, and socioeconomic background of a home. The student's employment-efficacy and cognition of the education environment are also differences between the university and the college students. So, to improve the job attitude, developing their ability for employment preparation, educational programs should be arranged in school and continuous researches are needed.

  • PDF