• 제목/요약/키워드: Common number network

Search Result 210, Processing Time 0.035 seconds

A Study on the Age Distribution Factors of One Person Household in Seoul using Multiple Regression Analysis (다중회귀분석을 이용한 서울시 1인 가구의 연령별 분포요인에 관한 연구)

  • Lee, SunHee;Yoon, DongHyeun;Koh, JuneHwan
    • Spatial Information Research
    • /
    • v.23 no.3
    • /
    • pp.11-21
    • /
    • 2015
  • While the number of total population in Seoul has been on the constant decline for the last few years, the number of household has increased due to the rising tendency of the smaller households. In 2010, the small households in the metropolitan areas accounted for 44% of the entire households, and Statistics Korea has reported that one person household, which will take up more than 30% of the whole household, will have been the most common type of household by 2020. This reason of rise will be differently shown according to age like the preferred housing type or surrounding environments, this research is suggest to research hypothesis that distinction of age leads to the spatial distribution of one person household. Therefore, this research is to exercise a multiple regression analysis targeting on the facilities, which become the spatial distribution factor of one person household, with the independent variable gained from the concluded area calculated with the area ratio of the spatial unit followed by the service area analysis based on network. The spatial unit is the census output of Seoul, and based on this the interaction between the number of one person household according to age and the factors of its distribution. Also, the spatial regions - downtown, northeast, southeast, northwest, southwest - are designed as dummy variables and the results of each region are found out. As a result, the spatial regions occupied according to age are found to be varied - people in their 20s prefer housings near the college, 30s lease or the monthly rental housings, 40s the monthly rental housings, and over 60s the housing with the floor area of less than $40m^2$. Likewise, one person household has different types of housing environments preferred according to age, and thus a housing policy concerning this will have to be suggested.

A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.73-95
    • /
    • 2021
  • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.

National Survey of Sarcoidosis in Korea (유육종증 전국실태조사)

  • 대한결핵 및 호흡기학회 학술위원회
    • Tuberculosis and Respiratory Diseases
    • /
    • v.39 no.6
    • /
    • pp.453-473
    • /
    • 1992
  • Background: National survey was performed to estimate the incidence of sarcoidosis in Korea. The clinical data of confirmed cases were analysed for the practice of primary care physicians and pulmonary specialists. Methods: The period of study was from January 1991 to December 1992. Data were retrospectively collected by correspondence with physicians in departments of internal medicine, dermatology, ophthalmology and neurology of the hospitals having more than 100 beds using returning postcards. In confirmed and suspicious cases of sardoidosis, case record chart for clinical and laboratory findings were obtained in detail. Results: 1) Postcards were sent to 523 departments in 213 hospitals. Internal medicine composed 41%, dermatology 20%, ophthalmology 20% and neurology 19%. 2) Postcards were returned from 241 departments (replying rates was 48%). 3) There were 113 confirmed cases from 50 departments and 10 cases. The cases were composed from internal medicine (81%), dermatology (13%), ophthalmology (3%) and neurology (3%). 78 confirmed cases were analysed, which were composed from department of internal medicine (92%), dermatology (5%), and neurology (3%). 4) The time span for analysed cases was 1980 to 1992. one case was analysed in 1980 and the number gradually increased to 18 cases in 1991. 5) The majority of patients (84.4%) were in the age group of 20 to 49 years. 6) The ratio of male to female was 1 : 1.5. 7) The most common chief complains were respiratory symptoms, dermatologic symptoms, generalized discomforts, visual changes, arthralgia, abdominal pains, and swallowing difficulties in order. 16% of the patients were asymptomatic. 8) Mean duration between symptom onset and diagnosis was 2 months. 9) The most common symptoms were respiratory, general, dermatologic, ophthalmologic, neurologic and cardiac origin in order. 10) Hemoglobin, hematocrits and platelet were in normal range. 58% of the patients had lymphopenia measuring less than 30% of white cell count. The ratio of CD4 to CD8 lymphocytes was $1.73{\pm}1.16$ with range of 0.43 to 4.62. ESR was elevated in 43% of the cases. 11) Blood chemistry was normal in most cases. Serum angiotensin converting enzyme (S-ACE) was $66.8{\pm}58.6\;U/L$ with the range of 8.79 to 265 U /L. Proteinuria of more than 150 mg was found in 42. 9% of the patients. 12) Serum IgG was elevated in 43.5%, IgA in 45.5%, IgM in 59.1% and IgE in 46.7%. The levels of complement C3 and C4 were in the normal range. Anti-nuclear antibody was detected in 11% of the cases. Kweim test was performed in 3 cases, and in all cases the result was positive. 13) FVC was decreased in 17.3%, FEV1 in 11.5%, FEV1/FVC in 10%, TLC in 15.2%, and DLco in 64.7%. 14) PaO2 was decreased below 90 mmHg in 48.6% and PaCO2 was increased above 45 mmHg in 5.7%. 15) The percentage of macrophages in BAL fluid was $51.4{\pm}19.2%$, lymphocytes $44.4{\pm}21.1%$, and the ratio of CD4 to CD8 lymphocytes was $3.41{\pm}2.07$. 16) There was no difference in laboratory findings between male and female. 17) Hilar enlargement on chest PA was present in 87.9% (bilaterally in 78.8% and unilaterally in 9.1%). 18) According to Siltzbach's classification, stage 0 was 5%, stage 158.3%, stage 228.3%, and stage 38.3%. 19) Hilart enlargement on chest CT was present in 92.6% (bilaterally 76.4% and unilaterally in 16.2%). 20) HRCT was done in 16 cases. The most common findings were nodules, interlobular thickening, focal patchy infiltrations in order. Two cases was normal finding. 21) Other radiologic examinations showed bone change in one case and splenomegaly in two cases. 22) Gallium scan was done in 12 cases. Radioactivity was increased in hilar and mediastinal lymph nodes in 8 cases and in parenchyme in 2 cases. 23) The pathologic diagnosis was commonly performed by transbrochial lung biopsy (TBLB, 47.3%), skin and mediastinal lymph nodes biopsy (34.5%), peripheral lymph nodes biopsy (23.6%), open lung biopsy (18.2%) and bronchial biopsy in order. 24) The most common findings in pathology were non·caseating granuloma (100%), multi-nucleated giant cell (47.3%), hyalinized acellular scar (34.5%), reticulin fibrin network (20%), inclusion body (10.9%), necrosis (9.1%), and lymphangitic distribution of granuloma (1.8%) in order. Conclusion: Clinical, laboratory, radiologic and pathologic findings were summarized. This collected data will assist in finding a test for detection and staging of sarcoidosis in Korea in near future.

  • PDF

The Effects of Global Entrepreneurship and Social Capital Within Supply Chain on the Export Performance (글로벌 기업가정신과 공급사슬 내 사회적 자본이 수출성과에 미치는 영향)

  • Yoon, Heon-Deok;Kwak, Ki-Young;Seo, Ri-Bin
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.7 no.3
    • /
    • pp.1-16
    • /
    • 2012
  • Under the international business circumstance, global supply chain management is considered a vital strategic challenge to small and medium-sized enterprises(SMEs) suffering from deficient resources and capabilities to exploit overseas markets comparing with large corporations. That is because they can expand their business domains into overseas markets by establishing strategic alliances with global supply chain partners. Although a wide range of previous researches have emphasized the cooperative networks in the chain, most are ignoring the importance of developing relational characteristics such as trust and reciprocity with the partners. Besides, verifying the relational factors influencing firms' export performances, some studies proposed different and inconsistent factors. According to the social capital theory, which is the social quality and networks facilitating close cooperation of inter-individual and inter-organization, provides the integrated view to identify the relational characteristics in the aspects of network, trust and reciprocal norm. Meanwhile, a number of researchers shows that global entrepreneurship is the internal and intangible resource necessary to promote SMEs' internationalization. Upon closer examination, however, they cannot explain clearly its influencing mechanism in the inter-firm cooperative relationships. This study is to verify the effect of social capital accumulated within global supply chain on SMEs' qualitative and quantitative export performance. In addition, we shed new light on global entrepreneurship expected to be concerned with the formation of social capital and the enhancement of export performances. For this purpose, the questionnaires, developed through literature review, were collected from 192 Korean SMEs affiliated in Korean Medium Industries Association and Global Chief Executive Officer's Club focusing on their memberships' international business. As a result of multi-regression analysis, the social capital - network, trust and reciprocal norm shared with global supply chain partner - as well as global entrepreneurship - innovativeness, proactiveness and risk-taking - have positive effect on SMEs' export performances. Also global entrepreneurship affects positively social capital which has mediating effect partially in the relationship between global entrepreneurship and performances. These results means that there is a structural process - global entrepreneurship(input), social capital(output), and export performances(outcome). In other words, a firm should consistently invest in and develop the social capital with global supply chain partners in order to achieve common goals, establish strategic collaborations and obtain long-term export performances. Furthermore, it is required to foster the global entrepreneurship in an organization so as to build up the social capital. More detailed practical issues and discussion are made in the conclusion.

  • PDF

A Study on the Revitalization of Tourism Industry through Big Data Analysis (한국관광 실태조사 빅 데이터 분석을 통한 관광산업 활성화 방안 연구)

  • Lee, Jungmi;Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.149-169
    • /
    • 2018
  • Korea is currently accumulating a large amount of data in public institutions based on the public data open policy and the "Government 3.0". Especially, a lot of data is accumulated in the tourism field. However, the academic discussions utilizing the tourism data are still limited. Moreover, the openness of the data of restaurants, hotels, and online tourism information, and how to use SNS Big Data in tourism are still limited. Therefore, utilization through tourism big data analysis is still low. In this paper, we tried to analyze influencing factors on foreign tourists' satisfaction in Korea through numerical data using data mining technique and R programming technique. In this study, we tried to find ways to revitalize the tourism industry by analyzing about 36,000 big data of the "Survey on the actual situation of foreign tourists from 2013 to 2015" surveyed by the Korea Culture & Tourism Research Institute. To do this, we analyzed the factors that have high influence on the 'Satisfaction', 'Revisit intention', and 'Recommendation' variables of foreign tourists. Furthermore, we analyzed the practical influences of the variables that are mentioned above. As a procedure of this study, we first integrated survey data of foreign tourists conducted by Korea Culture & Tourism Research Institute, which is stored in the tourist information system from 2013 to 2015, and eliminate unnecessary variables that are inconsistent with the research purpose among the integrated data. Some variables were modified to improve the accuracy of the analysis. And we analyzed the factors affecting the dependent variables by using data-mining methods: decision tree(C5.0, CART, CHAID, QUEST), artificial neural network, and logistic regression analysis of SPSS IBM Modeler 16.0. The seven variables that have the greatest effect on each dependent variable were derived. As a result of data analysis, it was found that seven major variables influencing 'overall satisfaction' were sightseeing spot attraction, food satisfaction, accommodation satisfaction, traffic satisfaction, guide service satisfaction, number of visiting places, and country. Variables that had a great influence appeared food satisfaction and sightseeing spot attraction. The seven variables that had the greatest influence on 'revisit intention' were the country, travel motivation, activity, food satisfaction, best activity, guide service satisfaction and sightseeing spot attraction. The most influential variables were food satisfaction and travel motivation for Korean style. Lastly, the seven variables that have the greatest influence on the 'recommendation intention' were the country, sightseeing spot attraction, number of visiting places, food satisfaction, activity, tour guide service satisfaction and cost. And then the variables that had the greatest influence were the country, sightseeing spot attraction, and food satisfaction. In addition, in order to grasp the influence of each independent variables more deeply, we used R programming to identify the influence of independent variables. As a result, it was found that the food satisfaction and sightseeing spot attraction were higher than other variables in overall satisfaction and had a greater effect than other influential variables. Revisit intention had a higher ${\beta}$ value in the travel motive as the purpose of Korean Wave than other variables. It will be necessary to have a policy that will lead to a substantial revisit of tourists by enhancing tourist attractions for the purpose of Korean Wave. Lastly, the recommendation had the same result of satisfaction as the sightseeing spot attraction and food satisfaction have higher ${\beta}$ value than other variables. From this analysis, we found that 'food satisfaction' and 'sightseeing spot attraction' variables were the common factors to influence three dependent variables that are mentioned above('Overall satisfaction', 'Revisit intention' and 'Recommendation'), and that those factors affected the satisfaction of travel in Korea significantly. The purpose of this study is to examine how to activate foreign tourists in Korea through big data analysis. It is expected to be used as basic data for analyzing tourism data and establishing effective tourism policy. It is expected to be used as a material to establish an activation plan that can contribute to tourism development in Korea in the future.

The Future of Radio and its Role in the Era of Smart Media (스마트미디어 시대 속 라디오의 미래와 역할 고찰)

  • KWON, Youngsung;SONG, Haeryong
    • Trans-
    • /
    • v.1
    • /
    • pp.117-139
    • /
    • 2016
  • Radio, the first broadcasting medium in history, is also the first mobile medium that meets the currently mobile ecology based on mobile communications network. As a result, it is easily approachable to consumers, can easily engage individual consumers, and its program contents have a huge appealing power to individual listeners, allowing it to form intimacy with audiences at the closest distance. However, the listening rating of radio has decreased greatly because it has experienced various changes by many other competitive media such as TV and internet and it has been influenced by relative constant hypothesis. Also, radio now faces a bigger competition due to the emergence of smartphone. In this circumstance, radio showed movements to evolve into a digital radio that presents improved sound, strengthened reception power, and increased number of channels, but it suddenly changed to DMB and portable multimedia DMB is having huge problems in its marketability due to smartphone. Yet, the listening rating of analogue radio broadcasting that remained unchanged was 13.99% in 2014, an increase by 47% from 2011, and the percentage of listeners under the age of 18 increased by 2.4 times from 2011 to 2014, which was a unique and interesting phenomenon. Accordingly, this paper compared the characteristics of internet and radio that have the traits of daily life, information, individuality, participatory, adventurousness, alternative media, expertise, and sound media. The paper then examined the listening method of radio, in which the direct groundwave antenna reception through a vehicular device is the most common form during the use of transportation means. Finally, it sought to investigate the future of radio based on the understanding of the increase in radio listening ratings, especially by comparing it to the characteristics of smart generation that focus on smartphone and the internet The study results demonstrated that entertainment and amusements are attempting at changes while they used to be obtained selectively by the smart generation from fragmentary information. In addition, radio is expected to become an influential medium in the future through its advantages of 'selected information' and reliability. However, considering such possibilities, radio needs to build the expertise and reliability of broadcasting contents much more at the same time as its digitalization, and it will be able to have its own competitiveness by focusing on various experiences and cultural exposures.

  • PDF

A User Profile-based Filtering Method for Information Search in Smart TV Environment (스마트 TV 환경에서 정보 검색을 위한 사용자 프로파일 기반 필터링 방법)

  • Sean, Visal;Oh, Kyeong-Jin;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.97-117
    • /
    • 2012
  • Nowadays, Internet users tend to do a variety of actions at the same time such as web browsing, social networking and multimedia consumption. While watching a video, once a user is interested in any product, the user has to do information searches to get to know more about the product. With a conventional approach, user has to search it separately with search engines like Bing or Google, which might be inconvenient and time-consuming. For this reason, a video annotation platform has been developed in order to provide users more convenient and more interactive ways with video content. In the future of smart TV environment, users can follow annotated information, for example, a link to a vendor to buy the product of interest. It is even better to enable users to search for information by directly discussing with friends. Users can effectively get useful and relevant information about the product from friends who share common interests or might have experienced it before, which is more reliable than the results from search engines. Social networking services provide an appropriate environment for people to share products so that they can show new things to their friends and to share their personal experiences on any specific product. Meanwhile, they can also absorb the most relevant information about the product that they are interested in by either comments or discussion amongst friends. However, within a very huge graph of friends, determining the most appropriate persons to ask for information about a specific product has still a limitation within the existing conventional approach. Once users want to share or discuss a product, they simply share it to all friends as new feeds. This means a newly posted article is blindly spread to all friends without considering their background interests or knowledge. In this way, the number of responses back will be huge. Users cannot easily absorb the relevant and useful responses from friends, since they are from various fields of interest and knowledge. In order to overcome this limitation, we propose a method to filter a user's friends for information search, which leverages semantic video annotation and social networking services. Our method filters and brings out who can give user useful information about a specific product. By examining the existing Facebook information regarding users and their social graph, we construct a user profile of product interest. With user's permission and authentication, user's particular activities are enriched with the domain-specific ontology such as GoodRelations and BestBuy Data sources. Besides, we assume that the object in the video is already annotated using Linked Data. Thus, the detail information of the product that user would like to ask for more information is retrieved via product URI. Our system calculates the similarities among them in order to identify the most suitable friends for seeking information about the mentioned product. The system filters a user's friends according to their score which tells the order of whom can highly likely give the user useful information about a specific product of interest. We have conducted an experiment with a group of respondents in order to verify and evaluate our system. First, the user profile accuracy evaluation is conducted to demonstrate how much our system constructed user profile of product interest represents user's interest correctly. Then, the evaluation on filtering method is made by inspecting the ranked results with human judgment. The results show that our method works effectively and efficiently in filtering. Our system fulfills user needs by supporting user to select appropriate friends for seeking useful information about a specific product that user is curious about. As a result, it helps to influence and convince user in purchase decisions.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

The Pattern Analysis of Financial Distress for Non-audited Firms using Data Mining (데이터마이닝 기법을 활용한 비외감기업의 부실화 유형 분석)

  • Lee, Su Hyun;Park, Jung Min;Lee, Hyoung Yong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.111-131
    • /
    • 2015
  • There are only a handful number of research conducted on pattern analysis of corporate distress as compared with research for bankruptcy prediction. The few that exists mainly focus on audited firms because financial data collection is easier for these firms. But in reality, corporate financial distress is a far more common and critical phenomenon for non-audited firms which are mainly comprised of small and medium sized firms. The purpose of this paper is to classify non-audited firms under distress according to their financial ratio using data mining; Self-Organizing Map (SOM). SOM is a type of artificial neural network that is trained using unsupervised learning to produce a lower dimensional discretized representation of the input space of the training samples, called a map. SOM is different from other artificial neural networks as it applies competitive learning as opposed to error-correction learning such as backpropagation with gradient descent, and in the sense that it uses a neighborhood function to preserve the topological properties of the input space. It is one of the popular and successful clustering algorithm. In this study, we classify types of financial distress firms, specially, non-audited firms. In the empirical test, we collect 10 financial ratios of 100 non-audited firms under distress in 2004 for the previous two years (2002 and 2003). Using these financial ratios and the SOM algorithm, five distinct patterns were distinguished. In pattern 1, financial distress was very serious in almost all financial ratios. 12% of the firms are included in these patterns. In pattern 2, financial distress was weak in almost financial ratios. 14% of the firms are included in pattern 2. In pattern 3, growth ratio was the worst among all patterns. It is speculated that the firms of this pattern may be under distress due to severe competition in their industries. Approximately 30% of the firms fell into this group. In pattern 4, the growth ratio was higher than any other pattern but the cash ratio and profitability ratio were not at the level of the growth ratio. It is concluded that the firms of this pattern were under distress in pursuit of expanding their business. About 25% of the firms were in this pattern. Last, pattern 5 encompassed very solvent firms. Perhaps firms of this pattern were distressed due to a bad short-term strategic decision or due to problems with the enterpriser of the firms. Approximately 18% of the firms were under this pattern. This study has the academic and empirical contribution. In the perspectives of the academic contribution, non-audited companies that tend to be easily bankrupt and have the unstructured or easily manipulated financial data are classified by the data mining technology (Self-Organizing Map) rather than big sized audited firms that have the well prepared and reliable financial data. In the perspectives of the empirical one, even though the financial data of the non-audited firms are conducted to analyze, it is useful for find out the first order symptom of financial distress, which makes us to forecast the prediction of bankruptcy of the firms and to manage the early warning and alert signal. These are the academic and empirical contribution of this study. The limitation of this research is to analyze only 100 corporates due to the difficulty of collecting the financial data of the non-audited firms, which make us to be hard to proceed to the analysis by the category or size difference. Also, non-financial qualitative data is crucial for the analysis of bankruptcy. Thus, the non-financial qualitative factor is taken into account for the next study. This study sheds some light on the non-audited small and medium sized firms' distress prediction in the future.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.