• Title/Summary/Keyword: K5 system

Search Result 31,959, Processing Time 0.07 seconds

A Study on Garden Design Principles in "Sakuteiki(作庭記)" - Focused on the "Fungsu Theory"(風水論) - (「사쿠테이키(作庭記)」의 작정원리 연구 - 풍수론(風水論)을 중심으로 -)

  • Kim, Seung-Yoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.41 no.6
    • /
    • pp.1-19
    • /
    • 2013
  • This study tries to review 'Sakuteiki(作庭記)', the Book of Garden Making, compiled at the end of the 11th Century during the Heian Period of Japan, from the East-Asian perspective. 'Sakuteiki' is a Garden Theory Book, the oldest in the world as well as in Asia, and it contains the traditional knowledge of Japanese ancient garden culture, which originated from the continent(Korea and China). Traditional knowledge related to East-Asian garden culture reviewed in this paper is "Fungsu Theory"(風水, Asian traditional ecology: Fengshui in Chinese; Fusui in Japanese), stemmed from the culture to seek sound and blessed places to live in. Viewed from modern landscape architecture, the Fungsu Theory corresponds to ecology(science). The Fungsu Theory was established around the Han Dynasty of China together with the Yinyangwuxing(陰陽五行) Theory and widely used for making human residences including gardens. It was transmitted to Japan via Korea as well as through direct transaction between Japan and China. This study reinterprets garden design principles represented in Sakuteiki, which were selected in 5 key words according to the Fungsu Theory. The 5 key words for the Fungsu Theory are "the place in harmony of four guardian gods(四神相應地)", "planting trees in the four cardinal directions", "flow of Chi(氣)", "curved line and asymmetry", and "mountain is the king, water is the people". Garden design principles of "the place in harmony of four guardian gods(四神相應地)" and "planting trees in the four cardinal directions" are corresponding to "Myeongdang-ron(明堂論, Theory of propitious site)". The place in harmony of four guardian gods mentioned in Sakuteiki is a landform surrounded by the flow of water to the east, the great path to the west, the pond to the south, and the hill to the north. And the Theory originated from Zhaijing(宅經, Classic of dwelling Sites) of China. According to this principle, the city was planned and as a miniature model, the residence of the aristocrat during the Heian period was made. At the residence the location of the garden surrounded by the four gods(the flow of water, the great path, the pond, and the hill) is the Myeongdang(明堂, the propitious site: Mingtang in Chinese; Meido in Japanese). Sakuteiki explains how to substitute for the four gods by planting trees in the four cardinal directions when they were not given by nature. This way of planting originated from Zhaijing(宅經) and also goes back to Qiminyaoshu (齊民要術), compiled in the 6th Century of China. In this way of planting, the number of trees suggested in Sakuteiki is related to Hetu(河圖) and Luoshu(洛書), which are iconography of Yi(易), the philosophy of change, in ancient China. Such way of planting corresponds to that of Yongdoseo(龍圖墅, the villa based on the principle of Hetu) presented in Sanrimgyeongje (山林經濟), an encyclopedia on agriculture and living in the 17th Century of Korea. And garden design principles of "the flow of Chi(氣)", "curved line and asymmetry" is connected to "Saenggi Theory(生氣論, Theory of vitality)". Sakuteiki explains the right flow of Chi(氣) through the proper flow and the reverse flow of the garden stream and also suggests the curved line of the garden stream, asymmetric arrangement of bridges and stones in the garden, and indented shape of pond edges, which are ways of accumulating Chi(氣) and therefore lead to "Saenggi Theory" of the Fungsu Theory. The last design principle, "mountain is the king, water is the people", is related to "Hyeongguk Theory(形局論, Theory of form)" of the Fungsu Theory. Sakuteiki explains the meaning of garden through a metaphor, which views mountain as king, water as the people, and stones as king's retainers. It compares the situation in which the king governs the people with the help of his retainers to the ecological phenomena in which mountain(earth) controls water with the help of stones. This principle befits "Hyeongguk Theory(形局論, Theory of form)" of the Fungsu Theory which explains landform on the analogy of social systems, people, animals and things. As above, major garden design principles represented in Sakuteiki can be interpreted in the context of the Fungsu Theory, the traditional knowledge system in East Asia. Therefore, we can find the significance of Sakuteiki in that the wisdom of ancient garden culture in East-Asia was integrated in it, although it described the knowhow of a specific garden style in a specific period of Japan.

Image Watermarking for Copyright Protection of Images on Shopping Mall (쇼핑몰 이미지 저작권보호를 위한 영상 워터마킹)

  • Bae, Kyoung-Yul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.147-157
    • /
    • 2013
  • With the advent of the digital environment that can be accessed anytime, anywhere with the introduction of high-speed network, the free distribution and use of digital content were made possible. Ironically this environment is raising a variety of copyright infringement, and product images used in the online shopping mall are pirated frequently. There are many controversial issues whether shopping mall images are creative works or not. According to Supreme Court's decision in 2001, to ad pictures taken with ham products is simply a clone of the appearance of objects to deliver nothing but the decision was not only creative expression. But for the photographer's losses recognized in the advertising photo shoot takes the typical cost was estimated damages. According to Seoul District Court precedents in 2003, if there are the photographer's personality and creativity in the selection of the subject, the composition of the set, the direction and amount of light control, set the angle of the camera, shutter speed, shutter chance, other shooting methods for capturing, developing and printing process, the works should be protected by copyright law by the Court's sentence. In order to receive copyright protection of the shopping mall images by the law, it is simply not to convey the status of the product, the photographer's personality and creativity can be recognized that it requires effort. Accordingly, the cost of making the mall image increases, and the necessity for copyright protection becomes higher. The product images of the online shopping mall have a very unique configuration unlike the general pictures such as portraits and landscape photos and, therefore, the general image watermarking technique can not satisfy the requirements of the image watermarking. Because background of product images commonly used in shopping malls is white or black, or gray scale (gradient) color, it is difficult to utilize the space to embed a watermark and the area is very sensitive even a slight change. In this paper, the characteristics of images used in shopping malls are analyzed and a watermarking technology which is suitable to the shopping mall images is proposed. The proposed image watermarking technology divide a product image into smaller blocks, and the corresponding blocks are transformed by DCT (Discrete Cosine Transform), and then the watermark information was inserted into images using quantization of DCT coefficients. Because uniform treatment of the DCT coefficients for quantization cause visual blocking artifacts, the proposed algorithm used weighted mask which quantizes finely the coefficients located block boundaries and coarsely the coefficients located center area of the block. This mask improves subjective visual quality as well as the objective quality of the images. In addition, in order to improve the safety of the algorithm, the blocks which is embedded the watermark are randomly selected and the turbo code is used to reduce the BER when extracting the watermark. The PSNR(Peak Signal to Noise Ratio) of the shopping mall image watermarked by the proposed algorithm is 40.7~48.5[dB] and BER(Bit Error Rate) after JPEG with QF = 70 is 0. This means the watermarked image is high quality and the algorithm is robust to JPEG compression that is used generally at the online shopping malls. Also, for 40% change in size and 40 degrees of rotation, the BER is 0. In general, the shopping malls are used compressed images with QF which is higher than 90. Because the pirated image is used to replicate from original image, the proposed algorithm can identify the copyright infringement in the most cases. As shown the experimental results, the proposed algorithm is suitable to the shopping mall images with simple background. However, the future study should be carried out to enhance the robustness of the proposed algorithm because the robustness loss is occurred after mask process.

The Present Status and a Proposal of the Prospective Measures for Parasitic Diseases Control in Korea (우리나라 기생충병관리의 현황(現況)과 효율적방안에 관(關)한 연구(硏究))

  • Loh, In-Kyu
    • Journal of Preventive Medicine and Public Health
    • /
    • v.3 no.1
    • /
    • pp.1-16
    • /
    • 1970
  • The present status of control measures for public health important helminthic infections in Korea was surveyed in 1969 and the following results were obtained. The activities of parasitic examination and Ascaris treatment for the positives which were done during 1966 to 1969 were brought in poor result and could not decrease the infection rate. It is needed to improve or strengthen the activities. The mass treatment activities for paragonimiasis and clonorchiasis in the areas which were designated by the Ministry of Health were carried out during 1965 to 1968 with no good results in decrease of estimated number of the patients. There were too many pharmaceutical companies where many kinds of anthelmintics were produced. It may be better to reduce the number of anthelmintics produced and control the quality. The human feces, the most important source of helminthic infections, was generally not treated in sanitary ways because of the poor sewerage system and no sewage treatment plant in urban areas and insanitary latrines in rural areas. The field soils of 170 specimens were collected from 34 areas out of 55 urban and tourist areas where night soil has been prohibited by a regulation to be used as a fertilizer, and examined for parasites contamination with the result of Ascaris egg detection in 44%. Some kinds of vegetables of 64 specimens each from the supply agents of parasite free vegetables and general markets were collected and examined for parasites contamination with the results of Ascaris egg detection in 25% and 36% respectively. The parasite control activities and the ability of parasitological examination techniques in the health centers of the country were not satisfactory. The budget of the Ministry of Health for the parasite control was very poor. The actual expenditure needed for cellophane thick smear technique was 8 Won per a specimen. As a principle the control of helminthic infections might be led toward breaking the chain of events in the life cycle of the prasites and eliminating environmental and host factors concerned with the infections, and the following methods nay be pointed out. 1) Mass treatment might be done to eliminate human reservoirs of an infection. 2) Animal reservoirs which are related with human infections night be eliminated. 3) The excretes of reservoirs, particularly human feces, should be treated in sanitary ways by the means of sanitary sewerage system and sewage treatment plant in urban areas and sanitary latrines such as waterborne latrine, aqua privy and pit latrine in rural areas. The increase of national economical development and prohibition of the habit of using night soils as a fertilizer might be very important factors to achieve the purpose. 4) The control of vehicles and intermediate hosts might be done by the means of prohibition of soil contamination with parasites, food sanitation, insect control and snail control. 5) The improvement of insanitary attitudes and bad habits which are related with parasitic infections night be done by the means of prohibition of habit of using night soils as a fertilizer, and improving eating habits and personal hygiene. 6) Chemoprophylactic measure and vaccination may be effective to prevent the infections or the development of a parasite to adult in the bodies when the bodies were invaded by parasites. Further studies and development of this kind of measures are needed.

  • PDF

Attitude Confidence and User Resistance for Purchasing Wearable Devices on Virtual Reality: Based on Virtual Reality Headgears (가상현실 웨어러블 기기의 구매 촉진을 위한 태도 자신감과 사용자 저항 태도: 가상현실 헤드기어를 중심으로)

  • Sohn, Bong-Jin;Park, Da-Sul;Choi, Jaewon
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.165-183
    • /
    • 2016
  • Over the past decade, there has been a rapid diffusion of technological devices and a rising number of various devices, resulting in an escalation of virtual reality technology. Technological market has rapidly been changed from smartphone to wearable devices based on virtual reality. Virtual reality can make users feel real situation through sensing interaction, voice, motion capture and so on. Facebook.com, Google, Samsung, LG, Sony and so on have investigated developing platform of virtual reality. the pricing of virtual reality devices also had decreased into 30% from their launched period. Thus market infrastructure in virtual reality have rapidly been developed to crease marketplace. However, most consumers recognize that virtual reality is not ease to purchase or use. That could not lead consumers to positive attitude for devices and purchase the related devices in the early market. Through previous studies related to virtual reality, there are few studies focusing on why the devices for virtual reality stayed in early stage in adoption & diffusion context in the market. Almost previous studies considered the reasons of hard adoption for innovative products in the viewpoints of Typology of Innovation Resistance, MIR(Management of Innovation Resistant), UTAUT & UTAUT2. However, product-based antecedents also important to increase user intention to purchase and use products in the technological market. In this study, we focus on user acceptance and resistance for increasing purchase and usage promotions of wearable devices related to virtual reality based on headgear products like Galaxy Gear. Especially, we added a variables like attitude confidence as a dimension for user resistance. The research questions of this study are follows. First, how attitude confidence and innovativeness resistance affect user intention to use? Second, What factors related to content and brand contexts can affect user intention to use? This research collected data from the participants who have experiences using virtual rality headgears aged between 20s to 50s located in South Korea. In order to collect data, this study used a pilot test and through making face-to-face interviews on three specialists, face validity and content validity were evaluated for the questionnaire validity. Cleansing the data, we dropped some outliers and data of irrelevant papers. Totally, 156 responses were used for testing the suggested hypotheses. Through collecting data, demographics and the relationships among variables were analyzed through conducting structural equation modeling by PLS. The data showed that the sex of respondents who have experience using social commerce sites (male=86(55.1%), female=70(44.9%). The ages of respondents are mostly from 20s (74.4%) to 30s (16.7%). 126 respondents (80.8%) have used virtual reality devices. The results of our model estimation are as follows. With the exception of Hypothesis 1 and 7, which deals with the two relationships between brand awareness to attitude confidence, and quality of content to perceived enjoyment, all of our hypotheses were supported. In compliance with our hypotheses, perceived ease of use (H2) and use innovativeness (H3) were supported with its positively influence for the attitude confidence. This finding indicates that the more ease of use and innovativeness for devices increased, the more users' attitude confidence increased. Perceived price (H4), enjoyment (H5), Quantity of contents (H6) significantly increase user resistance. However, perceived price positively affect user innovativeness resistance meanwhile perceived enjoyment and quantity of contents negatively affect user innovativeness resistance. In addition, aesthetic exterior (H6) was also positively associated with perceived price (p<0.01). Also projection quality (H8) can increase perceived enjoyment (p<0.05). Finally, attitude confidence (H10) increased user intention to use virtual reality devices. however user resistance (H11) negatively affect user intention to use virtual reality devices. The findings of this study show that attitude confidence and user innovativeness resistance differently influence customer intention for using virtual reality devices. There are two distinct characteristic of attitude confidence: perceived ease of use and user innovativeness. This study identified the antecedents of different roles of perceived price (aesthetic exterior) and perceived enjoyment (quality of contents & projection quality). The findings indicated that brand awareness and quality of contents for virtual reality is not formed within virtual reality market yet. Therefore, firms should developed brand awareness for their product in the virtual market to increase market share.

Feasibility of Deep Learning Algorithms for Binary Classification Problems (이진 분류문제에서의 딥러닝 알고리즘의 활용 가능성 평가)

  • Kim, Kitae;Lee, Bomi;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.95-108
    • /
    • 2017
  • Recently, AlphaGo which is Bakuk (Go) artificial intelligence program by Google DeepMind, had a huge victory against Lee Sedol. Many people thought that machines would not be able to win a man in Go games because the number of paths to make a one move is more than the number of atoms in the universe unlike chess, but the result was the opposite to what people predicted. After the match, artificial intelligence technology was focused as a core technology of the fourth industrial revolution and attracted attentions from various application domains. Especially, deep learning technique have been attracted as a core artificial intelligence technology used in the AlphaGo algorithm. The deep learning technique is already being applied to many problems. Especially, it shows good performance in image recognition field. In addition, it shows good performance in high dimensional data area such as voice, image and natural language, which was difficult to get good performance using existing machine learning techniques. However, in contrast, it is difficult to find deep leaning researches on traditional business data and structured data analysis. In this study, we tried to find out whether the deep learning techniques have been studied so far can be used not only for the recognition of high dimensional data but also for the binary classification problem of traditional business data analysis such as customer churn analysis, marketing response prediction, and default prediction. And we compare the performance of the deep learning techniques with that of traditional artificial neural network models. The experimental data in the paper is the telemarketing response data of a bank in Portugal. It has input variables such as age, occupation, loan status, and the number of previous telemarketing and has a binary target variable that records whether the customer intends to open an account or not. In this study, to evaluate the possibility of utilization of deep learning algorithms and techniques in binary classification problem, we compared the performance of various models using CNN, LSTM algorithm and dropout, which are widely used algorithms and techniques in deep learning, with that of MLP models which is a traditional artificial neural network model. However, since all the network design alternatives can not be tested due to the nature of the artificial neural network, the experiment was conducted based on restricted settings on the number of hidden layers, the number of neurons in the hidden layer, the number of output data (filters), and the application conditions of the dropout technique. The F1 Score was used to evaluate the performance of models to show how well the models work to classify the interesting class instead of the overall accuracy. The detail methods for applying each deep learning technique in the experiment is as follows. The CNN algorithm is a method that reads adjacent values from a specific value and recognizes the features, but it does not matter how close the distance of each business data field is because each field is usually independent. In this experiment, we set the filter size of the CNN algorithm as the number of fields to learn the whole characteristics of the data at once, and added a hidden layer to make decision based on the additional features. For the model having two LSTM layers, the input direction of the second layer is put in reversed position with first layer in order to reduce the influence from the position of each field. In the case of the dropout technique, we set the neurons to disappear with a probability of 0.5 for each hidden layer. The experimental results show that the predicted model with the highest F1 score was the CNN model using the dropout technique, and the next best model was the MLP model with two hidden layers using the dropout technique. In this study, we were able to get some findings as the experiment had proceeded. First, models using dropout techniques have a slightly more conservative prediction than those without dropout techniques, and it generally shows better performance in classification. Second, CNN models show better classification performance than MLP models. This is interesting because it has shown good performance in binary classification problems which it rarely have been applied to, as well as in the fields where it's effectiveness has been proven. Third, the LSTM algorithm seems to be unsuitable for binary classification problems because the training time is too long compared to the performance improvement. From these results, we can confirm that some of the deep learning algorithms can be applied to solve business binary classification problems.

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

A Study of Factors Associated with Software Developers Job Turnover (데이터마이닝을 활용한 소프트웨어 개발인력의 업무 지속수행의도 결정요인 분석)

  • Jeon, In-Ho;Park, Sun W.;Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.191-204
    • /
    • 2015
  • According to the '2013 Performance Assessment Report on the Financial Program' from the National Assembly Budget Office, the unfilled recruitment ratio of Software(SW) Developers in South Korea was 25% in the 2012 fiscal year. Moreover, the unfilled recruitment ratio of highly-qualified SW developers reaches almost 80%. This phenomenon is intensified in small and medium enterprises consisting of less than 300 employees. Young job-seekers in South Korea are increasingly avoiding becoming a SW developer and even the current SW developers want to change careers, which hinders the national development of IT industries. The Korean government has recently realized the problem and implemented policies to foster young SW developers. Due to this effort, it has become easier to find young SW developers at the beginning-level. However, it is still hard to recruit highly-qualified SW developers for many IT companies. This is because in order to become a SW developing expert, having a long term experiences are important. Thus, improving job continuity intentions of current SW developers is more important than fostering new SW developers. Therefore, this study surveyed the job continuity intentions of SW developers and analyzed the factors associated with them. As a method, we carried out a survey from September 2014 to October 2014, which was targeted on 130 SW developers who were working in IT industries in South Korea. We gathered the demographic information and characteristics of the respondents, work environments of a SW industry, and social positions for SW developers. Afterward, a regression analysis and a decision tree method were performed to analyze the data. These two methods are widely used data mining techniques, which have explanation ability and are mutually complementary. We first performed a linear regression method to find the important factors assaociated with a job continuity intension of SW developers. The result showed that an 'expected age' to work as a SW developer were the most significant factor associated with the job continuity intention. We supposed that the major cause of this phenomenon is the structural problem of IT industries in South Korea, which requires SW developers to change the work field from developing area to management as they are promoted. Also, a 'motivation' to become a SW developer and a 'personality (introverted tendency)' of a SW developer are highly importantly factors associated with the job continuity intention. Next, the decision tree method was performed to extract the characteristics of highly motivated developers and the low motivated ones. We used well-known C4.5 algorithm for decision tree analysis. The results showed that 'motivation', 'personality', and 'expected age' were also important factors influencing the job continuity intentions, which was similar to the results of the regression analysis. In addition to that, the 'ability to learn' new technology was a crucial factor for the decision rules of job continuity. In other words, a person with high ability to learn new technology tends to work as a SW developer for a longer period of time. The decision rule also showed that a 'social position' of SW developers and a 'prospect' of SW industry were minor factors influencing job continuity intensions. On the other hand, 'type of an employment (regular position/ non-regular position)' and 'type of company (ordering company/ service providing company)' did not affect the job continuity intension in both methods. In this research, we demonstrated the job continuity intentions of SW developers, who were actually working at IT companies in South Korea, and we analyzed the factors associated with them. These results can be used for human resource management in many IT companies when recruiting or fostering highly-qualified SW experts. It can also help to build SW developer fostering policy and to solve the problem of unfilled recruitment of SW Developers in South Korea.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

A Study of The Medical Classics in the '$\bar{A}yurveda$' ('아유르베다'($\bar{A}yurveda$)의 의경(醫經)에 관한 연구)

  • Kim, Ki-Wook;Park, Hyun-Kuk;Seo, Ji-Young
    • Journal of Korean Medical classics
    • /
    • v.20 no.4
    • /
    • pp.91-117
    • /
    • 2007
  • Through a simple study of the medical classics in the '$\bar{A}yurveda$', we have summarized them as follows. 1) Traditional Indian medicine started in the Ganges river area at about 1500 B. C. E. and traces of medical science can be found in the "Rigveda" and "Atharvaveda". 2) The "Charaka" and "$Su\acute{s}hruta$(妙聞集)", ancient texts from India, are not the work of one person, but the result of the work and errors of different doctors and philosophers. Due to the lack of historical records, the time of Charaka or $Su\acute{s}hruta$(妙聞)s' lives are not exactly known. So the completion of the "Charaka" is estimated at 1st${\sim}$2nd century C. E. in northwestern India, and the "$Su\acute{s}hruta$" is estimated to have been completed in 3rd${\sim}$4th century C. E. in central India. Also, the "Charaka" contains details on internal medicine, while the "$Su\acute{s}hruta$" contains more details on surgery by comparison. 3) '$V\bar{a}gbhata$', one of the revered Vriddha Trayi(triad of the ancients, 三醫聖) of the '$\bar{A}yurveda$', lived and worked in about the 7th century and wrote the "$A\d{s}\d{t}\bar{a}nga$ $A\d{s}\d{t}\bar{a}nga$ $h\d{r}daya$ $sa\d{m}hit\bar{a}$ $samhit\bar{a}$(八支集)" and "$A\d{s}\d{t}\bar{a}nga$ Sangraha $samhit\bar{a}$(八心集)", where he tried to compromise and unify the "Charaka" and "$Su\acute{s}hruta$". The "$A\d{s}\d{t}\bar{a}nga$ Sangraha $samhit\bar{a}$" was translated into Tibetan and Arabic at about the 8th${\sim}$9th century, and if we generalize the medicinal plants recorded in each the "Charaka", "$Su\acute{s}hruta$" and the "$A\d{s}\d{t}\bar{a}nga$ Sangraha $samhit\bar{a}$", there are 240, 370, 240 types each. 4) The 'Madhava' focused on one of the subjects of Indian medicine, '$Nid\bar{a}na$' ie meaning "the cause of diseases(病因論)", and in one of the copies found by Bower in 4th century C. E. we can see that it uses prescriptions from the "BuHaLaJi(布哈拉集)", "Charaka", "$Su\acute{s}hruta$". 5) According to the "Charaka", there were 8 branches of ancient medicine in India : treatment of the body(kayacikitsa), special surgery(salakya), removal of alien substances(salyapahartka), treatment of poison or mis-combined medicines(visagaravairodhikaprasamana), the study of ghosts(bhutavidya), pediatrics(kaumarabhrtya), perennial youth and long life(rasayana), and the strengthening of the essence of the body(vajikarana). 6) The '$\bar{A}yurveda$', which originated from ancient experience, was recorded in Sanskrit, which was a theorization of knowledge, and also was written in verses to make memorizing easy, and made medicine the exclusive possession of the Brahmin. The first annotations were 1060 for the "Charaka", 1200 for the "$Su\acute{s}hruta$", 1150 for the "$A\d{s}\d{t}\bar{a}nga$ Sangraha $samhit\bar{a}$", and 1100 for the "$Nid\bar{a}na$", The use of various mineral medicines in the "Charaka" or the use of mercury as internal medicine in the "$A\d{s}\d{t}\bar{a}nga$ Sangraha $samhit\bar{a}$", and the palpation of the pulse for diagnosing in the '$\bar{A}yurveda$' and 'XiZhang(西藏)' medicine are similar to TCM's pulse diagnostics. The coexistence with Arabian 'Unani' medicine, compromise with western medicine and the reactionism trend restored the '$\bar{A}yurveda$' today. 7) The "Charaka" is a book inclined to internal medicine that investigates the origin of human disease which used the dualism of the 'Samkhya', the natural philosophy of the 'Vaisesika' and the logic of the 'Nyaya' in medical theories, and its structure has 16 syllables per line, 2 lines per poem and is recorded in poetry and prose. Also, the "Charaka" can be summarized into the introduction, cause, judgement, body, sensory organs, treatment, pharmaceuticals, and end, and can be seen as a work that strongly reflects the moral code of Brahmin and Aryans. 8) In extracting bloody pus, the "Charaka" introduces a 'sharp tool' bloodletting treatment, while the "$Su\scute{s}hruta$" introduces many surgical methods such as the use of gourd dippers, horns, sucking the blood with leeches. Also the "$Su\acute{s}hruta$" has 19 chapters specializing in ophthalmology, and shows 76 types of eye diseases and their treatments. 9) Since anatomy did not develop in Indian medicine, the inner structure of the human body was not well known. The only exception is 'GuXiangXue(骨相學)' which developed from 'Atharvaveda' times and the "$A\d{s}\d{t}\bar{a}nga$ Sangraha $samhit\bar{a}$". In the "$A\d{s}\d{t}\bar{a}nga$ Sangraha $samhit\bar{a}$"'s 'ShenTiLun(身體論)' there is a thorough listing of the development of a child from pregnancy to birth. The '$\bar{A}yurveda$' is not just an ancient traditional medical system but is being called alternative medicine in the west because of its ability to supplement western medicine and, as its effects are being proved scientifically it is gaining attention worldwide. We would like to say that what we have researched is just a small fragment and a limited view, and would like to correct and supplement any insufficient parts through more research of new records.

  • PDF