• Title/Summary/Keyword: Systems Performance

Search Result 25,976, Processing Time 0.054 seconds

Effects of Auxin and Fog Treatments on the Green-Wood Cutting of the Mature Trees in Prunus yedoensis (왕벚나무 성숙목의 녹지삽목에서 Auxin 및 Fog 처리 효과)

  • Kim, Chang-Soo;Kim, Zin-Suh
    • Journal of Korean Society of Forest Science
    • /
    • v.96 no.6
    • /
    • pp.676-683
    • /
    • 2007
  • In an attempt to develop an efficient method for the propagation of mature Prunus yedoensis (45 to 55 years old), green wood cuttings from normal branch and sprouts branch were treated with three different kinds of auxin (Rootone < 1-naphthylacetamide 0.4% >, IBA 100 ppm, and control and two different kinds of fog systems (0.9 L/min. and 0.54 L/min.). The Rootone treatment showed higher values in the percentage of rootings (PR) and the mean number of roots per cutting (NR) than the IBA treatment in the early stage. However, in the late stage, the values of PR and NR in the Rootone treatment become lower than those in the IBA 100 ppm treatment. On the other hand, root development ceased 62 days after taking cuttings for all of the treatments. The IBA 100 ppm treatment showed the best performance in root development (PR= 89.5%, NR = 6.5, LR=6.4 cm). The values of PR (76.5%) and NR (6.4) in the 0.9 L/min. of fog treatment was higher than those (PR = 71.7% and NR = 5.4) in the 0.54 L/min. of fog treatment. The cuttings from sprouts (PR: 74.8%, NR: 5.9, LR: 5.7 cm) showed slightly better performance in rooting rate that the cuttings from shoots (PR : 73.3%, NR: 5.9, LR: 5.4 cm). Statistically significant interactions were presented among most of the different combinations of three factors (auxin treatments, fog treatments, and types of cuttings). The PR showed the highest value of 98.0% in the combination of cuttings of shoots+IBA 100 ppm+0.54 L/min. fog treatments. In case of NR, the cuttings from normal branch showed a higher value than the cuttings from sprouts branch under the fog treatment of 0.9 L/min., while this tendency was reversed under the fog treatment of 0.54 L/min.. The perigon development of roots, which reflects the number and the direction of roots, was best in the IBA treatment (85.6%).

A Study on the UIC(University & Industry Collaboration) Model for Global New Business (글로벌 사업 진출을 위한 산학협력 협업촉진모델: 경남 G대학 GTEP 사업 실험사례연구)

  • Baek, Jong-ok;Park, Sang-hyeok;Seol, Byung-moon
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.10 no.6
    • /
    • pp.69-80
    • /
    • 2015
  • This can be promoted collaboration environment for the system and the system is very important for competitiveness, it is equipped. If so, could work in collaboration with members of the organization to promote collaboration what factors? Organizational collaboration and cooperation of many people working, or worth pursuing common goals by sharing information and processes to improve labor productivity, defined as collaboration. Factors that promote collaboration are shared visions, the organization's principles and rules that reflect the visions, on-line system developments, and communication methods. First, it embodies the vision shared by the more sympathetic members are active and voluntary participation in the activities of the organization can be achieved. Second, the members are aware of all the rules and principles of a united whole is accepted and leads to good performance. In addition, the ability to share sensitive business activities for self-development and also lead to work to make this a regular activity to create a team that can collaborate to help the environment and the atmosphere. Third, a systematic construction of the online collaboration system is made efficient and rapid task. According to Student team and A corporation we knew that Cloud services and social media, low-cost, high-efficiency services could achieve. The introduction of the latest information technology changes, the members of the organization's systems and active participation can take advantage of continuing education must be made. Fourth, the company to inform people both inside and outside of the organization to communicate actively to change the image of the company activities, the creation of corporate performance is very important to figure. Reflects the latest trend to actively use social media to communicate the effort is needed. For development of systematic collaboration promoting model steps to meet the organizational role. First, the Chief Executive Officer to make a firm and clear vision of the organization members to propagate the faith, empathy gives a sense of belonging should be able to have. Second, middle managers, CEO's vision is to systematically propagate the organizers rules and principles to establish a system would create. Third, general operatives internalize the vision of the company stating that the role of outside companies must adhere. The purpose of this study was well done in collaboration organizations promoting factors for strategic alignment model based on the golden circle and collaboration to understand and reflect the latest trends in information technology tools to take advantage of smart work and business know how student teams through case analysis will derive the success factors. This is the foundation for future empirical studies are expected to be present.

  • PDF

Design of Ultrasonic Nebulizer for Inhalation Toxicology Study of Cadmium with Application of Engineering Methodology and Performance Evaluation with Light-Scattering Photometer (공학적 기법을 응용한 카드뮴의 흡입독성 연구를 위한 초음파 네뷸라이져의 설계 그리고 광산란 광도계를 이용한 성능평가)

  • Jeung Jae Yeal;Milton Donald K.;Kim Tae Hyeung;Lee Jong Young;Chong Myoung Soo;Ko Kwang Jae;Kim Sang Duck;Kang Sung Ho;Song Young Sun;Lee Ki Nam
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.16 no.3
    • /
    • pp.464-471
    • /
    • 2002
  • Author applied several engineering methodologies to classical ultrasonic nebulizer to cope with it's demerits. After several trials and errors, we got the several meaningful results. To evaluate the modified ultrasonic nebulizer for inhalation toxicology of cadmium, author used light-scattering photometer. This paper is the one part of inhalation exposure systems for inhalation toxicology study of cadmium. According to the testing conditions, source temperature 50℃ and inlet-duct band temperature 150℃, aerosol generation results for sodium chloride and cadmium chloride were as followings: Coefficients of variation(CV) of sodium chloride and cadmium chloride for repeated trials were 3.38 and 4.77 for 10g, 2.47 and 5.02 for 5g, and 4.70 and 2.98 for 2.5g. All the CVs were within 10% of acceptance variability. Count Per Minute(CPM) changes of NaCl and CdCl₂ for 5 repeated trials were similar. CPM ratios of CdCl₂/NaCl were 1.13 for 10g, 0.76 for 5g, and 1.06 for 2.5g. Relative aerosol generation of cadmium chloride to sodium chloride was the highest in 10g. Efficiency increases of 24.50% for 5g NaCl, 14.91 % for 2.5g NaCl, and 16.48% for 2.5g CdCl₂ with respect to theoretical efficiency were observed but 0.04% efficiency decrease was observed in 5g CdC₂. According to the modifications of source temperature(20, 50, 70℃) and inlet-duct band temperature(20, 50, 100, 150, 200℃), aerosol generation results for NaCl and CdCl₂ were as followings: CPM trends for each quantity excepting 10g NaCl in inlet-duct band temperature 200℃ were similar, and the highest CPM was observed in source temperature 70℃ to each inlet-duct band temperature. The highest CPMs to 10, 5, and 2.5g NaCl were observed in source temperature 70℃ and inlet-duct band temperature 20℃. Aerosol generation of cadmium chloride was increased with the higher source temperature, excepting inlet-duct band temperature 200℃. The highest CPMs for 10, 5, and 2.5g CdCl₂ were observed in source temperature 70℃ and inlet-duct band temperature 20℃, and this trend was similar to NaCl aerosol generation The highest CPMs for 10, 5, and 2.5g CdCl₂ were observed in source temperature 70℃ and inlet-duct band temperature 20℃, and this result was similar to NaCl aerosol generation. Observed efficiencies of 5 and 2.5g NaCl were similar to ifs theoretical efficiency but -3.08% efficiency decrease of 5g CdCl₂, 17.47% efficiency increase of 2.5g CdCl₂ were observed. CPM ratio of CdCl₂/NaCl of 10g was different to 5 and 2.5g, and 2.5g ratio was higher than 5g ratio. In conclusion, to get maximum aerosol generation for NaCl and CdCl₂ will be the conditions that set the appropriate inlet-duct band temperature for each materials and increase the source temperature. Sodium chloride can be used to evaluate the performance and predict the concentration for cadmium aerosol in aerosol generator and inhalation exposure system.

Independent Production Routines and Environmental Changes In 'Comprehensive Programming Television Channels' in Korea Focusing on Interviews with Independent Producers, Broadcast Writers and Individuals Involved with the TV Channels (종합편성채널의 독립제작 환경과 관행에 관한 연구 독립PD, 작가 및 종합편성채널 관계자 심층인터뷰를 중심으로)

  • Choi, Sun Young;Han, Hee Jeong
    • Korean journal of communication and information
    • /
    • v.73
    • /
    • pp.56-91
    • /
    • 2015
  • This study examined changes in the independent production environment in the perspectives from flexible specialization of labor and media routines since January 2011, when comprehensive programming television channels (JTBC, MBN, Channel A, TV Chosun) emerged in Korea. In-depth interviews were conducted with thirteen individuals, including producers from independent production companies, broadcast writers, and individuals involved with these TV channels. The interview results indicated that a flexible specialization production system had been established by the comprehensive programming channels. This means that they were heavily dependent on independent producers, except in relations to their own news programs. Moreover, it was identified that the production of diverse programs could be difficult due to absurd contract practices such as those related to TV ratings and performance systems. Second, these channels have implemented some positive changes such as the payment of higher production costs and an incentive system, compared to terrestrial TV stations. However, the incentive system also helps to aggravate internal competition in the channel and also instigate contract competitions among independent companies, which can eventually result in the channels for holding exclusive rights to certain content and, hence, unfair business practices. Third, as a result of the newspaper and broadcast cross-owenership system of the comprehensive programming channels, hierarchical independent production practices can be established under the influence of newspaper proprietors and executives or managers who have previously worked for newspapers. Lastly, as a result of interviews with independent producers and individuals involved with the TV channels concerning the awareness of comprehensive programming channels, it could not be ascertained whether it is difficult to produce programs dealing with diverse items and genres, because programming autonomy has been distorted by capital or the advertisement market. In this circumstance, it is not surprising that some comprehensive programming channels mentioned that they prioritize profit and performance in programming. In conclusion, it is absolutely imperative that complementary and legal measures be implemented institutionally in order to redress the existing systematic dysfunctional routines in the independent productions of the comprehensive programming TV channels in Korea.

  • PDF

A Study of Quality Control of Nuclear Medicine Counting System and Gamma Camera (핵의학 계측기기 및 감마카메라의 정도관리 연구)

  • 손혜경;김희중;정해조;정하규;이종두;유형식
    • Progress in Medical Physics
    • /
    • v.12 no.2
    • /
    • pp.103-112
    • /
    • 2001
  • Purpose: The purpose of this study was to investigate the current status of performing nuclear medicine quality control in korea and to test selected protocols of quality control of nuclear medicine counting system and gamma camera. Materials and Methods: Fifty three hospitals were included to investigate the current status of nuclear medicine quality control in korea. The precision of dose calibrator and thyroid uptake system was measured with Tc-99m 35.52 MBq for 2 minuets and Tc-99m 5.14 MBq for 10 sec every one minute, respectively. The sensitivity of CeraSPECT$^{TM}$ with low energy high resolution parallel hole collimator was measured using two cylindrical phantoms with 15 cm in diameter and 12 cm and 30 cm in heights containing Tc-99m. The correction factor for sensitivity of CeraSPECT$^{TM}$ was calculated using phantom data. The system planar sensitivity, uniformity, count rate and spatial resolution were measured for Varicam gamma camera with low energy high resolution parallel hole collimator using 140 keV centered 20% energy window, 256$\times$256 or 512$\times$512 matrix sizes. Results: The quality control of dose calibrator and well counter were showed poor performance status. On the other hand, The quality control of gamma camera and other systems were showed relatively good performance status. The results of precision of dose calibrator and thyroid uptake system was $\pm$1.4%(<$\pm$5%) and chi^2=29.7(>16.92), respectively. It showed that the sensitivity of CeraSPECT$^{TM}$ was higher in center slices compared with the edge slices. After correction of nonuniform sensitivities for patient data, it showed better results compare with prior to correction. System planar sensitivity of Varicam gamma camera was 4.39 CPM/MBq. The observed count rate at 20% loss was 102,407 counts/sec (head 1), 113,427 counts/sec (head 2), when input count rate was 81,926 counts/sec (head 1), 90,741 counts/sec (head 2). The spatial resolution without scatter medium were 8.16 mm of FWHM and 14.85 mm of FWTM. The spatial resolution with scatter medium were 8.87 mm of FWHM and 18.87 mm of FWTM. Conclusion: It is necessary to understand the importance of quality control and to perform quality control of nuclear medicine devices.vices.

  • PDF

Participant Characteristic and Educational Effects for Cyber Agricultural Technology Training Courses (사이버농업기술교육 참가자의 특성과 교육효과)

  • Kang, Dae-Koo
    • Journal of Agricultural Extension & Community Development
    • /
    • v.21 no.1
    • /
    • pp.35-82
    • /
    • 2014
  • It was main objectives to find the learners characteristics and educational effects of cyber agricultural technology courses in RDA. For the research, it was followed by literature reviews and internet based survey methods. In internet based survey, two staged stratified sampling method was adopted from cyber training members database in RDA along with some key word as open course or certificate course, and enrollment years. Instrument was composed through literature reviews about cyber education effects and educational effect factors. And learner characteristics items were added in survey documents. It was sent to sampled persons by e-mail and 316 data was returned via google survey systems. Through the data cleaning, 303 data were analysed by chi-square, t-test and F-test. It's significance level was .05. The results of the research were as followed; First, the respondent was composed of mainly man(77.9%), and monthly income group was mainly 2,000,000 or 3,000,000 won(24%), bachelor degree(48%), fifty or forty age group was shared to 75%, and their job was changed after learning(12.2%). So major respondents' job was not changed. Their major was not mainly agriculture. Learners' learning style were composed of two or more types as concrete-sequential, mixing, abstract-random, so e-learning course should be developed for the students' type. Second, it was attended at 3.2 days a week, 53.53 minutes a class, totally 172.63 minutes a week. They were very eager or generally eager to study, and attended two or more subjects. The cyber education motives was for farming knowledge, personal competency development, job performance enlarging. They selected subjects along with their interest. A subject person couldn't choose more subjects for little time, others, non interesting subject, but more subject persons were for job performance benefits and previous subjects effectiveness. Most learner was finished their subject, but a fourth was not finished for busy (26.7%). And their entrying behavior was not enough to learn e-course and computer or internet using ability was middle level as software using. And they thought RDA cyber course was comfort in non time or space limit, knowledge acquisition, and personal competency development. Cyber learning group was composed of open course only (12.5%), certificate only(25.7%), both(36.3%). Third, satisfaction and academic achievement of e-learning learners were good, and educational service offering for doing job in learning application category was good, but effect of cyber education was not good, especially, agricultural income increasing was not good because major learner group was not farmer, so they couldn't apply their knowledge to farming. And content structure and design, content comprehension, content amount were good. The more learning subject group responded to good in effects, and both open course and certificate course group satisfied more than open course only group. Based on the results, recommendation was offered as cyber course specialization before main course in RDA training system, support staff and faculty enlargement, building blended learning system with local RDA office, introducing cyber tutor system.

A Study on Forecasting Accuracy Improvement of Case Based Reasoning Approach Using Fuzzy Relation (퍼지 관계를 활용한 사례기반추론 예측 정확성 향상에 관한 연구)

  • Lee, In-Ho;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.67-84
    • /
    • 2010
  • In terms of business, forecasting is a work of what is expected to happen in the future to make managerial decisions and plans. Therefore, the accurate forecasting is very important for major managerial decision making and is the basis for making various strategies of business. But it is very difficult to make an unbiased and consistent estimate because of uncertainty and complexity in the future business environment. That is why we should use scientific forecasting model to support business decision making, and make an effort to minimize the model's forecasting error which is difference between observation and estimator. Nevertheless, minimizing the error is not an easy task. Case-based reasoning is a problem solving method that utilizes the past similar case to solve the current problem. To build the successful case-based reasoning models, retrieving the case not only the most similar case but also the most relevant case is very important. To retrieve the similar and relevant case from past cases, the measurement of similarities between cases is an important key factor. Especially, if the cases contain symbolic data, it is more difficult to measure the distances. The purpose of this study is to improve the forecasting accuracy of case-based reasoning approach using fuzzy relation and composition. Especially, two methods are adopted to measure the similarity between cases containing symbolic data. One is to deduct the similarity matrix following binary logic(the judgment of sameness between two symbolic data), the other is to deduct the similarity matrix following fuzzy relation and composition. This study is conducted in the following order; data gathering and preprocessing, model building and analysis, validation analysis, conclusion. First, in the progress of data gathering and preprocessing we collect data set including categorical dependent variables. Also, the data set gathered is cross-section data and independent variables of the data set include several qualitative variables expressed symbolic data. The research data consists of many financial ratios and the corresponding bond ratings of Korean companies. The ratings we employ in this study cover all bonds rated by one of the bond rating agencies in Korea. Our total sample includes 1,816 companies whose commercial papers have been rated in the period 1997~2000. Credit grades are defined as outputs and classified into 5 rating categories(A1, A2, A3, B, C) according to credit levels. Second, in the progress of model building and analysis we deduct the similarity matrix following binary logic and fuzzy composition to measure the similarity between cases containing symbolic data. In this process, the used types of fuzzy composition are max-min, max-product, max-average. And then, the analysis is carried out by case-based reasoning approach with the deducted similarity matrix. Third, in the progress of validation analysis we verify the validation of model through McNemar test based on hit ratio. Finally, we draw a conclusion from the study. As a result, the similarity measuring method using fuzzy relation and composition shows good forecasting performance compared to the similarity measuring method using binary logic for similarity measurement between two symbolic data. But the results of the analysis are not statistically significant in forecasting performance among the types of fuzzy composition. The contributions of this study are as follows. We propose another methodology that fuzzy relation and fuzzy composition could be applied for the similarity measurement between two symbolic data. That is the most important factor to build case-based reasoning model.

A Methodology for Extracting Shopping-Related Keywords by Analyzing Internet Navigation Patterns (인터넷 검색기록 분석을 통한 쇼핑의도 포함 키워드 자동 추출 기법)

  • Kim, Mingyu;Kim, Namgyu;Jung, Inhwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.123-136
    • /
    • 2014
  • Recently, online shopping has further developed as the use of the Internet and a variety of smart mobile devices becomes more prevalent. The increase in the scale of such shopping has led to the creation of many Internet shopping malls. Consequently, there is a tendency for increasingly fierce competition among online retailers, and as a result, many Internet shopping malls are making significant attempts to attract online users to their sites. One such attempt is keyword marketing, whereby a retail site pays a fee to expose its link to potential customers when they insert a specific keyword on an Internet portal site. The price related to each keyword is generally estimated by the keyword's frequency of appearance. However, it is widely accepted that the price of keywords cannot be based solely on their frequency because many keywords may appear frequently but have little relationship to shopping. This implies that it is unreasonable for an online shopping mall to spend a great deal on some keywords simply because people frequently use them. Therefore, from the perspective of shopping malls, a specialized process is required to extract meaningful keywords. Further, the demand for automating this extraction process is increasing because of the drive to improve online sales performance. In this study, we propose a methodology that can automatically extract only shopping-related keywords from the entire set of search keywords used on portal sites. We define a shopping-related keyword as a keyword that is used directly before shopping behaviors. In other words, only search keywords that direct the search results page to shopping-related pages are extracted from among the entire set of search keywords. A comparison is then made between the extracted keywords' rankings and the rankings of the entire set of search keywords. Two types of data are used in our study's experiment: web browsing history from July 1, 2012 to June 30, 2013, and site information. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The original sample dataset contains 150 million transaction logs. First, portal sites are selected, and search keywords in those sites are extracted. Search keywords can be easily extracted by simple parsing. The extracted keywords are ranked according to their frequency. The experiment uses approximately 3.9 million search results from Korea's largest search portal site. As a result, a total of 344,822 search keywords were extracted. Next, by using web browsing history and site information, the shopping-related keywords were taken from the entire set of search keywords. As a result, we obtained 4,709 shopping-related keywords. For performance evaluation, we compared the hit ratios of all the search keywords with the shopping-related keywords. To achieve this, we extracted 80,298 search keywords from several Internet shopping malls and then chose the top 1,000 keywords as a set of true shopping keywords. We measured precision, recall, and F-scores of the entire amount of keywords and the shopping-related keywords. The F-Score was formulated by calculating the harmonic mean of precision and recall. The precision, recall, and F-score of shopping-related keywords derived by the proposed methodology were revealed to be higher than those of the entire number of keywords. This study proposes a scheme that is able to obtain shopping-related keywords in a relatively simple manner. We could easily extract shopping-related keywords simply by examining transactions whose next visit is a shopping mall. The resultant shopping-related keyword set is expected to be a useful asset for many shopping malls that participate in keyword marketing. Moreover, the proposed methodology can be easily applied to the construction of special area-related keywords as well as shopping-related ones.

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

  • Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.177-192
    • /
    • 2016
  • Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.