Search | Korea Science

Smart Store in Smart City: The Development of Smart Trade Area Analysis System Based on Consumer Sentiments (Smart Store in Smart City: 소비자 감성기반 상권분석 시스템 개발)

Yoo, In-Jin;Seo, Bong-Goon;Park, Do-Hyung
- Journal of Intelligence and Information Systems
- /
- v.24 no.1
- /
- pp.25-52
- /
- 2018
This study performs social network analysis based on consumer sentiment related to a location in Seoul using data reflecting consumers' web search activities and emotional evaluations associated with commerce. The study focuses on large commercial districts in Seoul. In addition, to consider their various aspects, social network indexes were combined with the trading area's public data to verify factors affecting the area's sales. According to R square's change, We can see that the model has a little high R square value even though it includes only the district's public data represented by static data. However, the present study confirmed that the R square of the model combined with the network index derived from the social network analysis was even improved much more. A regression analysis of the trading area's public data showed that the five factors of 'number of market district,' 'residential area per person,' 'satisfaction of residential environment,' 'rate of change of trade,' and 'survival rate over 3 years' among twenty two variables. The study confirmed a significant influence on the sales of the trading area. According to the results, 'residential area per person' has the highest standardized beta value. Therefore, 'residential area per person' has the strongest influence on commercial sales. In addition, 'residential area per person,' 'number of market district,' and 'survival rate over 3 years' were found to have positive effects on the sales of all trading area. Thus, as the number of market districts in the trading area increases, residential area per person increases, and as the survival rate over 3 years of each store in the trading area increases, sales increase. On the other hand, 'satisfaction of residential environment' and 'rate of change of trade' were found to have a negative effect on sales. In the case of 'satisfaction of residential environment,' sales increase when the satisfaction level is low. Therefore, as consumer dissatisfaction with the residential environment increases, sales increase. The 'rate of change of trade' shows that sales increase with the decreasing acceleration of transaction frequency. According to the social network analysis, of the 25 regional trading areas in Seoul, Yangcheon-gu has the highest degree of connection. In other words, it has common sentiments with many other trading areas. On the other hand, Nowon-gu and Jungrang-gu have the lowest degree of connection. In other words, they have relatively distinct sentiments from other trading areas. The social network indexes used in the combination model are 'density of ego network,' 'degree centrality,' 'closeness centrality,' 'betweenness centrality,' and 'eigenvector centrality.' The combined model analysis confirmed that the degree centrality and eigenvector centrality of the social network index have a significant influence on sales and the highest influence in the model. 'Degree centrality' has a negative effect on the sales of the districts. This implies that sales decrease when holding various sentiments of other trading area, which conflicts with general social myths. However, this result can be interpreted to mean that if a trading area has low 'degree centrality,' it delivers unique and special sentiments to consumers. The findings of this study can also be interpreted to mean that sales can be increased if the trading area increases consumer recognition by forming a unique sentiment and city atmosphere that distinguish it from other trading areas. On the other hand, 'eigenvector centrality' has the greatest effect on sales in the combined model. In addition, the results confirmed a positive effect on sales. This finding shows that sales increase when a trading area is connected to others with stronger centrality than when it has common sentiments with others. This study can be used as an empirical basis for establishing and implementing a city and trading area strategy plan considering consumers' desired sentiments. In addition, we expect to provide entrepreneurs and potential entrepreneurs entering the trading area with sentiments possessed by those in the trading area and directions into the trading area considering the district-sentiment structure.
https://doi.org/10.13088/jiis.2018.24.1.025 인용 PDF KSCI

Major Class Recommendation System based on Deep learning using Network Analysis (네트워크 분석을 활용한 딥러닝 기반 전공과목 추천 시스템)

Lee, Jae Kyu;Park, Heesung;Kim, Wooju
- Journal of Intelligence and Information Systems
- /
- v.27 no.3
- /
- pp.95-112
- /
- 2021
In university education, the choice of major class plays an important role in students' careers. However, in line with the changes in the industry, the fields of major subjects by department are diversifying and increasing in number in university education. As a result, students have difficulty to choose and take classes according to their career paths. In general, students choose classes based on experiences such as choices of peers or advice from seniors. This has the advantage of being able to take into account the general situation, but it does not reflect individual tendencies and considerations of existing courses, and has a problem that leads to information inequality that is shared only among specific students. In addition, as non-face-to-face classes have recently been conducted and exchanges between students have decreased, even experience-based decisions have not been made as well. Therefore, this study proposes a recommendation system model that can recommend college major classes suitable for individual characteristics based on data rather than experience. The recommendation system recommends information and content (music, movies, books, images, etc.) that a specific user may be interested in. It is already widely used in services where it is important to consider individual tendencies such as YouTube and Facebook, and you can experience it familiarly in providing personalized services in content services such as over-the-top media services (OTT). Classes are also a kind of content consumption in terms of selecting classes suitable for individuals from a set content list. However, unlike other content consumption, it is characterized by a large influence of selection results. For example, in the case of music and movies, it is usually consumed once and the time required to consume content is short. Therefore, the importance of each item is relatively low, and there is no deep concern in selecting. Major classes usually have a long consumption time because they have to be taken for one semester, and each item has a high importance and requires greater caution in choice because it affects many things such as career and graduation requirements depending on the composition of the selected classes. Depending on the unique characteristics of these major classes, the recommendation system in the education field supports decision-making that reflects individual characteristics that are meaningful and cannot be reflected in experience-based decision-making, even though it has a relatively small number of item ranges. This study aims to realize personalized education and enhance students' educational satisfaction by presenting a recommendation model for university major class. In the model study, class history data of undergraduate students at University from 2015 to 2017 were used, and students and their major names were used as metadata. The class history data is implicit feedback data that only indicates whether content is consumed, not reflecting preferences for classes. Therefore, when we derive embedding vectors that characterize students and classes, their expressive power is low. With these issues in mind, this study proposes a Net-NeuMF model that generates vectors of students, classes through network analysis and utilizes them as input values of the model. The model was based on the structure of NeuMF using one-hot vectors, a representative model using data with implicit feedback. The input vectors of the model are generated to represent the characteristic of students and classes through network analysis. To generate a vector representing a student, each student is set to a node and the edge is designed to connect with a weight if the two students take the same class. Similarly, to generate a vector representing the class, each class was set as a node, and the edge connected if any students had taken the classes in common. Thus, we utilize Node2Vec, a representation learning methodology that quantifies the characteristics of each node. For the evaluation of the model, we used four indicators that are mainly utilized by recommendation systems, and experiments were conducted on three different dimensions to analyze the impact of embedding dimensions on the model. The results show better performance on evaluation metrics regardless of dimension than when using one-hot vectors in existing NeuMF structures. Thus, this work contributes to a network of students (users) and classes (items) to increase expressiveness over existing one-hot embeddings, to match the characteristics of each structure that constitutes the model, and to show better performance on various kinds of evaluation metrics compared to existing methodologies.
https://doi.org/10.13088/jiis.2021.27.3.095 인용 PDF KSCI

A Study on the Road Safety Analysis Model: Focused on National Highway Areas in Cheonbuk Province (도로 안전성 분석 모형에 관한 연구: 전라북도 국도 권역을 중심으로)

Lim, Joonbeom;Kim, Joon-Ki;Lee, Soobeom;Kim, Hyunjin
- KSCE Journal of Civil and Environmental Engineering Research
- /
- v.34 no.2
- /
- pp.583-595
- /
- 2014
Currently, Korean transportation policies are aiming for increase of safety and environment-friendly and efficient operation, by avoiding construction and expansion of roads, and upgrading road alignments and facilities. This is revealed by that there have been 22 road expansion projects (30%) and 50 road improvement projects (70%) under the 3rd Five-Year Plan for National Highways ('11~'15), while there were 53 road expansion projects (71%) and 22 road improvement projects (29%) under the 2nd Five-Year Plan for National Highways. For more effective road improvement projects, there is a need of choosing projects after an objective and scientific safety assessment of each road, and assessing safety improvement depending on projects. This study is intended to develop a model for this road safety analysis and assessment. The major objective of this study is creating a road safety analysis and assessment model appropriate for Korean society, based on the HSM (Highway Safety Manual) of the U.S. In order to build up data for model development, the sections thought to have identical geometrical structure factors in 5 lines, Cheonbuk province, were divided as homogeneous sections, and representative values of geometric structures, facilities, traffic volume, climate conditions and land usage were collected from the 1,452 sections divided. In order to build up data for model development, the sections thought to have identical geometrical structure factors in 5 lines, Cheonbuk province, were divided as homogeneous sections, and representative values of geometric structures, facilities, traffic volume, climate conditions and land usage were collected from the 1,452 sections divided. The collected data was processed correlation analysis of each road element was implemented to see which factor had a big effect on traffic accidents. On the basis of these results, then, an accident model was established as a negative binomial regression model.Using the developed model, an Crash Modification Factor (CMF) which determines accident frequency changes depending on safety performance function (SPF) predicting the number of accident occurrence through traffic volume and road section expansion, road geometric structure and traffic properties, was extracted.
https://doi.org/10.12652/Ksce.2014.34.2.0583 인용 PDF KSCI

Predicting Regional Soybean Yield using Crop Growth Simulation Model (작물 생육 모델을 이용한 지역단위 콩 수량 예측)

Ban, Ho-Young;Choi, Doug-Hwan;Ahn, Joong-Bae;Lee, Byun-Woo
- Korean Journal of Remote Sensing
- /
- v.33 no.5_2
- /
- pp.699-708
- /
- 2017
The present study was to develop an approach for predicting soybean yield using a crop growth simulation model at the regional level where the detailed and site-specific information on cultivation management practices is not easily accessible for model input. CROPGRO-Soybean model included in Decision Support System for Agrotechnology Transfer (DSSAT) was employed for this study, and Illinois which is a major soybean production region of USA was selected as a study region. As a first step to predict soybean yield of Illinois using CROPGRO-Soybean model, genetic coefficients representative for each soybean maturity group (MG I~VI) were estimated through sowing date experiments using domestic and foreign cultivars with diverse maturity in Seoul National University Farm ($37.27^{\circ}N$, $126.99^{\circ}E$) for two years. The model using the representative genetic coefficients simulated the developmental stages of cultivars within each maturity group fairly well. Soybean yields for the grids of $10km{\times}10km$ in Illinois state were simulated from 2,000 to 2,011 with weather data under 18 simulation conditions including the combinations of three maturity groups, three seeding dates and two irrigation regimes. Planting dates and maturity groups were assigned differently to the three sub-regions divided longitudinally. The yearly state yields that were estimated by averaging all the grid yields simulated under non-irrigated and fully-Irrigated conditions showed a big difference from the statistical yields and did not explain the annual trend of yield increase due to the improved cultivation technologies. Using the grain yield data of 9 agricultural districts in Illinois observed and estimated from the simulated grid yield under 18 simulation conditions, a multiple regression model was constructed to estimate soybean yield at agricultural district level. In this model a year variable was also added to reflect the yearly yield trend. This model explained the yearly and district yield variation fairly well with a determination coefficients of $R^2=0.61$ (n = 108). Yearly state yields which were calculated by weighting the model-estimated yearly average agricultural district yield by the cultivation area of each agricultural district showed very close correspondence ($R^2=0.80$) to the yearly statistical state yields. Furthermore, the model predicted state yield fairly well in 2012 in which data were not used for the model construction and severe yield reduction was recorded due to drought.
https://doi.org/10.7780/kjrs.2017.33.5.2.9 인용 PDF KSCI

An Analysis of Body Shapes in Aged Abdominal Obese Women for Apparel Pattern Design (복부비만 노년 여성의 의복패턴설계를 위한 체형연구)

Kim, Soo-A;Choi, Hei-Sun
- Journal of the Korean Society of Clothing and Textiles
- /
- v.30 no.12 s.159
- /
- pp.1690-1696
- /
- 2006
The purpose of this study is to provide the basic data useful in designing apparel patterns for aged abdominal obese women. The body measurements of 318 women were taken at random, whose ages were over 60 and fields of action were colleges, sports centers, or business sites in Seoul and the neighboring districts. A total of 33 features in the upper body and lower body were used fer the anthropometric measurement and analysis using anthropometry. The collected measurement data were processed statistically using the SPSS 12.0 program for technical statistical analysis, t-test, frequency analysis, correlation analysis. The results of the study are as follows. 1. Subjects were classified into two groups as a result of analysis for measurement data. It was revealed that 251(about 79 percent) women of total subjects(n=318) have a characteristic of abdominal obese body type and elderly women of these group usually had big abdomen rather than hip. The criteria of abdominal obesity based on waist-hip ratio, WHR(=0.85). 2. Aged abdominal obese women have shown much larger size in most body measurements except items of some vertical length, such as bust ponit-bust point, font interscye, back interscye with circumference and depth of armscye, bust, waist, abdomen and hip while showing no difference in height, biacrominal breadth, hip width, neck shoulder point to breast point, crotch length. 3. Vervaeck index(=100.1) and Rohrer index(=1.7) indicated that the abdominal obese women were fat in overall body. And aspect ratio of waist(=0.86), abdomen(=0.92) and hip(=0.75) also appeared high that the shape of cross sections in those regions was similar to a figure of circle 4. In view of the correlation coefficient between hip circumference and the rest measurement items, and between hip circumference inclusively of the abdomen protrusion and the rest measurement items, there were found some differences for each group. In case of Group (abdominal obese group), the former is smaller than the other. 5. In case of Abdominal obese women, hip circumference inclusively of the abdomen protrusion is more mutually related to the rest items related to make apparel pattern as waist circumference, depth of armscye and so on than what hip circumference is. This result indicated which must be considered hip circumference inclusively of the abdomen protrusion to make apparel patterns for abdominal obese women unlike women of common body types.
PDF KSCI

A study on the CRM strategy for medium and small industry of distribution (중소유통업체의 CRM 도입방안에 관한 연구)

Kim, Gi-Pyoung
- Journal of Distribution Science
- /
- v.8 no.3
- /
- pp.37-47
- /
- 2010
CRM refers to the operating activities that always maintain and promote good relationship with customers to ultimately maximize the company's profits by understanding the value of customers to meet their demands, establishing a strategy which may maximize the Life Time Value and successfully operating the business by integrating the customer management processes. In our country, many big businesses are introducing CRM initiatively to use it in marketing strategy however, most medium and small sized companies do not understand CRM clearly or they feel difficult to introduce it due to huge investment needed. This study is intended to present CRM promotion strategy and activities plan fit for the medium and small sized companies by analyzing the success factors of the leading companies those have already executed CRM by surveying the precedents to make the distributors out of the industries have close relation with consumers to overcome their weakness in scale and strengthen their competitiveness in such a rapidly changing and fiercely competing market. There are 5 stages to build CRM such as the recognition of the needs of CRM establishment, the establishment of CRM integrated database, the establishment of customer analysis and marketing strategy through data mining, the practical use of customer analysis through data mining and the implementation of response analysis and close loop process. Through the case study of leading companies, CRM is needed in types of businesses where the companies constantly contact their customers. To meet their needs, they assertively analyze their customer information. Through this, they develop their own CRM programs personalized for their customers to provide high quality service products. For customers helping them make profits, the VIP marketing strategy is conducted to keep the customers from breaking their relationships with the companies. Through continuous management, CRM should be executed. In other words, through customer segmentation, the profitability for the customers should be maximized. The maximization of the profitability for the customers is the key to CRM. These are the success factors of the CRM of the distributors in Korea. Firstly, the top management's will power for CS management is needed. Secondly, the culture across the company should be made to respect the customers. Thirdly, specialized customer management and CRM workers should be trained. Fourthly, CRM behaviors should be developed for the whole staff members. Fifthly, CRM should be carried out through systematic cooperation between related departments. To make use of the case study for CRM, the company should understand the customer and establish customer management programs to set the optimal CRM strategy and continuously pursue it according to a long-term plan. For this, according to collected information and customer data, customers should be segmented and the responsive customer system should be designed according to the differentiated strategy according to the class of the customers. In terms of the future CRM, integrated CRM is essential where the customer information gathers together in one place. As the degree of customers' expectation increases a lot, the effective way to meet the customers' expectation should be pursued. As the IT technology improved rapidly, RFID (Radio Frequency Identification) appears. On a real-time basis, information about products and customers is obtained massively in a very short time. A strategy for successful CRM promotion should be improving the organizations in charge of contacting customers, re-planning the customer management processes and establishing the integrated system with the marketing strategy to keep good relation with the customers according to a long-term plan and a proper method suitable to the market conditions and run a company-wide program. In addition, a CRM program should be continuously improved and complemented to meet the company's characteristics. Especially, a strategy for successful CRM for the medium and small sized distributors should be as follows. First, they should change their existing recognition in CRM and keep in-depth care for the customers. Second, they should benchmark the techniques of CRM from the leading companies and find out success points to use. Third, they should seek some methods best suited for their particular conditions by achieving the ideas combining their own strong points with marketing. Fourth, a CRM model should be developed that will promote relationship with individual customers just like the precedents of small sized businesses in Switzerland through small but noticeable events.
PDF

A study on urban heat islands over the metropolitan Seoul area, using satellite images (원격탐사기법에 의한 도시열섬 연구)

;Lee, Hyoun-Young
- Journal of the Korean Geographical Society
- /
- v.40
- /
- pp.1-13
- /
- 1989
The brightness temperature from NOAA AVHRR CH 4 images was examined for the metropolitan Seoul area, the capital city of Korea, to detect the characteristics of the urban heat island for this study. Surface data from 21 meteorological stations were compared with the brightness temperatures Through computer enhancement techniques, more than 20 heat islands could be recognized in South Korea, with 1 km spatii resolution at a scale of 1: 200, 00O(Fig. 3, 4 and 6). The result of the analysis of AVHRR CH 4 images over the metropolitan Seoul area can be summerized as follows (1) The pattern of brightness temperature distribution in the metropolitan Seoul area shows a relatively strong temperature contrast between urban and rural areas. There is some indication of the warm brightness temperature zone characterrizing built-up area including CBD, densely populated residential district and industrial zone. The cool brightness temperature is asociaed with the major hills such as Bukhan-san, Nam-san and Kwanak-san or with the major water bodies such as Han-gang, and reservoirs. Although the influence of the river and reservoirs is obvious in the brightness temperauture, that of small-scaled land use features such as parks in the cities is not features such as parks in the cities is not apperent. (2) One can find a linerar relationshop between the brightenss temperature and air temperature for 10 major cities, where the difference between two variables is larger in big cities. Though the coefficient value is 0.82, one can estimate that factors of the heat islands can not be explained only by the size of the cities. The magnitude of the horizontal brightness temperature differences between urban and rural area is found to be greater than that of horizontal air temperature difference in Korea. (3) Also one can find the high heat island intensity in some smaller cities such as Changwon(won(Tu-r=9.0$^{\circ}$C) and Po-hang(Tu-r==7.1$^{\circ}$~)T. he industrial location quotient of Chang-won is the second in the country and Po-hang the third. (4) A comparision of the enhanced thermal infrared imageries in 1986 and 1989, with the map at a scale of 1:200, 000 for the meotropolitan Seoul area showes the extent of possible urbanization changes. In the last three years, the heat islands have been extended in area. zone characterrizing built-up area including (5) Although the overall data base is small, the data in Fig. 3 suggest that brightness tempeautre could ge utilized for the study on the heat island characteristics. Satellite observations are required to study and monitor the impact of urban heat island on the climate and environment on global scale. This type of remote sensing provides a meams of monitoring the growth of urban and suburban aeas and its impact on the environment.
PDF

A Study on Trust Transfer in Traditional Fintech of Smart Banking (핀테크 서비스에서 오프라인에서 온라인으로의 신뢰전이에 관한 연구 - 스마트뱅킹을 중심으로 -)

Ai, Di;Kwon, Sun-Dong;Lee, Su-Chul;Ko, Mi-Hyun;Lee, Bo-Hyung
- Management & Information Systems Review
- /
- v.36 no.3
- /
- pp.167-184
- /
- 2017
In this study, we investigated the effect of offline banking trust on smart banking trust. As influencing factors of smart banking trust, this study compared offline banking trust, smart banking's system quality, and information quality. For the empirical study, 186 questionnaire data were collected from smart banking users and the data were analyzed using Smart-PLS 2.0. As results, it was verified that there is trust transfer in FinTech service, by the significant effect of offline banking trust on smart banking trust. And it was proved that the effect of offline banking trust on smart banking trust is lower than that of smart banking itself. The contribution of this study can be seen in both academic and industrial aspects. First, it is the contribution of the academic aspect. Previous studies on banking were focused on either offline banking or smart banking. But this study, focus on the relationship between offline banking and online banking, proved that offline banking trust affects smart banking trust. Next, it is the industrial contribution. This study showed that offline banking characteristics of traditional commercial banks affect the trust of emerging smart banking service. This means that the emerging FinTech companies are not advantageous in the competition of trust building compared to traditional commercial banks. Unlike traditional commercial banks, the emerging FinTech is innovating the convenience of customers by arming them with new technologies such as mobile Internet, social network, cloud technology, and big data. However, these FinTech strengths alone can not guarantee sufficient trust needed for financial transactions, because banking customers do not change a habit or an inertia that they already have during using traditional banks. Therefore, emerging FinTech companies should strive to create destructive value that reflects the connection with various Internet services and the strength of online interaction such as social services, which have an advantage over customer contacts. And emerging FinTech companies should strive to build service trust, focused on young people with low resistance to new services.
PDF

Game Theoretic Optimization of Investment Portfolio Considering the Performance of Information Security Countermeasure (정보보호 대책의 성능을 고려한 투자 포트폴리오의 게임 이론적 최적화)

Lee, Sang-Hoon;Kim, Tae-Sung
- Journal of Intelligence and Information Systems
- /
- v.26 no.3
- /
- pp.37-50
- /
- 2020
Information security has become an important issue in the world. Various information and communication technologies, such as the Internet of Things, big data, cloud, and artificial intelligence, are developing, and the need for information security is increasing. Although the necessity of information security is expanding according to the development of information and communication technology, interest in information security investment is insufficient. In general, measuring the effect of information security investment is difficult, so appropriate investment is not being practice, and organizations are decreasing their information security investment. In addition, since the types and specification of information security measures are diverse, it is difficult to compare and evaluate the information security countermeasures objectively, and there is a lack of decision-making methods about information security investment. To develop the organization, policies and decisions related to information security are essential, and measuring the effect of information security investment is necessary. Therefore, this study proposes a method of constructing an investment portfolio for information security measures using game theory and derives an optimal defence probability. Using the two-person game model, the information security manager and the attacker are assumed to be the game players, and the information security countermeasures and information security threats are assumed as the strategy of the players, respectively. A zero-sum game that the sum of the players' payoffs is zero is assumed, and we derive a solution of a mixed strategy game in which a strategy is selected according to probability distribution among strategies. In the real world, there are various types of information security threats exist, so multiple information security measures should be considered to maintain the appropriate information security level of information systems. We assume that the defence ratio of the information security countermeasures is known, and we derive the optimal solution of the mixed strategy game using linear programming. The contributions of this study are as follows. First, we conduct analysis using real performance data of information security measures. Information security managers of organizations can use the methodology suggested in this study to make practical decisions when establishing investment portfolio for information security countermeasures. Second, the investment weight of information security countermeasures is derived. Since we derive the weight of each information security measure, not just whether or not information security measures have been invested, it is easy to construct an information security investment portfolio in a situation where investment decisions need to be made in consideration of a number of information security countermeasures. Finally, it is possible to find the optimal defence probability after constructing an investment portfolio of information security countermeasures. The information security managers of organizations can measure the specific investment effect by drawing out information security countermeasures that fit the organization's information security investment budget. Also, numerical examples are presented and computational results are analyzed. Based on the performance of various information security countermeasures: Firewall, IPS, and Antivirus, data related to information security measures are collected to construct a portfolio of information security countermeasures. The defence ratio of the information security countermeasures is created using a uniform distribution, and a coverage of performance is derived based on the report of each information security countermeasure. According to numerical examples that considered Firewall, IPS, and Antivirus as information security countermeasures, the investment weights of Firewall, IPS, and Antivirus are optimized to 60.74%, 39.26%, and 0%, respectively. The result shows that the defence probability of the organization is maximized to 83.87%. When the methodology and examples of this study are used in practice, information security managers can consider various types of information security measures, and the appropriate investment level of each measure can be reflected in the organization's budget.
https://doi.org/10.13088/jiis.2020.26.3.037 인용 PDF KSCI

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

Jeong, Hanjo;Park, Byeonghwa
- Journal of Intelligence and Information Systems
- /
- v.21 no.1
- /
- pp.1-13
- /
- 2015
As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.
https://doi.org/10.13088/jiis.2015.21.1.01 인용 PDF KSCI

Search Result 6,134, Processing Time 0.041 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)