• Title/Summary/Keyword: Classification Database

Search Result 937, Processing Time 0.045 seconds

Salient Region Detection Algorithm for Music Video Browsing (뮤직비디오 브라우징을 위한 중요 구간 검출 알고리즘)

  • Kim, Hyoung-Gook;Shin, Dong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.112-118
    • /
    • 2009
  • This paper proposes a rapid detection algorithm of a salient region for music video browsing system, which can be applied to mobile device and digital video recorder (DVR). The input music video is decomposed into the music and video tracks. For the music track, the music highlight including musical chorus is detected based on structure analysis using energy-based peak position detection. Using the emotional models generated by SVM-AdaBoost learning algorithm, the music signal of the music videos is classified into one of the predefined emotional classes of the music automatically. For the video track, the face scene including the singer or actor/actress is detected based on a boosted cascade of simple features. Finally, the salient region is generated based on the alignment of boundaries of the music highlight and the visual face scene. First, the users select their favorite music videos from various music videos in the mobile devices or DVR with the information of a music video's emotion and thereafter they can browse the salient region with a length of 30-seconds using the proposed algorithm quickly. A mean opinion score (MOS) test with a database of 200 music videos is conducted to compare the detected salient region with the predefined manual part. The MOS test results show that the detected salient region using the proposed method performed much better than the predefined manual part without audiovisual processing.

Suitability Classes for Italian Ryegrass (Lolium multiflorum Lam.) Using Soil and Climate Digital Database in Gangwon Province (강원도에서 토양과 기후 데이터베이스를 이용한 이탈리안 라이그라스의 재배 적지 구분)

  • Kim, Kyung-Dae;Sung, Kyung-Il;Jung, Yeong-Sang;Lee, Hyun-Il;Kim, Eun-Jeong;Nejad, Jalil Ghassemi;Jo, Mu-Hwan;Lim, Young-Chul
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.32 no.4
    • /
    • pp.437-446
    • /
    • 2012
  • As a part of establishing suitability classification for forage production, use of the national soil and climate database was attempted for Italian ryegrass (Lolium multiflorum Lam., IRG) in Gangwon Province. The soil data base were from Heugtoram of the National Academy of Agricultural Science, and the climate data base were from the National Center for Agro-Meteorology, respectively. Soil physical properties including soil texture, drainage, slope available depth and surface rock contents, and soil chemical properties including soil acidity and salinity, organic matter content were selected as soil factors. The crieria and weighting factors of these elements were scored. Climate factors including average daily minimum temperature, average temperature from March to May, the number of days of which average temperature was higher than $5^{\circ}C$ from September to December, the number of days of precipitation and its amount from October to May of the following year were selected, and criteria and weighting factors were scored. The electronic maps were developed with these scores using the national data base of soil and climate. Based on soil scores, the area of Goseong, Sogcho, Gangreung, and Samcheog in east coastal region with gentle slope were classified as the possible and/or the proper area for IRG cultivation in Gangwon Province. The lands with gentle or moderate slope of Cheolwon, Yanggu, Chuncheon, Hweongseong, Pyungchang and Jeongsun in west side slope of Taebaeg mountains were classified as the possible and/or proper area as well. Based on climate score, the east coastal area of Goseong, Sogcho, Yangyang, Gangreung and Samcheog could be classified as the possible or proper area. Most area located on west side of the Taebaeg mountains were classified as not suitable for IRG production. In scattered area in Chuncheon and Weonju, where the scores exceeded 60, the IRG cultivation should be carefully managed for good production. For better application of electronic maps.

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

Classification of Cultivation Region for Soybean (Glycine max [L.]) in South Korea Based on 30 Years of Weather Indices (평년기상을 활용한 우리나라의 콩 재배지역 구분)

  • Dong-Kyung Yoon;Jaesung Park;Jinhee Seo;Okjae Won;Man-Soo Choi;Hyeon Su Lee;Chaewon Lee
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.69 no.1
    • /
    • pp.49-60
    • /
    • 2024
  • A region can be divided into cultivation zones based on homogeneity in weather variables that have the greatest influence on crop growth and yield. This study classified the cultivation zone of soybean using weather indices as a prior study to classify the agroclimatic zone of soybean. Meteorological factors affecting soybeans were determined through correlation analysis over a 10 year period (from 2013 to 2022) using data from the Miryang and Suwon regions collected from the soybean yield trial database of the Rural Development Administration, Korea and the meteorological database of the Korea Meteorological Administration. The correlation between growth characteristics and the minimum temperature, daily temperature range, and precipitation were high during the vegetative growth stages. Moreover, the correlation between yield components and the maximum temperature, daily temperature range, and precipitation were high during the reproductive growth stages. As a result of k-means clustering, soybean cultivation zones were divided into three zones. Zone 1 was the central inland region and southern Gyeonggi-do; Zone 2 was the southern part of the west coast, the southern part of the east coast, and the South Sea; and Zone 3 included parts of eastern Gyeonggi-do, Gangwon-do, and areas with high altitudes. Zone 1, which has a wide latitude range, was further subdivided into three cultivation zones. The results of this study may provide useful information for estimating agrometeorological characteristics and predicting the success of soybean cultivation in South Korea.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Natural Language Processing Model for Data Visualization Interaction in Chatbot Environment (챗봇 환경에서 데이터 시각화 인터랙션을 위한 자연어처리 모델)

  • Oh, Sang Heon;Hur, Su Jin;Kim, Sung-Hee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.11
    • /
    • pp.281-290
    • /
    • 2020
  • With the spread of smartphones, services that want to use personalized data are increasing. In particular, healthcare-related services deal with a variety of data, and data visualization techniques are used to effectively show this. As data visualization techniques are used, interactions in visualization are also naturally emphasized. In the PC environment, since the interaction for data visualization is performed with a mouse, various filtering for data is provided. On the other hand, in the case of interaction in a mobile environment, the screen size is small and it is difficult to recognize whether or not the interaction is possible, so that only limited visualization provided by the app can be provided through a button touch method. In order to overcome the limitation of interaction in such a mobile environment, we intend to enable data visualization interactions through conversations with chatbots so that users can check individual data through various visualizations. To do this, it is necessary to convert the user's query into a query and retrieve the result data through the converted query in the database that is storing data periodically. There are many studies currently being done to convert natural language into queries, but research on converting user queries into queries based on visualization has not been done yet. Therefore, in this paper, we will focus on query generation in a situation where a data visualization technique has been determined in advance. Supported interactions are filtering on task x-axis values and comparison between two groups. The test scenario utilized data on the number of steps, and filtering for the x-axis period was shown as a bar graph, and a comparison between the two groups was shown as a line graph. In order to develop a natural language processing model that can receive requested information through visualization, about 15,800 training data were collected through a survey of 1,000 people. As a result of algorithm development and performance evaluation, about 89% accuracy in classification model and 99% accuracy in query generation model was obtained.

Review of Research Trends and Evaluation Tools for Clinical Studies of Neck Pain and Cervical Spondylosis : Using the Pubmed Database (Pubmed분석을 통한 경추통과 경추 척추증의 임상연구 최신동향 및 평가도구에 관한 고찰)

  • Kim, Myung Kwan;Kim, Young-Il;Kim, Eun Seok;Jung, In Chul;Park, Yang-Chun;Jeon, Ju Hyun
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.32 no.4
    • /
    • pp.232-246
    • /
    • 2018
  • The purpose of this research is to contribute to clinical researches on neck pain and cervical spondylosis by reviewing the latest research trends and evaluation tools through the analyses of clinical studies on neck pain and cervical spondylosis over the last 5 years. 70 papers satisfying the selection conditions among the RCT papers that had been searched as "neck pain" or "cervical spondylosis" at Pubmed(https://www.ncbi.nlm.nih.gov/pubmed) from March 2011 to February 2016 were targeted. Papers were numbered in order of their publication dates and analyzed by classifying their contents into 1) pain classification, 2) treatment type, 3) treatment duration, 4) treatment time, 5) number of participants, 6) evaluation tools and methods of research, and 7) evaluation duration. 55 papers targeted chronic neck pain, 6 papers acute and subacute neck pain, and 2 papers subacute and chronic neck pain. In comparison by intervention, 43 papers corresponded to physical therapy, 3 papers to acupuncture, 1 to herbal fomentation, 5 to medication, and 18 papers corresponded to multilateral comparisons comparing the efficacy by various interventions. In research period, there were 50 papers based on treatment period, 16 papers based on the number of treatments, and 4 papers based on different periods depending on each group. In treatment duration, the cases from 1 month or more to less than 3 months were most, followed by the cases of less than 1 month, and the cases from 3 months or more to less than 6 months. In treatment frequency, the number of treatments of the treatment group was the same as that of the control group in 51 papers, and many treatments were conducted by the methods of acupuncture, manual therapy, and injection therapy in cases of once or twice of treatments, and physical therapy and electroacupuncture corresponded mainly to the cases from 3 times or more to less than 10 times of treatments, and retrospective observation and exercise programs corresponded mainly to the cases of more than 30 times of treatments. In the number of subjects of the researches, the cases from 50 or more to less than 100 were most, followed by the cases from 20 or more to less than 50. There were 7 evaluation tools cited 10 times or more: VAS, NRS, PPT, NDI, NPQ, CROM, and SF-36. In evaluation period, 37 papers evaluated only during the treatment period, and 33 papers conducted follow-up. In follow-up period, the cases of less than 3 months were most, followed by the cases from 6 months or more to less than 1 year, and the cases from 3 months or more to less than 6 months. When planning clinical researches on cervical pain in the future, appropriate intervention methods, frequency and duration of treatment, period of follow-up, appropriate number of subjects and selection of evaluation tools for objective validity will have to be considered. In addition, randomization, double-blind, etc. will have to be considered for researches with high basis level.

Hydrogeochemistry and Statistical Analysis of Water Quality for Small Potable Water Supply System in Nonsan Area (논산지역 마을상수도 수질의 수리지화학 및 통계 분석)

  • Ko, Kyung-Seok;Ahn, Joo-Sung;Suk, Hee-Jun;Lee, Jin-Soo;Kim, Hyeong-Soo
    • Journal of Soil and Groundwater Environment
    • /
    • v.13 no.6
    • /
    • pp.72-84
    • /
    • 2008
  • This study was carried out to provide proper management plans for small portable water supply system in the Nonsan area through water quality monitoring, hydrogeochemical investigation and multivariate statistical analyses. Nonsan area is a typical rural area heavily depending on small water supply system for portable usage. Geology of the area is composed of granite dominantly along with metasedimentary rocks, gneiss and volcanic rocks. The monitoring results of small portable water supply system showed that 13-21% of groundwaters have exceeded the groundwater standard for drinking water, which is 5 to 8 times higher than the results from the whole country survey (2.5% in average). The major components exceeding the standard limits are nitrate-nitrogen, turbidity, total coliform, bacteria, fluoride and arsenic. High nitrate contamination observed at southern and northern parts of the study area seems to be caused by cultivation practices such as greenhouses. Although Ca and $HCO_3$ are dominant species in groundwater, concentrations of Na, Cl and $NO_3$ have increased at the granitic area indicating anthropogenic contamination. The groundwaters are divided into 2 groups, granite and metasedimentary rock/gneiss areas, with the second principal component presenting anthropogenic pollution by cultivation and residence from the principal components analysis. The discriminant analysis, with an error of 5.56% between initial classification and prediction on geology, can explain more clearly the geochemical characteristics of groundwaters by geology than the principal components analysis. Based on the obtained results, it is considered that the multivariate statistical analysis can be used as an effective method to analyze the integrated hydrogeochemical characteristics and to clearly discriminate variations of the groundwater quality. The research results of small potable water supply system in the study area showed that the groundwater chemistry is determined by the mixed influence of land use, soil properties, and topography which are controlled by geology. To properly control and manage small water supply systems for central and local governments, it is recommended to construct a total database system for groundwater environment including geology, land use, and topography.

Study on Discharge Characteristics of Water Pollutants among Industrial Wastewater per Industrial Classification and the Probability Evaluation (업종별 산업폐수중 수질오염물질 배출 특성 및 개연성 평가 연구)

  • Ahn, Tae-ung;Kim, Won-ky;Son, Dae-hee;Yeom, Ick-tae;Kim, Jae-hoon;Yu, Soon-ju
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.38 no.1
    • /
    • pp.14-24
    • /
    • 2016
  • Information on the lists of pollutants from industrial wastewater discharge are essential not only to specify the key pollutants to be managed in permission process but to design the treatment facilities by the dischargers. In this study, wastewater quality analysis was conducted for three industrial categories including the specified hazardous water pollutants. The general description of the wastewater occurrence, major sources, treatment facilities are also investigated to obtain integrated database on the pollutant inventories for the industrial categories. In addition Based on the analysis of raw wastewater and final effluent, the detected pollutant items are confirmed by analyzing their presence in the raw or supplement materials, the potential of formation as byproducts, and the possibility of inclusion as impurities. The three industrial categories include petrochemical basic compounds, basic organic compounds, and thermal power generation. The water pollutants emitted from petrochemical basic compound manufacturing facilities are 31 items including 16 specified hazardous water pollutants. Basic organic compound manufacturing facilities discharge 30 kinds of pollutants including 14 specified hazardous water pollutants. Thermal power generation facilities emit 20 pollutants, 8 specified hazardous water pollutants among them. These substances were decided as emission inventories of water pollutants finally through the probability evaluation. The compounds detected for each categories are screened through investigation on the possible causes of their occurrence and confirmed as the final water pollutant inventories.

A Study on a Type of Regeneration Project on Old Industrial Complex (노후산업단지 재생사업 추진 유형에 관한 연구)

  • Kim, Joo-hoon;Byun, Byung-seol
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.21 no.2
    • /
    • pp.192-211
    • /
    • 2018
  • With significant influences of old industrial complex in September 2009, Ministry of Land, Infrastructure and Transport chose the 4 districts for the first pilot project. In December 2014, the second pilot project districts were established. In addition, there were 10 districts in April 2016 and 5 districts in April 2016 as the third pilot project and 5 districts in March 2017 as the fourth pilot project. In order to promote smooth business operation of the recycling business, we introduced the effective area designation and special system as stipulated in Article 39.12-13 of the Industrial Location and Development Act revised in May 2015. The effective area, It is a method that can promote propagation and diffusion of the rehabilitation business through visualization by making effective the promotion of the rehabilitation business and by promoting the business in consideration of the geographical feature of the region and industry group, The setting of the unreasonable effective area is based on the criteria and classification of the plan and the objective promotion method according to the individual characteristics of the aged industrial park because the delay of the rehabilitation business and the possibility of the increase of many problems are presented Be sure to Data Envelopment Analysis (DEA) and the old industrial complex database were constructed and utilized to classify the types of recycling projects. Therefore, in this study, it is necessary to strengthen the competitiveness of aged industrial complex by examining the correlation between the diagnosis of 83 aged industrial complex sites and the rehabilitation projects supported by the Ministry of Land, and the types of business promotion for aged industrial parks. It can be used as a guideline for the feasibility of the project.