• Title/Summary/Keyword: Weather classification

Search Result 194, Processing Time 0.021 seconds

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Investigation of Daily Life and Consciousness of Longevous People in Korea -(1)The Regional Features of Longevity Areas- (우리나라 장수자(長壽者)의 생활(生活) 및 의식조사(意識調査)에 관한 연구(硏究) -(1) 장수지역(長壽地域)의 지역적(地域的) 특성(特性)-)

  • Choi, Jin-Ho;Pyeun, Jae-Hyeung;Rhim, Chae-Hwan;Yang, Jong-Soon;Kim, Soo-Hyun;Kim, Jeung-Han;Lee, Byeong-Ho;Woo, Soon-Im;Choe, Sun-Nam;Byun, Dae-Seok
    • Journal of the Korean Society of Food Culture
    • /
    • v.1 no.2
    • /
    • pp.116-126
    • /
    • 1986
  • This study was designed to be a link in the chain of the investigation on daily life and consciousness of longevous people in Korea, and to investigate the regional feature of longevity areas. The daily life and consciousness were investigated on 379 subjects(male 121, female 258) of the aged who were above 80 years of age, from June to November in 1985. This paper is to report the results investigated the longevity rate, distribution, classification and weather of longevity districts, and also the actual conditions such as the functions of daily life and educational degree of longevous people. 1. The number of longevous people in Korea was 171,449 (male 42,842, female 128,607), and the average longevity rate was 0.46% against total population in Korea(male 0.23%, female 0.69%). 2. Of the longevity rates of shi and/or do in Korea, Cheju(1.03%) was the highest among these districts, and decreased in the order of Chonnam(0.79%), Chonbuk(0.66%), Kyongbuk(0.65%) and Kyongnam(0.61%), whereas the large cities such as Inchon(0.22%), Seoul(0.23%), Pusan(0.23%) and Taegu(0.28%) were remarkably lower than districts in seasides and mountains. 3. The districts above 1.0% of longevity rate in Korea showed 17-guns, and the distribution of these districts was 10-guns of Chonnam, 2-guns of Kyongbuk and Kyongnam, and 1-gun of Kyonggi, Cho-nbuk and Cheju, respectively. 4. Of these districts, Pukcheju(1.65%) was the highest, and decreased in the order of Namhae(1.56%), Sungju(1.24%), Posong(1.22%) and Koksong(1.20%). The highest figure(male 0.71%, female 2.51%) was observed in Pukcheju as contrasted with 0.23%(male) and 0.69%(female) of the average longevity rate in Korea. 5. The sex ratio of longevous people in Korea showed the female/male ratio of 3.0. It is, therefore, believed that the longevity rate of female was 3 times higher than that of male. 6. The longevity districts were classified into seven districts in seasides, three districts in isolated islands, and seven rural districts in mountains. 7. The situation of weather in longevity districts was in the range of 11.2 to $14.8^{\circ}C$ at annual average temperature, and 878.5 to 1585.9mm at annual average rainfall. 8. Of the educational degree of longevous people, uneducated(71.5%) was the highest, and followed by the order of village school(15.8%) and above elementary school(4.8%). 9. In the functions of daily life, the aged moving actively(53.0%) was the highest among these longevous people, followed by the aged moving a little(23.5%). Therefore, it is believed that health degree of these longevous peoples by the functions of daily life was very gratifying.

  • PDF

Estimation of Annual Trends and Environmental Effects on the Racing Records of Jeju Horses (제주마 주파기록에 대한 연도별 추세 및 환경효과 분석)

  • Lee, Jongan;Lee, Soo Hyun;Lee, Jae-Gu;Kim, Nam-Young;Choi, Jae-Young;Shin, Sang-Min;Choi, Jung-Woo;Cho, In-Cheol;Yang, Byoung-Chul
    • Journal of Life Science
    • /
    • v.31 no.9
    • /
    • pp.840-848
    • /
    • 2021
  • This study was conducted to estimate annual trends and the environmental effects in the racing records of Jeju horses. The Korean Racing Authority (KRA) collected 48,645 observations for 2,167 Jeju horses from 2002 to 2019. Racing records were preprocessed to eliminate errors that occur during the data collection. Racing times were adjusted for comparison between race distances. A stepwise Akaike information criterion (AIC) variable selection method was applied to select appropriate environment variables affecting racing records. The annual improvement of the race time was -0.242 seconds. The model with the lowest AIC value was established when variables were selected in the following order: year, budam classification, jockey ranking, trainer ranking, track condition, weather, age, and gender. The most suitable model was constructed when the jockey ranking and age variables were considered as random effects. Our findings have potential for application as basic data when building models for evaluating genetic abilities of Jeju horses.

Development of 1ST-Model for 1 hour-heavy rain damage scale prediction based on AI models (1시간 호우피해 규모 예측을 위한 AI 기반의 1ST-모형 개발)

  • Lee, Joonhak;Lee, Haneul;Kang, Narae;Hwang, Seokhwan;Kim, Hung Soo;Kim, Soojun
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.5
    • /
    • pp.311-323
    • /
    • 2023
  • In order to reduce disaster damage by localized heavy rains, floods, and urban inundation, it is important to know in advance whether natural disasters occur. Currently, heavy rain watch and heavy rain warning by the criteria of the Korea Meteorological Administration are being issued in Korea. However, since this one criterion is applied to the whole country, we can not clearly recognize heavy rain damage for a specific region in advance. Therefore, in this paper, we tried to reset the current criteria for a special weather report which considers the regional characteristics and to predict the damage caused by rainfall after 1 hour. The study area was selected as Gyeonggi-province, where has more frequent heavy rain damage than other regions. Then, the rainfall inducing disaster or hazard-triggering rainfall was set by utilizing hourly rainfall and heavy rain damage data, considering the local characteristics. The heavy rain damage prediction model was developed by a decision tree model and a random forest model, which are machine learning technique and by rainfall inducing disaster and rainfall data. In addition, long short-term memory and deep neural network models were used for predicting rainfall after 1 hour. The predicted rainfall by a developed prediction model was applied to the trained classification model and we predicted whether the rain damage after 1 hour will be occurred or not and we called this as 1ST-Model. The 1ST-Model can be used for preventing and preparing heavy rain disaster and it is judged to be of great contribution in reducing damage caused by heavy rain.