Search | Korea Science

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
- Journal of Intelligence and Information Systems
- /
- v.24 no.4
- /
- pp.219-240
- /
- 2018
Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.
https://doi.org/10.13088/jiis.2018.24.4.219 인용 PDF KSCI HTML

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

Kim, Kilho;Choi, Sangwoo;Chae, Moon-jung;Park, Heewoong;Lee, Jaehong;Park, Jonghun
- Journal of Intelligence and Information Systems
- /
- v.25 no.1
- /
- pp.163-177
- /
- 2019
As smartphones are getting widely used, human activity recognition (HAR) tasks for recognizing personal activities of smartphone users with multimodal data have been actively studied recently. The research area is expanding from the recognition of the simple body movement of an individual user to the recognition of low-level behavior and high-level behavior. However, HAR tasks for recognizing interaction behavior with other people, such as whether the user is accompanying or communicating with someone else, have gotten less attention so far. And previous research for recognizing interaction behavior has usually depended on audio, Bluetooth, and Wi-Fi sensors, which are vulnerable to privacy issues and require much time to collect enough data. Whereas physical sensors including accelerometer, magnetic field and gyroscope sensors are less vulnerable to privacy issues and can collect a large amount of data within a short time. In this paper, a method for detecting accompanying status based on deep learning model by only using multimodal physical sensor data, such as an accelerometer, magnetic field and gyroscope, was proposed. The accompanying status was defined as a redefinition of a part of the user interaction behavior, including whether the user is accompanying with an acquaintance at a close distance and the user is actively communicating with the acquaintance. A framework based on convolutional neural networks (CNN) and long short-term memory (LSTM) recurrent networks for classifying accompanying and conversation was proposed. First, a data preprocessing method which consists of time synchronization of multimodal data from different physical sensors, data normalization and sequence data generation was introduced. We applied the nearest interpolation to synchronize the time of collected data from different sensors. Normalization was performed for each x, y, z axis value of the sensor data, and the sequence data was generated according to the sliding window method. Then, the sequence data became the input for CNN, where feature maps representing local dependencies of the original sequence are extracted. The CNN consisted of 3 convolutional layers and did not have a pooling layer to maintain the temporal information of the sequence data. Next, LSTM recurrent networks received the feature maps, learned long-term dependencies from them and extracted features. The LSTM recurrent networks consisted of two layers, each with 128 cells. Finally, the extracted features were used for classification by softmax classifier. The loss function of the model was cross entropy function and the weights of the model were randomly initialized on a normal distribution with an average of 0 and a standard deviation of 0.1. The model was trained using adaptive moment estimation (ADAM) optimization algorithm and the mini batch size was set to 128. We applied dropout to input values of the LSTM recurrent networks to prevent overfitting. The initial learning rate was set to 0.001, and it decreased exponentially by 0.99 at the end of each epoch training. An Android smartphone application was developed and released to collect data. We collected smartphone data for a total of 18 subjects. Using the data, the model classified accompanying and conversation by 98.74% and 98.83% accuracy each. Both the F1 score and accuracy of the model were higher than the F1 score and accuracy of the majority vote classifier, support vector machine, and deep recurrent neural network. In the future research, we will focus on more rigorous multimodal sensor data synchronization methods that minimize the time stamp differences. In addition, we will further study transfer learning method that enables transfer of trained models tailored to the training data to the evaluation data that follows a different distribution. It is expected that a model capable of exhibiting robust recognition performance against changes in data that is not considered in the model learning stage will be obtained.
https://doi.org/10.13088/jiis.2019.25.1.163 인용 PDF KSCI HTML

The Variation of Natural Population of Pinus densiflora S. et Z. in Korea (III) -Genetic Variation of the Progeny Originated from Mt. Chu-wang, An-Myon Island and Mt. O-Dae Populations- (소나무 천연집단(天然集團)의 변이(變異)에 관(關)한 연구(硏究)(III) -주왕산(周王山), 안면도(安眠島), 오대산(五臺山) 소나무집단(集團)의 차대(次代)의 유전변이(遺傳變異)-)

Yim, Kyong Bin;Kwon, Ki Won
- Journal of Korean Society of Forest Science
- /
- v.32 no.1
- /
- pp.36-63
- /
- 1976
The purpose of this study is to elucidate the genetic variation of the natural forest of Pinus densiflora. Three natural populations of the species, which are considered to be superior quality phenotypically, were selected. The locations and conditions of the populations are shown in table 1 and 2. The morphological traits of tree and needle and some other characteristics were presented already in our first report of this series in which population and family differences according to observed characteristics were statistically analyzed. Twenty trees were sampled from each populations, i.e., 60 trees in total. During the autumn of 1974, matured cones were collected from each tree and open-pollinated seeds were extracted in laboratory. Immediately after cone collection, in closed condition, the morphological characteristics were measured. Seed and seed-wing dimensions were also studied. In the spring of 1975, the seeds were sown in the experimental tree nursery located in Suweon. And in the April of 1976, the 1-0 seedlings were transplanted according to the predetermined experimental design, randomized block design with three replications. Because of cone setting condition. the number of family from which progenies were raised by populations were not equal. The numbers of family were 20 in population 1. 18 in population 2 and 15 in population 3. Then, each randomized block contained seedlings of 53 families from 3 populations. The present paper is mainly concerned with the variation of some characteristics of cone, seed, needle, growth performance of seedlings, and chlorophyll and monoterpene compositions of needles. The results obtained are summerized as follows. 1. The meteorological data obtained by averaging the records of 30 year period, observed from the nearest station to each location of populations, are shown in Fig. 3, 4, and 5. The distributional pattern of monthly precipitation are quite similar among locations. However, the precipitation density on population 2, Seosan area, during growing season is lower as compared to the other two populations. Population 1. Cheong-song area, and population 3, Pyong-chang area, are located in inland, but population 2 in the western seacoast. The differences on the average monthly air temperatures and the average monthly lowest temperatures among populations can hardly be found. 2. Available information on the each mother trees (families) studied, such as age, stem height, diameter at breast height, clear-bole-length, crown conditions and others are shown in table 6,7, and 8. 3. The measurements of fresh cone weight, length and the widest diameter of cone are given in Tab]e 9. All these traits arc concerned with the highly significant population differences and family differences within population. And the population difference was also found in the cone-index, that is, length-diameter ratio. 4. Seed-wing length and seed-wing width showed the population differences, and the family differences were also found in both characteristics. Not discussed in this paper, however, seed-wing colours and their shapes indicate the specificity which is inherent to individual trees as shown in photo 3 on page 50. The colour and shape are fully the expression of genetic make up of mother tree. The little variations on these traits are resulted from this reason. The significant differences among populations and among families were found in those characteristics, such as 1000-seed weight, seed length, seed width, and seed thickness as shown in table 11. As to all these dimensions, the values arc always larger in population 1 which is younger in age than that of the other two. The population differences evaluated by cone, seed and seed-wing sizes could partly be attributed to the growth vigorousity. 5. The values of correlation between the characteristics of cone and seed are presented in table 12. As shown, the positive correlations between cone diameter and seed-wing width were calculated in all populations studied. The correlation between seed-wing length and seed length was significantly positive in population 1 and 3 but not in population 2, that is, the r-value is so small as 0.002. in the latter. The correlation between cone length and seed-wing length was highly significant in population 1, but not in population 2. 6. Differences among progenies in growth performances, such as 1-0 and 1-1 seedling height and root collar diameter were highly singificant among populations as well as families within population(Table 13.) 7. The heritability values in narrow sense of population characteristics were estimated on the basis of variance components. The values based on seedling height at each age stage of 1-1 and 1-0 ranged from 0.146 to 0.288 and the values of root collar diameter from 0.060 to 0.130. (Table 14). These heritability values varied according to characteristics and seedling ages. Here what must be stated is that, for calculation of heritability values, the variance values of population was divided by the variance value of environment (error) and family and population. The present authors want to add the heritability values based on family level in the coming report. It might be considered that if the tree age is increased in furture, the heritability value is supposed to be altered or lowered. Examining the heritability values studied previously by many authors, in pine group at age of 7 to 15, the values of height growth ranged from 0.2 to 0.4 in general. The values we obtained are further below than these. 8. The correlation between seedling growth and seed characteristics were examined and the values resulted are shown in table 16. Contrary to our hypothetical premise of positive correlation between 1-0 seedling height and seed weight, non-significance on it was found. However, 1-0 seedling height correlated positively with seed length. And significant correlations between 1-0 and 1-1 seedling height are calculated. 9. The numbers of stomata row calculated separately by abaxial and adaxial side showed highly significant differences among populations, but not in serration density. On serration density, the differences among families within population were highly significant. (Table 17) A fact must be noted is that the correlation between stomata row on abaxial side and adaxial side was highly significant in all populations. Non-significances of correlation coefficient between progenies and parents regarding to stomata row on abaxial side were shown in all populations studied.(Table 18). 10. The contents of chhlorophyll b of the needle were a little more than that of chlorophyll a irrespective of the populations examined. The differences of chlorophyll a, b and a plus b contents were highly significant but not among families within populations as shown in table 20. The contents of chlorophyll a and b are presented by individual trees of each populations in table 21. 11. The occurrence of monoterpene components was examined by gas liquid chromatography (Shimazu, GC-1C type) to evaluate the population difference. There are some papers reporting the chemical geography of pines basing upon monoterpene composition. The number of populations studied here is not enough to state this problem. The kinds of monoterpene observed in needle were ${\alpha}$-pinene, camphene, ${\beta}$-pinene, myrcene, limonene, ${\beta}$-phellandrene and terpinolene plus two unknowns. In analysis of monoterpene composition, the number of sample trees varied with population, I.e., 18 families for population 1, 15 for population 2 and 11 for population3. (Table 22, 23 and 24). The histograms(Fig. 6) of 7 components of monoterpene by population show noticeably higher percentages of ${\alpha}$-pinene irrespective of population and ${\beta}$-phellandrene in the next order. The minor Pinus densiflora monoterpene composition of camphene, myrcene, limonene and terpinolene made up less than 10 percent of the portion in general. The average coefficients of variation of ${\alpha}$-pinene and ${\beta}$-phellandrene were 11 percent. On the contrary to this, the average coefficients of variation of camphene, limonene and terpinolene varied from 20 to 30 percent. And the significant differences between populaiton were observed only in myrcene and ${\beta}$-phellandrene. (Table 25).
PDF

Search Result 413, Processing Time 0.02 seconds

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)