Search | Korea Science

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

Kim Yu-Seop;Chang Jeong-Ho
- The KIPS Transactions:PartB
- /
- v.11B no.6
- /
- pp.749-758
- /
- 2004
In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.
https://doi.org/10.3745/KIPSTB.2004.11B.6.749 인용 PDF KSCI

A 10b 250MS/s $1.8mm^2$ 85mW 0.13um CMOS ADC Based on High-Accuracy Integrated Capacitors (높은 정확도를 가진 집적 커페시터 기반의 10비트 250MS/s $1.8mm^2$ 85mW 0.13un CMOS A/D 변환기)

Sa, Doo-Hwan;Choi, Hee-Cheol;Kim, Young-Lok;Lee, Seung-Hoon
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.43 no.11 s.353
- /
- pp.58-68
- /
- 2006
This work proposes a 10b 250MS/s $1.8mm^2$ 85mW 0.13um CMOS A/D Converter (ADC) for high-performance integrated systems such as next-generation DTV and WLAN simultaneously requiring low voltage, low power, and small area at high speed. The proposed 3-stage pipeline ADC minimizes chip area and power dissipation at the target resolution and sampling rate. The input SHA maintains 10b resolution with either gate-bootstrapped sampling switches or nominal CMOS sampling switches. The SHA and two MDACs based on a conventional 2-stage amplifier employ optimized trans-conductance ratios of two amplifier stages to achieve the required DC gain, bandwidth, and phase margin. The proposed signal insensitive 3-D fully symmetric capacitor layout reduces the device mismatch of two MDACs. The low-noise on-chip current and voltage references can choose optional off-chip voltage references. The prototype ADC is implemented in a 0.13um 1P8M CMOS process. The measured DNL and INL are within 0.24LSB and 0.35LSB while the ADC shows a maximum SNDR of 54dB and 48dB and a maximum SFDR of 67dB and 61dB at 200MS/s and 250MS/s, respectively. The ADC with an active die area of $1.8mm^2$ consumes 85mW at 250MS/s at a 1.2V supply.
PDF KSCI

Dual Codec Based Joint Bit Rate Control Scheme for Terrestrial Stereoscopic 3DTV Broadcast (지상파 스테레오스코픽 3DTV 방송을 위한 이종 부호화기 기반 합동 비트율 제어 연구)

Chang, Yong-Jun;Kim, Mun-Churl
- Journal of Broadcast Engineering
- /
- v.16 no.2
- /
- pp.216-225
- /
- 2011
Following the proliferation of three-dimensional video contents and displays, many terrestrial broadcasting companies have been preparing for stereoscopic 3DTV service. In terrestrial stereoscopic broadcast, it is a difficult task to code and transmit two video sequences while sustaining as high quality as 2DTV broadcast due to the limited bandwidth defined by the existing digital TV standards such as ATSC. Thus, a terrestrial 3DTV broadcasting with a heterogeneous video codec system, where the left image and right images are based on MPEG-2 and H.264/AVC, respectively, is considered in order to achieve both high quality broadcasting service and compatibility for the existing 2DTV viewers. Without significant change in the current terrestrial broadcasting systems, we propose a joint rate control scheme for stereoscopic 3DTV service based on the heterogeneous dual codec systems. The proposed joint rate control scheme applies to the MPEG-2 encoder a quadratic rate-quantization model which is adopted in the H.264/AVC. Then the controller is designed for the sum of the left and right bitstreams to meet the bandwidth requirement of broadcasting standards while the sum of image distortions is minimized by adjusting quantization parameter obtained from the proposed optimization scheme. Besides, we consider a condition on maintaining quality difference between the left and right images around a desired level in the optimization in order to mitigate negative effects on human visual system. Experimental results demonstrate that the proposed bit rate control scheme outperforms the rate control method where each video coding standard uses its own bit rate control algorithm independently in terms of the increase in PSNR by 2.02%, the decrease in the average absolute quality difference by 77.6% and the reduction in the variance of the quality difference by 74.38%.
https://doi.org/10.5909/JEB.2011.16.2.216 인용 PDF KSCI

Effects of Polyols on Antimicrobial and Preservative Efficacy in Cosmetics (화학방부제 배합량 감소를 위한 폴리올류의 항균, 방부영향력 연구)

Shin, Kye-Ho;Kwack, Il-Young;Lee, Sung-Won;Suh, Kyung-Hee;Moon, Sung-Joon;Chang, Ih-Seop
- Journal of the Society of Cosmetic Scientists of Korea
- /
- v.33 no.2
- /
- pp.111-115
- /
- 2007
It is inevitable to use germicidal agents like parabens, imidazolidinyl urea, phenoxyethanol and chlorphenesin to preserve the cosmetics. Although effective in reducing microblological contamination, chemical preservatives are irritative, allergenic and even toxic to human skin. So it is needed to decrease or eliminate usage of preservatives in cosmetic products Glycerin, butylene glycol (BG), prorylene glycol (PG), and dipropylene glycol (DPG) are widely used in cosmetics as skin conditioning agent or solvents. At high concentrations, they have antimicrobial activities, but deteriorate product quality like sensory feeling or safety. The purpose of study is to evaluate the effects of polyols on antimicrobial and preservative efficacy and confirm whether using adjusted polyols can decrease the contents of preservatives without deterioration of the quality of cosmetics. Effects of common polyols on antimicrobial activities of general preservatives were measured. BG and PG significantly (p < 0.05) increased activities of preservatives, but glycerin influenced little. It was inferred from the regression analysis of the results with S. aureus that adding 1% of PG increased activities of preservatives up to $2.1{\sim}8.4 %$ and BG improved activities of preservatives up to $1.8{\sim}8.4 %$. The challenge test results for oil in water lotions and creams showed that BG and PG improved the efficacy of preservative systems up to 40 % at a range of $5.5{\sim}9.9 %$, but glycerin had little effect on it. The measured rates of improvement were analogous to the inferences from regression analysis. It can be concluded that is possible to reduce total chemical preservatives up to 40 %, consequently improve the safety and sensory quality of cosmetics with the precision control of polyols. Added to that, using this paradigm, low preservative contents, praraben-free system, and even preservative-free systems can be expected in the near future.
PDF KSCI

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

Kim, Eunmi;Hong, Taeho
- Journal of Intelligence and Information Systems
- /
- v.21 no.1
- /
- pp.29-45
- /
- 2015
Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.
https://doi.org/10.13088/jiis.2015.21.1.29 인용 PDF KSCI

Predicting link of R&D network to stimulate collaboration among education, industry, and research (산학연 협업 활성화를 위한 R&D 네트워크 연결 예측 연구)

Park, Mi-yeon;Lee, Sangheon;Jin, Guocheng;Shen, Hongme;Kim, Wooju
- Journal of Intelligence and Information Systems
- /
- v.21 no.3
- /
- pp.37-52
- /
- 2015
The recent global trends display expansion and growing solidity in both cooperative collaboration between industry, education, and research and R&D network systems. A greater support for the network and cooperative research sector would open greater possibilities for the evolution of new scholar and industrial fields and the development of new theories evoked from synergized educational research. Similarly, the national need for a strategy that can most efficiently and effectively support R&D network that are established through the government's R&D project research is on the rise. Despite the growing urgency, due to the habitual dependency on simple individual personal information data regarding R&D industry participants and generalized statistical data references, the policies concerning network system are disappointing and inadequate. Accordingly, analyses of the relationships involved for each subject who is participating in the R&D industry was conducted and on the foundation of an educational-industrial-research network system, possible changes within and of the network that may arise were predicted. To predict the R&D network transitions, Common Neighbor and Jaccard's Coefficient models were designated as the basic foundational models, upon which a new prediction model was proposed to address the limitations of the two aforementioned former models and to increase the accuracy of Link Prediction, with which a comparative analysis was made between the two models. Through the effective predictions regarding R&D network changes and transitions, such study result serves as a stepping-stone for an establishment of a prospective strategy that supports a desirable educational-industrial-research network and proposes a measure to promote the national policy to one that can effectively and efficiently sponsor integrated R&D industries. Though both weighted applications of Common Neighbor and Jaccard's Coefficient models provided positive outcomes, improved accuracy was comparatively more prevalent in the weighted Common Neighbor. An un-weighted Common Neighbor model predicted 650 out of 4,136 whereas a weighted Common Neighbor model predicted 50 more results at a total of 700 predictions. While the Jaccard's model demonstrated slight performance improvements in numeric terms, the differences were found to be insignificant.
https://doi.org/10.13088/jiis.2015.21.3.37 인용 PDF KSCI

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
- Journal of Intelligence and Information Systems
- /
- v.25 no.4
- /
- pp.105-122
- /
- 2019
Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.
https://doi.org/10.13088/jiis.2019.25.4.105 인용 PDF KSCI

A Study on the Cause Analysis and Countermeasures of the Traditional Market for Fires in the TRIZ Method (TRIZ 기법에 의한 재래시장 화재의 원인분석과 대책에 관한 연구)

Seo, Yong-Goo;Min, Se-Hong
- Fire Science and Engineering
- /
- v.31 no.4
- /
- pp.95-102
- /
- 2017
The fires in the traditional markets often occur recently with the most of them expanded into great fires so that the damage is very serious. The status of traditional markets handling the distribution for ordinary people is greatly shrunk with the aggressive marketing of the local large companies and the foreign large distribution companies after the overall opening of the local distribution market. Most of the traditional markets have the history and tradition from decades to centuries and have grown steadily with the joys and sorrows of ordinary people and the development of the local economy. The fire developing to the large fire has the characteristics of the problem that the fire possibility is high since all products can be flammable due to the deterioration of facilities, the arbitrary modification of equipment, and the crowding of the goods for sale. Furthermore, most of the stores are petty with their small sizes so that the passage is narrow affecting the passage of pedestrians. Accordingly, the traditional markets are vulnerable to fire due to the initial unplanned structural problem so that the large scale fire damage occurs. The study is concerned with systematically classifying and analyzing the result by applying the TRIZ tool to the fire risk factors to extract the fundamental problem with the fire of the traditional market and make the active response. The study was done for preventing the fire on the basis of it and the expansion to the large fire in case of fire to prepare the specific measure to minimize the fire damage. On the basis of the fire expansion risk factor of the derived traditional market, the study presented the passive measures such as the improvement of the fire resisting capacity, the fire safety island, etc. and the active and institutional measures such as the obligation of the fire breaking news facilities, the application of the extra-high pressure pump system, the divided use of the electric line, etc.
https://doi.org/10.7731/KIFSE.2017.31.4.095 인용 PDF KSCI HTML

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

Choi, Nam woong;Kim, Wooju
- Journal of Intelligence and Information Systems
- /
- v.25 no.3
- /
- pp.43-62
- /
- 2019
At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.
https://doi.org/10.13088/jiis.2019.25.3.043 인용 PDF KSCI

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

Kim, Museong;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.27 no.3
- /
- pp.175-197
- /
- 2021
Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.
https://doi.org/10.13088/jiis.2021.27.3.175 인용 PDF KSCI

Search Result 7,458, Processing Time 0.038 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)