Search | Korea Science

Text Classification Using Parallel Word-level and Character-level Embeddings in Convolutional Neural Networks

Geonu Kim;Jungyeon Jang;Juwon Lee;Kitae Kim;Woonyoung Yeo;Jong Woo Kim
- Asia pacific journal of information systems
- /
- v.29 no.4
- /
- pp.771-788
- /
- 2019
Deep learning techniques such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) show superior performance in text classification than traditional approaches such as Support Vector Machines (SVMs) and Naïve Bayesian approaches. When using CNNs for text classification tasks, word embedding or character embedding is a step to transform words or characters to fixed size vectors before feeding them into convolutional layers. In this paper, we propose a parallel word-level and character-level embedding approach in CNNs for text classification. The proposed approach can capture word-level and character-level patterns concurrently in CNNs. To show the usefulness of proposed approach, we perform experiments with two English and three Korean text datasets. The experimental results show that character-level embedding works better in Korean and word-level embedding performs well in English. Also the experimental results reveal that the proposed approach provides better performance than traditional CNNs with word-level embedding or character-level embedding in both Korean and English documents. From more detail investigation, we find that the proposed approach tends to perform better when there is relatively small amount of data comparing to the traditional embedding approaches.
https://doi.org/10.14329/apjis.2019.29.4.771 인용 PDF

An Effective Face Authentication Method for Resource - Constrained Devices (제한된 자원을 갖는 장치에서 효과적인 얼굴 인증 방법)

Lee Kyunghee;Byun Hyeran
- Journal of KIISE:Software and Applications
- /
- v.31 no.9
- /
- pp.1233-1245
- /
- 2004
Though biometrics to authenticate a person is a good tool in terms of security and convenience, typical authentication algorithms using biometrics may not be executed on resource-constrained devices such as smart cards. Thus, to execute biometric processing on resource-constrained devices, it is desirable to develop lightweight authentication algorithm that requires only small amount of memory and computation. Also, among biological features, face is one of the most acceptable biometrics, because humans use it in their visual interactions and acquiring face images is non-intrusive. We present a new face authentication algorithm in this paper. Our achievement is two-fold. One is to present a face authentication algorithm with low memory requirement, which uses support vector machines (SVM) with the feature set extracted by genetic algorithms (GA). The other contribution is to suggest a method to reduce further, if needed, the amount of memory required in the authentication at the expense of verification rate by changing a controllable system parameter for a feature set size. Given a pre-defined amount of memory, this capability is quite effective to mount our algorithm on memory-constrained devices. The experimental results on various databases show that our face authentication algorithm with SVM whose input vectors consist of discriminating features extracted by GA has much better performance than the algorithm without feature selection process by GA has, in terms of accuracy and memory requirement. Experiment also shows that the number of the feature ttl be selected is controllable by a system parameter.
PDF KSCI

Recognition of Superimposed Patterns with Selective Attention based on SVM (SVM기반의 선택적 주의집중을 이용한 중첩 패턴 인식)

Bae, Kyu-Chan;Park, Hyung-Min;Oh, Sang-Hoon;Choi, Youg-Sun;Lee, Soo-Young
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.42 no.5 s.305
- /
- pp.123-136
- /
- 2005
We propose a recognition system for superimposed patterns based on selective attention model and SVM which produces better performance than artificial neural network. The proposed selective attention model includes attention layer prior to SVM which affects SVM's input parameters. It also behaves as selective filter. The philosophy behind selective attention model is to find the stopping criteria to stop training and also defines the confidence measure of the selective attention's outcome. Support vector represents the other surrounding sample vectors. The support vector closest to the initial input vector in consideration is chosen. Minimal euclidean distance between the modified input vector based on selective attention and the chosen support vector defines the stopping criteria. It is difficult to define the confidence measure of selective attention if we apply common selective attention model, A new way of doffing the confidence measure can be set under the constraint that each modified input pixel does not cross over the boundary of original input pixel, thus the range of applicable information get increased. This method uses the following information; the Euclidean distance between an input pattern and modified pattern, the output of SVM, the support vector output of hidden neuron that is the closest to the initial input pattern. For the recognition experiment, 45 different combinations of USPS digit data are used. Better recognition performance is seen when selective attention is applied along with SVM than SVM only. Also, the proposed selective attention shows better performance than common selective attention.
PDF KSCI

DISTRACTION OSTEOGENESIS OF THE MIDFACE WITH A RIGID EXTERNAL DISTRACTOR (RED) (강성 외장형 신장기(Rigid External Distractor)를 이용한 중안면부의 골신장술)

Oh , Jung-Hwan;Alexander, Kuebler.;Zoeller, Joachim E.
- Journal of the Korean Association of Oral and Maxillofacial Surgeons
- /
- v.28 no.2
- /
- pp.161-164
- /
- 2002
In recent, distraction osteogenesis has been used to correct skeletal malformations and discrepancies in the craniofacial area. It also seems to be considered as an alternative in the treatment of severe midfacial hypoplasia. There are some types of distractors for midfacial distraction such as subcutaneous distractors and rigid external distractors. We used a rigid external distractor for correction (RED) of craniofacial hypoplasia. Seven patients underwent a midfacial distraction osteogenesis with a rigid external distractor between April 2000 and July 2001. Three patients suffered from Apert's syndrome, three patients from Crouzon's syndrome, and one patient suffered from midfacial hypoplasia due to midfacial radiotheraphy during childhood. On average, the mean distance of distraction was 19.8mm ($10{\sim}25mm$) and the distraction lasted for 24 days. The patients showed no severe complications like infections, optic disturbance, or wrong distraction vectors. One patient complained pain on the site of the occipital fixation of the distractor. In one patient who underwent subtotal craniectomy 3 months before Le Fort III distraction, the distractor was dislocated as the cranial bone was too weak to support the distractor. This report reveals that the application of rigid external distractor and transfacial pull results in an exact control of the distraction vectors and an excellent correction of midfacial hypoplasia without any severe complications.
PDF KSCI

Leision Detection in Chest X-ray Images based on Coreset of Patch Feature (패치 특징 코어세트 기반의 흉부 X-Ray 영상에서의 병변 유무 감지)

Kim, Hyun-bin;Chun, Jun-Chul
- Journal of Internet Computing and Services
- /
- v.23 no.3
- /
- pp.35-45
- /
- 2022
Even in recent years, treatment of first-aid patients is still often delayed due to a shortage of medical resources in marginalized areas. Research on automating the analysis of medical data to solve the problems of inaccessibility for medical services and shortage of medical personnel is ongoing. Computer vision-based medical inspection automation requires a lot of cost in data collection and labeling for training purposes. These problems stand out in the works of classifying lesion that are rare, or pathological features and pathogenesis that are difficult to clearly define visually. Anomaly detection is attracting as a method that can significantly reduce the cost of data collection by adopting an unsupervised learning strategy. In this paper, we propose methods for detecting abnormal images on chest X-RAY images as follows based on existing anomaly detection techniques. (1) Normalize the brightness range of medical images resampled as optimal resolution. (2) Some feature vectors with high representative power are selected in set of patch features extracted as intermediate-level from lesion-free images. (3) Measure the difference from the feature vectors of lesion-free data selected based on the nearest neighbor search algorithm. The proposed system can simultaneously perform anomaly classification and localization for each image. In this paper, the anomaly detection performance of the proposed system for chest X-RAY images of PA projection is measured and presented by detailed conditions. We demonstrate effect of anomaly detection for medical images by showing 0.705 classification AUROC for random subset extracted from the PadChest dataset. The proposed system can be usefully used to improve the clinical diagnosis workflow of medical institutions, and can effectively support early diagnosis in medically poor area.
https://doi.org/10.7472/jksii.2022.23.3.35 인용 PDF KSCI HTML

Statistical Techniques to Detect Sensor Drifts (센서드리프트 판별을 위한 통계적 탐지기술 고찰)

Seo, In-Yong;Shin, Ho-Cheol;Park, Moon-Ghu;Kim, Seong-Jun
- Journal of the Korea Society for Simulation
- /
- v.18 no.3
- /
- pp.103-112
- /
- 2009
In a nuclear power plant (NPP), periodic sensor calibrations are required to assure sensors are operating correctly. However, only a few faulty sensors are found to be calibrated. For the safe operation of an NPP and the reduction of unnecessary calibration, on-line calibration monitoring is needed. In this paper, principal component-based Auto-Associative support vector regression (PCSVR) was proposed for the sensor signal validation of the NPP. It utilizes the attractive merits of principal component analysis (PCA) for extracting predominant feature vectors and AASVR because it easily represents complicated processes that are difficult to model with analytical and mechanistic models. With the use of real plant startup data from the Kori Nuclear Power Plant Unit 3, SVR hyperparameters were optimized by the response surface methodology (RSM). Moreover the statistical techniques are integrated with PCSVR for the failure detection. The residuals between the estimated signals and the measured signals are tested by the Shewhart Control Chart, Exponentially Weighted Moving Average (EWMA), Cumulative Sum (CUSUM) and generalized likelihood ratio test (GLRT) to detect whether the sensors are failed or not. This study shows the GLRT can be a candidate for the detection of sensor drift.
https://doi.org/10.9709/JKSS.2009.18.3.103 인용 PDF

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

Park, Jongin;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.25 no.3
- /
- pp.19-41
- /
- 2019
According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.
https://doi.org/10.13088/jiis.2019.25.3.019 인용 PDF KSCI

Frame Rate Conversion Algorithm Using Adaptive Search-based Motion Estimation (적응적 탐색기반 움직임 추정을 사용한 프레임 율 변환 알고리즘)

Kim, Young-Duk;Chang, Joon-Young;Kang, Moon-Gi
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.46 no.3
- /
- pp.18-27
- /
- 2009
In this paper, we propose a frame rate conversion algorithm using adaptive search-based motion estimation (ME). The proposed ME method uses recursive search, 3-step search, and single predicted search as candidates for search strategy. The best method among the three candidates is adaptively selected on a block basis according to the predicted motion type. The adaptation of the search method improves the accuracy of the estimated motion vectors while curbing the increase of computational load. To support the proposed ME method, an entire image is divided into three regions with different motion types. Experimental results show that the proposed FRC method achieves better image quality than existing algorithms in both subjective and objective measures.
PDF KSCI

A Study on Continuous Management Strategy or Published Coordinates of National Geodetic Control Points using GPS Network Adjustment (GPS 측지망 조정을 통한 국가기준점 성과의 상시 산정 체계에 관한 연구)

Jung, Kwang-Ho;Lee, Hung-Kyu
- Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
- /
- v.29 no.4
- /
- pp.367-380
- /
- 2011
This paper has focused on deriving a GPS based geodetic network adjustment strategy to continuously determine coordinate sets of the national geodetic control points. After domestic literature review on the topic and overseas case studies about countries that recently reformed their geodetic infrastructure have been carried out, a simplified geodetic network consisting of two layers, namely GPS active and passive network, has been proposed to maximize effectiveness of the network adjustment through reducing the number of the passive points. Furthermore, a GPS data processing and network adjustment procedure has been derived to support the continuous management scheme. While a scheme for the active layer adopts a sequential least squares adjustment based on a multi-baseline, that of the passive layer employs a multi-session adjustment technique with respect to 3-dimensional baseline vectors. Finally, experimental adjustment against a network comprising 24 active and 6,900 passive stations has been performed to demonstrate the efficiency and the effectiveness of the proposed method.
https://doi.org/10.7848/ksgpc.2011.29.4.367 인용 PDF KSCI

The extension of the largest generalized-eigenvalue based distance metric D_ij(γ₁) in arbitrary feature spaces to classify composite data points

Daoud, Mosaab
- Genomics & Informatics
- /
- v.17 no.4
- /
- pp.39.1-39.20
- /
- 2019
Analyzing patterns in data points embedded in linear and non-linear feature spaces is considered as one of the common research problems among different research areas, for example: data mining, machine learning, pattern recognition, and multivariate analysis. In this paper, data points are heterogeneous sets of biosequences (composite data points). A composite data point is a set of ordinary data points (e.g., set of feature vectors). We theoretically extend the derivation of the largest generalized eigenvalue-based distance metric D_ij(γ₁) in any linear and non-linear feature spaces. We prove that D_ij(γ₁) is a metric under any linear and non-linear feature transformation function. We show the sufficiency and efficiency of using the decision rule $\bar{{\delta}}_{{\Xi}i}$(i.e., mean of D_ij(γ₁)) in classification of heterogeneous sets of biosequences compared with the decision rules min_𝚵iand median_𝚵i. We analyze the impact of linear and non-linear transformation functions on classifying/clustering collections of heterogeneous sets of biosequences. The impact of the length of a sequence in a heterogeneous sequence-set generated by simulation on the classification and clustering results in linear and non-linear feature spaces is empirically shown in this paper. We propose a new concept: the limiting dispersion map of the existing clusters in heterogeneous sets of biosequences embedded in linear and nonlinear feature spaces, which is based on the limiting distribution of nucleotide compositions estimated from real data sets. Finally, the empirical conclusions and the scientific evidences are deduced from the experiments to support the theoretical side stated in this paper.
https://doi.org/10.5808/GI.2019.17.4.e39 인용 PDF KSCI

Search Result 169, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)