Search | Korea Science

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
- Journal of Intelligence and Information Systems
- /
- v.25 no.4
- /
- pp.105-122
- /
- 2019
Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.
https://doi.org/10.13088/jiis.2019.25.4.105 인용 PDF KSCI

Development and Validation of Korean Composit Burn Index(KCBI) (한국형 산불피해강도지수(KCBI)의 개발 및 검증)

Lee, Hyunjoo;Lee, Joo-Mee;Won, Myoung-Soo;Lee, Sang-Woo
- Journal of Korean Society of Forest Science
- /
- v.101 no.1
- /
- pp.163-174
- /
- 2012
CBI(Composite Burn Index) developed by USDA Forest Service is a index to measure burn severity based on remote sensing. In Korea, the CBI has been used to investigate the burn severity of fire sites for the last few years. However, it has been an argument on that CBI is not adequate to capture unique characteristics of Korean forests, and there has been a demand to develop KCBI(Korean Composite Burn Index). In this regard, this study aimed to develop KCBI by adjusting the CBI and to validate its applicability by using remote sensing technique. Uljin and Youngduk, two large fire sites burned in 2011, were selected as study areas, and forty-four sampling plots were assigned in each study area for field survey. Burn severity(BS) of the study areas were estimated by analyzing NDVI from SPOT images taken one month later of the fires. Applicability of KCBI was validated with correlation analysis between KCBI index values and NDVI values and their confusion matrix. The result showed that KCBI index values and NDVI values were closely correlated in both Uljin (r = -0.54 and p<0.01) and Youngduk (r = -0.61 and p<0.01). Thus this result supported that proposed KCBI is adequate index to measure burn severity of fire sites in Korea. There was a number of limitations, such as the low correlation coefficients between BS and KCBI and skewed distribution of KCBI sampling plots toward High and Extreme classes. Despite of these limitations, the proposed KCBI showed high potentials for estimating burn severity of fire sites in Korea, and could be improved by considering the limitations in further studies.
KSCI

Integrated Rotary Genetic Analysis Microsystem for Influenza A Virus Detection

Jung, Jae Hwan;Park, Byung Hyun;Choi, Seok Jin;Seo, Tae Seok
- Proceedings of the Korean Vacuum Society Conference
- /
- 2013.08a
- /
- pp.88-89
- /
- 2013
A variety of influenza A viruses from animal hosts are continuously prevalent throughout the world which cause human epidemics resulting millions of human infections and enormous industrial and economic damages. Thus, early diagnosis of such pathogen is of paramount importance for biomedical examination and public healthcare screening. To approach this issue, here we propose a fully integrated Rotary genetic analysis system, called Rotary Genetic Analyzer, for on-site detection of influenza A viruses with high speed. The Rotary Genetic Analyzer is made up of four parts including a disposable microchip, a servo motor for precise and high rate spinning of the chip, thermal blocks for temperature control, and a miniaturized optical fluorescence detector as shown Fig. 1. A thermal block made from duralumin is integrated with a film heater at the bottom and a resistance temperature detector (RTD) in the middle. For the efficient performance of RT-PCR, three thermal blocks are placed on the Rotary stage and the temperature of each block is corresponded to the thermal cycling, namely $95^{\circ}C$ (denature), $58^{\circ}C$ (annealing), and $72^{\circ}C$ (extension). Rotary RT-PCR was performed to amplify the target gene which was monitored by an optical fluorescent detector above the extension block. A disposable microdevice (10 cm diameter) consists of a solid-phase extraction based sample pretreatment unit, bead chamber, and 4 ${\mu}L$ of the PCR chamber as shown Fig. 2. The microchip is fabricated using a patterned polycarbonate (PC) sheet with 1 mm thickness and a PC film with 130 ${\mu}m$ thickness, which layers are thermally bonded at $138^{\circ}C$ using acetone vapour. Silicatreated microglass beads with 150~212 ${\mu}L$ diameter are introduced into the sample pretreatment chambers and held in place by weir structure for construction of solid-phase extraction system. Fig. 3 shows strobed images of sequential loading of three samples. Three samples were loaded into the reservoir simultaneously (Fig. 3A), then the influenza A H3N2 viral RNA sample was loaded at 5000 RPM for 10 sec (Fig. 3B). Washing buffer was followed at 5000 RPM for 5 min (Fig. 3C), and angular frequency was decreased to 100 RPM for siphon priming of PCR cocktail to the channel as shown in Figure 3D. Finally the PCR cocktail was loaded to the bead chamber at 2000 RPM for 10 sec, and then RPM was increased up to 5000 RPM for 1 min to obtain the as much as PCR cocktail containing the RNA template (Fig. 3E). In this system, the wastes from RNA samples and washing buffer were transported to the waste chamber, which is fully filled to the chamber with precise optimization. Then, the PCR cocktail was able to transport to the PCR chamber. Fig. 3F shows the final image of the sample pretreatment. PCR cocktail containing RNA template is successfully isolated from waste. To detect the influenza A H3N2 virus, the purified RNA with PCR cocktail in the PCR chamber was amplified by using performed the RNA capture on the proposed microdevice. The fluorescence images were described in Figure 4A at the 0, 40 cycles. The fluorescence signal (40 cycle) was drastically increased confirming the influenza A H3N2 virus. The real-time profiles were successfully obtained using the optical fluorescence detector as shown in Figure 4B. The Rotary PCR and off-chip PCR were compared with same amount of influenza A H3N2 virus. The Ct value of Rotary PCR was smaller than the off-chip PCR without contamination. The whole process of the sample pretreatment and RT-PCR could be accomplished in 30 min on the fully integrated Rotary Genetic Analyzer system. We have demonstrated a fully integrated and portable Rotary Genetic Analyzer for detection of the gene expression of influenza A virus, which has 'Sample-in-answer-out' capability including sample pretreatment, rotary amplification, and optical detection. Target gene amplification was real-time monitored using the integrated Rotary Genetic Analyzer system.
PDF

Swelling and Mechanical Property Change of Shale and Sandstone in Supercritical CO₂ (초임계 CO₂에 의한 셰일 및 사암의 물성변화 및 스웰링에 관한 연구)

Choi, Chae-Soon;Song, Jae-Joon
- Tunnel and Underground Space
- /
- v.22 no.4
- /
- pp.266-275
- /
- 2012
In this study, a method is devised to implement a supercritical $CO_2$ ($scCO_2$) injection environment on a laboratory scale and to investigate the effects of $scCO_2$ on the properties of rock specimens. Specimens of shale and sandstone normally constituting the cap rock and reservoir rock, respectively, were kept in a laboratory reactor chamber with $scCO_2$ for two weeks. From this stage, a chemical reaction between rock surface and the $scCO_2$ was induced. The effect of saline water was also investigated by comparing three conditions ($scCO_2$-rock, $scCO_2-H_2O$-rock and $scCO_2$-brine(1M)-rock). Finally, we checked the changes in the properties before and after the reaction by destructive and nondestructive testing procedures. The swelling of shale was a main concern in this case. The experimental results suggested that $scCO_2$ has a greater effect on the swelling of the shale than pure water and brine. It was also observed that the largest swelling displacement of shale occurred after a reaction with the $H_2O-scCO_2$ solution. The results of a series of the destructive and nondestructive tests indicate that although each of the property changes of the rock differed depending on the reaction conditions, the $H_2O-scCO_2$ solution had the greatest effect. In this study, shale was highly sensitive to the reaction conditions. These results provide fundamental information pertaining to the stability of $CO_2$ storage sites due to physical and chemical reactions between the rocks in these sites and $scCO_2$.
https://doi.org/10.7474/TUS.2012.22.4.266 인용 PDF KSCI

Convenient Nucleic Acid Detection for Tomato spotted wilt virus: Virion Captured/RT-PCR (VC/RT-PCR) (Tomato spotted wilt virus를 위한 간편한 식물바이러스 핵산진단법: Virion Captured/RT-PCR (VC/RT-PCR))

Cho Jeom-Deog;Kim Jeong-Soo;Kim Hyun-Ran;Chung Bong-Nam;Ryu Ki-Hyun
- Research in Plant Disease
- /
- v.12 no.2
- /
- pp.139-143
- /
- 2006
Virion captured reverse transcription polymerase chain reaction (VC/RT-PCR) could detect plant virus quickly and accurately. In the VC/RT-PCR, no antibody is needed unlike immuno-captured RT-PCR (IC/RT-PCR) which had been improved method of RT-PCR for plant viruses, and virus nucleic acids can be obtained easily within 30minutes by property of polypropylene PCR tube which is hold and immobilized viral particles on its surface. For the virion capture of Tomato spotted wilt virus (TSWV), the extraction buffer was tested. The optimum macerating buffer for TSWV was 0.01M potassium phosphate buffer, pH 7.0, containing 0.5% sodium sulfite. The viral crude sap was incubated for 30 min at $4^{\circ}C$. The virions in the PCR tubes were washed two times with 0.01M PBS containing 0.05% Tween-20. The washed virions were treated at $95^{\circ}C$ immediately for 1 min containing RNase free water and chilled quickly in the ice. Disclosed virions' RNAs by heat treatment were used for RT-PCR. Dilution end point of $10^{-5}$ from plant's crude sap infected with TSWV showed relatively higher detection sensitivity for VC/RT-PCR. During multiple detection using two or more primers, interference was arisen by interactions between primer-primer and plant species. The result of multiplex RT-PCR was influenced by combinations of primers and the kind of plant, and the optimum extraction buffer for the multiplex detection by VC/RT-PCR should be developed.
https://doi.org/10.5423/RPD.2006.12.2.139 인용 PDF KSCI

Development of a Simultaneous Analysis Method for DDT (DDD & DDE) in Ginseng (인삼 중 DDT(DDD 및 DDE) 분석법의 개발)

Kim, Sung-Dan;Cho, Tae-Hee;Han, Eun-Jung;Park, Seoung-Gyu;Han, Chang-Ho;Jo, Han-Bin;Choi, Byung-Hyun
- Korean Journal of Food Science and Technology
- /
- v.40 no.2
- /
- pp.123-128
- /
- 2008
The MRLs (maximum residue limits) of DDT (DDD and DDE) in fresh ginseng, dried ginseng, and steamed red ginseng are set as low as 0.01 mg/kg, 0.05 mg/kg, and 0.05 mg/kg, respectively. Therefore, this study was undertaken to develop a simple and highly sensitive analysis method, as well as to reduce interfering ginseng matrix peaks, for the determination of DDT isomers (o,p'-DDE, p,p'-DDE, o,p'-DDD, p,p'-DDD, o,p'-DDT, and p,p'-DDT) in fresh ginseng, dried ginseng, and steamed red ginseng at the 0.01 mg/kg level. The method used acetonitrile extraction according to simultaneous analysis, followed by normal-phase Florisil solid-phase extraction column clean-up. The purification method entailed the following steps: (1) dissolve the concentrated sample extract in 7 mL hexane; (2) add 3 mL of $H_2SO_4$; (3) vigorously shake on avortex mixer; (4) cetrifuge at 2000 rpm for 5 min; (5) transfer 3.5 mL of the supernatant to the Florisil-SPE (500 mg/6 mL);and (6) elute the SPE column with 1.5 mL of hexane and 10 mL of ether/hexane (6:94). The determination of DDT isomers was carried out by a gas chromatography-electron capture detector (GC-${\mu}$ECD). The hexane and ether/hexane (6:94) eluate significantly removed chromatographic interferences, and the addition of 30% $H_2SO_4$ to the acetonitrile extract effectively reduced many interfering ginseng matrix peaks, to allow for the determination of the DDT isomers at the 0.01 mg/kg level. The recoveries of the 6 fortified (most at 0.01 mg/kg) DDT isomers from fresh ginseng, dried ginseng, and steamed red ginseng ranged from 87.9 to 99.6%. The MDLs (method detection limits) ranged from 0.003 to 0.009 mg/kg. Finally, the application of this method for the determination of DDT isomers is sensitive, rapid, simple, and inexpensive.
PDF KSCI

A Semantic Classification Model for e-Catalogs (전자 카탈로그를 위한 의미적 분류 모형)

Kim Dongkyu;Lee Sang-goo;Chun Jonghoon;Choi Dong-Hoon
- Journal of KIISE:Databases
- /
- v.33 no.1
- /
- pp.102-116
- /
- 2006
Electronic catalogs (or e-catalogs) hold information about the goods and services offered or requested by the participants, and consequently, form the basis of an e-commerce transaction. Catalog management is complicated by a number of factors and product classification is at the core of these issues. Classification hierarchy is used for spend analysis, custom3 regulation, and product identification. Classification is the foundation on which product databases are designed, and plays a central role in almost all aspects of management and use of product information. However, product classification has received little formal treatment in terms of underlying model, operations, and semantics. We believe that the lack of a logical model for classification Introduces a number of problems not only for the classification itself but also for the product database in general. It needs to meet diverse user views to support efficient and convenient use of product information. It needs to be changed and evolved very often without breaking consistency in the cases of introduction of new products, extinction of existing products, class reorganization, and class specialization. It also needs to be merged and mapped with other classification schemes without information loss when B2B transactions occur. For these requirements, a classification scheme should be so dynamic that it takes in them within right time and cost. The existing classification schemes widely used today such as UNSPSC and eClass, however, have a lot of limitations to meet these requirements for dynamic features of classification. In this paper, we try to understand what it means to classify products and present how best to represent classification schemes so as to capture the semantics behind the classifications and facilitate mappings between them. Product information implies a plenty of semantics such as class attributes like material, time, place, etc., and integrity constraints. In this paper, we analyze the dynamic features of product databases and the limitation of existing code based classification schemes. And describe the semantic classification model, which satisfies the requirements for dynamic features oi product databases. It provides a means to explicitly and formally express more semantics for product classes and organizes class relationships into a graph. We believe the model proposed in this paper satisfies the requirements and challenges that have been raised by previous works.
PDF KSCI

The Influence of Daily Social Interaction and Physical Activity on Daily Happiness of Korean Urban Older Adults (도시노인의 사회적 교류, 신체활동과 일상적 행복감의 관련성: 개인특성의 맥락효과를 고려하여)

Han, Gyounghae;Choi, Heejin
- 한국노년학
- /
- v.38 no.4
- /
- pp.1083-1105
- /
- 2018
The present study sought to capture day-to-day fluctuation of the daily happiness among Korean urban older adults and to examine whether the within person fluctuation of daily happiness is explained by the social and physical activities the older adults experience each day. We also examined whether the within person association between daily social, physical activities and the daily happiness varies by individual characteristics(i.e. gender, age, educational level and health). In addition, we explored the relationships between the level and fluctuation of daily happiness and the level of global happiness. The data was collected by multi-method approach, which includes general survey, daily diary method and collection of physical activity data through the activity monitors. In total, 175 urban older adults participated for seven days of daily diary survey. The data about the number of steps and the time spent on sedentary activities, light intensity physical activities and moderate to vigorous intensity physical activities were also collected during the same period from 16 sub-samples using activity monitors. Hierarchical linear modeling was applied for the analysis. The results were as below. First, the level of happiness of older adults fluctuated during a week, and the patterns of fluctuation varied by the gender and the health. Second, socializing with their children and friends elevated their levels of happiness. Also the impact of contacts with siblings on the level of daily happiness was greater for the unhealthy group compare to the healthy group. Third, older adults were happier on the days when they walked more, but the level of daily happiness decreased on the days when they spent longer time for low intensity physical activities. Lastly, the higher level of daily happiness were related to the higher level of global happiness, but the degree of fluctuation of daily happiness was not related to the level of global happiness. The implications of these results and suggestions for future research are discussed.
https://doi.org/10.31888/JKGS.2018.38.4.1083 인용

A Study on Process Model for Systematic Management of Archival Objects (행정박물의 체계적 관리를 위한 프로세스 구축방안)

Lee, Ye-Kyoung;Kim, Keum-Ei;Lee, Jin-Hee
- The Korean Journal of Archival Studies
- /
- no.17
- /
- pp.157-202
- /
- 2008
Archival Objects are defined as objects having historical, aesthetic, and artistic value as well as archival value created and used with a particular purpose in business process. Increasingly, many countries including Canada, Australia, China are recognized the importance of Archival Objects and designated them as national records. In Korea, Archival Objects are involved in national records through '2006 Plan for the Archives and Records Management Reform'. So National Archives and Records Service provided a foothold for comprehensive plan of national records management including Archival Objects. And also, by revising Records and Archives Management Act in 2007, National Archives and Records Service declared aggressive will to management Archival Objects. Until now, Objects held in public institution were easy to be damaged because definition or scope of Archival Objects was ambiguous and management system for material character wasn't exist. Even though the revised Records and Archives Management Act suggest definition and declare the responsibility of management, management system focused on various shape and material of objects need to be established. So this study has defined Archival Objects shortly and carried out a research 5 institutions on the actual management condition. By researching the result of institution survey, Records and Archives Management Act and actual Records Management System, we could find some problems. In solving these problems, We provide objects management process in the order capture ${\rightarrow}$ register ${\rightarrow}$ description ${\rightarrow}$ preservation ${\rightarrow}$ use${\rightarrow}$disposition. In addition, close cooperation between records center and museum of institution should be established for the unitive management at national level. This study has significance in introducing a base to manage Archival Objects systematically. By studying more, we hope to advance in management of valuable Archival Objects.
https://doi.org/10.20923/kjas.2008.17.157 인용 PDF

Real-time CRM Strategy of Big Data and Smart Offering System: KB Kookmin Card Case (KB국민카드의 빅데이터를 활용한 실시간 CRM 전략: 스마트 오퍼링 시스템)

Choi, Jaewon;Sohn, Bongjin;Lim, Hyuna
- Journal of Intelligence and Information Systems
- /
- v.25 no.2
- /
- pp.1-23
- /
- 2019
Big data refers to data that is difficult to store, manage, and analyze by existing software. As the lifestyle changes of consumers increase the size and types of needs that consumers desire, they are investing a lot of time and money to understand the needs of consumers. Companies in various industries utilize Big Data to improve their products and services to meet their needs, analyze unstructured data, and respond to real-time responses to products and services. The financial industry operates a decision support system that uses financial data to develop financial products and manage customer risks. The use of big data by financial institutions can effectively create added value of the value chain, and it is possible to develop a more advanced customer relationship management strategy. Financial institutions can utilize the purchase data and unstructured data generated by the credit card, and it becomes possible to confirm and satisfy the customer's desire. CRM has a granular process that can be measured in real time as it grows with information knowledge systems. With the development of information service and CRM, the platform has change and it has become possible to meet consumer needs in various environments. Recently, as the needs of consumers have diversified, more companies are providing systematic marketing services using data mining and advanced CRM (Customer Relationship Management) techniques. KB Kookmin Card, which started as a credit card business in 1980, introduced early stabilization of processes and computer systems, and actively participated in introducing new technologies and systems. In 2011, the bank and credit card companies separated, leading the 'Hye-dam Card' and 'One Card' markets, which were deviated from the existing concept. In 2017, the total use of domestic credit cards and check cards grew by 5.6% year-on-year to 886 trillion won. In 2018, we received a long-term rating of AA + as a result of our credit card evaluation. We confirmed that our credit rating was at the top of the list through effective marketing strategies and services. At present, Kookmin Card emphasizes strategies to meet the individual needs of customers and to maximize the lifetime value of consumers by utilizing payment data of customers. KB Kookmin Card combines internal and external big data and conducts marketing in real time or builds a system for monitoring. KB Kookmin Card has built a marketing system that detects realtime behavior using big data such as visiting the homepage and purchasing history by using the customer card information. It is designed to enable customers to capture action events in real time and execute marketing by utilizing the stores, locations, amounts, usage pattern, etc. of the card transactions. We have created more than 280 different scenarios based on the customer's life cycle and are conducting marketing plans to accommodate various customer groups in real time. We operate a smart offering system, which is a highly efficient marketing management system that detects customers' card usage, customer behavior, and location information in real time, and provides further refinement services by combining with various apps. This study aims to identify the traditional CRM to the current CRM strategy through the process of changing the CRM strategy. Finally, I will confirm the current CRM strategy through KB Kookmin card's big data utilization strategy and marketing activities and propose a marketing plan for KB Kookmin card's future CRM strategy. KB Kookmin Card should invest in securing ICT technology and human resources, which are becoming more sophisticated for the success and continuous growth of smart offering system. It is necessary to establish a strategy for securing profit from a long-term perspective and systematically proceed. Especially, in the current situation where privacy violation and personal information leakage issues are being addressed, efforts should be made to induce customers' recognition of marketing using customer information and to form corporate image emphasizing security.
https://doi.org/10.13088/jiis.2019.25.2.001 인용 PDF KSCI HTML

Search Result 4,187, Processing Time 0.042 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)