• Title/Summary/Keyword: 시스템 알고리즘

Search Result 15,538, Processing Time 0.049 seconds

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

  • Kim Yu-Seop;Chang Jeong-Ho
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.749-758
    • /
    • 2004
  • In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.

A development of DS/CDMA MODEM architecture and its implementation (DS/CDMA 모뎀 구조와 ASIC Chip Set 개발)

  • 김제우;박종현;김석중;심복태;이홍직
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.6
    • /
    • pp.1210-1230
    • /
    • 1997
  • In this paper, we suggest an architecture of DS/CDMA tranceiver composed of one pilot channel used as reference and multiple traffic channels. The pilot channel-an unmodulated PN code-is used as the reference signal for synchronization of PN code and data demondulation. The coherent demodulation architecture is also exploited for the reverse link as well as for the forward link. Here are the characteristics of the suggested DS/CDMA system. First, we suggest an interlaced quadrature spreading(IQS) method. In this method, the PN coe for I-phase 1st channel is used for Q-phase 2nd channels and the PN code for Q-phase 1st channel is used for I-phase 2nd channel, and so on-which is quite different from the eisting spreading schemes of DS/CDMA systems, such as IS-95 digital CDMA cellular or W-CDMA for PCS. By doing IQS spreading, we can drastically reduce the zero crossing rate of the RF signals. Second, we introduce an adaptive threshold setting for the synchronization of PN code, an initial acquistion method that uses a single PN code generator and reduces the acquistion time by a half compared the existing ones, and exploit the state machines to reduce the reacquistion time Third, various kinds of functions, such as automatic frequency control(AFC), automatic level control(ALC), bit-error-rate(BER) estimator, and spectral shaping for reducing the adjacent channel interference, are introduced to improve the system performance. Fourth, we designed and implemented the DS/CDMA MODEM to be used for variable transmission rate applications-from 16Kbps to 1.024Mbps. We developed and confirmed the DS/CDMA MODEM architecture through mathematical analysis and various kind of simulations. The ASIC design was done using VHDL coding and synthesis. To cope with several different kinds of applications, we developed transmitter and receiver ASICs separately. While a single transmitter or receiver ASC contains three channels (one for the pilot and the others for the traffic channels), by combining several transmitter ASICs, we can expand the number of channels up to 64. The ASICs are now under use for implementing a line-of-sight (LOS) radio equipment.

  • PDF

Target Advertisement Service using a Viewer's Profile Reasoning (시청자 프로파일 추론 기법을 이용한 표적 광고 서비스)

  • Kim Munjo;Im Jeongyeon;Kang Sanggil;Kim Munchrul;Kang Kyungok
    • Journal of Broadcast Engineering
    • /
    • v.10 no.1 s.26
    • /
    • pp.43-56
    • /
    • 2005
  • In the existing broadcasting environment, it is not easy to serve the bi-directional service between a broadcasting server and a TV audience. In the uni-directional broadcasting environments, almost TV programs are scheduled depending on the viewers' popular watching time, and the advertisement contents in these TV programs are mainly arranged by the popularity and the ages of the audience. The audiences make an effort to sort and select their favorite programs. However, the advertisement programs which support the TV program the audience want are not served to the appropriate audiences efficiently. This randomly provided advertisement contents can occur to the audiences' indifference and avoidance. In this paper, we propose the target advertisement service for the appropriate distribution of the advertisement contents. The proposed target advertisement service estimates the audience's profile without any issuing the private information and provides the target-advertised contents by using his/her estimated profile. For the experimental results, we used the real audiences' TV usage history such as the ages, fonder and time of the programs from AC Neilson Korea. And we show the accuracy of the proposed target advertisement service algorithm. NDS (Normalized Distance Sum) and the Vector correlation method, and implementation of our target advertisement service system.

Ontology-based User Customized Search Service Considering User Intention (온톨로지 기반의 사용자 의도를 고려한 맞춤형 검색 서비스)

  • Kim, Sukyoung;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.129-143
    • /
    • 2012
  • Recently, the rapid progress of a number of standardized web technologies and the proliferation of web users in the world bring an explosive increase of producing and consuming information documents on the web. In addition, most companies have produced, shared, and managed a huge number of information documents that are needed to perform their businesses. They also have discretionally raked, stored and managed a number of web documents published on the web for their business. Along with this increase of information documents that should be managed in the companies, the need of a solution to locate information documents more accurately among a huge number of information sources have increased. In order to satisfy the need of accurate search, the market size of search engine solution market is becoming increasingly expended. The most important functionality among much functionality provided by search engine is to locate accurate information documents from a huge information sources. The major metric to evaluate the accuracy of search engine is relevance that consists of two measures, precision and recall. Precision is thought of as a measure of exactness, that is, what percentage of information considered as true answer are actually such, whereas recall is a measure of completeness, that is, what percentage of true answer are retrieved as such. These two measures can be used differently according to the applied domain. If we need to exhaustively search information such as patent documents and research papers, it is better to increase the recall. On the other hand, when the amount of information is small scale, it is better to increase precision. Most of existing web search engines typically uses a keyword search method that returns web documents including keywords which correspond to search words entered by a user. This method has a virtue of locating all web documents quickly, even though many search words are inputted. However, this method has a fundamental imitation of not considering search intention of a user, thereby retrieving irrelevant results as well as relevant ones. Thus, it takes additional time and effort to set relevant ones out from all results returned by a search engine. That is, keyword search method can increase recall, while it is difficult to locate web documents which a user actually want to find because it does not provide a means of understanding the intention of a user and reflecting it to a progress of searching information. Thus, this research suggests a new method of combining ontology-based search solution with core search functionalities provided by existing search engine solutions. The method enables a search engine to provide optimal search results by inferenceing the search intention of a user. To that end, we build an ontology which contains concepts and relationships among them in a specific domain. The ontology is used to inference synonyms of a set of search keywords inputted by a user, thereby making the search intention of the user reflected into the progress of searching information more actively compared to existing search engines. Based on the proposed method we implement a prototype search system and test the system in the patent domain where we experiment on searching relevant documents associated with a patent. The experiment shows that our system increases the both recall and precision in accuracy and augments the search productivity by using improved user interface that enables a user to interact with our search system effectively. In the future research, we will study a means of validating the better performance of our prototype system by comparing other search engine solution and will extend the applied domain into other domains for searching information such as portal.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Development of Measuring Technique for Milk Composition by Using Visible-Near Infrared Spectroscopy (가시광선-근적외선 분광법을 이용한 유성분 측정 기술 개발)

  • Choi, Chang-Hyun;Yun, Hyun-Woong;Kim, Yong-Joo
    • Food Science and Preservation
    • /
    • v.19 no.1
    • /
    • pp.95-103
    • /
    • 2012
  • The objective of this study was to develop models for the predict of the milk properties (fat, protein, SNF, lactose, MUN) of unhomogenized milk using the visible and near-infrared (NIR) spectroscopic technique. A total of 180 milk samples were collected from dairy farms. To determine optimal measurement temperature, the temperatures of the milk samples were kept at three levels ($5^{\circ}C$, $20^{\circ}C$, and $40^{\circ}C$). A spectrophotometer was used to measure the reflectance spectra of the milk samples. Multilinear-regression (MLR) models with stepwise method were developed for the selection of the optimal wavelength. The preprocessing methods were used to minimize the spectroscopic noise, and the partial-least-square (PLS) models were developed to prediction of the milk properties of the unhomogenized milk. The PLS results showed that there was a good correlation between the predicted and measured milk properties of the samples at $40^{\circ}C$ and at 400~2,500 nm. The optimal-wavelength range of fat and protein were 1,600~1,800 nm, and normalization improved the prediction performance. The SNF and lactose were optimized at 1,600~1,900 nm, and the MUN at 600~800 nm. The best preprocessing method for SNF, lactose, and MUN turned out to be smoothing, MSC, and second derivative. The Correlation coefficients between the predicted and measured fat, protein, SNF, lactose, and MUN were 0.98, 0.90, 0.82, 0.75, and 0.61, respectively. The study results indicate that the models can be used to assess milk quality.

Advanced Improvement for Frequent Pattern Mining using Bit-Clustering (비트 클러스터링을 이용한 빈발 패턴 탐사의 성능 개선 방안)

  • Kim, Eui-Chan;Kim, Kye-Hyun;Lee, Chul-Yong;Park, Eun-Ji
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.1
    • /
    • pp.105-115
    • /
    • 2007
  • Data mining extracts interesting knowledge from a large database. Among numerous data mining techniques, research work is primarily concentrated on clustering and association rules. The clustering technique of the active research topics mainly deals with analyzing spatial and attribute data. And, the technique of association rules deals with identifying frequent patterns. There was an advanced apriori algorithm using an existing bit-clustering algorithm. In an effort to identify an alternative algorithm to improve apriori, we investigated FP-Growth and discussed the possibility of adopting bit-clustering as the alternative method to solve the problems with FP-Growth. FP-Growth using bit-clustering demonstrated better performance than the existing method. We used chess data in our experiments. Chess data were used in the pattern mining evaluation. We made a creation of FP-Tree with different minimum support values. In the case of high minimum support values, similar results that the existing techniques demonstrated were obtained. In other cases, however, the performance of the technique proposed in this paper showed better results in comparison with the existing technique. As a result, the technique proposed in this paper was considered to lead to higher performance. In addition, the method to apply bit-clustering to GML data was proposed.

  • PDF

Liver Splitting Using 2 Points for Liver Graft Volumetry (간 이식편의 체적 예측을 위한 2점 이용 간 분리)

  • Seo, Jeong-Joo;Park, Jong-Won
    • The KIPS Transactions:PartB
    • /
    • v.19B no.2
    • /
    • pp.123-126
    • /
    • 2012
  • This paper proposed a method to separate a liver into left and right liver lobes for simple and exact volumetry of the river graft at abdominal MDCT(Multi-Detector Computed Tomography) image before the living donor liver transplantation. A medical team can evaluate an accurate river graft with minimized interaction between the team and a system using this algorithm for ensuring donor's and recipient's safe. On the image of segmented liver, 2 points(PMHV: a point in Middle Hepatic Vein and PPV: a point at the beginning of right branch of Portal Vein) are selected to separate a liver into left and right liver lobes. Middle hepatic vein is automatically segmented using PMHV, and the cutting line is decided on the basis of segmented Middle Hepatic Vein. A liver is separated on connecting the cutting line and PPV. The volume and ratio of the river graft are estimated. The volume estimated using 2 points are compared with a manual volume that diagnostic radiologist processed and estimated and the weight measured during surgery to support proof of exact volume. The mean ${\pm}$ standard deviation of the differences between the actual weights and the estimated volumes was $162.38cm^3{\pm}124.39$ in the case of manual segmentation and $107.69cm^3{\pm}97.24$ in the case of 2 points method. The correlation coefficient between the actual weight and the manually estimated volume is 0.79, and the correlation coefficient between the actual weight and the volume estimated using 2 points is 0.87. After selection the 2 points, the time involved in separation a liver into left and right river lobe and volumetry of them is measured for confirmation that the algorithm can be used on real time during surgery. The mean ${\pm}$ standard deviation of the process time is $57.28sec{\pm}32.81$ per 1 data set ($149.17pages{\pm}55.92$).

The Evaluation of Meteorological Inputs retrieved from MODIS for Estimation of Gross Primary Productivity in the US Corn Belt Region (MODIS 위성 영상 기반의 일차생산성 알고리즘 입력 기상 자료의 신뢰도 평가: 미국 Corn Belt 지역을 중심으로)

  • Lee, Ji-Hye;Kang, Sin-Kyu;Jang, Keun-Chang;Ko, Jong-Han;Hong, Suk-Young
    • Korean Journal of Remote Sensing
    • /
    • v.27 no.4
    • /
    • pp.481-494
    • /
    • 2011
  • Investigation of the $CO_2$ exchange between biosphere and atmosphere at regional, continental, and global scales can be directed to combining remote sensing with carbon cycle process to estimate vegetation productivity. NASA Earth Observing System (EOS) currently produces a regular global estimate of gross primary productivity (GPP) and annual net primary productivity (NPP) of the entire terrestrial earth surface at 1 km spatial resolution. While the MODIS GPP algorithm uses meteorological data provided by the NASA Data Assimilation Office (DAO), the sub-pixel heterogeneity or complex terrain are generally reflected due to coarse spatial resolutions of the DAO data (a resolution of $1{\circ}\;{\times}\;1.25{\circ}$). In this study, we estimated inputs retrieved from MODIS products of the AQUA and TERRA satellites with 5 km spatial resolution for the purpose of finer GPP and/or NPP determinations. The derivatives included temperature, VPD, and solar radiation. Seven AmeriFlux data located in the Corn Belt region were obtained to use for evaluation of the input data from MODIS. MODIS-derived air temperature values showed a good agreement with ground-based observations. The mean error (ME) and coefficient of correlation (R) ranged from $-0.9^{\circ}C$ to $+5.2^{\circ}C$ and from 0.83 to 0.98, respectively. VPD somewhat coarsely agreed with tower observations (ME = -183.8 Pa ~ +382.1 Pa; R = 0.51 ~ 0.92). While MODIS-derived shortwave radiation showed a good correlation with observations, it was slightly overestimated (ME = -0.4 MJ $day^{-1}$ ~ +7.9 MJ $day^{-1}$; R = 0.67 ~ 0.97). Our results indicate that the use of inputs derived MODIS atmosphere and land products can provide a useful tool for estimating crop GPP.

Quality Control of Agro-meteorological Data Measured at Suwon Weather Station of Korea Meteorological Administration (기상청 수원기상대 농업기상 관측요소의 품질관리)

  • Oh, Gyu-Lim;Lee, Seung-Jae;Choi, Byoung-Choel;Kim, Joon;Kim, Kyu-Rang;Choi, Sung-Won;Lee, Byong-Lyol
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.17 no.1
    • /
    • pp.25-34
    • /
    • 2015
  • In this research, we applied a procedure of quality control (QC) to the agro-meteorological data measured at the Suwon weather station of Korea Meteorological Administration (KMA). The QC was conducted through six steps based on the KMA Real-time Quality control system for Meteorological Observation Data (RQMOD) and four steps based on the International Soil Moisture Network (ISMN) QC modules. In addition, we set up our own empirical method to remove erroneous data which could not be filtered by the RQMOD and ISMN methods. After all these QC procedures, a well-refined agro-meteorological dataset was complied at both air and soil temperatures. Our research suggests that soil moisture requires more detailed and reliable grounds to remove doubtful data, especially in winter with its abnormal variations. The raw data and the data after QC are now available at the NCAM website (http://ncam.kr/page/req/agri_weather.php).