• Title/Summary/Keyword: phrase-based

Search Result 233, Processing Time 0.02 seconds

Enhanced Method for Person Name Retrieval in Academic Information Service (학술정보서비스에서 인명검색 고도화 방법)

  • Han, Hee-Jun;Yae, Yong-Hee;You, Beom-Jong
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.2
    • /
    • pp.490-498
    • /
    • 2010
  • In the web or not, all academic information have the creator which produces that information. The creator can be individual, organization, institution, or country. Most information consist of the title, author and content. The article among academic information is described by title, author, keywords, abstract, publisher, ISSN(International Standard Serial Number) and etc., and the patent information is consisted some metadata such as invention title, applicant, inventors, agents, application number, claim items etc. Most web-based academic information services provide search functions to user by processing and handling these metadata, and the search function using the author field is important. In this paper, we propose an effective indexing management for person name search, and search techniques using boosting factor and near operation based on phrase search to improve precision rate of search result. And we describe person name retrieval result with another expression name, co-authors and persons in same research field. The approach presented in this paper provides accurate data and additional search results to user efficiently.

Relation Extraction based on Composite Kernel combining Pattern Similarity of Predicate-Argument Structure (술어-논항 구조의 패턴 유사도를 결합한 혼합 커널 기반관계 추출)

  • Jeong, Chang-Hoo;Choi, Sung-Pil;Choi, Yun-Soo;Song, Sa-Kwang;Chun, Hong-Woo
    • Journal of Internet Computing and Services
    • /
    • v.12 no.5
    • /
    • pp.73-85
    • /
    • 2011
  • Lots of valuable textual information is used to extract relations between named entities from literature. Composite kernel approach is proposed in this paper. The composite kernel approach calculates similarities based on the following information:(1) Phrase structure in convolution parse tree kernel that has shown encouraging results. (2) Predicate-argument structure patterns. In other words, the approach deals with syntactic structure as well as semantic structure using a reciprocal method. The proposed approach was evaluated using various types of test collections and it showed the better performance compared with those of previous approach using only information from syntactic structures. In addition, it showed the better performance than those of the state of the art approach.

A Unit Selection Methods using Flexible Break in a Japanese TTS (일본어 합성기에서 유동 Break를 이용한 합성단위 선택 방법)

  • Song, Young-Hwan;Na, Deok-Su;Kim, Jong-Kuk;Bae, Myung-Jin;Lee, Jong-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.403-408
    • /
    • 2007
  • In a large corpus-based speech synthesizer, a break, which is a parameter influencing the naturalness and intelligibility, is used as an important feature during a unit selection process. Japanese is a language having intonations, which ate indicated by the relative differences in pitch heights and the APs(Accentual Phrases) are placed according to the changes of the accents while a break occurs on a boundary of the APs. Although a break can be predicted by using J-ToBI(Japanese-Tones and Break Indices), which is a rule-based or statistical approach, it is very difficult to predict a break exactly due to the flexibility. Therefore, in this paper, a method is to conduct a unit search by dividing breaks into two types, such as a fixed break and a flexible break, in order to use the advantages of a large-scale corpus, which includes various types of prosodies. As a result of an experiment, the proposed unit selection method contributed itself to enhance the naturalness of synthesized speeches.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

The Development of an Automatic Indexing System based on a Thesaurus (시소러스를 기반으로 하는 자동색인 시스템에 관한 연구)

  • 임형묵;정상철
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.1
    • /
    • pp.213-242
    • /
    • 1993
  • During the past decades,several automatic indexing systems have been developed such as single term indexing.phrase indexing and thesaurus basedidndexing systems.Among these systems,single term indexing has been known as superior to others despte its simpicity of extracting meaningful terms.On the other hand,thesaurus based one has been conceived as producing low retrival rate ,mainly because thesauri do not usually have enough index terms.so that much of text data fail to be indexed if they do not match with any of index terms in thesauri.This paper develops a thesaurus based indexing system THINS that yields higher retrieval rate than other systems.by doing syntactic analysis of text data and matching them with index terms in thesauri partially.First,the system analyzes the input text syntactically by using the machine translation suystem MATES/EK and extracts noun phrases.After deleting stop words from noun phrases and stemming the remaining ones.it tries to index these with similar index terms in the thesaurus as much as possible. We conduct an experiment with CACM data set that measures the retrieval effectiveness with CACM data set that measures the retrieval effectuvenss of THINS with single term based one under HYKIS-a thesaurus based information retrieval system.It turns out that THINS yields about 10 percent higher precision than single term based one.while shows 8to9 percent lower recall.This retrieval rate shows that THINS improves much better than privious ones that only yields 25 or 30 percent lower precision than single term based one.We also argue that the relatively lower recall is cause by that CRCS-the thesaurus included in CACM datea set is very incomplete one,having only more than one thousand terms,thus THINS is expected to produce much higher rate if it is associated with currently available large thesaurus.

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

A Study on the Major Elements of an Arbitration Clause in International Investment Contracts (국제투자계약상의 중재조항(Arbitration Clause)의 주요 구성요소에 관한 연구)

  • Oh, Won-Suk;Seo, Kyung
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.38
    • /
    • pp.155-180
    • /
    • 2008
  • The purpose of this paper is to examine the major elements of Arbitration Clause in international investment contracts and to help the investor, especially foreign investors, considering these elements when they draft the contracts. First of all, to describe the extent of the arbitrable issues broadly is very important by using the phrase such as "disputes in connection with". Furthermore in order to be enforceable, the issues must be a subject-matter to be submitted to arbitration in accordance with the laws of the place of arbitration and the law application to the merits of the disputes (N.Y. Convention, Art. II). Second, the appointment of the arbitrators usually shall be based on the principle of freedom of contract. If the parties do not agree on the appointment, it is decided in accordance with the arbitration rules of the institution by the tribunal. Third, the procedural rules of the arbitration are the arbitration rules of the arbitration institution in case of institution arbitration, unless otherwise agreed. Forth, what is the most importance element of Arbitration Clause is the place of arbitration. In this case, also the principle of freedom of contract has priority. Unless otherwise agreed, Washington is the place of arbitration in case of ICSID Arbitration, but in case of ICC Arbitration, neutral third country may be the place of arbitration. However in case of ad hoc arbitration, both parties should indicate the place. If not, the whole arbitration may be paralysed by an uncooperative party. Besides the major elements, I examined the relation between the arbitration clause and award enforcement in terms of sovereign immunity. The enforcement of awards in the field of state contracts many encounter the problem of the sovereign immunity, which means that the State itself or the State enterprise is the contract partner. To avoid the this problems, it is advisable for the parties insert the clause such as ICSID Model Clause XIX.

  • PDF

A Study on the Research Trends in Domestic/International Information Science Articles by Co-word Analysis (동시출현단어 분석을 통한 국내외 정보학 학회지 연구동향 파악)

  • Kim, Ha Jin;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.1
    • /
    • pp.99-118
    • /
    • 2014
  • This paper carried out co-word analysis of noun and noun phrase using text-mining technique in order to grasp the research trends on domestic and international information science articles. It was conducted based on collected titles and articles of the papers published in the Journal of the Korean Society for Information Management (KOSIM) and Journal of American Society for Information Science and Technology (JASIST) from 1990 to 2013. By dividing whole period into five publication window, this paper was organized into the following processes: 1) analysis of high frequency co-word pair to examine the overall trends of both information science articles 2) analysis of each word appearing with high frequency keyword to grasp the detailed subject 3) focused network analysis of trend after 2010 when distinctively new keyword appeared. The result of the analysis shows that KOSIM has considerable portion of studies conducted regarding topics such as library, information service, information user and information organization. Whereas, JASIST has focused on studies regarding information retrieval, information user, web information, and bibliometrics.

Study on Performance Evaluation of Automatic license plate recognition program using Emgu CV (Emgu CV를 이용한 자동차 번호판 자동 인식 프로그램의 성능 평가에 관한 연구)

  • Kim, Nam-Woo;Hur, Chang-Wu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.6
    • /
    • pp.1209-1214
    • /
    • 2016
  • LPR(License plate recognition) is a kind of the most popular surveillance technology based on accompanied by a video and video within the optical character recognition. LPR need a many process. One is a localization of car license plates, license plate of size, space, contrast, normalized to adjust the brightness, another is character division for recognize the character optical character recognition to win the individual characters, character recognition, the other is phrase analysis of the shape, size, position by year, the procedure for the analysis by comparing the database of license plate having a difference by region. In this paper, describing the results of performance of license plate recognition S/W, which was implemented using EmguCV, find the location, using the tesseract OCR, which are well known to an optical character recognition engine of open source, the characters of the license plate image capturing angle of the plate, image size, brightness.

Automatic semantic annotation of web documents by SVM machine learning (SVM 기계학습을 이용한 웹문서의 자동 의미 태깅)

  • Hwang, Woon-Ho;Kang, Sin-Jae
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.2
    • /
    • pp.49-59
    • /
    • 2007
  • This paper is about an system which can perform automatic semantic annotation to actualize "Semantic Web." Since it is impossible to tag numerous documents manually in the web, it is necessary to gather large Korean web documents as training data, and extract features by using natural language techniques and a thesaurus. After doing these, we constructed concept classifiers through the SVM (support vector machine) teaming algorithm. According to the characteristics of Korean language, morphological analysis and syntax analysis were used in this system to extract feature information. Based on these analyses, the concept code is mapped with Kadokawa thesaurus, which made it possible to map similar words and phrase to one concept code, to make training vectors. This contributed to rise the recall of our system. Results of the experiment show the system has a some possibility of semantic annotation.

  • PDF