• Title/Summary/Keyword: base-generated topic

Search Result 7, Processing Time 0.022 seconds

Why Are Sentential Subjects Not Allowed in Seem-type Verbs in English?

  • Jang, Youngjun
    • Journal of English Language & Literature
    • /
    • v.55 no.6
    • /
    • pp.1245-1261
    • /
    • 2009
  • The purpose of this paper is to show the internal structure of the socalled sentential subject constructions in English. The constructions that we examine in this paper are such as It seems that John failed in the syntax exam vs. *That John failed in the syntax exam seems and It really stinks that the Giants lost the World Series vs. That the Giants lost the World Series really stinks. As seen above, the English verb seem does not tolerate the sentential subject. This is in sharp contrast to other English verbs such as suck, blow, bite, and stink, which do allow the sentential subject. There are several issues regarding these constructions. First, where is the sentential subject located? Second, is the sentential subject assigned structural Case? Third, is the sentential subject extraposed or does it remain in its base-generated complement position? Fourth, is the sentential subject a base-generated topic in the specifier position of CP, as Arlenga (2005) claims? In this paper, we argue that sentential subjects are base-generated in the specifier of the verbal phrase in case of stink-type verbs, while they are licensed as a complement to verbs like seem. We also argue that a sentential subject can be raised in the seem-type verbal constructions, if it were part of the complement small clause.

Company Name Discrimination in Tweets using Topic Signatures Extracted from News Corpus

  • Hong, Beomseok;Kim, Yanggon;Lee, Sang Ho
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.4
    • /
    • pp.128-136
    • /
    • 2016
  • It is impossible for any human being to analyze the more than 500 million tweets that are generated per day. Lexical ambiguities on Twitter make it difficult to retrieve the desired data and relevant topics. Most of the solutions for the word sense disambiguation problem rely on knowledge base systems. Unfortunately, it is expensive and time-consuming to manually create a knowledge base system, resulting in a knowledge acquisition bottleneck. To solve the knowledge-acquisition bottleneck, a topic signature is used to disambiguate words. In this paper, we evaluate the effectiveness of various features of newspapers on the topic signature extraction for word sense discrimination in tweets. Based on our results, topic signatures obtained from a snippet feature exhibit higher accuracy in discriminating company names than those from the article body. We conclude that topic signatures extracted from news articles improve the accuracy of word sense discrimination in the automated analysis of tweets.

Design and Implementation of Topic Map Generation System based Tag (태그 기반 토픽맵 생성 시스템의 설계 및 구현)

  • Lee, Si-Hwa;Lee, Man-Hyoung;Hwang, Dae-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.5
    • /
    • pp.730-739
    • /
    • 2010
  • One of core technology in Web 2.0 is tagging, which is applied to multimedia data such as web document of blog, image and video etc widely. But unlike expectation that the tags will be reused in information retrieval and then maximize the retrieval efficiency, unacceptable retrieval results appear owing to toot limitation of tag. In this paper, in the base of preceding research about image retrieval through tag clustering, we design and implement a topic map generation system which is a semantic knowledge system. Finally, tag information in cluster were generated automatically with topics of topic map. The generated topics of topic map are endowed with mean relationship by use of WordNet. Also the topics are endowed with occurrence information suitable for topic pair, and then a topic map with semantic knowledge system can be generated. As the result, the topic map preposed in this paper can be used in not only user's information retrieval demand with semantic navigation but alse convenient and abundant information service.

Automatic Detection of Off-topic Documents using ConceptNet and Essay Prompt in Automated English Essay Scoring (영어 작문 자동채점에서 ConceptNet과 작문 프롬프트를 이용한 주제-이탈 문서의 자동 검출)

  • Lee, Kong Joo;Lee, Gyoung Ho
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1522-1534
    • /
    • 2015
  • This work presents a new method that can predict, without the use of training data, whether an input essay is written on a given topic. ConceptNet is a common-sense knowledge base that is generated automatically from sentences that are extracted from a variety of document types. An essay prompt is the topic that an essay should be written about. The method that is proposed in this paper uses ConceptNet and an essay prompt to decide whether or not an input essay is off-topic. We introduce a way to find the shortest path between two nodes on ConceptNet, as well as a way to calculate the semantic similarity between two nodes. Not only an essay prompt but also a student's essay can be represented by concept nodes in ConceptNet. The semantic similarity between the concepts that represent an essay prompt and the other concepts that represent a student's essay can be used for a calculation to rank "on-topicness" ; if a low ranking is derived, an essay is regarded as off-topic. We used eight different essay prompts and a student-essay collection for the performance evaluation, whereby our proposed method shows a performance that is better than those of the previous studies. As ConceptNet enables the conduction of a simple text inference, our new method looks very promising with respect to the design of an essay prompt for which a simple inference is required.

Application of Grid-based Approach for Auto Mesh Generation of Vacuum Chamber (자동 요소망 생성을 위한 격자구성기법 적용)

  • Lee J.S.;Park Y.J.;Chang Y.S.;Choi J.B.;Kim Y.J.
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2005.06a
    • /
    • pp.844-847
    • /
    • 2005
  • A seamless analysis of complex geometry is one of greatly interesting topic. However, there are still gaps between the industrial applications and fundamental academic studies owing to time consuming modeling process. To resolve this problem, an auto mesh generation program based on grid-based approach has been developed for IT-product in the present study. At first, base mesh and skin mesh are generated using the information of entities which extracted from IGES file. Secondly the provisional core mesh with rugged boundary geometry is constructed by superimposing the skin mesh as well as the base mesh generated from the CAD model. Finally, the positions of boundary nodes are adjusted to make a qualified mesh by adapting node modification and smoothing techniques. Also, for the sake of verification of mesh quality, the hexahedral auto mesh constructed by the program is compared with the corresponding tetrahedral free mesh and hexahedral mapped mesh through static finite element analyses. Thereby, it is anticipated that the grid-based approach can be used as a promising pre-processor for integrity evaluation of various IT-products.

  • PDF

A Minimum Expected Length Insertion Algorithm and Grouping Local Search for the Heterogeneous Probabilistic Traveling Salesman Problem (이종 확률적 외판원 문제를 위한 최소 평균거리 삽입 및 집단적 지역 탐색 알고리듬)

  • Kim, Seung-Mo;Choi, Ki-Seok
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.33 no.3
    • /
    • pp.114-122
    • /
    • 2010
  • The Probabilistic Traveling Salesman Problem (PTSP) is an important topic in the study of traveling salesman problem and stochastic routing problem. The goal of PTSP is to find a priori tour visiting all customers with a minimum expected length, which simply skips customers not requiring a visit in the tour. There are many existing researches for the homogeneous version of the problem, where all customers have an identical visiting probability. Otherwise, the researches for the heterogeneous version of the problem are insufficient and most of them have focused on search base algorithms. In this paper, we propose a simple construction algorithm to solve the heterogeneous PTSP. The Minimum Expected Length Insertion (MELI) algorithm is a construction algorithm and consists of processes to decide a sequence of visiting customers by inserting the one, with the minimum expected length between two customers already in the sequence. Compared with optimal solutions, the MELI algorithm generates better solutions when the average probability is low and the customers have different visiting probabilities. We also suggest a local search method which improves the initial solution generated by the MELI algorithm.

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.