• Title/Summary/Keyword: sequence-to-sequence 모델

Search Result 695, Processing Time 0.025 seconds

Automatic Text Categorization Using Passage-based Weight Function and Passage Type (문단 단위 가중치 함수와 문단 타입을 이용한 문서 범주화)

  • Joo, Won-Kyun;Kim, Jin-Suk;Choi, Ki-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.6 s.102
    • /
    • pp.703-714
    • /
    • 2005
  • Researches in text categorization have been confined to whole-document-level classification, probably due to lacks of full-text test collections. However, full-length documents availably today in large quantities pose renewed interests in text classification. A document is usually written in an organized structure to present its main topic(s). This structure can be expressed as a sequence of sub-topic text blocks, or passages. In order to reflect the sub-topic structure of a document, we propose a new passage-level or passage-based text categorization model, which segments a test document into several Passages, assigns categories to each passage, and merges passage categories to document categories. Compared with traditional document-level categorization, two additional steps, passage splitting and category merging, are required in this model. By using four subsets of Routers text categorization test collection and a full-text test collection of which documents are varying from tens of kilobytes to hundreds, we evaluated the proposed model, especially the effectiveness of various passage types and the importance of passage location in category merging. Our results show simple windows are best for all test collections tested in these experiments. We also found that passages have different degrees of contribution to main topic(s), depending on their location in the test document.

A Development of Generalized Coupled Markov Chain Model for Stochastic Prediction on Two-Dimensional Space (수정 연쇄 말콥체인을 이용한 2차원 공간의 추계론적 예측기법의 개발)

  • Park Eun-Gyu
    • Journal of Soil and Groundwater Environment
    • /
    • v.10 no.5
    • /
    • pp.52-60
    • /
    • 2005
  • The conceptual model of under-sampled study area will include a great amount of uncertainty. In this study, we investigate the applicability of Markov chain model in a spatial domain as a tool for minimizing the uncertainty arose from the lack of data. A new formulation is developed to generalize the previous two-dimensional coupled Markov chain model, which has more versatility to fit any computational sequence. Furthermore, the computational algorithm is improved to utilize more conditioning information and reduce the artifacts, such as the artificial parcel inclination, caused by sequential computation. A generalized 20 coupled Markov chain (GCMC) is tested through applying a hypothetical soil map to evaluate the appropriateness as a substituting model for conventional geostatistical models. Comparing to sequential indicator model (SIS), the simulation results from GCMC shows lower entropy at the boundaries of indicators which is closer to real soil maps. For under-sampled indicators, however, GCMC under-estimates the presence of the indicators, which is a common aspect of all other geostatistical models. To improve this under-estimation, further study on data fusion (or assimilation) inclusion in the GCMC is required.

Application of Hydro-Cartographic Generalization on Buildings for 2-Dimensional Inundation Analysis (2차원 침수해석을 위한 수리학적 건물 일반화 기법의 적용)

  • PARK, In-Hyeok;JIN, Gi-Ho;JEON, Ka-Young;HA, Sung-Ryong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.18 no.2
    • /
    • pp.1-15
    • /
    • 2015
  • Urban flooding threatens human beings and facilities with chemical and physical hazards since the beginning of human civilization. Recent studies have emphasized the integration of data and models for effective urban flood inundation modeling. However, the model set-up process is tend to be time consuming and to require a high level of data processing skill. Furthermore, in spite of the use of high resolution grid data, inundation depth and velocity are varied with building treatment methods in 2-D inundation model, because undesirable grids are generated and resulted in the reliability decline of the simulation results. Thus, it requires building generalization process or enhancing building orthogonality to minimize the distortion of building before converting building footprint into grid data. This study aims to develop building generalization method for 2-dimensional inundation analysis to enhance the model reliability, and to investigate the effect of building generalization method on urban inundation in terms of geographical engineering and hydraulic engineering. As a result to improve the reliability of 2-dimensional inundation analysis, the building generalization method developed in this study should be adapted using Digital Building Model(DBM) before model implementation in urban area. The proposed building generalization sequence was aggregation-simplification, and the threshold of the each method should be determined by considering spatial characteristics, which should not exceed the summation of building gap average and standard deviation.

Review of Site Characterization Methodology for Deep Geological Disposal of Radioactive Waste (방사성폐기물의 심층 처분을 위한 부지특성조사 방법론 해외 사례 연구)

  • Park, Kyung-Woo;Kim, Kyung-Su;Koh, Yong-Kwon;Jo, Yeonguk;Ji, Sung-Hoon
    • Journal of Nuclear Fuel Cycle and Waste Technology(JNFCWT)
    • /
    • v.15 no.3
    • /
    • pp.239-256
    • /
    • 2017
  • In the process of site selection for a radioactive waste disposal, site characterization must be carried out to obtain input parameters to assess the safety and feasibility of deep geological repository. In this paper, methodologies of site characterization for radioactive waste disposal in Korea were suggested based on foreign cases of site characterization. The IAEA recommends that site characterization for radioactive waste disposal should be performed through stepwise processes, in which the site characterization period is divided into preliminary and detailed stages, in sequence. This methodology was followed by several foreign countries for their geological disposal programs. General properties related to geological environments were obtained at the preliminary site characterization stage; more detailed site characteristics were investigated during the detailed site characterization stage. The results of investigation of geology, hydro-geology, geochemistry, rock mechanics, solute transport and thermal properties at a site have to be combined and constructed in the form of a site descriptive model. Based on this site descriptive model, the site characteristics can be evaluated to assess suitability of site for radioactive waste disposal. According to foreign site characterization cases, 7 or 8 years are expected to be needed for site characterization; however, the time required may increase if the no proper national strategy is provided.

A Study on the Transmission of a Transgene in the Offspring of Transgenic Mice (형질전환 생쥐의 후손에서 외래 유전자의 유전성에 대한 연구)

  • 염행철
    • Korean Journal of Animal Reproduction
    • /
    • v.20 no.4
    • /
    • pp.453-458
    • /
    • 1997
  • It is known that the incorporation of genes into transgenic mice is generally stable and is p passed on to succeeding generations in a Mendelian fashion. In this report, transgenic mice were set as a model to evaluate whether the transgenes are transmitted in a Mendelian principle in a successive generations and how they are tran s smitted into their offspring. A 3.0 kb linear DNA fragment, containing the MMTV LTR, bovine aSI casein cDNA and SV 40 splicing and polyadenylation site; was microinjected into fertilized mouse embryos. The tail DNAs of the resulting pups were subjected to dot and Southern hybridizations to screen transgenic founders. The DNAs of their offspring were anlyzed by PCR to confirm the transmission of the transgene from F0. Out of 72 live pups four pups (5.6%), 3 males and 1 female, were positive for the transgene. The rates of transmission from F0 into F1 were 33.3, 7.7, 0, and 62.5%. Those from F1 into F2 were 63.6, 5.9, and 68.8% and those from F2 into F3 were 85.7, and 88.2%. In this report, the transmission pattern of transgenes in transgenic mice into their offspring was demonstrated. It either follows or does not follow in a Mendelian fashion. Deletion or loss of the transgenes from F0 in some lines became apparant to the succeeding generations.

  • PDF

Design and Implementation of a Real-Time Lipreading System Using PCA & HMM (PCA와 HMM을 이용한 실시간 립리딩 시스템의 설계 및 구현)

  • Lee chi-geun;Lee eun-suk;Jung sung-tae;Lee sang-seol
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.11
    • /
    • pp.1597-1609
    • /
    • 2004
  • A lot of lipreading system has been proposed to compensate the rate of speech recognition dropped in a noisy environment. Previous lipreading systems work on some specific conditions such as artificial lighting and predefined background color. In this paper, we propose a real-time lipreading system which allows the motion of a speaker and relaxes the restriction on the condition for color and lighting. The proposed system extracts face and lip region from input video sequence captured with a common PC camera and essential visual information in real-time. It recognizes utterance words by using the visual information in real-time. It uses the hue histogram model to extract face and lip region. It uses mean shift algorithm to track the face of a moving speaker. It uses PCA(Principal Component Analysis) to extract the visual information for learning and testing. Also, it uses HMM(Hidden Markov Model) as a recognition algorithm. The experimental results show that our system could get the recognition rate of 90% in case of speaker dependent lipreading and increase the rate of speech recognition up to 40~85% according to the noise level when it is combined with audio speech recognition.

  • PDF

Personalized Session-based Recommendation for Set-Top Box Audience Targeting (셋톱박스 오디언스 타겟팅을 위한 세션 기반 개인화 추천 시스템 개발)

  • Jisoo Cha;Koosup Jeong;Wooyoung Kim;Jaewon Yang;Sangduk Baek;Wonjun Lee;Seoho Jang;Taejoon Park;Chanwoo Jeong;Wooju Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.323-338
    • /
    • 2023
  • TV advertising with deep analysis of watching pattern of audiences is important to set-top box audience targeting. Applying session-based recommendation model(SBR) to internet commercial, or recommendation based on searching history of user showed its effectiveness in previous studies, but applying SBR to the TV advertising was difficult in South Korea due to data unavailabilities. Also, traditional SBR has limitations for dealing with user preferences, especially in data with user identification information. To tackle with these problems, we first obtain set-top box data from three major broadcasting companies in South Korea(SKB, KT, LGU+) through collaboration with Korea Broadcast Advertising Corporation(KOBACO), and this data contains of watching sequence of 4,847 anonymized users for 6 month respectively. Second, we develop personalized session-based recommendation model to deal with hierarchical data of user-session-item. Experiments conducted on set-top box audience dataset and two other public dataset for validation. In result, our proposed model outperformed baseline model in some criteria.

Endless debates on the extant basal-most angiosperm (현생 기저 피자식물에 대한 끝나지 않는 논쟁)

  • Kim, Sangtae
    • Korean Journal of Plant Taxonomy
    • /
    • v.40 no.1
    • /
    • pp.1-15
    • /
    • 2010
  • Recognizing a basal group in a taxon is one of the most important factors involved in understanding the evolutionary history of that group of life. Many botanists have suggested a sister to all other angiosperms to understand the origin and rapid diversification of angiosperms based on morphological and fossil evidence. Recent technical advances in molecular biology and the accumulation of molecular phylogenetic data have provided evidence of the extant basal-most angiosperm which is a sister to all other angiosperms. Although it is still arguable, most plant taxonomists agree that Amborella trichopoda Baill., a species (monotypic genus and monotypic family) distributed in New Caledonia, is a sister to all other extant angiosperms based on evidence from the following molecular approaches: 1) classical phylogenetic analyses based on multiple genes (or DNA regions), 2) analyses of a tree network of duplicated gene families, and 3) gene-structural evidence. As an alternative hypothesis with relatively minor evidence, some researchers have also suggested that Amborella and Nymphaeaceae form a clade that is a sister to all other angiosperms. Debate regarding the basal-most angiosperms is still ongoing and is currently one of the hot issues in plant evolutionary biology. We expect that sequencing of the whole genome of Amborella as an evolutionary model plant and subsequent studies based on this genome sequence will provide information regarding the origin and rapid diversification of angiosperms, which is Darwin's so called abominable mystery.

Temporal Data Mining Framework (시간 데이타마이닝 프레임워크)

  • Lee, Jun-Uk;Lee, Yong-Jun;Ryu, Geun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.9D no.3
    • /
    • pp.365-380
    • /
    • 2002
  • Temporal data mining, the incorporation of temporal semantics to existing data mining techniques, refers to a set of techniques for discovering implicit and useful temporal knowledge from large quantities of temporal data. Temporal knowledge, expressible in the form of rules, is knowledge with temporal semantics and relationships, such as cyclic pattern, calendric pattern, trends, etc. There are many examples of temporal data, including patient histories, purchaser histories, and web log that it can discover useful temporal knowledge from. Many studies on data mining have been pursued and some of them have involved issues of temporal data mining for discovering temporal knowledge from temporal data, such as sequential pattern, similar time sequence, cyclic and temporal association rules, etc. However, all of the works treated data in database at best as data series in chronological order and did not consider temporal semantics and temporal relationships containing data. In order to solve this problem, we propose a theoretical framework for temporal data mining. This paper surveys the work to date and explores the issues involved in temporal data mining. We then define a model for temporal data mining and suggest SQL-like mining language with ability to express the task of temporal mining and show architecture of temporal mining system.

A Data-driven Classifier for Motion Detection of Soldiers on the Battlefield using Recurrent Architectures and Hyperparameter Optimization (순환 아키텍쳐 및 하이퍼파라미터 최적화를 이용한 데이터 기반 군사 동작 판별 알고리즘)

  • Joonho Kim;Geonju Chae;Jaemin Park;Kyeong-Won Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.107-119
    • /
    • 2023
  • The technology that recognizes a soldier's motion and movement status has recently attracted large attention as a combination of wearable technology and artificial intelligence, which is expected to upend the paradigm of troop management. The accuracy of state determination should be maintained at a high-end level to make sure of the expected vital functions both in a training situation; an evaluation and solution provision for each individual's motion, and in a combat situation; overall enhancement in managing troops. However, when input data is given as a timer series or sequence, existing feedforward networks would show overt limitations in maximizing classification performance. Since human behavior data (3-axis accelerations and 3-axis angular velocities) handled for military motion recognition requires the process of analyzing its time-dependent characteristics, this study proposes a high-performance data-driven classifier which utilizes the long-short term memory to identify the order dependence of acquired data, learning to classify eight representative military operations (Sitting, Standing, Walking, Running, Ascending, Descending, Low Crawl, and High Crawl). Since the accuracy is highly dependent on a network's learning conditions and variables, manual adjustment may neither be cost-effective nor guarantee optimal results during learning. Therefore, in this study, we optimized hyperparameters using Bayesian optimization for maximized generalization performance. As a result, the final architecture could reduce the error rate by 62.56% compared to the existing network with a similar number of learnable parameters, with the final accuracy of 98.39% for various military operations.