• Title/Summary/Keyword: 단어길이

Search Result 147, Processing Time 0.026 seconds

`사실오도$\cdot$잘못된 정보`가 오해 부른다 - 농약, 수많은 인공물질중 독성시험 선도하는 안전물질 독성은 그것의 '유무 아닌 강약'에 대한 비교

  • 복전수부
    • Agrochemical news magazine
    • /
    • v.20 no.5 s.152
    • /
    • pp.24-27
    • /
    • 1999
  • 필자는 대략 30년 전부터 전국 각지의 각종 단체에 초대되어 $\ulcorner$농약$\lrcorner$에 관하여 여러 가지 이야기를 해오면서 다양한 질문에 답하고 비판도 받았다. 그리고 너무나도 무책임한 때로는 의분(義憤)을 금할 길이 없을 정도의 정보누적에 의해 $\ulcorner$농약$\lrcorner$이라는 단어가 사람의 건강에서 자연생태계에 이르기까지 $\ulcorner$위험한 물질$\lrcorner$의 대명사가 되어 가는 것을 가슴아프게 생각해 왔다. 이는 분명 전문가가 아닌 어설픈 지식을 가지고 진보적 지식인인 양 하는 사람들과 너무나도 일방적인, 때로는 감정적인 보도자세에 의한 것이라 여겨진다. 어쨌든 쉽게 농약을 의심하는 풍조가 있지만 그로 인하여 간과 할 지도 모른다는 사실이 심히 염려스럽다. 이같은 사례와 함께 농약의 올바른 모습을 2회에 걸쳐 정리해 보고자 한다.

  • PDF

DOCST: Document frequency Oriented Clustering for Short Texts (가중치를 이용한 효과적인 항공 단문 군집 방법)

  • Kim, Jooyoung;Lee, Jimin;An, Soonhong;Lee, Hoonsuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.331-334
    • /
    • 2018
  • 비정형 데이터의 대표적인 형태 중 하나인 텍스트 데이터 기계학습은 다양한 산업군에서 활용되고 있다. NOTAM 은 하루에 수 천개씩 생성되는 항공전문으로써 현재는 사람의 수작업으로 분석하고 있다. 기계학습을 통해 업무 효율성을 기대할 수 있는 반면, 축약어가 혼재된 단문이라는 데이터의 특성상 일반적인 분석에 어려움이 있다. 본 연구에서는, 데이터의 크기가 크지 않고, 축약어가 혼재되어 있으며, 문장의 길이가 매우 짧은 문서들을 군집화하는 방법을 제안한다. 주제를 기준으로 문서를 분류하는 LDA 와, 단어를 k 차원의 벡터공간에 표현하는 Word2Vec 를 활용하여 잡음이 포함된 단문 데이터에서도 효율적으로 문서를 군집화 할 수 있다.

A Study on the Dense Vector Representation of Query-Passage for Open Domain Question Answering (오픈 도메인 질의응답을 위한 질문-구절의 밀집 벡터 표현 연구)

  • Minji Jung;Saebyeok Lee;Youngjune Kim;Cheolhun Heo;Chunghee Lee
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.115-121
    • /
    • 2022
  • 질문에 답하기 위해 관련 구절을 검색하는 기술은 오픈 도메인 질의응답의 검색 단계를 위해 필요하다. 전통적인 방법은 정보 검색 기법인 빈도-역문서 빈도(TF-IDF) 기반으로 희소한 벡터 표현을 활용하여 구절을 검색한다. 하지만 희소 벡터 표현은 벡터 길이가 길 뿐만 아니라, 질문에 나오지 않는 단어나 토큰을 검색하지 못한다는 취약점을 가진다. 밀집 벡터 표현 연구는 이러한 취약점을 개선하고 있으며 대부분의 연구가 영어 데이터셋을 학습한 것이다. 따라서, 본 연구는 한국어 데이터셋을 학습한 밀집 벡터 표현을 연구하고 여러 가지 부정 샘플(negative sample) 추출 방법을 도입하여 전이 학습한 모델 성능을 비교 분석한다. 또한, 대화 응답 선택 태스크에서 밀집 검색에 활용한 순위 재지정 상호작용 레이어를 추가한 실험을 진행하고 비교 분석한다. 밀집 벡터 표현 모델을 학습하는 것이 도전적인 과제인만큼 향후에도 다양한 시도가 필요할 것으로 보인다.

  • PDF

Design of Automatic Document Classifier for IT documents based on SVM (SVM을 이용한 디렉토리 기반 기술정보 문서 자동 분류시스템 설계)

  • Kang, Yun-Hee;Park, Young-B.
    • Journal of IKEEE
    • /
    • v.8 no.2 s.15
    • /
    • pp.186-194
    • /
    • 2004
  • Due to the exponential growth of information on the internet, it is getting difficult to find and organize relevant informations. To reduce heavy overload of accesses to information, automatic text classification for handling enormous documents is necessary. In this paper, we describe structure and implementation of a document classification system for web documents. We utilize SVM for documentation classification model that is constructed based on training set and its representative terms in a directory. In our system, SVM is trained and is used for document classification by using word set that is extracted from information and communication related web documents. In addition, we use vector-space model in order to represent characteristics based on TFiDF and training data consists of positive and negative classes that are represented by using characteristic set with weight. Experiments show the results of categorization and the correlation of vector length.

  • PDF

Korean isolated word recognizer using new time alignment method of speech signal (새로운 시간축 정규화 방법을 이용한 한국어 고립단어 인식기)

  • Nam, Myeong-U;Park, Gyu-Hong;No, Seung-Yong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.5
    • /
    • pp.567-575
    • /
    • 2001
  • This paper suggests new method to get fixed size parameter from different length of voice signals. The efficiency of speech recognizer is determined by how to compare the similarity(distance of each pattern) of the parameter from voice signal. But the variation of voice signal and the difference of speech speed make it difficult to extract the fixed size parameter from the voice signal. The method suggested in this paper is to normalize the parameter at fixed size by using the 2 dimension DCT(Discrete Cosine Transform) after representing the parameter by spectrogram. To prove validity of the suggested method, parameter extracted from 32 auditory filter-bank(it estimates auditory nerve firing probabilities) is used for the input of neural network after being processed by 2 dimension DCT. And to compare with conventional methods, we used one of conventional methods which solve time alignment problem. The result shows more efficient performance and faster recognition speed in the speaker dependent and independent isolated word recognition than conventional method.

  • PDF

Fabrication of semiconductor optical switch module using laser welding technique (반도체 광스위치 모듈의 제작 및 특성연구)

  • 강승구
    • Korean Journal of Optics and Photonics
    • /
    • v.10 no.1
    • /
    • pp.73-79
    • /
    • 1999
  • Semiconductor optical switch modules of 1$\times$2, 1$\times$4, and 4$\times$4 types for 1550 nm optical communication systems were fabricated by using laser welding technique, embodying in 30-pin butterfly package. For better coupling efficiency between switch chip and optical fiber, tapered fibers of 10~15mm lens radii were used, which provided up to 60% optical coupling efficiency. With the help of new laser hammering process, we could recover the lost optical power almost completely up to average 82% of initially obtained power. The fabricated optical switch modules showed good thermal stability of less than 5% degradation even after 200 times thermal cycling test. The 2.5 Gbps optical transmission characteristics of the 4$\times$4 switch module showed low sensitivities of less than -30dB for all possible switching paths. The transmission penalties of 1$\times$2 switch module at $10^{-10}$ BER were 0.6dB and 0.7dB for 50Xm and 90 Km optical fibers, respectively.

  • PDF

Digital Down Converter System improving the computational complexity (복잡도를 개선한 Digital Down Converter 시스템)

  • Moon, Ki-Tak;Hong, Moo-Hyun;Lee, Joung-Seok;Kim, Kyung-Seok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.3
    • /
    • pp.11-17
    • /
    • 2010
  • Multi-standard, multi-band, multi-service system to ensure a flexible interface between the SDR (Software Defined Radio) technology for the implementation of the Stability and Low-Power, Low-Calcualrion DDC (Digital Down Conversion) technology is essential. DDC technology consists of a digital channel filter. This is a typical digital filter because of the limited fisheries are vulnerable to overflow and rounding errors are drawbacks. In this paper, we overcome this disadvantage, we propose the structure of the DDC. The way WDF (Wave Digital Filter) Structural rounding error due to the structural resistance to noise. Therefore, This is the useful structure when the filter coefficients's word length is short. In addition, since IIR filters based on FIR filters based on the amount of computation is reduced because fewer than filter's tap. The proposed structure is used in DDC that CIC (Cascaded Integrator Comb) filter, WDF, IFOP (Interpolated Fourth-Order Polynomials) were analyzed with respect to, the results were confirmed by computer simulation.

Construction of Linearly Aliened Corpus Using Unsupervised Learning (자율 학습을 이용한 선형 정렬 말뭉치 구축)

  • Lee, Kong-Joo;Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.387-394
    • /
    • 2004
  • In this paper, we propose a modified unsupervised linear alignment algorithm for building an aligned corpus. The original algorithm inserts null characters into both of two aligned strings (source string and target string), because the two strings are different from each other in length. This can cause some difficulties like the search space explosion for applications using the aligned corpus with null characters and no possibility of applying to several machine learning algorithms. To alleviate these difficulties, we modify the algorithm not to contain null characters in the aligned source strings. We have shown the usability of our approach by applying it to different areas such as Korean-English back-trans literation, English grapheme-phoneme conversion, and Korean morphological analysis.

Analysis of the Continuity of Reading Passages in the 5th and 6th Grade Elementary School English Textbooks Based on Readability (이독성을 통한 초등학교 5, 6학년 영어 교과서 읽기 지문의 연계성 분석)

  • Jang, Hankyeol;Lee, Je-Young
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.6
    • /
    • pp.116-124
    • /
    • 2022
  • The purpose of this study is to examine the vertical and horizontal continuity between grades and publishers, respectively, by analyzing the readability of reading passages included in English textbooks for 5th and 6th grades of elementary school. In order to do so, a corpus was constructed with the reading passages contained in 10 textbooks, and the reading passages in each textbook were analyzed through Coh-Metrix. Also, it was examined whether there was a statistically significant difference between grades and publishers in readability through one-way ANOVA. The results are as follows. First, as a result of analyzing the difference in readability between publishers within the same grade, there was a statistically significant difference between fifth-grade textbooks in the L2 readability index. Second, as a result of analyzing the vertical continuity between grades within the publisher, the difficulty of textbook A was higher in grade 6 than grade 5 based on FRE and FKGL, which showed a statistically significant difference. On the other hand, when L2 readability was used as the standard, the difficulty of textbook B was lower in 6th grade than in 5th grade. This result seems to be because FRE and FKGL calculate readability based on sentence and word length, whereas L2 readability is based on content word overlap, word frequency, and syntactic similarity of sentences.

Headword Finding System Using Document Expansion (문서 확장을 이용한 표제어 검색시스템)

  • Kim, Jae-Hoon;Kim, Hyung-Chul
    • Journal of Information Management
    • /
    • v.42 no.4
    • /
    • pp.137-154
    • /
    • 2011
  • A headword finding system is defined as an information retrieval system using a word gloss as a query. We use the gloss as a document in order to implement such a system. Generally the gloss is very short in length and then makes very difficult to find the most proper headword for a given query. To alleviate this problem, we expand the document using the concept of query expansion in information retrieval. In this paper, we use 2 document expansion methods : gloss expansion and similar word expansion. The former is the process of inserting glosses of words, which include in the document, into a seed document. The latter is also the process of inserting similar words into a seed document. We use a featureless clustering algorithm for getting the similar words. The performance (r-inclusion rate) amounts to almost 100% when the queries are word glosses and r is 16, and to 66.9% when the queries are written in person by users. Through several experiments, we have observed that the document expansions are very useful for the headword finding system. In the future, new measures including the r-inclusion rate of our proposed measure are required for performance evaluation of headword finding systems and new evaluation sets are also needed for objective assessment.