• Title/Summary/Keyword: 문장 분할

Search Result 131, Processing Time 0.024 seconds

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Outline History of Corporation Yudohoi(儒道會) via 『Cheongeumrok(晴陰錄)』 by Hong Chan-Yu: "Volume of Materials" (『청음록(晴陰錄)』으로 본 (사(社))유도회(儒道會) 약사(略史))

  • Chaung, hoo soo
    • (The)Study of the Eastern Classic
    • /
    • no.55
    • /
    • pp.265-291
    • /
    • 2014
  • Cheongeumrok is the journal of Gwonwoo(卷宇) Hong Chan-yu(1915-2005) during the period of January 9, 1969~January 14, 1982. He was personally involved in the foundation of a corporation called Yudohoi and also all of its operation, which makes him the most knowledgeable person about its history. His Cheongeumrok thus seems worthy enough as a proper material to arrange its history. Cheongeumrok consists of total 19 books, amounting to approximately 3,300 pieces of squared manuscript paper containing 200 letters per piece. He wrote it in Chinese and sometimes followed the Hangul-style word order while writing in Chinese. Many parts of the manuscript were written in a cursive hand with many Chinese poems embedded throughout it. The manuscript offers major information related to the corporation Yudohoi extracted from his journal. 1. There was a meeting of promoters to commemorate the foundation of the corporation in November, 1968, and it was in January, 1969 that it was established after getting a permit from the Ministry of Culture and Communication in January, 1969(Permit No. of Ministry of Culture and Communication: Da(다)-2-3(Jongmu(宗務)1732.5)). 2. Its office was moved from the original location of the 3rd floor of Wonnam Building, 133-1 Wonnam-dong, Jongro-gu, Seoul(currently Daekhak Pharmacy in front of Seoul National University Hospital) to Room 388 of Gwangjang Company, 4 Yeji-dong, Jongro-gu(office of Heungsan Social Gathering) and to second floor of KyungBo building, 21 Kyansu-dong, and to 3rd floor of Geongguk Building in Gyeongwoon-dong. 3. Its operational costs were covered by the supports of Seong Sang-yeong, the eldest son of Seong Jong-ho, the chairman of the board, later Kim Won-tae and Gwon Tae-hun, next chairmen of the board, and Hong Chan-yun, a director, since 1979. 4. His Confucian activities include participating in Seonggyungwan Seokjeonje (成均館 釋奠), joining in the erection of the Parijangseo(巴里長書) Monument and the publication of its commemorative poetry book, compiling the biographies(not completed) of Confucian patriotic martyrs for independence, and participating in the establishment of family rituals and regulations as a practice member. 5. His Yudohoi had a dispute with Seonggyungwan and lost a suit at the High Court in July, 1975 and Supreme Court in February, 1976. 6. There were discussions about its unification with Seonggyungwan Yudohoi, but there was hardly any progress. 7. Yudohoi started to provide full-scale courses on Confucian and Chinese classics under the leadership of Director Hong Chan-yu in 1979, and they have continued on today. Its courses for scholarship students including those for common citizens boast a history of 29 years and 220 graduates.

Study on the meaning of Edi-curation in Trans-media era - Based on the comic(webtoon) and publishing content - (트랜스미디어 시대에서 에디큐레이션의 의미에 대한 연구 - 출판 및 만화 콘텐츠를 중심으로 -)

  • Park, Se-Hyeon
    • Cartoon and Animation Studies
    • /
    • s.44
    • /
    • pp.235-261
    • /
    • 2016
  • Media consumers in the context of the Internet and digital media are using the same content to a variety of platforms. As such, the content of various genres is converted to the form of a new content through the process of fusion, combination, transformation, differentiation, reproduction, etc on the basis of digital media. That is referred to as trans-media. In order to create the successful content in trans-media era, it requires the work of Edi-curation. Edi-curation work is the act of editing and adding meaning to the curation work of curators. In that sense, this paper analyzed the definition and meaning for Edi-curation of publishing and comic(webtoon) content in trans-media era. Edi-curation process induces the changing role of consumers and producers of content in the digital media experience. In process of Edi-curation, consumers(producers) will soon lead to a media producer(consumer), namely proconsumer / produser. Diversification of digital platforms and devices, digital 1 person (or SNS) appeared in the media, etc. are also required to Edi-curation of content and comic(webtoon) published in a variety of ways. Depending on the intention of media producers (or consumers), content through the process of replication, montage, disassembly, dismantling, hypertext, compression, and reconstruction births to new content. The work of Edi-curation has the significance that it affects the way the media producers work in creative process, as well as the reading content of the media consumers. In the publishing content, Edi-curation work is the logicality destruction of a chapter or a paragraph, a sentence of colloquialisms, card news, the deformation of the utilization of video and media content. Meanwhile in the comic(webtoon) content, we mention the destruction of cut(frame), the various modifications of speech bubbles, onomatopoeia, and mimetic word.

Molecular biological studies on Heat-Shock Responses in Amoeba proteus: I. Detection of Heat-shock Proteins (아메바(Amoebaproteus)의 열충격 대응에 관한 분자생물학적 연구: 1 . 열충격 대응 단백질의 탐색)

  • 홍혜경;최지영안태인
    • The Korean Journal of Zoology
    • /
    • v.37 no.4
    • /
    • pp.554-564
    • /
    • 1994
  • 세균이 세포내 공생하는 xD strain과 모 세포주인 tD strain Amoeba proteus의 열충격 대응의 차이를 알아 보기 위하여 방사선 동위원소로 표지된 아미노산을 Ca2+_less Chalkley's 용액에서 음작용 경로를 통하여 90분 동안 흡수하게 하고, 저온 및 고온 스트레스에 대하여 새로 합성되는 스트레스 대응 단백질의 양상을 1, 2차원 전기영동 및 자기방사 사진법에 의해서 비교하였다 저온(10"C) 충격에 대응하여 아메바는 두 strain 모두 56.0 kDa, pl 6.0 단백질을 강하게 발현하였으며, xD strain에서는 tD strain과 달리 저온 충격 초기에 66 0 kDa, pl 5.5 단백질의 발현이 중단되었다. 한편 고온(33"C) 열충격에 대하여 두 strain 아메바에서 모두 10여종의 단백질이 새합성되는 것으로 확인되었으며, tD 아메바에는 이들 단백질의 새합성이 완만하게 이루어지는데 비하여 xD 아메바에서는 그중 66.0 kDa 단백질이 고온 대응 단백질로서 신속하게 새합성되는 것으로 나타났다. 이외에도 2차원 전기 영동 분석을 통하여 열충격에 의해서 발현이 촉진되는 다수의 단백질들을 탐지하였다 탐지된 아메바의 열충격 단백질은 분자량에 따라 hsp100군 2종, hsp90군, 3종, hsp70군 및 hsp60군 각 1종, 그리고 small csp군 4종으로 분류해 볼 수 있었다 두 분석의 결과를 종합해 보면 tD 아메바에는 저온 및 고온 충격에 대하여 열충격 단백질의 합성이 완만하게 상승하는 데 비하여 xD strain에서는 신속하게 이루어졌다. 이상의 결과로 보아 아메바의 세포내 공생 세균은 숙주의 열충격 대응기작에 변화를 야기한 것으로 판단된다한 것으로 판단된다. 10mg과 20mg의 estrogen 처리구 사이에 유두 직경, 길이 그리고 용적의 증가량에 있어서는 차이가 없었다. 10mg 및 20mg의 estrogen 처리는 초발정일령을 각각 20일 및 124일 단축시켰다. 전체적으로 이러한 결과는 송아지에 estradiol의 삽입은 성장과 유선 발달을 촉진시키고 초발정일령을 단축시킬수 있다는 것을 강력하게 지적한다. 일치하지 않으므로 더욱 정밀한 조사를 실시하여 분류학상의 위치를 정확히 밝혀 볼 필요가 있을 것으로 생각되었다.연한 도구이자 정신활동으로 보게함으로써, 주제 및 연구방법에서 획일성보다 다양성과 창조성이 강조되고 있다. 그리고 연구에 있어서 주제 의 다양성을 통해 보다 현실생활에 밀접하게 연결되어야 할 필요성은 학문이나 과학의 사회 성에 대한 새로운 인식을 가져다 주고 있다. 이러한 지리교육과정의 좌표의 변화된 측면들 을 고려하여, 지리교육과정의 새로운 방향은 다음의 세가지로 모색될 수 있다. 첫째, 爭點中 心 地理敎育課程이다. 사회쟁점에 대한 접근은 쟁점의 이해와 문제해결에의 지리적 관점의 활용을 통해 학습내용의 시사성과 사실성을 높힐 수 있다. 이때 문제해결능력을 통해 현대 시민의 자질 및 능력을 기를 수 있음은 물론, 다른 한편으로 실제세계 즉 학생의 실생활, 사 회, 국가, 세계에서 일어나는 일들과의 관련성을 갖게 함으로써, 내적 동기화와 외적인 자극 을 강력하게 결합할 수 있을 것이다. 이는 개인적 유관적합성과 사회적 유관적합성을 동시 에 확보하는 데 유리할 것이다. 둘째, 思考中心 地理敎育課程이다. 지리교육은 학생들을 지 식 및 기능의 숙달자가 되도록 할 것이 아니라 기본적 문장해독력의 수준을 넘어 능력있는 사고자로 길러내는 것을 목표로 하여야 한다.

  • PDF

The Speaker Recognition System using the Pitch Alteration (피치변경을 이용한 화자인식 시스템)

  • Jung JongSoon;Bae MyungJin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.115-118
    • /
    • 2002
  • Parameters used in a speaker recognition system are desirable expressing speaker's characteristics filly and have in a speech. That is to say, if inter-speaker than intra-speaker variance a big characteristic, it is useful to distinguish between speakers. Also, to make minimum error between speakers, it is required the improved recognition technology as well as the distinguishing characteristics. When we see the result of recent simulation performance, we obtain more exact performance by using dynamic characteristics and constant characteristics by a speaking habit. Therefore we suggest it to solve this problem as followings. The prosodic information is used by a characteristic vector of speech. Characteristics vector generally using in speaker recognition system is a modeling spectrum information and is working for a high performance in non-noise circumstance. However, it is found a problem that characteristic vector is distorted in noise circumstance and it makes a reduction of recognition rate. In this paper, we change pitch line divided by segment which can estimate a dynamic characteristic and it is used as a recognition characteristic. we confirmed that the dynamic characteristic is very robust in noise circumstance with a simulation. We make a decision of acceptance or rejection by comparing test pattern and recognition rate using the proposed algorithm has more improvement than using spectrum and prosodic information. Especially stational recognition rate can be obtained in noise circumstance through the simulation.

  • PDF

Effects of Forest Healing Programs Using School Forests on Language Acquisition and Ego-resilience of Multicultural Background Students (학교 숲을 활용한 산림치유프로그램 활동이 다문화배경 학생들의 언어습득 향상과 자아탄력성에 미치는 영향)

  • Jang, Cheoul-Soon;Shin, Chang-Seob;Jang, Byung-Soon;Sharif, Md. Omar
    • Korean Journal of Environment and Ecology
    • /
    • v.33 no.3
    • /
    • pp.333-340
    • /
    • 2019
  • As the number of students in the multicultural background grows, the interest in their education is also increasing. The purpose of this study is to investigate the effect of forest healing factors on the improvement of language ability and ego-resilience of students from multicultural families. We conducted an after-school forest healing program of ten male and ten female middle school students of a multicultural preparatory school located in ${\bigcirc}{\bigcirc}$-dong in Cheongju, Chungnam Province. The experiment consisted of a total of 12 weekly one-hour (60 minutes) programs from April 12, 2018 to June 26, 2018. The forest healing program is an activity that uses the various environmental factors that exist in the forest to increase the immunity of the human body and restore physical and mental health. To determine the difference in ego-resilience before and after the program, we conducted a paired t-test and analyzed with the SPSS 18.0 program. The results showed that the ego-resilience significantly improved in all sub-factors including the positive thinking ability, problem-solving ability, intimacy ability, emotional adjustment ability, and autonomic behavior ability (p<.001). The descriptive statistics of the language ability showed the improvement in writing errors, pronunciation errors, sentence errors, tense errors, and errors in research and connection. We expect the results of this study can be used as the basic data to improve ego-resilience and language acquisition ability of middle-entry children and students from multicultural families.

A study on content strategy for long-term exposure of YouTube's 'Trending' (유튜브 '인기급상승' 장기 노출을 위한 콘텐츠 전략에 관한 연구)

  • Lee, Min-Young;Byun, Guk-Do;Choi, Sang-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.4
    • /
    • pp.359-372
    • /
    • 2022
  • This study aimed to derive a YouTube content strategy that can be exposed to Trending for a long time by comparing the features of 20 channels in the short/long term using 'YouTube Trending' data in 2021. First, through Pearson's correlation analysis, we found that various factors such as 'the number of title or tag letters' related to long-term exposure, and set this as an index to compare features. As a result, 1)'video title' of about 40-45 letters without excessive special characters, 2)'video length' within 10 minutes, 3)'Video description' is effective when writing 2-3 sentences and adding SNS information or including 3 key tags. Also, it would be more effective if you set key tag pairs such as (먹방, mukbang), (역대급, 레전드) derived through text mining. Through this, the channel will spread globally, bringing various advantages, and will be used as an indicator to evaluate the globality of the channel.

Personalized Speech Classification Scheme for the Smart Speaker Accessibility Improvement of the Speech-Impaired people (언어장애인의 스마트스피커 접근성 향상을 위한 개인화된 음성 분류 기법)

  • SeungKwon Lee;U-Jin Choe;Gwangil Jeon
    • Smart Media Journal
    • /
    • v.11 no.11
    • /
    • pp.17-24
    • /
    • 2022
  • With the spread of smart speakers based on voice recognition technology and deep learning technology, not only non-disabled people, but also the blind or physically handicapped can easily control home appliances such as lights and TVs through voice by linking home network services. This has greatly improved the quality of life. However, in the case of speech-impaired people, it is impossible to use the useful services of the smart speaker because they have inaccurate pronunciation due to articulation or speech disorders. In this paper, we propose a personalized voice classification technique for the speech-impaired to use for some of the functions provided by the smart speaker. The goal of this paper is to increase the recognition rate and accuracy of sentences spoken by speech-impaired people even with a small amount of data and a short learning time so that the service provided by the smart speaker can be actually used. In this paper, data augmentation and one cycle learning rate optimization technique were applied while fine-tuning ResNet18 model. Through an experiment, after recording 10 times for each 30 smart speaker commands, and learning within 3 minutes, the speech classification recognition rate was about 95.2%.

A Study on Speech Recognition Using the HM-Net Topology Design Algorithm Based on Decision Tree State-clustering (결정트리 상태 클러스터링에 의한 HM-Net 구조결정 알고리즘을 이용한 음성인식에 관한 연구)

  • 정현열;정호열;오세진;황철준;김범국
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.2
    • /
    • pp.199-210
    • /
    • 2002
  • In this paper, we carried out the study on speech recognition using the KM-Net topology design algorithm based on decision tree state-clustering to improve the performance of acoustic models in speech recognition. The Korean has many allophonic and grammatical rules compared to other languages, so we investigate the allophonic variations, which defined the Korean phonetics, and construct the phoneme question set for phonetic decision tree. The basic idea of the HM-Net topology design algorithm is that it has the basic structure of SSS (Successive State Splitting) algorithm and split again the states of the context-dependent acoustic models pre-constructed. That is, it have generated. the phonetic decision tree using the phoneme question sets each the state of models, and have iteratively trained the state sequence of the context-dependent acoustic models using the PDT-SSS (Phonetic Decision Tree-based SSS) algorithm. To verify the effectiveness of the above algorithm we carried out the speech recognition experiments for 452 words of center for Korean language Engineering (KLE452) and 200 sentences of air flight reservation task (YNU200). Experimental results show that the recognition accuracy has progressively improved according to the number of states variations after perform the splitting of states in the phoneme, word and continuous speech recognition experiments respectively. Through the experiments, we have got the average 71.5%, 99.2% of the phoneme, word recognition accuracy when the state number is 2,000, respectively and the average 91.6% of the continuous speech recognition accuracy when the state number is 800. Also we haute carried out the word recognition experiments using the HTK (HMM Too1kit) which is performed the state tying, compared to share the parameters of the HM-Net topology design algorithm. In word recognition experiments, the HM-Net topology design algorithm has an average of 4.0% higher recognition accuracy than the context-dependent acoustic models generated by the HTK implying the effectiveness of it.

The Effect of Color Filter on the Reading Ability in Teenager with Irlen-Syndrome (얼렌증후군에서 컬러필터가 읽기능력에 미치는 영향)

  • Lee, Dong-Joon;Leem, Hyun-Sung
    • Journal of Korean Ophthalmic Optics Society
    • /
    • v.18 no.2
    • /
    • pp.125-136
    • /
    • 2013
  • Purpose: The aim of this study was to investigate the effect of improving read speed with color filter or without color filter to improve reading disorder of teenager who were diagnosed as Meares-Irlen syndrome through survey inspection with Meares-Irlen syndrome visual stress (MISViS) score. Methods: MISViS subjects were selected from screening survey MISViS results given above 2.13 in the clinical criteria scores (MISViS score). Reading speed were measured quickly and efficiently the rate of reading via test in which randomly ordered common words are read aloud during a minute. Each of the subjects were worn a filter of the lowest concentration in each color filter group composed of 15 groups. Results: MISViS score of MISViS group and control group were 2.57 and 0.66, respectively. Results of reading speed with filter and without filter in MISViS group were $102.27{\pm}27.86$ wpm and $118.87{\pm}26.99$ wpm (p=0.001), respectively, as well as were $132.93{\pm}6.88$ wpm and $133.43{\pm}6.64$ wpm (p=0.131) in the normal group. Associated with error changes with filter and without filter between two groups, skipping in MISViS Group were from $0.25{\pm}0.62$ times to 0 times (p=0.191), Errors were from $1.83{\pm}1.69$ times to $0.17{\pm}0.38$ times (p = 0.004) and, repetitions were 0. skipping in control group were 0 times, errors were from $0.21{\pm}0.43$ times to $0.07{\pm}0.27$ times (p=0.336) and, repetitions were from $0.14{\pm}0.36$ times to 0 (p=0.165). The filter of blue series chosen in MISViS group had higher percentage (40%), whereas, subjects in normal group were more likely to prefer the filter of gray color (29%). Conclusions: This study showed that MISViS score have been used as a significant diagnosis for Irlen syndrome screening. This study found that wearing suitable color filter for MISViS patients were useful to improve learning with regard to reading. Unique color filter selection for MISViS subjects must be carefully considered since fit color filter are different personally.