• Title/Summary/Keyword: data similarity

Search Result 2,098, Processing Time 0.032 seconds

Automatic Classification and Vocabulary Analysis of Political Bias in News Articles by Using Subword Tokenization (부분 단어 토큰화 기법을 이용한 뉴스 기사 정치적 편향성 자동 분류 및 어휘 분석)

  • Cho, Dan Bi;Lee, Hyun Young;Jung, Won Sup;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2021
  • In the political field of news articles, there are polarized and biased characteristics such as conservative and liberal, which is called political bias. We constructed keyword-based dataset to classify bias of news articles. Most embedding researches represent a sentence with sequence of morphemes. In our work, we expect that the number of unknown tokens will be reduced if the sentences are constituted by subwords that are segmented by the language model. We propose a document embedding model with subword tokenization and apply this model to SVM and feedforward neural network structure to classify the political bias. As a result of comparing the performance of the document embedding model with morphological analysis, the document embedding model with subwords showed the highest accuracy at 78.22%. It was confirmed that the number of unknown tokens was reduced by subword tokenization. Using the best performance embedding model in our bias classification task, we extract the keywords based on politicians. The bias of keywords was verified by the average similarity with the vector of politicians from each political tendency.

A study on the digitalization of 3D Pen (3D펜의 디지털화에 대한 연구)

  • Kim, Jong-Young;Jeon, Byung-Hoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.6
    • /
    • pp.583-590
    • /
    • 2021
  • This paper is a study on the digitization of an analog 3D pen. The term digital implies features such as homeostasis, transformability, combinability, reproducibility, and convenience of storage. One device that produces a combination of these digital characteristics is a 3D printer, but its industrial use is limited due to low productivity and limitations with materials and physical characteristics. In particular, improvements are required to use 3D printers, such as better user accessibility owing to expertise and skills in modeling software and printers. Complementing this fact is the 3D pen, which is excellent in portability and ease of use, but has a limitation in that it cannot be digitized. Therefore, in order to secure a digitalization capability and ease of use, and to secure the safety of printing materials that pose controversial hazards during the printing process, research problems and alternatives have been derived by combining food, and digitization was demonstrated with a newly developed 3D pen. In order to digitize the 3D pen, a sensor in a structured device detects the motion of an analog 3D pen, and this motion is converted into 3D data (X-Y-Z coordinate values) through a spatial analysis algorithm. To prove this method, the similarity was confirmed by visualization using MeshLab version 1.3.4. It is expected that this food pen can be used in youth education and senior healthcare programs in the future.

Isolation and identification of tick-borne pathogens in hard ticks collected in Daejeon (대전 주택가 산책로 진드기의 인수공통전염병 병원체 감염실태 조사)

  • Han, So-young;Sung, Sun-hye;Seo, Jin-woo;Kim, Jong-ho;Lee, Seok-ju;Yoo, Sang-sik
    • Korean Journal of Veterinary Service
    • /
    • v.44 no.2
    • /
    • pp.93-102
    • /
    • 2021
  • In this study, a total of 9,449 hard ticks were collected once a month from April to October 2020 from a neighborhood park in Daejeon by flagging & dragging method and CO2 manned trap method. The collected ticks were classified according to the Yamagutsi search table using a stereoscopic microscope and molecular biological analysis of four pathogens (SFTSV, Anaplasma spp., Ehrlichia spp., Borrellia spp.). As a result of the study, Haemaphysalis longicornis were collected the most in all areas of the five boroughs at a rate of 82 to 96 percent, while adults were collected the most in May to July, nymphs were collected the most in April to June, and larvae from August to October at a rate of 78 percent to 98 percent. In pathogens, three cases of SFTSV were detected, showing a minimum infection rate (MIR) of 0.46%, while Anaplasma spp. and Ehrlichia spp. were detected one each, with 0.15% and Borrelia spp. with a minimum infection rate of 0.46%. The detected SFTSV showed 99.9% homogeneity with the KF781490 detected in Cheongwon-gun, Chungbuk Province, Anaplasma spp. showed 99.0% homogeneity with JN990105 detected in China, and Erhlichia spp. showed 98.9% genetic similarity with U96436 separated from the U.S. In this study, the distribution status and pathogen infection rate of the hard ticks in the Daejeon area are analyzed and provided as basic data for the prevention of the hard tick-borne infectious disease.

COI-Based Genetic Structure of an Exotic Snapping Turtle Chelydra serpentina Imported to South Korea

  • Baek, Su Youn;Shin, ChoRong;Kim, Kyung Min;Choi, Eun-Hwa;Hwang, Jihye;Jun, Jumin;Park, Taeseo;Kil, Hyun Jong;Suk, Ho Young;Min, Mi-Sook;Park, Yoonseong;Lee, YoungSup;Hwang, Ui Wook
    • Animal Systematics, Evolution and Diversity
    • /
    • v.36 no.4
    • /
    • pp.354-362
    • /
    • 2020
  • A common snapping turtle Chelydra serpentina inhabiting North America is internationally protected as an endangered species. It is known that the individuals of common snapping turtles were imported to South Korea as pets, and after being abandoned, some inhabit the natural ecosystem of South Korea like wild animals. No genetic survey has yet been performed for the common snapping turtles imported to South Korea. Hereby, cytochrome c oxidase subunit I (COI) information, which is 594 bp long, was determined for a total of 16 C. serpentina individuals, of which one was found in nature, twelve legally imported and their descendants, and the other three were provided from the Kansas Herpetological Society, USA. The obtained data were combined with thirteen COI sequences of C. serpentina retrieved from NCBI GenBank for the subsequent population genetic analyses. The results showed that there exist five haplotypes with high sequence similarity (only three parsimoniously informative sites). In the TCS and phylogenetic analyses, all the examined C. serpentina samples coincidently formed a strong monoclade with those collected mostly from Kansas State, USA, indicating that the imported ones to South Korea are from the central North America. In addition, there found the amino acid changes and the high degree of nucleotide sequence differences between C. serpentina and C. rossignoni with some important morphological characters. It is expected that the present results could provide an important framework for systematic management and control of exotic snapping turtles imported and released to nature of South Korea.

Comparison of the Effects of Continuous Erosion Control Dams on Benthic Macroinvertebrate Communities Before and After the Rainy Season (연속적인 사방댐이 장마 전·후 저서성 대형무척추동물 군집에 미치는 영향 비교)

  • An, Chae Hui;Han, Jung Soo;Hyun, Jae Bin;Choi, Jun Kil;Lee, Hwang Goo
    • Ecology and Resilient Infrastructure
    • /
    • v.8 no.1
    • /
    • pp.54-63
    • /
    • 2021
  • This study aimed to investigate changes in benthic macroinvertebrate communities caused by erosion control dams using data obtained from three erosion control dams in Wonju, Gangwon Province, before and after the rainy season. Surveys were conducted four times from March to September 2019, and survey points were continuously selected during the installation of closed-type and open-type dams. A total of eight points from the upstream and downstream regions of each dam type were selected. The flow velocity of both the closed and open types increased, but the closed type exhibited a relatively higher flow velocity than the open type. Benthic macroinvertebrate species and individuals mostly decreased after the rainy season. A relatively large number of species and individuals were found upstream of the closed-type dam. An analysis of the ephemeroptera-plecoptera-trichoptera groups showed relatively reduced ephemeroptera in the closed-type dam and reduced trichoptera in the open-type dam. The periods before and after the rainy season could be divided based on the results of a similarity analysis. The open type showed relatively minimal changes before and after the rainy season.

Interpretation of Firing Temperature and Material Similarity for Potteries from Ancient Tombs in Songpa Area, Seoul (서울 송파 지역 고분 출토 토기의 재료학적 동질성 및 소성온도 해석)

  • Lee, Gyu Hye;Yun, Jung Hyun;Lee, Chan Hee
    • Conservation Science in Museum
    • /
    • v.28
    • /
    • pp.17-34
    • /
    • 2022
  • This study seeks to identify the material characteristics of earthenware excavated from the Bangi-dong Ancient Tomb No. 3 and the articulated stone-mound tomb of the Seokchon-dong ancient tombs in the Songpa region, and analyze the homogeneity and the firing temperature of the materials used at each excavated site. The remains have been studied relatively recently, and the groups of tombs in which they were found demonstrate the transition of ancient Korean burial systems, and at the same time, provide important archaeological data about those in power at the time. The earthenware pottery excavated from the two sites examined in the study were buried at different times, and it is assumed that they were made by procuring weathered soil of similar gneiss, judging from the behavior of the compatible and incompatible elements and the weathering tendency found by examining the main components. In addition, the examination of the mineral composition and microstructure of clay indicates that the earthenware from Seokchon-dong was fired at 950 degrees Celsius or lower at a relatively early stage. On the other hand, the earthenware from Bangi-dong Tomb No. 3 was confirmed to have experienced temperatures below 850 degrees Celsius and above 1,000 degrees Celsius. However, it is difficult to interpret the difference as the result of the changes in firing temperature throughout the eras. It is expected that it will be possible to interpret the changes in earthenware manufacturing techniques by comparing more diverse earthenware potteries and ancient soils.

Artificial Intelligence for Assistance of Facial Expression Practice Using Emotion Classification (감정 분류를 이용한 표정 연습 보조 인공지능)

  • Dong-Kyu, Kim;So Hwa, Lee;Jae Hwan, Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.6
    • /
    • pp.1137-1144
    • /
    • 2022
  • In this study, an artificial intelligence(AI) was developed to help with facial expression practice in order to express emotions. The developed AI used multimodal inputs consisting of sentences and facial images for deep neural networks (DNNs). The DNNs calculated similarities between the emotions predicted by the sentences and the emotions predicted by facial images. The user practiced facial expressions based on the situation given by sentences, and the AI provided the user with numerical feedback based on the similarity between the emotion predicted by sentence and the emotion predicted by facial expression. ResNet34 structure was trained on FER2013 public data to predict emotions from facial images. To predict emotions in sentences, KoBERT model was trained in transfer learning manner using the conversational speech dataset for emotion classification opened to the public by AIHub. The DNN that predicts emotions from the facial images demonstrated 65% accuracy, which is comparable to human emotional classification ability. The DNN that predicts emotions from the sentences achieved 90% accuracy. The performance of the developed AI was evaluated through experiments with changing facial expressions in which an ordinary person was participated.

Development of Block-based Code Generation and Recommendation Model Using Natural Language Processing Model (자연어 처리 모델을 활용한 블록 코드 생성 및 추천 모델 개발)

  • Jeon, In-seong;Song, Ki-Sang
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.3
    • /
    • pp.197-207
    • /
    • 2022
  • In this paper, we develop a machine learning based block code generation and recommendation model for the purpose of reducing cognitive load of learners during coding education that learns the learner's block that has been made in the block programming environment using natural processing model and fine-tuning and then generates and recommends the selectable blocks for the next step. To develop the model, the training dataset was produced by pre-processing 50 block codes that were on the popular block programming language web site 'Entry'. Also, after dividing the pre-processed blocks into training dataset, verification dataset and test dataset, we developed a model that generates block codes based on LSTM, Seq2Seq, and GPT-2 model. In the results of the performance evaluation of the developed model, GPT-2 showed a higher performance than the LSTM and Seq2Seq model in the BLEU and ROUGE scores which measure sentence similarity. The data results generated through the GPT-2 model, show that the performance was relatively similar in the BLEU and ROUGE scores except for the case where the number of blocks was 1 or 17.

Classification of Diabetic Retinopathy using Mask R-CNN and Random Forest Method

  • Jung, Younghoon;Kim, Daewon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.12
    • /
    • pp.29-40
    • /
    • 2022
  • In this paper, we studied a system that detects and analyzes the pathological features of diabetic retinopathy using Mask R-CNN and a Random Forest classifier. Those are one of the deep learning techniques and automatically diagnoses diabetic retinopathy. Diabetic retinopathy can be diagnosed through fundus images taken with special equipment. Brightness, color tone, and contrast may vary depending on the device. Research and development of an automatic diagnosis system using artificial intelligence to help ophthalmologists make medical judgments possible. This system detects pathological features such as microvascular perfusion and retinal hemorrhage using the Mask R-CNN technique. It also diagnoses normal and abnormal conditions of the eye by using a Random Forest classifier after pre-processing. In order to improve the detection performance of the Mask R-CNN algorithm, image augmentation was performed and learning procedure was conducted. Dice similarity coefficients and mean accuracy were used as evaluation indicators to measure detection accuracy. The Faster R-CNN method was used as a control group, and the detection performance of the Mask R-CNN method through this study showed an average of 90% accuracy through Dice coefficients. In the case of mean accuracy it showed 91% accuracy. When diabetic retinopathy was diagnosed by learning a Random Forest classifier based on the detected pathological symptoms, the accuracy was 99%.

Analysis of ICT Education Trends using Keyword Occurrence Frequency Analysis and CONCOR Technique (키워드 출현 빈도 분석과 CONCOR 기법을 이용한 ICT 교육 동향 분석)

  • Youngseok Lee
    • Journal of Industrial Convergence
    • /
    • v.21 no.1
    • /
    • pp.187-192
    • /
    • 2023
  • In this study, trends in ICT education were investigated by analyzing the frequency of appearance of keywords related to machine learning and using conversion of iteration correction(CONCOR) techniques. A total of 304 papers from 2018 to the present published in registered sites were searched on Google Scalar using "ICT education" as the keyword, and 60 papers pertaining to ICT education were selected based on a systematic literature review. Subsequently, keywords were extracted based on the title and summary of the paper. For word frequency and indicator data, 49 keywords with high appearance frequency were extracted by analyzing frequency, via the term frequency-inverse document frequency technique in natural language processing, and words with simultaneous appearance frequency. The relationship degree was verified by analyzing the connection structure and centrality of the connection degree between words, and a cluster composed of words with similarity was derived via CONCOR analysis. First, "education," "research," "result," "utilization," and "analysis" were analyzed as main keywords. Second, by analyzing an N-GRAM network graph with "education" as the keyword, "curriculum" and "utilization" were shown to exhibit the highest correlation level. Third, by conducting a cluster analysis with "education" as the keyword, five groups were formed: "curriculum," "programming," "student," "improvement," and "information." These results indicate that practical research necessary for ICT education can be conducted by analyzing ICT education trends and identifying trends.