• Title/Summary/Keyword: Text Matching

Search Result 149, Processing Time 0.02 seconds

Self Introduction Essay Classification Using Doc2Vec for Efficient Job Matching (Doc2Vec 모형에 기반한 자기소개서 분류 모형 구축 및 실험)

  • Kim, Young Soo;Moon, Hyun Sil;Kim, Jae Kyeong
    • Journal of Information Technology Services
    • /
    • v.19 no.1
    • /
    • pp.103-112
    • /
    • 2020
  • Job seekers are making various efforts to find a good company and companies attempt to recruit good people. Job search activities through self-introduction essay are nowadays one of the most active processes. Companies spend time and cost to reviewing all of the numerous self-introduction essays of job seekers. Job seekers are also worried about the possibility of acceptance of their self-introduction essays by companies. This research builds a classification model and conducted an experiments to classify self-introduction essays into pass or fail using deep learning and decision tree techniques. Real world data were classified using stratified sampling to alleviate the data imbalance problem between passed self-introduction essays and failed essays. Documents were embedded using Doc2Vec method developed from existing Word2Vec, and they were classified using logistic regression analysis. The decision tree model was chosen as a benchmark model, and K-fold cross-validation was conducted for the performance evaluation. As a result of several experiments, the area under curve (AUC) value of PV-DM results better than that of other models of Doc2Vec, i.e., PV-DBOW and Concatenate. Furthmore PV-DM classifies passed essays as well as failed essays, while PV_DBOW can not classify passed essays even though it classifies well failed essays. In addition, the classification performance of the logistic regression model embedded using the PV-DM model is better than the decision tree-based classification model. The implication of the experimental results is that company can reduce the cost of recruiting good d job seekers. In addition, our suggested model can help job candidates for pre-evaluating their self-introduction essays.

A Spelling Error Correction Model in Korean Using a Correction Dictionary and a Newspaper Corpus (교정사전과 신문기사 말뭉치를 이용한 한국어 철자 오류 교정 모델)

  • Lee, Se-Hee;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.427-434
    • /
    • 2009
  • With the rapid evolution of the Internet and mobile environments, text including spelling errors such as newly-coined words and abbreviated words are widely used. These spelling errors make it difficult to develop NLP (natural language processing) applications because they decrease the readability of texts. To resolve this problem, we propose a spelling error correction model using a spelling error correction dictionary and a newspaper corpus. The proposed model has the advantage that the cost of data construction are not high because it uses a newspaper corpus, which we can easily obtain, as a training corpus. In addition, the proposed model has an advantage that additional external modules such as a morphological analyzer and a word-spacing error correction system are not required because it uses a simple string matching method based on a correction dictionary. In the experiments with a newspaper corpus and a short message corpus collected from real mobile phones, the proposed model has been shown good performances (a miss-correction rate of 7.3%, a F1-measure of 97.3%, and a false positive rate of 1.1%) in the various evaluation measures.

Program Plagiarism Detection based on X-treeDiff+ (X-treeDiff+ 기반의 프로그램 복제 탐지)

  • Lee, Suk-Kyoon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.44-53
    • /
    • 2010
  • Program plagiarism is a significant factor to reduce the quality of education in computer programming. In this paper, we propose the technique of identifying similar or identical programs in order to prevent students from reckless copying their programming assignments. Existing approaches for identifying similar programs are mainly based on fingerprints or pattern matching for text documents. Different from those existing approaches, we propose an approach based on the program structur. Using paring progrmas, we first transform programs into XML documents by representing syntactic components in the programs with elements in XML document, then run X-tree Diff+, which is the change detection algorithm for XML documents, and produce an edit script as a change. The decision of similar or identical programs is made on the analysis of edit scripts in terms of program plagiarism. Analysis of edit scripts allows users to understand the process of conversion between two programs so that users can make qualitative judgement considering the characteristics of program assignment and the degree of plagiarism.

Semantic Video Retrieval Based On User Preference (사용자 선호도를 고려한 의미기반 비디오 검색)

  • Jung, Min-Young;Park, Sung-Han
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.4
    • /
    • pp.127-133
    • /
    • 2009
  • To ensure access to rapidly growing video collection, video indexing is becoming more and more essential. A database for video should be build for fast searching and extracting the accurate features of video information with more complex characteristics. Moreover, video indexing structure supports efficient retrieval of interesting contents to reflect user preferences. In this paper, we propose semantic video retrieval method based on user preference. Unlikely the previous methods do not consider user preferences. Futhermore, the conventional methods show the result as simple text matching for the user's query that does not supports the semantic search. To overcome these limitations, we develop a method for user preference analysis and present a method of video ontology construction for semantic retrieval. The simulation results show that the proposed algorithm performs better than previous methods in terms of semantic video retrieval based on user preferences.

A Study On Intelligent Robot Control Based On Voice Recognition For Smart FA (스마트 FA를 위한 음성인식 지능로봇제어에 관한 연구)

  • Sim, H.S.;Kim, M.S.;Choi, M.H.;Bae, H.Y.;Kim, H.J.;Kim, D.B.;Han, S.H.
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.21 no.2
    • /
    • pp.87-93
    • /
    • 2018
  • This Study Propose A New Approach To Impliment A Intelligent Robot Control Based on Voice Recognition For Smart Factory Automation Since human usually communicate each other by voices, it is very convenient if voice is used to command humanoid robots or the other type robot system. A lot of researches has been performed about voice recognition systems for this purpose. Hidden Markov Model is a robust statistical methodology for efficient voice recognition in noise environments. It has being tested in a wide range of applications. A prediction approach traditionally applied for the text compression and coding, Prediction by Partial Matching which is a finite-context statistical modeling technique and can predict the next characters based on the context, has shown a great potential in developing novel solutions to several language modeling problems in speech recognition. It was illustrated the reliability of voice recognition by experiments for humanoid robot with 26 joints as the purpose of application to the manufacturing process.

A Study on the Analysis of Intellectual Structure of Korean Veterinary Sciences (국내 수의과학 분야의 지적 구조 분석에 관한 연구)

  • Cho, Hyun-Yang
    • Journal of Information Management
    • /
    • v.43 no.2
    • /
    • pp.43-66
    • /
    • 2012
  • The purpose of this study is to see the intellectual structure in the field of veterinary sciences in Korea, using author profiling analysis(APA), a bibliometric approach. Three journals are selected on the basis of citation data, exchanging most citations with Korean Journal of Veterinary. And then, 50 authors who published most articles at selected journals during the given period of time were chosen. The analysis of similarity and dissimilarity among authors by comparing co-word appearance patterns from article title, abstracts, and keywords was made. Authors can be grouped 11 minor clusters under 4 major clusters, depending on their interests in the area of veterinary sciences in Korea. The subjects for each cluster at the veterinary sciences are decided by the matching the keyword, representing author's research interest. As a result, it is possible to figure out the current research trends and the researcher network in the field of veterinary sciences.

Emotion-based Real-time Facial Expression Matching Dialogue System for Virtual Human (감정에 기반한 가상인간의 대화 및 표정 실시간 생성 시스템 구현)

  • Kim, Kirak;Yeon, Heeyeon;Eun, Taeyoung;Jung, Moonryul
    • Journal of the Korea Computer Graphics Society
    • /
    • v.28 no.3
    • /
    • pp.23-29
    • /
    • 2022
  • Virtual humans are implemented with dedicated modeling tools like Unity 3D Engine in virtual space (virtual reality, mixed reality, metaverse, etc.). Various human modeling tools have been introduced to implement virtual human-like appearance, voice, expression, and behavior similar to real people, and virtual humans implemented via these tools can communicate with users to some extent. However, most of the virtual humans so far have stayed unimodal using only text or speech. As AI technologies advance, the outdated machine-centered dialogue system is now changing to a human-centered, natural multi-modal system. By using several pre-trained networks, we implemented an emotion-based multi-modal dialogue system, which generates human-like utterances and displays appropriate facial expressions in real-time.

A Study on Constructing a Digital Archive System of the Modern Korean Christian Collections (근대 한국기독교 자료의 디지털 아카이브 시스템 구축에 관한 연구)

  • Yang, Ji-Ann
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.8
    • /
    • pp.681-691
    • /
    • 2022
  • The purpose of this study is to construct a digital archive system by analyzing the collections of the Korean Christian Museum at S University, which has a large number of materials related to Korean Christianity published in the modern period from the time of Korea's enlightenment until liberation. In order to construct a digital archive system, indexes and metadata for the collection are complied according to the pre-defined format. After digitizing the selected collection, a database is built using metadata information, and the actual system is divided into a web standard-based management system and a user service system. Also a content-based search system is constructed, which provides the matching value of retrieval results in units of one character and an automatic search term completion function to enhance user convenience. Therefore, collections in the museum, which are difficult to access the original text, are digitized and provided so that they can be easily used, laying the foundation for the long-term development of humanities contents for improving the accessibility and availability of collections for both researchers and the public.

Effective Picture Search in Lifelog Management Systems using Bluetooth Devices (라이프로그 관리 시스템에서 블루투스 장치를 이용한 효과적인 사진 검색 방법)

  • Chung, Eun-Ho;Lee, Ki-Yong;Kim, Myoung-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.383-391
    • /
    • 2010
  • A Lifelog management system provides users with services to store, manage, and search their life logs. This paper proposes a fully-automatic collecting method of real world social contacts and lifelog search engine using collected social contact information as keyword. Wireless short-distance network devices in mobile phones are used to detect social contacts of their users. Human-Bluetooth relationship matrix is built based on the frequency of a human-being and a Bluetooth device being observed at the same time. Results show that with 20% of social contact information out of full social contact information of the observation times used for calculation, 90% of human-Bluetooth relationship can be correctly acquired. A lifelog search-engine that takes human names as keyword is suggested which compares two vectors, a row of Human-Bluetooth matrix and a vector of Bluetooth list scanned while a lifelog was created, using vector information retrieval model. This search engine returns more lifelog than existing text-matching search engine and ranks the result unlike existing search-engine.

A Study on the Dizziness of Huangdi's Internal Classic $\ll$黃帝內經$\gg$ ($\ll$소문.영추(素問.靈樞)$\gg$에 나타난 현훈(眩暈)에 대한 연구(硏究))

  • Tark, Myoung-Rim;Kang, Na-Ru;Ko, Woo-Shin;Yoon, Hwa-Jung
    • The Journal of Korean Medicine Ophthalmology and Otolaryngology and Dermatology
    • /
    • v.24 no.1
    • /
    • pp.142-170
    • /
    • 2011
  • Objective : The purpose of this study is to investigate dizziness of Plain Questions $\ll$素問$\gg$ and Miraculous Pivot $\ll$靈樞$\gg$. Methods : We conducted a study on the original text paragraphs of Internal Classic $\ll$內經$\gg$ containing the dizziness and analysis of Yang, Ma, Zhang, Wang etc. We drew a parallel between dizziness from Internal Classic $\ll$內經$\gg$and matching diagnoses from western medicine. Results : The results were as follows. 1. Dizziness in Ok Ki Jin Jiang Ron <玉機眞藏論> and Pyo Bon Byeong Jeon Ron <標本病傳論> had relation to liver and was similar to dizziness caused by tension, hypertension, anemia and cerebrovascular accident etc. in western medicine. 2. Dizziness in Ja Yeol<刺熱>, O Sa<五邪> and Hai Ron<海論> had relation to kidney and was similar to dizziness caused by aging and peripheral vertigo concurrent with tinnitus and difficulty in hearing in western medicine. 3. Dizziness in O Sa<五邪> had relation to heart(pericardium) and was similar to dizziness caused by cardiac output loss and psychogenic dizziness in western medicine. 4. In Internal Classic $\ll$內經$\gg$ the main etiology of dizziness was infirmity(虛), which were Qi(氣) of the upper portion of the body being insufficient(上氣不足), blood depletion(血枯), deficiency of marrow-reservoir(髓海不足) etc. 5. In Dae Hok Ron<大惑論> etiology and pathogenesis of dizziness were mentioned and dizziness was similar to dizziness caused by eye disorder, psychogenic dizziness and central dizziness in western medicine. 6. In Internal Classic $\ll$內經$\gg$ the meridian of acupuncture points which was used much for dizziness was Bladder Meridian. Aqupunture points used in treatment of dizziness were Ch'onju(天柱), Kollyun(崑崙), Taejo, Chok-t'ongkok(足通谷) etc. Conclusion : We found out etiology, pathogenesis, treatments of dizziness in Internal Classic $\ll$內經$\gg$. Further we compared with western medicine to develop better understanding of dizziness.