Search | Korea Science

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

Park, Jongin;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.25 no.3
- /
- pp.19-41
- /
- 2019
According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.
https://doi.org/10.13088/jiis.2019.25.3.019 인용 PDF KSCI

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

Jeong, Hanjo;Park, Byeonghwa
- Journal of Intelligence and Information Systems
- /
- v.21 no.1
- /
- pp.1-13
- /
- 2015
As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.
https://doi.org/10.13088/jiis.2015.21.1.01 인용 PDF KSCI

Implementation of a walking-aid light with machine vision-based pedestrian signal detection (머신비전 기반 보행신호등 검출 기능을 갖는 보행등 구현)

Jihun Koo;Juseong Lee;Hongrae Cho;Ho-Myoung An
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.17 no.1
- /
- pp.31-37
- /
- 2024
In this study, we propose a machine vision-based pedestrian signal detection algorithm that operates efficiently even in computing resource-constrained environments. This algorithm demonstrates high efficiency within limited resources and is designed to minimize the impact of ambient lighting by sequentially applying HSV color space-based image processing, binarization, morphological operations, labeling, and other steps to address issues such as light glare. Particularly, this algorithm is structured in a relatively simple form to ensure smooth operation within embedded system environments, considering the limitations of computing resources. Consequently, it possesses a structure that operates reliably even in environments with low computing resources. Moreover, the proposed pedestrian signal system not only includes pedestrian signal detection capabilities but also incorporates IoT functionality, allowing wireless integration with a web server. This integration enables users to conveniently monitor and control the status of the signal system through the web server. Additionally, successful implementation has been achieved for effectively controlling 50W LED pedestrian signals. This proposed system aims to provide a rapid and efficient pedestrian signal detection and control system within resource-constrained environments, contemplating its potential applicability in real-world road scenarios. Anticipated contributions include fostering the establishment of safer and more intelligent traffic systems.
https://doi.org/10.17661/jkiiect.2024.17.1.31 인용 PDF HTML

The Meaning of Collective Relationships Becoming by Large-scale Interview Project - Focused on the media exhibition art <70mk> - (대규모 인터뷰 작업이 생성하는 집단적 관계성의 의미 - 미디어전시예술 <70mK>를 중심으로)

OH, Se Hyun
- Trans-
- /
- v.7
- /
- pp.19-48
- /
- 2019
This study was described to examine the meaning of the media exhibition work <70mK>, which aims to capture the topography of the collective consciousness of the Korean people through large-scale interviews. <70mK> edits and organizes interview images of individual beings in mosaic-like layouts and forms, creating video exhibitions and holding exhibitions. The objects in the split frame show the continuity of differences that reveal their own thoughts and personalities. This is a synchronic and conscious collective typology in which the intrinsic nature of the individuals is embodied in a simultaneous and holistic image. Interview images reveal their own form as a actual being and convey the intrinsic nature of one's own as oral information. <70mK> constructs a new individualization by aesthetically structuring the forms and information of life individuals in the extension of a specific group. The beings in the frame are not communicating with each other and are looking straight ahead. it conveys to visitors their relationship and personality as the preindividual reality. It is the repetitive arrangement and composition of heterogeneity and difference that each individual shows, and is a chain operation that includes collective identity behind it. <70mK> constructs the direct images and sounds of individual interviewee, creating a new form of information transfer called Video Art Exhibition. This makes metaphors and perceptions of the meaning and process of transindividual relationships and the meaning of psychic individuation and collective individuation. This is an appropriate case to explain with modern technology and individualization of Gilbert Simondon thought together with the meaning of becoming and relation of individualization. The exhibition space constructed by <70mK> is an aesthetic methodology of the psychic and collective meaning and its relationship to a particular group of individuals through which they are connected. Simondon studied the meaning of the process of individualization and the meaning of becoming, and is a philosopher who positively considered the potential of modern technology. <70mK> is a new individual as structured and generated ethical reality mediated by modern technology mechanisms and network behaviors. It is an case of an aesthetic and practical methodology of how interviews function as 'transduction' in the process of individualization in which technology is cooperated. The direct images and sounds of <70mK> are systems in which the information of life individuals is carried, amplified, accumulated and transmitted. It is also a new individual as a psychic and collective landscape. It is a newly became exhibition art work through the multiple individualization, and is a representation of transindividual meanings and process. The media exhibition art of individualized metastable states leads to new relationships in which viewers perceive the same preindividual reality and feel affectivity. The exhibition space of <70mK> becomes a stage for preparing the actual possibility of the transindividual group beyond the representation of the semantic function.
PDF

About the Multi-layered Communication of Princess Pari on the Webtoon Platform of Daum -Focusing on Analysis of Narrative Structure and Comments (Daum 웹툰 <바리공주>를 통해 본 고전 기반 웹툰 콘텐츠의 다층적 대화 양상 -서사구조와 댓글 분석을 중심으로)

Choe, Key-Sook
- Journal of Popular Narrative
- /
- v.25 no.3
- /
- pp.303-345
- /
- 2019
This article analyzes the multi-layered communication in the Webtoon Princess Pari, released on the Daum portal site, created (written and illustrated) by Kim Naim, through analyzing the narrative structure and comments with the qualitative / quantitative methodology. The webtoon Princess Pari is structured in an omnibus style in which unit narratives are intermittently articulated, multi-lined, and interconnected. As integrated narratives which link with unitary narratives, Pari's growth story as a shaman and a romance narrative are structured. The classical original story of the shaman was used as a prehistory corresponding to the prequel of the webtoon through a preview, and the writer restructured the narrative to overcome the contradictions of the gender asymmetry and the patriarchal ideology of the original text. The viewer then creates a conversational space by giving critical and reflective comments. According to a statistical analysis conducted through sampling, the types of comments can be classified as follows: Appreciation and criticism of the contents ≫ Emotional response ≫ Intuitive overall review ≫ Knowledge and reflection ≫ Comments on comments. In the process of creation and acceptance of the Webtoon, a multi-layered dialogue between classical and modern, content and audience, acceptance and creation has been at play. In the creation dimension, the writer used a device to fill the gap of mythical symbols of the contents. At the level of the audience, they formed a culture of sharing information, knowledge, and reflection about tradition/folk/culture through comments. This corresponds to classical and modern dialogue through the webtoon. The viewers form a sympathetic bond, attempt hermeneutical coordination, supplement the information, and search for a balanced angle through controversial conversation. In addition, by commenting on attitudes, views, and perspective, the commentators showed a behavioral pattern corresponding to meta-criticism in literature. The viewers' comments acted as feedback on the creation of the webtoons, so that the creation and acceptance itself influenced the production of the content of the webtoon. The webtoon Princess Pari, which was based on Korean classical narrative, has been reorganized onto 'moving and dynamic' content, which leads to sense, thinking, criticism and reflection through the formation of various dialogues.
https://doi.org/10.18856/jpn.2019.25.3.009 인용

A Queriable XML Compression using Inferred Data Types (추론한 데이타 타입을 이용한 질의 가능 XML 압축)

;;Chung Chin-Wan
- Journal of KIISE:Databases
- /
- v.32 no.4
- /
- pp.441-451
- /
- 2005
HTML is mostly stored in native file systems instead of specialized repositories such as a database. Like HTML, XML, the standard for the exchange and the representation of data in the Internet, is mostly resident on native file systems. However. since XML data is irregular and verbose, the disk space and the network bandwidth are wasted compared to those of regularly structured data. To overcome this inefficiency of XML data, the research on the compression of XML data has been conducted. Among recently proposed XML compression techniques, some techniques do not support querying compressed data, while other techniques which support querying compressed data blindly encode data values using predefined encoding methods without considering the types of data values which necessitates partial decompression for processing range queries. As a result, the query performance on compressed XML data is degraded. Thus, this research proposes an XML compression technique which supports direct and efficient evaluations of queries on compressed XML data. This XML compression technique adopts an encoding method, called dictionary encoding, to encode each tag of XML data and applies proper encoding methods for encoding data values according to the inferred types of data values. Also, through the implementation and the performance evaluation of the XML compression technique proposed in this research, it is shown that the implemented XML compressor efficiently compresses real-life XML data lets and achieves significant improvements on query performance for compressed XML data.
PDF KSCI

A Case Study on Instruction for Mathematically Gifted Children through The Application of Open-ended Problem Solving Tasks (개방형 과제를 활용한 수학 영재아 수업 사례 분석)

Park Hwa-Young;Kim Soo-Hwan
- Communications of Mathematical Education
- /
- v.20 no.1 s.25
- /
- pp.117-145
- /
- 2006
Mathematically gifted children have creative curiosity about novel tasks deriving from their natural mathematical talents, aptitudes, intellectual abilities and creativities. More effect in nurturing the creative thinking found in brilliant children, letting them approach problem solving in various ways and make strategic attempts is needed. Given this perspective, it is desirable to select open-ended and atypical problems as a task for educational program for gifted children. In this paper, various types of open-ended problems were framed and based on these, teaming activities were adapted into gifted children's class. Then in the problem solving process, the characteristic of bright children's mathematical thinking ability and examples of problem solving strategies were analyzed so that suggestions about classes for bright children utilizing open-ended tasks at elementary schools could be achieved. For this, an open-ended task made of 24 inquiries was structured, the teaching procedure was made of three steps properly transforming Renzulli's Enrichment Triad Model, and 24 periods of classes were progressed according to the teaching plan. One period of class for each subcategories of mathematical thinking ability; ability of intuitional insight, systematizing information, space formation/visualization, mathematical abstraction, mathematical reasoning, and reflective thinking were chosen and analyzed regarding teaching, teaming process and products. Problem solving examples that could be anticipated through teaching and teaming process and products analysis, and creative problem solving examples were suggested, and suggestions about teaching bright children using open-ended tasks were deduced based on the analysis of the characteristic of tasks, role of the teacher, impartiality and probability of approaching through reflecting the classes. Through the case study of a mathematics class for bright children making use of open-ended tasks proved to satisfy the curiosity of the students, and was proved to be effective for providing and forming a habit of various mathematical thinking experiences by establishing atypical mathematical problem solving strategies. This study is meaningful in that it provided mathematically gifted children's problem solving procedures about open-ended problems and it made an attempt at concrete and practical case study about classes fur gifted children while most of studies on education for gifted children in this country focus on the studies on basic theories or quantitative studies.
PDF

Development of Smart Mining Technology Level Diagnostics and Assessment Model for Mining Sites (광산 현장의 스마트 마이닝 기술 수준 진단평가 모델 개발)

Park, Sebeom;Choi, Yosoon
- Tunnel and Underground Space
- /
- v.32 no.1
- /
- pp.78-92
- /
- 2022
In this study, we proposed a diagnostics and assessment model for mining sites that can evaluate the smart mining technology level in a systematic and structured way. For this, the maturity of the smart mining was defined, and detailed assessment items of the diagnostics and assessment model for smart mining were derived by considering the smart factory diagnostics and assessment model (KS X 9001-3) used in the manufacturing industry. While maintaining the existing system, the existing 46 detailed assessment items were modified to be suitable for mining. As a result, a total of 29 detailed assessment items were derived in the areas of promotion strategy, process, information system and automation, and performance. Based on this, a questionnaire was designed to diagnose the level of smart mining technology, and assessment was performed by applying it to domestic iron mines. The level of smart mining technology in the study area was found to be level 2, and it could be inferred that it was about 40% lower than the average smart level of the general manufacturing industry. In addition, by using the developed model, it was possible to recognize the weak points of the mine at each stage of the introduction, operation, and advancement of smart mining, and to suggest investment and improvement directions.
https://doi.org/10.7474/TUS.2022.32.1.078 인용 PDF KSCI

Search Result 78, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)