• Title/Summary/Keyword: Stemming algorithm

Search Result 19, Processing Time 0.028 seconds

A Study on Rhythm Information Visualization Using Syllable of Digital Text (디지털 텍스트의 음절을 이용한 운율 정보 시각화에 관한 연구)

  • Park, seon-hee;Lee, jae-joong;Park, jin-wan
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.120-126
    • /
    • 2009
  • As the information age grows rapidly, the amount of digital texts has been increasing as well. It has brought an increasing of visualization case in order to figure out lots of digital texts. Existing visualized design of digital text is merely concentrating on figuration of subject word through adoption of stemming algorithm and word frequency extraction, prominence of meaning of text, and connection in between sentences. So it is a fact that expression of rhythm that can visualize sentimental feeing of digital text was insufficient. Syllable is a phoneme unit that can express rhythm more efficiently. In sentences, syllable is a most basic pronunciation unit in pronouncing word, phase and sentence. On this basis, accent, intonation, length of rhythm factor and others are based on syllable. Sonority, which is most closely associated with definitions of syllable, is expressed through air flow of igniting lung and acoustic energy that is specified kinetic energy into sonority. Seen from this perspective, this study examines phonologic definition and characteristics based on syllable, which is properties of digital text, and research the way to visualize rhythm through diagram. After converting digital text into phonetic symbol by the experiment, rhythm information are visualized into images using degree of resonance, which was started from rhythm in all languages, and using syllable establishment of digital text. By visualizing syllable information, it provides syllable information of digital text and express sentiment of digital text through diagram to assist user's understanding by systematic formula. Therefore, this study is aimed at planning for easy understanding of text's rhythm and realizing visualization of digital text.

  • PDF

A Fair MAC Algorithm under Capture Effect in IEEE 802.11 DCF -based WLANs (IEEE 802.11 무선랜에서 캡쳐 효과를 고려한 Fair MAC 알고리즘)

  • Jeong, Ji-Woong;Choi, Sun-Woong;Kim, Chong-Kwon
    • Journal of KIISE:Information Networking
    • /
    • v.37 no.5
    • /
    • pp.386-395
    • /
    • 2010
  • Widespread deployment of infrastructure WLANs has made Wi-Fi an integral part of today's Internet access technology. Despite its crucial role in affecting end-to-end performance, past research has focused on MAC protocol enhancement, analysis, and simulation-based performance evaluation without sufficiently considering a misbehavior stemming from capture effect. It is well known that the capture effect occurs frequently in wireless environment and incurs throughput unfairness between nodes. In this paper, we propose a novel Fair MAC algorithm which achieves fairness even under physically unfair environment. While satisfying the fairness, the proposed algorithm maximizes the system throughput. Extensive simulation results show that the proposed Fair MAC algorithm substantially improves fairness without throughput reduction.

A Study on Technology Forecasting based on Co-occurrence Network of Keyword in Multidisciplinary Journals (다학제 분야 학술지의 주제어 동시발생 네트워크를 활용한 기술예측 연구)

  • Kim, Hyunuk;Ahn, Sang-Jin;Jung, Woo-Sung
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.40 no.4
    • /
    • pp.49-63
    • /
    • 2015
  • Keyword indexed in multidisciplinary journals show trends about science and technology innovation. Nature and Science were selected as multidisciplinary journals for our analysis. In order to reduce the effect of plurality of keyword, stemming algorithm were implemented. After this process, we fitted growth curve of keyword (stem) following bass model, which is a well-known model in diffusion process. Bass model is useful for expressing growth pattern by assuming innovative and imitative activities in innovation spreading. In addition, we construct keyword co-occurrence network and calculate network measures such as centrality indices and local clustering coefficient. Based on network metrics and yearly frequency of keyword, time series analysis was conducted for obtaining statistical causality between these measures. For some cases, local clustering coefficient seems to Granger-cause yearly frequency of keyword. We expect that local clustering coefficient could be a supportive indicator of emerging science and technology.

Improving the Accuracy of a Heliocentric Potential (HCP) Prediction Model for the Aviation Radiation Dose

  • Hwang, Junga;Yoon, Kyoung-Won;Jo, Gyeongbok;Noh, Sung-Jun
    • Journal of Astronomy and Space Sciences
    • /
    • v.33 no.4
    • /
    • pp.279-285
    • /
    • 2016
  • The space radiation dose over air routes including polar routes should be carefully considered, especially when space weather shows sudden disturbances such as coronal mass ejections (CMEs), flares, and accompanying solar energetic particle events. We recently established a heliocentric potential (HCP) prediction model for real-time operation of the CARI-6 and CARI-6M programs. Specifically, the HCP value is used as a critical input value in the CARI-6/6M programs, which estimate the aviation route dose based on the effective dose rate. The CARI-6/6M approach is the most widely used technique, and the programs can be obtained from the U.S. Federal Aviation Administration (FAA). However, HCP values are given at a one month delay on the FAA official webpage, which makes it difficult to obtain real-time information on the aviation route dose. In order to overcome this critical limitation regarding the time delay for space weather customers, we developed a HCP prediction model based on sunspot number variations (Hwang et al. 2015). In this paper, we focus on improvements to our HCP prediction model and update it with neutron monitoring data. We found that the most accurate method to derive the HCP value involves (1) real-time daily sunspot assessments, (2) predictions of the daily HCP by our prediction algorithm, and (3) calculations of the resultant daily effective dose rate. Additionally, we also derived the HCP prediction algorithm in this paper by using ground neutron counts. With the compensation stemming from the use of ground neutron count data, the newly developed HCP prediction model was improved.

Dual-mode Transmission Strategy for Blind Interference Alignment Scheme in MISO Broadcast Channels (MISO 브로드캐스트 채널에서의 블라인드 간섭 정렬 기법 기반 이중 전송 기법 설계)

  • Yang, Minho;Jang, Jinyoung;Kim, Dong Ku
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.12
    • /
    • pp.1102-1109
    • /
    • 2013
  • Blind interference alignment (BIA) scheme has demonstrated a way of interference alignment (IA) without channel state information at transmitter (CSIT). While it shows superior performance in high signal-to-noise ratio (SNR) regime stemming from the maximal degrees of freedom (DoF) gain, BIA scheme achieves inferior sum-rate performance in low SNR regime. This paper proposes a dual-mode transmission strategy which switches between single user (SU) SISO with receive mode selection and the BIA scheme depending upon the range of SNR. First, we derive a closed-form achievable rate for each transmission-mode. Secondly, we propose a low-complex transmission-mode selection algorithm.

A Full Digital Multipath Generator (완전 디지털 다중경로발생기)

  • 권성재
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.7 no.2
    • /
    • pp.74-81
    • /
    • 2002
  • In general, a multipath generator consists of a time delay generator, phase rotator, and amplitude attenuator, and is implemented mostly in an analog manner. Analog, or partially analog versions of a multipath generator is disadvantageous in that they may suffer from problems associated with component aging and adjustment, signal fidelity degradation stemming from repeated A/D and D/A conversion use of high frequency to achieve fine i.e., subsample fractional tin delays. This paper presents the design and implementation methodology of a full digital multipath generator which can be used in performance evaluations of digital terrestrial television as well as communications, receivers. In particular, an efficient practical method is proposed which can achieve both integer and fractional time delays simultaneously, without placing restrictions on the allowable system master clock frequency. The proposed algorithm lends itself to minimizing hardware implementation cost by relegating some fixed put of the computation involved to an IBM PC. The proposed multipath generator occupies only a single digital board space, and its experimental results are provided to corroborate the proposed implementation methodology.

  • PDF

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

An Ontology Model for Public Service Export Platform (공공 서비스 수출 플랫폼을 위한 온톨로지 모형)

  • Lee, Gang-Won;Park, Sei-Kwon;Ryu, Seung-Wan;Shin, Dong-Cheon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.149-161
    • /
    • 2014
  • The export of domestic public services to overseas markets contains many potential obstacles, stemming from different export procedures, the target services, and socio-economic environments. In order to alleviate these problems, the business incubation platform as an open business ecosystem can be a powerful instrument to support the decisions taken by participants and stakeholders. In this paper, we propose an ontology model and its implementation processes for the business incubation platform with an open and pervasive architecture to support public service exports. For the conceptual model of platform ontology, export case studies are used for requirements analysis. The conceptual model shows the basic structure, with vocabulary and its meaning, the relationship between ontologies, and key attributes. For the implementation and test of the ontology model, the logical structure is edited using Prot$\acute{e}$g$\acute{e}$ editor. The core engine of the business incubation platform is the simulator module, where the various contexts of export businesses should be captured, defined, and shared with other modules through ontologies. It is well-known that an ontology, with which concepts and their relationships are represented using a shared vocabulary, is an efficient and effective tool for organizing meta-information to develop structural frameworks in a particular domain. The proposed model consists of five ontologies derived from a requirements survey of major stakeholders and their operational scenarios: service, requirements, environment, enterprise, and county. The service ontology contains several components that can find and categorize public services through a case analysis of the public service export. Key attributes of the service ontology are composed of categories including objective, requirements, activity, and service. The objective category, which has sub-attributes including operational body (organization) and user, acts as a reference to search and classify public services. The requirements category relates to the functional needs at a particular phase of system (service) design or operation. Sub-attributes of requirements are user, application, platform, architecture, and social overhead. The activity category represents business processes during the operation and maintenance phase. The activity category also has sub-attributes including facility, software, and project unit. The service category, with sub-attributes such as target, time, and place, acts as a reference to sort and classify the public services. The requirements ontology is derived from the basic and common components of public services and target countries. The key attributes of the requirements ontology are business, technology, and constraints. Business requirements represent the needs of processes and activities for public service export; technology represents the technological requirements for the operation of public services; and constraints represent the business law, regulations, or cultural characteristics of the target country. The environment ontology is derived from case studies of target countries for public service operation. Key attributes of the environment ontology are user, requirements, and activity. A user includes stakeholders in public services, from citizens to operators and managers; the requirements attribute represents the managerial and physical needs during operation; the activity attribute represents business processes in detail. The enterprise ontology is introduced from a previous study, and its attributes are activity, organization, strategy, marketing, and time. The country ontology is derived from the demographic and geopolitical analysis of the target country, and its key attributes are economy, social infrastructure, law, regulation, customs, population, location, and development strategies. The priority list for target services for a certain country and/or the priority list for target countries for a certain public services are generated by a matching algorithm. These lists are used as input seeds to simulate the consortium partners, and government's policies and programs. In the simulation, the environmental differences between Korea and the target country can be customized through a gap analysis and work-flow optimization process. When the process gap between Korea and the target country is too large for a single corporation to cover, a consortium is considered an alternative choice, and various alternatives are derived from the capability index of enterprises. For financial packages, a mix of various foreign aid funds can be simulated during this stage. It is expected that the proposed ontology model and the business incubation platform can be used by various participants in the public service export market. It could be especially beneficial to small and medium businesses that have relatively fewer resources and experience with public service export. We also expect that the open and pervasive service architecture in a digital business ecosystem will help stakeholders find new opportunities through information sharing and collaboration on business processes.