• Title/Summary/Keyword: Information retrieval systems

Search Result 849, Processing Time 0.032 seconds

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

  • Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.143-159
    • /
    • 2015
  • Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.

Discussions on the Accessibility of School Library DLS Catalogue Records - Focused on Literary Collections - (학교도서관 DLS 목록의 자료 접근성에 대한 논의 - 문학 분야 장서를 중심으로 -)

  • Kang, Bong-Suk;Jung, Youngmi
    • Journal of Korean Library and Information Science Society
    • /
    • v.50 no.4
    • /
    • pp.539-559
    • /
    • 2019
  • One of the fundamental roles of libraries is to provide users with efficient and easy retrieval of materials. Various discussions have been made at domestic and abroad to improve the accessibility of materials by category, user, and collection, and at the center of this is the issue of improving classification and cataloging systems. However, there are few studies in this area dealing with the data accessibility of the DLS catalog, which is a central tool for accessing domestic school library materials. This study started from the appeal of school library users to the difficulty of searching and accessing books, especially literature. This study is an exploratory study that attempts to derive problems by finding the causes of there difficulties from various aspects. To this study, we surveyed and analyzed the current status of school library collections, the data registration of the school library support system DLS, the subject accessibility of catalog records produced through this, and the recognition and opinions of school library professionals. As a result, school library collections were highly concentrated in the literature field, and it was found that there was not enough catalog bibliographic records to provide efficient access to these collections. In addition, it was found to be somewhat lacking through the DLS search function to compensate for this. Surveys of school librarians and librarians have also identified this problem, and a rich topic index and search keyword assignments have been drawn to the majority of opinions as a way to improve access to materials in school library catalogs. As a continuous discussion on this subject, the plan for improving access to school library materials will be more concrete through future user studies and new challenges for bookshelf classification.

Customized Configuration with Template and Options (맞춤구성을 위한 템플릿과 Option 기반의 추론)

  • 이현정;이재규
    • Journal of Intelligence and Information Systems
    • /
    • v.8 no.1
    • /
    • pp.119-139
    • /
    • 2002
  • In electronic catalogs, each item is represented as an independent unit while the parts of the item can be composed of a higher level of functionality. Thus, the search for this kind of product database is limited to the retrieval of most similar standard commodities. However, many industrial products need to configure optional parts to fulfill the required specifications. Since there are many paths in finding the required specifications, we need to develop a search system via the configuration process. In this system, we adopt a two-phased approach. The first phase finds the most similar template, and the second phase adjusts the template specifications toward the required set of specifications by the Constraint and Rule Satisfaction Problem approach. There is no guarantee that the most similar template can find the most desirable configuration. The search system needs backtracking capability, so the search can stop at a satisfied local optimal satisfaction. This framework is applied to the configuration of computers and peripherals. Template-based reasoning is basically the same as case-based reasoning. The required set of specifications is represented by a list of criteria, and matched with the product specifications to find the closest ones. To measure the distance, we develop a thesaurus of values, which can identify the meaning of numbers, symbols, and words. With this configuration, the performance of the search by configuration algorithm is evaluated in terms of feasibility and admissibility.

  • PDF

Detecting Errors in POS-Tagged Corpus on XGBoost and Cross Validation (XGBoost와 교차검증을 이용한 품사부착말뭉치에서의 오류 탐지)

  • Choi, Min-Seok;Kim, Chang-Hyun;Park, Ho-Min;Cheon, Min-Ah;Yoon, Ho;Namgoong, Young;Kim, Jae-Kyun;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.7
    • /
    • pp.221-228
    • /
    • 2020
  • Part-of-Speech (POS) tagged corpus is a collection of electronic text in which each word is annotated with a tag as the corresponding POS and is widely used for various training data for natural language processing. The training data generally assumes that there are no errors, but in reality they include various types of errors, which cause performance degradation of systems trained using the data. To alleviate this problem, we propose a novel method for detecting errors in the existing POS tagged corpus using the classifier of XGBoost and cross-validation as evaluation techniques. We first train a classifier of a POS tagger using the POS-tagged corpus with some errors and then detect errors from the POS-tagged corpus using cross-validation, but the classifier cannot detect errors because there is no training data for detecting POS tagged errors. We thus detect errors by comparing the outputs (probabilities of POS) of the classifier, adjusting hyperparameters. The hyperparameters is estimated by a small scale error-tagged corpus, in which text is sampled from a POS-tagged corpus and which is marked up POS errors by experts. In this paper, we use recall and precision as evaluation metrics which are widely used in information retrieval. We have shown that the proposed method is valid by comparing two distributions of the sample (the error-tagged corpus) and the population (the POS-tagged corpus) because all detected errors cannot be checked. In the near future, we will apply the proposed method to a dependency tree-tagged corpus and a semantic role tagged corpus.

Establishment of Risk Database and Development of Risk Classification System for NATM Tunnel (NATM 터널 공정리스크 데이터베이스 구축 및 리스크 분류체계 개발)

  • Kim, Hyunbee;Karunarathne, Batagalle Vinuri;Kim, ByungSoo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.25 no.1
    • /
    • pp.32-41
    • /
    • 2024
  • In the construction industry, not only safety accidents, but also various complex risks such as construction delays, cost increases, and environmental pollution occur, and management technologies are needed to solve them. Among them, process risk management, which directly affects the project, lacks related information compared to its importance. This study tried to develop a MATM tunnel process risk classification system to solve the difficulty of risk information retrieval due to the use of different classification systems for each project. Risk collection used existing literature review and experience mining techniques, and DB construction utilized the concept of natural language processing. For the structure of the classification system, the existing WBS structure was adopted in consideration of compatibility of data, and an RBS linked to the work species of the WBS was established. As a result of the research, a risk classification system was completed that easily identifies risks by work type and intuitively reveals risk characteristics and risk factors linked to risks. As a result of verifying the usability of the established classification system, it was found that the classification system was effective as risks and risk factors for each work type were easily identified by user input of keywords. Through this study, it is expected to contribute to preventing an increase in cost and construction period by identifying risks according to work types in advance when planning and designing NATM tunnels and establishing countermeasures suitable for those factors.

Blind Rhythmic Source Separation (블라인드 방식의 리듬 음원 분리)

  • Kim, Min-Je;Yoo, Ji-Ho;Kang, Kyeong-Ok;Choi, Seung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.8
    • /
    • pp.697-705
    • /
    • 2009
  • An unsupervised (blind) method is proposed aiming at extracting rhythmic sources from commercial polyphonic music whose number of channels is limited to one. Commercial music signals are not usually provided with more than two channels while they often contain multiple instruments including singing voice. Therefore, instead of using conventional modeling of mixing environments or statistical characteristics, we should introduce other source-specific characteristics for separating or extracting sources in the under determined environments. In this paper, we concentrate on extracting rhythmic sources from the mixture with the other harmonic sources. An extension of nonnegative matrix factorization (NMF), which is called nonnegative matrix partial co-factorization (NMPCF), is used to analyze multiple relationships between spectral and temporal properties in the given input matrices. Moreover, temporal repeatability of the rhythmic sound sources is implicated as a common rhythmic property among segments of an input mixture signal. The proposed method shows acceptable, but not superior separation quality to referred prior knowledge-based drum source separation systems, but it has better applicability due to its blind manner in separation, for example, when there is no prior information or the target rhythmic source is irregular.

The Effects of e-Business on Business Performance - In the home-shopping industry - (e-비즈니스가 경영성과에 미치는 영향 -홈쇼핑을 중심으로-)

  • Kim, Sae-Jung;Ahn, Seon-Sook
    • Management & Information Systems Review
    • /
    • v.22
    • /
    • pp.137-165
    • /
    • 2007
  • It seems high time to increase productivity by adopting e-business to overcome challenges posed by both external factors including the appreciation of Korean won, oil hikes and fierce global competition and domestic issues represented by disparities between large corporations and small and medium enterprises (SMEs), Seoul metropolitan and local cities, and export and domestic demand all of which weaken future growth engines in the Korean economy. The demands of the globalization era are for innovative changes in businessprocess and industrial structure aiming for creating new values. To this end, e-business is expected to play a core role in the sophistication of the Korean economy through new values and innovation. In order to examine business performance in e-business-adopting industries, this study analyzed the home shopping industry by closely looking into the financial ratios including the ratio of net profit to sales, the ratio of operation income to sales, the ratio of gross cost to sales cost, the ratio of gross cost to selling, general and administrative (SG&A) expense, and return of investment (ROI). This study, for best outcome, referred to corporate financial statements as a main resource to calculate financial ratios by utilizing Data Analysis, Retrieval and Transfer System (DART) of the Financial Supervisory Service, one of the Korea's financial supervisory authorities. First of all, the result of the trend analysis on the ratio of net profit to sales is as following. CJ Home Shopping has registered a remarkable increase in its ratio of net profit rate to sales since 2002 while its competitors find it hard to catch up with CJ's stunning performances. This is partly due to the efficient management compared to CJ's value of capital. Such significance, if the current trend continues, will make the front-runner assume the largest market share. On the other hand, GS Home Shopping, despite its best organized system and largest value of capital among others, lacks efficiency in management. Second of all, the result of the trend analysis on the ratio of operation income to sales is as following. Both CJ Home Shopping and GS Home Shopping have, until 2004, recorded similar growth trend. However, while CJ Home Shopping's operating income continued to increase in 2005, GS Home Shopping observed its operating income declining which resulted in the increasing income gap with CJ Home Shopping. While CJ Home Shopping with the largest market share in home shopping industryis engaged in aggressive marketing, GS Home Shopping due to its stability-driven management strategies falls behind CJ again in the ratio of operation income to sales in spite of its favorable management environment including its large capital. Companies in the Group B were established in the same year of 2001. NS Home Shopping was the first in the Group B to shift its loss to profit. Woori Home Shopping has continued to post operating loss for three consecutive years and finally was sold to Lotte Group in 2007, but since then, has registered a continuing increase in net income on sales. Third of all, the result of the trend analysis on the ratio of gross cost to sales cost is as following. Since home shopping falls into sales business, its cost of sales is much lower than that of other types of business such as manufacturing industry. Since 2002 in gross costs including cost of sales, SG&A expense, and non-operating expense, cost of sales turned out to have remarkably decreased. Group B has also posted a notable decline in the same sector since 2002. Fourth of all, the result of the trend analysis on the ratio of gross cost to SG&A expense is as following. Due to its unique characteristics, the home shopping industry usually posts ahigh ratio of SG&A expense. However, more than 80% of SG&A expense means the result of lax management and at the same time, a sharp lower net income on sales than other industries. Last but not least, the result of the trend analysis on ROI is as following. As for CJ Home Shopping, the curve of ROI looks similar to that of its investment on fixed assets. As it turned out, the company's ratio of fixed assets to operating income skyrocketed in 2004 and 2005. As far as GS Home Shopping is concerned, its fixed assets are not as much as that of CJ Home Shopping. Consequently, competition in the home shopping industry, at the moment, is among CJ, GS, Hyundai, NS and Woori Home Shoppings, and all of them need to more thoroughly manage their costs. In order for the late-comers of Group B and other home shopping companies to advance further, the current lax management should be reformed particularly on their SG&A expense sector. Provided that the total sales volume in the Internet shopping sector is projected to grow over 20 trillion won by the year 2010, it is concluded that all the participants in the home shopping industry should put strategies on efficient management on costs and expenses as their top priority rather than increase revenues, if they hope to grow even further after 2007.

  • PDF

THE CURRENT STATUS OF BIOMEDICAL ENGINEERING IN THE USA

  • Webster, John G.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1992 no.05
    • /
    • pp.27-47
    • /
    • 1992
  • Engineers have developed new instruments that aid in diagnosis and therapy Ultrasonic imaging has provided a nondamaging method of imaging internal organs. A complex transducer emits ultrasonic waves at many angles and reconstructs a map of internal anatomy and also velocities of blood in vessels. Fast computed tomography permits reconstruction of the 3-dimensional anatomy and perfusion of the heart at 20-Hz rates. Positron emission tomography uses certain isotopes that produce positrons that react with electrons to simultaneously emit two gamma rays in opposite directions. It locates the region of origin by using a ring of discrete scintillation detectors, each in electronic coincidence with an opposing detector. In magnetic resonance imaging, the patient is placed in a very strong magnetic field. The precessing of the hydrogen atoms is perturbed by an interrogating field to yield two-dimensional images of soft tissue having exceptional clarity. As an alternative to radiology image processing, film archiving, and retrieval, picture archiving and communication systems (PACS) are being implemented. Images from computed radiography, magnetic resonance imaging (MRI), nuclear medicine, and ultrasound are digitized, transmitted, and stored in computers for retrieval at distributed work stations. In electrical impedance tomography, electrodes are placed around the thorax. 50-kHz current is injected between two electrodes and voltages are measured on all other electrodes. A computer processes the data to yield an image of the resistivity of a 2-dimensional slice of the thorax. During fetal monitoring, a corkscrew electrode is screwed into the fetal scalp to measure the fetal electrocardiogram. Correlations with uterine contractions yield information on the status of the fetus during delivery To measure cardiac output by thermodilution, cold saline is injected into the right atrium. A thermistor in the right pulmonary artery yields temperature measurements, from which we can calculate cardiac output. In impedance cardiography, we measure the changes in electrical impedance as the heart ejects blood into the arteries. Motion artifacts are large, so signal averaging is useful during monitoring. An intraarterial blood gas monitoring system permits monitoring in real time. Light is sent down optical fibers inserted into the radial artery, where it is absorbed by dyes, which reemit the light at a different wavelength. The emitted light travels up optical fibers where an external instrument determines O2, CO2, and pH. Therapeutic devices include the electrosurgical unit. A high-frequency electric arc is drawn between the knife and the tissue. The arc cuts and the heat coagulates, thus preventing blood loss. Hyperthermia has demonstrated antitumor effects in patients in whom all conventional modes of therapy have failed. Methods of raising tumor temperature include focused ultrasound, radio-frequency power through needles, or microwaves. When the heart stops pumping, we use the defibrillator to restore normal pumping. A brief, high-current pulse through the heart synchronizes all cardiac fibers to restore normal rhythm. When the cardiac rhythm is too slow, we implant the cardiac pacemaker. An electrode within the heart stimulates the cardiac muscle to contract at the normal rate. When the cardiac valves are narrowed or leak, we implant an artificial valve. Silicone rubber and Teflon are used for biocompatibility. Artificial hearts powered by pneumatic hoses have been implanted in humans. However, the quality of life gradually degrades, and death ensues. When kidney stones develop, lithotripsy is used. A spark creates a pressure wave, which is focused on the stone and fragments it. The pieces pass out normally. When kidneys fail, the blood is cleansed during hemodialysis. Urea passes through a porous membrane to a dialysate bath to lower its concentration in the blood. The blind are able to read by scanning the Optacon with their fingertips. A camera scans letters and converts them to an array of vibrating pins. The deaf are able to hear using a cochlear implant. A microphone detects sound and divides it into frequency bands. 22 electrodes within the cochlea stimulate the acoustic the acoustic nerve to provide sound patterns. For those who have lost muscle function in the limbs, researchers are implanting electrodes to stimulate the muscle. Sensors in the legs and arms feed back signals to a computer that coordinates the stimulators to provide limb motion. For those with high spinal cord injury, a puff and sip switch can control a computer and permit the disabled person operate the computer and communicate with the outside world.

  • PDF

Design and Implementation of Multiple Filter Distributed Deduplication System Applying Cuckoo Filter Similarity (쿠쿠 필터 유사도를 적용한 다중 필터 분산 중복 제거 시스템 설계 및 구현)

  • Kim, Yeong-A;Kim, Gea-Hee;Kim, Hyun-Ju;Kim, Chang-Geun
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.10
    • /
    • pp.1-8
    • /
    • 2020
  • The need for storage, management, and retrieval techniques for alternative data has emerged as technologies based on data generated from business activities conducted by enterprises have emerged as the key to business success in recent years. Existing big data platform systems must load a large amount of data generated in real time without delay to process unstructured data, which is an alternative data, and efficiently manage storage space by utilizing a deduplication system of different storages when redundant data occurs. In this paper, we propose a multi-layer distributed data deduplication process system using the similarity of the Cuckoo hashing filter technique considering the characteristics of big data. Similarity between virtual machines is applied as Cuckoo hash, individual storage nodes can improve performance with deduplication efficiency, and multi-layer Cuckoo filter is applied to reduce processing time. Experimental results show that the proposed method shortens the processing time by 8.9% and increases the deduplication rate by 10.3%.

Text Filtering using Iterative Boosting Algorithms (반복적 부스팅 학습을 이용한 문서 여과)

  • Hahn, Sang-Youn;Zang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.4
    • /
    • pp.270-277
    • /
    • 2002
  • Text filtering is a task of deciding whether a document has relevance to a specified topic. As Internet and Web becomes wide-spread and the number of documents delivered by e-mail explosively grows the importance of text filtering increases as well. The aim of this paper is to improve the accuracy of text filtering systems by using machine learning techniques. We apply AdaBoost algorithms to the filtering task. An AdaBoost algorithm generates and combines a series of simple hypotheses. Each of the hypotheses decides the relevance of a document to a topic on the basis of whether or not the document includes a certain word. We begin with an existing AdaBoost algorithm which uses weak hypotheses with their output of 1 or -1. Then we extend the algorithm to use weak hypotheses with real-valued outputs which was proposed recently to improve error reduction rates and final filtering performance. Next, we attempt to achieve further improvement in the AdaBoost's performance by first setting weights randomly according to the continuous Poisson distribution, executing AdaBoost, repeating these steps several times, and then combining all the hypotheses learned. This has the effect of mitigating the ovefitting problem which may occur when learning from a small number of data. Experiments have been performed on the real document collections used in TREC-8, a well-established text retrieval contest. This dataset includes Financial Times articles from 1992 to 1994. The experimental results show that AdaBoost with real-valued hypotheses outperforms AdaBoost with binary-valued hypotheses, and that AdaBoost iterated with random weights further improves filtering accuracy. Comparison results of all the participants of the TREC-8 filtering task are also provided.