Search | Korea Science

Design and Implementation of Web Crawler utilizing Unstructured data

Tanvir, Ahmed Md.;Chung, Mokdong
- Journal of Korea Multimedia Society
- /
- v.22 no.3
- /
- pp.374-385
- /
- 2019
A Web Crawler is a program, which is commonly used by search engines to find the new brainchild on the internet. The use of crawlers has made the web easier for users. In this paper, we have used unstructured data by structuralization to collect data from the web pages. Our system is able to choose the word near our keyword in more than one document using unstructured way. Neighbor data were collected on the keyword through word2vec. The system goal is filtered at the data acquisition level and for a large taxonomy. The main problem in text taxonomy is how to improve the classification accuracy. In order to improve the accuracy, we propose a new weighting method of TF-IDF. In this paper, we modified TF-algorithm to calculate the accuracy of unstructured data. Finally, our system proposes a competent web pages search crawling algorithm, which is derived from TF-IDF and RL Web search algorithm to enhance the searching efficiency of the relevant information. In this paper, an attempt has been made to research and examine the work nature of crawlers and crawling algorithms in search engines for efficient information retrieval.
https://doi.org/10.9717/kmms.2019.22.3.374 인용 PDF KSCI HTML

Keywords and Spatial Based Indexing for Searching the Things on Web

Faheem, Muhammad R.;Anees, Tayyaba;Hussain, Muzammil
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.5
- /
- pp.1489-1515
- /
- 2022
The number of interconnected real-world devices such as sensors, actuators, and physical devices has increased with the advancement of technology. Due to this advancement, users face difficulties searching for the location of these devices, and the central issue is the findability of Things. In the WoT environment, keyword-based and geospatial searching approaches are used to locate these devices anywhere and on the web interface. A few static methods of indexing and ranking are discussed in the literature, but they are not suitable for finding devices dynamically. The authors have proposed a mechanism for dynamic and efficient searching of the devices in this paper. Indexing and ranking approaches can improve dynamic searching in different ways. The present paper has focused on indexing for improving dynamic searching and has indexed the Things Description in Solr. This paper presents the Things Description according to the model of W3C JSON-LD along with the open-access APIs. Search efficiency can be analyzed with query response timings, and the accuracy of response timings is critical for search results. Therefore, in this paper, the authors have evaluated their approach by analyzing the search query response timings and the accuracy of their search results. This study utilized different indexing approaches such as key-words-based, spatial, and hybrid. Results indicate that response time and accuracy are better with the hybrid approach than with keyword-based and spatial indexing approaches.
https://doi.org/10.3837/tiis.2022.05.005 인용 PDF KSCI HTML

Motion-based Fast Fractional Motion Estimation Scheme for H.264/AVC (움직임 예측을 이용한 고속 부화소 움직임 추정기)

Lee, Kwang-Woo;SunWoo, Myung-Hoon
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.45 no.3
- /
- pp.74-79
- /
- 2008
In an H.264/AVC video encoder, the motion estimation at fractional pixel accuracy improves a coding efficiency and image quality. However, it requires additional computation overheads for fractional search and interpolation, and thus, reducing the computation complexity of fractional search becomes more important. This paper proposes fast fractional search algorithms by combining the SASR(Simplified Adaptive Search Range) and the MSDSP(Mixed Small Diamond Search Pattern) with the predicted fractional motion vector. Compared with the full search and the prediction-based directional fractional pixel search, the proposed algorithms can reduce up to 93.2% and 81% of fractional search points, respectively with the maximum PSNR lost less than 0.04dB. Therefore, the proposed fast search algorithms are quite suitable for mobile applications requiring low power and complexity.
PDF KSCI

Unbounded Binary Search Method for Fast-tracking Maximum Power Point of Photovoltaic Modules

Hong, Yohan;Kim, Yong Sin;Baek, Kwang-Hyun
- IEIE Transactions on Smart Processing and Computing
- /
- v.5 no.6
- /
- pp.454-461
- /
- 2016
A maximum power point tracking (MPPT) system with fast-tracked time and high power efficiency is presented in this paper. The proposed MPPT system uses an unbounded binary search (UBS) algorithm that continuously tracks the maximum power point (MPP) with a binary system to follow the MPP under rapid-weather-change conditions. The proposed algorithm can decide the correct direction of the MPPT system while comparing the previous power point with the present power point. And then, by fixing the MPP until finding the next MPP, there is no oscillation of voltage MPP, which maximizes the overall power efficiency of the photovoltaic module. With these advantages, this proposed UBS is able to detect the MPP more effectively. This MPPT system is based on a boost converter with a micro-control unit to control analog-to-digital converters and pulse width modulation. Analysis of this work and experimental results show that the proposed UBS MPPT provides fast, accurate tracking with no oscillation in situations where weather rapidly changes and shadow is caused by all sorts of things. The tracking time is reduced by 87.3% and 66.1% under dynamic-state and steady-state operation, respectively, as compared with the conventional 7-bit perturb and observe technique.
https://doi.org/10.5573/IEIESPC.2016.5.6.454 인용 PDF KSCI

Knowledge Structure for Cost Estimates Based on Standardized Cost Database (원가산정을 위한 표준분류체계 활용한 지식체계 개발)

Im, Haekyung;Kang, Namhee;Choi, Jaehyun
- Proceedings of the Korean Institute of Building Construction Conference
- /
- 2016.05a
- /
- pp.235-236
- /
- 2016
The importance of construction management has been increasing due to the fact that complex construction projects blend several different industries depending on the traits of the construction. This research was conducted to search for a method to enhance efficiency in cost management of construction project and meet the need for reusability of accumulated construction information. The process of detailed estimation and methodology for using standard unit price information has been developed to strengthen the interoperability in cost information by utilizing a standard classification system. The concept of ontology is proposed as a method of connecting construction information based on a standard breakdown structure to increasing the connectivity of the cost information in the construction project. Therefore, construction information knowledge framework is developed in order to improve the efficiency of the detailed estimation work process.
PDF

A Storage and Retrieval System for Structured SGML Documents using Grove (Grove를 이용한 구조적 SGML문서의 저장 및 검색)

Kim, Hak-Gyoon;Cho, Sung-Bae
- Journal of KIISE:Computing Practices and Letters
- /
- v.8 no.5
- /
- pp.501-509
- /
- 2002
SGML(ISO 8879) has been proliferated to support various document styles and to transfer documents into different platforms. SGML documents have logical structure information in addition to contents. As SGML documents are widely used, there is an increasing need for database storage and retrieval system using the logical structure of documents. However. traditional search engines using document indexes cannot exploit the logical structure. In this Paper, we have developed an SGML document storage system, which is DTD-independent and store the document type and the document instance separately by using Grove which is the document model for DSSSL and HyTime. We have used the Object Store, an object-oriented DBMS, to store the structure information appropriately without any loss of structural information. Also, we have supported a index structure for search efficiency like the relational DBMS, and constructed an effective user interface which combines content-based search with structure-based search.
PDF KSCI

Real-time Graph Search for Space Exploration (공간 탐사를 위한 실시간 그래프 탐색)

Choi, Eun-Mi;Kim, In-Cheol
- Journal of Intelligence and Information Systems
- /
- v.11 no.1
- /
- pp.153-167
- /
- 2005
In this paper, we consider the problem of exploring unknown environments with a mobile robot or an autonomous character agent. Traditionally, research efforts to address the space exploration problem havefocused on the graph-based space representations and the graph search algorithms. Recently EXPLORE, one of the most efficient search algorithms, has been discovered. It traverses at most min$min(mn, d^2+m)$ edges where d is the deficiency of a edges and n is the number of edges and n is the number of vertices. In this paper, we propose DFS-RTA* and DFS-PHA*, two real-time graph search algorithms for directing an autonomous agent to explore in an unknown space. These algorithms are all built upon the simple depth-first search (DFS) like EXPLORE. However, they adopt different real-time shortest path-finding methods for fast backtracking to the latest node, RTA* and PHA*, respectively. Through some experiments using Unreal Tournament, a 3D online game environment, and KGBot, an intelligent character agent, we analyze completeness and efficiency of two algorithms.
PDF

A study on MPEG-7 descriptor combining method using borda count method (Borda count 방법을 이용한 다중 MPEG-7 서술자 조합에 관한 연구)

Eom, Min-Young;Choe, Yoon-Sik
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.43 no.1 s.307
- /
- pp.39-44
- /
- 2006
In this paper, search result list synthesis method is proposed using borda count method for still image retrieval based on MPEG-7 descriptors. MPEG-7 standardizes descriptors that extract feature information from media data. In many cases, using a single descriptor lacks of correctness, it is suggested to use multiple descriptors to enhance retrieval efficiency. In this paper, retrieval efficiency enhancement is achieved by combining multiple search results which are from each descriptor. In combining search result, newly calculated borda count method is proposed. Comparing current frequency compensated calculation, rank considered frequency compensation is used to score animage in database. This combining method is considered in Content based image retrieval system with relevance feedback algorithm which uses high level information from system user. In each relevance iteration step, adoptive borda count method is used to calculate score of images.
PDF KSCI

A Study on Design of Schema Integration based Biological Information Retrieval System (스키마 통합 기반 생명정보 검색시스템(BIRS) 설계에 관한 연구)

Han, Keon;Lee, Sang-Ho;Ahn, Bu-Young
- Journal of Information Management
- /
- v.40 no.1
- /
- pp.217-234
- /
- 2009
In computer-based virtual lab, a bioscience researcher who wants to obtain bio information first uses a biodiversity-related database to retrieve information on species, ecology and distribution of an organism. The researcher also needs to access gene/protein databases such as GenBank or PDB to find information on the organism's genetic sequence and protein structure. Furthermore, the researcher should search for academic papers containing the information on the organism so that his research is based on comprehensive and accurate information. This series of activities often undermines research efficiency as it takes a lot of time and causes inconvenience on the part of researchers. To solve such inconvenience, we analyzed various methods for integrated search and chosen schema integration. In addition, we analyzed each databases and extracted metadata for designing schema integration. This paper introduces a biological information retrieval system(BIRS) using schema integration and it's interface that will increase research efficiency for bioscience.
https://doi.org/10.1633/JIM.2009.40.1.217 인용 PDF

Resolving the Ambigities in World Sense by using Automatic Keyword Network in Information Retrieval (정보검색에서의 어의 중의성 해소를 위한 자동 키워드망의 이용)

Kim, Jung-Sae;Jang, Duk-Sung
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.12
- /
- pp.3855-3865
- /
- 2000
The automatic indexing is a compulsory part for the text retrieval system. However it is impossible to rank the appropriate texts at top. Furthermore, it is more difficult to prevent to rank the inappropriate texts having homonyms at top by only the automatic indexing. In this paper, we proposed the two-level retrieval system to enhance the retrieval efficiency, in which Automatic Keyword Network (AKN) is used at the second-level process. The firsHevel search is carried out with an inverted index file generated by the automatic indexing. On the other hand the second-level search exploits AKN based on the degree of asslxiation between terms. We have developed several formulas for rearranging the rank of texts at second-level search, and evaluated the performance of the effects of them on resolving the word sense ambiguities.
PDF

Search Result 567, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)