Browse > Article
http://dx.doi.org/10.11627/jkise.2018.41.4.189

Extracting Specific Information in Web Pages Using Machine Learning  

Lee, Joung-Yun (Industrial and Management Engineering, Incheon National University)
Kim, Jae-Gon (Industrial and Management Engineering, Incheon National University)
Publication Information
Journal of Korean Society of Industrial and Systems Engineering / v.41, no.4, 2018 , pp. 189-195 More about this Journal
Abstract
With the advent of the digital age, production and distribution of web pages has been exploding. Internet users frequently need to extract specific information they want from these vast web pages. However, it takes lots of time and effort for users to find a specific information in many web pages. While search engines that are commonly used provide users with web pages containing the information they are looking for on the Internet, additional time and efforts are required to find the specific information among extensive search results. Therefore, it is necessary to develop algorithms that can automatically extract specific information in web pages. Every year, thousands of international conference are held all over the world. Each international conference has a website and provides general information for the conference such as the date of the event, the venue, greeting, the abstract submission deadline for a paper, the date of the registration, etc. It is not easy for researchers to catch the abstract submission deadline quickly because it is displayed in various formats from conference to conference and frequently updated. This study focuses on the issue of extracting abstract submission deadlines from International conference websites. In this study, we use three machine learning models such as SVM, decision trees, and artificial neural network to develop algorithms to extract an abstract submission deadline in an international conference website. Performances of the suggested algorithms are evaluated using 2,200 conference websites.
Keywords
Data Extraction; Machine Learning; SVM; Decision Tree; Neural Network;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Jo, S.R., Sung, H.N., and Ahn. B.H., A Comparative Study on the Performance of SVM and an Artificial Neural Network in Intrusion Detection, Journal of the Korea Academia-Industrial cooperation Society, 2016, Vol. 17, No. 2, pp. 703-712.   DOI
2 Kim, G.S. and Park, J.A., Development of a Soil Moisture Estimation Model Using Artificial Neural Networks and Classification and Regression Tree(CART), Korean Society of Civil Engineers Journal of Civil Engineering, 2011, Vol. 31, No. 2, pp. 155-163.
3 Kim, H.S. and Kim, C.S., An Analysis of IT Proposal Evaluation Results using Big Data-based Opinion Mining, Journal of Society of Korea Industrial and Systems Engineering, 2018, Vol. 41, No. 1, pp. 1-10.
4 Kim, P.J., An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning, Journal of the Korean Society for Information Management, 2018, Vol. 35, No. 2, pp. 37-62.   DOI
5 Lee, J.Y., Moon, J.Y., and Kim, H.J., Examining the Intellectual Structure of Records Management and Archival Science in Korea with Text Mining, Journal of the Korean Society for Library and Information Science, 2017, Vol. 41, No. 1, pp. 345-372.   DOI
6 Lee, Y.J., Sim, M.K., Lee, S.S., and Lee, C.K., Study of the Operation of Actuated signal control Based on Vehicle Queue Length estimated by Deep Learning, The Journal of the Korea Institute of Intelligent Transport Systems, 2018, Vol. 17, No. 4, pp. 54-62.
7 Li, Y., Bontcheva, K., and Cunningham, H., Using Uneven Margins SVM and Perceptron for Information Extraction, Proceedings of the Ninth Conference on Computational Natural Language Learning, 2005, Catalonia, Spain, pp. 72-79.
8 Noh, T.H. and Lee, S.J., Extraction and Classification of Proper Nouns by Rule-based Machine Learning, Journal of Korean Institute of Information Scientists and Engineers, 2000, Vol. 27, No. 2, pp. 170-172.
9 Schneider, K.M. and Textkernel, B.V., Information Extraction from Calls for Papers with Conditional Random Fields and Layout Features, Artificial Intelligence Review, 2006, Vol. 25, No. 1, pp. 67-77.   DOI
10 Park, N.R., Design and Implementation of Criminal Identification System Based on Deep Learning, [dissertation], [Seongnam-si, Korea] : Gachon University, 2017.
11 Shin, H.S., Kim, J.H., Lee, H.Y., and Choi, K.S., A Method for Automatic Extraction of Term Definition from Text, Annual Conference on Human and Cognitive Language Technology, Chongju-si, Korea, 2002, pp. 292-299.
12 Son, J.R., SVM Spam Mail Analysis using Feature Selection [dissertation], [Seoul, Korea] : Hankuk University of Foreign Studies, 2005.
13 Jimenez, P. and Corchuelo, R., On learning web information extraction rules with TANGO, Journal Information Systems, 2018, Vol. 62, No. C, pp. 74-103.
14 Coptes, C. and Vapnik, V., Support-Vector Networks, Machine Learning, 1995, Vol. 20, No. 3, pp. 273-297.   DOI
15 Emilio, F., Rasquale, D.M., Giacomo, F., and Robert, B., Web data extraction, applications and techniques, Knowledge-Based Systems, 2018, Vol. 70, No. 1, pp. 301-323.
16 Hwang, M.G., Choi, D.J., and Kim, P.K., A Context Information Extraction Method according to Subject for Semantic Text Processing, Journal of Advanced Information Technology and Convergence, 2010, Vol. 11, No. 8, pp. 197-204.