Browse > Article
http://dx.doi.org/10.7838/jsebs.2017.22.2.061

Academic Conference Categorization According to Subjects Using Topical Information Extraction from Conference Websites  

Lee, Sue Kyoung (Department of Industrial and Management Engineering, Incheon National University)
Kim, Kwanho (Department of Industrial and Management Engineering, Incheon National University)
Publication Information
The Journal of Society for e-Business Studies / v.22, no.2, 2017 , pp. 61-77 More about this Journal
Abstract
Recently, the number of academic conference information on the Internet has rapidly increased, the automatic classification of academic conference information according to research subjects enables researchers to find the related academic conference efficiently. Information provided by most conference listing services is limited to title, date, location, and website URL. However, among these features, the only feature containing topical words is title, which causes information insufficiency problem. Therefore, we propose methods that aim to resolve information insufficiency problem by utilizing web contents. Specifically, the proposed methods the extract main contents from a HTML document collected by using a website URL. Based on the similarity between the title of a conference and its main contents, the topical keywords are selected to enforce the important keywords among the main contents. The experiment results conducted by using a real-world dataset showed that the use of additional information extracted from the conference websites is successful in improving the conference classification performances. We plan to further improve the accuracy of conference classification by considering the structure of websites.
Keywords
Academic Conference Classification; Topical Information Extraction; Text Mining; Text Categorization; Web Contents Analysis;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Cho, J., "A New Word Semantic Similarity Measure Method based on WordNet," Journal of Korean Institute of Information Technology, Vol. 11, No. 7, pp. 121-129, 2013.
2 Ciravegna, F., "$(LP)^2$, An Adaptive Algorithm for Information Extraction from Web-related Texts," Proceeding of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, 2001.
3 Conference.city, "International Conference Search Engine," [URL] http://www.conference.city/.
4 Cortes, C. and Vapnik, V., "Support Vector Networks," Machine Learning, Vol. 20, No. 3, pp. 273-297, 1995.   DOI
5 Cox, C., Nicolson, J., Finkel, J. R., Manning, C., and Langley, P., "Template Sampling for Leveraging Domain Knowledge in Information Extraction," Proceeding of PASCAL Challenges Workshop, 2005.
6 Eom, J., "Information Extraction Using a Hidden Markov Model," Thesis of Graduate School of Seoul National University, 2001.
7 Joachims, T., "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," Proceeding of the 10th European Conference on Machine Learning, Vol. 1398, pp. 137-142, 1998.
8 Kim, J., Park, S. B., and Lee, S. J., "Information Extraction from Call-for-Papers Using a Hidden Markov Model," Proceeding of 2005 Conference on the HCI Society of Korea, Vol. 2005, No. 1, pp. 967-972, 2005.
9 Kressel, U., "Pairwise Classification and Support Vector Machines," Advances in Kernel Methods Support Vector Learning, pp. 255-268, 1999.
10 Lazarinis, F., "Combining Information Retrieval with Information Extraction for Efficient Retrieval of Calls for Papers," Proceeding of IRSG'1998, 1998.
11 Lee, S. and Kim, H., "Keyword Extraction from News Corpus using Modified TF-IDF," The Journal of Society for e-Business Studies, Vol. 14, No. 4, pp. 59-73, 2009.
12 Lee, Y., "A Study on Extracting News Contents from News Web Pages," Journal of the Korean Society for Information Management, Vol. 26, No. 1, pp. 305-320, 2009.   DOI
13 Leopold, E. and Kindermann, J., "Text Categorization with Support Vector Machines: How to Represent Texts in Input Space?," Machine Learning, Vol. 46, pp. 423-444, 2002.   DOI
14 Li, Y., Bontcheva, K., and Cunningham, H., "Using Uneven Margins SVM and Perceptron for Information Extraction," Proceeding of the 9th Conference on Computational Natural Language Learning, 2005.
15 Munkova, D., Munk, M., and Vozar, M., "Data Pre-Processing Evaluation for Text Mining: Transaction/Sequence Model," 2013 International Conference on Computational Science, Vol. 18, pp. 1198-1207, 2013.
16 ReadabilityBUNDLE Library, [URL] https://github.com/srijiths/readabilityBUNDLE.
17 Roh, J.-H., Kim, H.-j., and Chang, J.-Y., "Improving Hypertext Classification Systems Through WordNet-based Feature Abstraction," The Journal of Society for e-Business Studies, Vol. 18, No. 2, pp. 95-110, 2013.   DOI
18 Sebastiani, F., "Machine Learning in Automated Text Categorization," ACM Computing Surveys, Vol. 34, No. 1, pp. 1-47, 2002.   DOI
19 Ryu, J., "Real-world Pattern Classifications Using Optimal Feature/Classifier Ensemble," Master's Theses for Graduate School of Seoul National University, 2002.
20 Schneider, K., "Information Extraction from Calls for Papers with Conditional Random Fields and Layout Features," Artificial Intelligence Review, Vol. 25, No. 1, pp. 67-77, 2006.   DOI
21 WikiCFP, "A Semantic wiki for Calls For Papers in Science and Technology Fields," [URL] http://www.wikicfp.com/cfp/.
22 Wikipedia, "TF-IDF," [URL] https://ko.wikipedia.org/wiki/TF-IDF.
23 Xia, J., Wen, K., Li, R. and Gu, X., "Optimizing Academic Conference Classification using Social Tags," 2010 13th IEEE International Conference on Computational Science and Engineering, pp. 289-294, 2010.
24 Xin, X., Li, J., Tang, J., and Luo, Q., "Academic Conference Homepage Understanding Using Constrained Hierarchical Conditional Random Fields," In Proceeding of International Conference on Information and Knowledge Management, pp. 1301-1310, 2008.