Big Scholarly Data in CiteSeerX: Information Extraction from the Web

Abstract

We examine CiteSeerX, an intelligent system designed with the goal of automatically acquiring and organizing large-scale collections of scholarly documents from the world wide web. From the perspective of automatic information extraction and modes of alternative search, we examine various functional aspects of this complex system with an eye towards ongoing and future research developments.

Publication
Proceedings of BigScholar at WWW