Big Scholarly Data in CiteSeerX: Information Extraction from the Web


We examine CiteSeerX, an intelligent system designed with the goal of automatically acquiring and organizing large-scale collections of scholarly documents from the world wide web. From the perspective of automatic information extraction and modes of alternative search, we examine various functional aspects of this complex system with an eye towards ongoing and future research developments.

Proceedings of BigScholar at WWW