Psychology Wiki
Register
Advertisement

Assessment | Biopsychology | Comparative | Cognitive | Developmental | Language | Individual differences | Personality | Philosophy | Social |
Methods | Statistics | Clinical | Educational | Industrial | Professional items | World psychology |

Professional Psychology: Debating Chamber · Psychology Journals · Psychologists


CiteSeer was a public search engine and digital library for scientific and academic papers. It is often considered to be the first automated citation indexing system and was considered a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search. It was replaced by CiteSeerx and all queries to CiteSeer are redirected to it. It was created by researchers Steve Lawrence, Kurt Bollacker and Lee Giles while they were at the NEC Research Institute (now NEC Labs), Princeton, New Jersey, USA. CiteSeer's goal was to actively crawl and harvest academic and scientific documents on the web and use autonomous citation indexing to permit querying by citation or by document, ranking them by citation impact. After NEC, it was hosted as CiteSeer.IST on the World Wide Web at the College of Information Sciences and Technology, The Pennsylvania State University, and had over 700,000 documents, primarily in the fields of computer and information science and engineering.

CiteSeer freely provided Open Archives Initiative metadata of all indexed documents and links indexed documents when possible to other sources of metadata such as DBLP and the ACM Portal.

CiteSeer's goal was to improve the dissemination and access of academic and scientific literature. As a non-profit service that can be freely used by anyone, it has been considered as part of the open access movement that is attempting to change academic and scientific publishing to allow greater access to scientific literature.

The name can be construed to have at least two explanations. As a pun, a 'sightseer' is a tourist who looks at the sights, so a 'cite seer' would be a researcher who looks at cited papers. Another is a 'seer' is a prophet and a 'cite seer' is a prophet of citations.

CiteSeer had not been comprehensively updated since 2005 due to limitations in its architecture design. It had a representative sampling of research documents in computer and information science but was limited in coverage because it only has access to papers that are publicly available, usually at an author's homepage, or those are submitted by an author. To overcome these limitations, an modular and open source architecture of CiteSeer was designed.

The new version and design of CiteSeer can be found at the Next Generation CiteSeer, CiteSeerx, website. CiteSeer-like engines and archives usually only harvest documents from publicly available websites and do not crawl publisher websites. As such authors whose documents are freely available are more likely to be represented in the index.

Recent developments[]

Other CiteSeer engines[]

The CiteSeer model had been extended to cover academic documents in business with SmealSearch and in e-business with eBizSearch. However, these were not maintained by their sponsors. An older version of both of these could be once found at BizSeer.IST but is no longer in service. For enhanced access and performance, similar versions of CiteSeer were supported at universities such as the Massachusetts Institute of Technology, University of Zürich and the National University of Singapore. However, these versions of CiteSeer proved difficult to maintain and are no longer available.

Versions of CiteSeer have been or are available at the following links:

Other Seer-like search and repository systems have been built for chemistry, ChemXSeer and for archaeology, ArchSeer. Another had been built for robots.txt file search, BotSeer. All of these are built on the open source tool SeerSuite, which uses the open source indexer Lucene.

Next Generation CiteSeer (CiteSeerx)[]

CiteSeerX (stylized as CiteSeerx[1]) is a public search engine and digital library and repository for scientific and academic papers with a focus on computer and information science.[1] It is loosely based on the previous CiteSeer search engine and digital library and is built with a new open source infrastructure, SeerSuite, and new algorithms and their implementations. It was developed by researchers Dr. Isaac Councill and Dr. C. Lee Giles at the College of Information Sciences and Technology, Pennsylvania State University. It continues to support the goals outlined by CiteSeer to actively crawl and harvest academic and scientific documents on the public web and to use a citation index to permit query by citations and ranking of documents by the impact of citations. Currently, Lee Giles, Prasenjit Mitra, Susan Gauch, Min-Yen Kan, Pradeep Teregowda, Juan Pablo Fernández Ramírez, Pucktada Treeratpituk, and Shuyi Zheng are or have been actively involved in its development. Recently, a table search feature was introduced.[2] It was funded by the National Science Foundation and Microsoft Research

CiteSeerX continues to be rated as one of the world's top repositories and was rated number 1 in July 2010.[3] It currently has over 1.5 million documents with nearly 1.5 million unique authors and 30 million citations.

CiteSeerX also shares its software, data, databases and metadata with other researchers, currently by rsync.[4] Its new modular open source architecture and software (available on SourceForge) is built on Apache Solr and other Apache and open source tools which allows it to be a testbed for new algorithms in document harvesting, ranking, indexing, and information extraction.

See also[]


References[]

Template:No footnotes

  1. 1.0 1.1 About CiteSeerX. URL accessed on 2010-05-07.
  2. The CiteSeerX Team. Pennsylvania State University. URL accessed on 2010-07-24.
  3. Ranking Web of World Repositories: Top 800 Repositories. Cybermetrics Lab. URL accessed on 2010-07-24.
  4. About CiteSeerX Metadata. Pennsylvania State University. URL accessed on 2010-07-24.

External links[]




This page uses Creative Commons Licensed content from Wikipedia (view authors).
Advertisement