Proposal overview (brief version)Edit

This bot will create or amend up to ~10,000 pages corresponding to mammalian genes. Pages will be created in groups of ~100 to ensure page quality. Each new page will be seeded with content from databases in the public domain. This content will include information about the gene's symbol, description, function, genomic location, structure and identifiers. Genes which do not have any existing WP pages for its symbol, aliases, or title will be created (e.g., MMP9). Genes which do have these conflicts in the WP namespace will be flagged for manual integration (e.g., Apolipoprotein_E). More details are presented in User:ProteinBoxBot/Ideas. This bot is currently being designed and developed by AndrewGNF and JonSDSUGrad. The list of all pages which were created or edited with ProteinBoxBot content is shown here.


Want an enhanced Protein box for your favorite gene? Feel free to add your request below and we'll bump it to the top of the list. If the target gene page already exists, please wikilink it!

ProteinBoxBot requests
Gene Symbol Entrez Gene ID Requestor PBB completion date
SCXB642658163.1.64.145 (talk) 11:24, 23 January 2008 (UTC)
RELN 5649 CopperKettle 30 October 2007
SERPINC1 462 K.murphy 17 November 2007
F10 2159 K.murphy 17 November 2007
TYR 7299 K.murphy 17 November 2007
FGF2 2247 K.murphy 2 November 2007
EPAS1 2034 Clockguy 17 November 2007
HIF3A 64344 Clockguy 17 November 2007
OPN4 94233 Clockguy 17 November 2007
USH1C 10083 Willow 17 November 2007
USH1G 124590 Willow 17 November 2007
USH2A 7399 Willow 17 November 2007
CLRN1 7401 Willow 17 November 2007
MYO7A 4647 Willow 17 November 2007
SLC4A7 9497 Willow 17 November 2007
CDH23 64072 Willow 17 November 2007
PCDH15 65217 Willow 17 November 2007
VLGR1 84059 Willow 17 November 2007
RXRB 6257 Boghog2 17 November 2007
RXRG 6258 Boghog2 17 November 2007
PPARD 9235 Boghog2 17 November 2007
ADRA1A 148 AndrewGNF 17 November 2007
ADRA1B 147 AndrewGNF 17 November 2007
ADRA1D 146 AndrewGNF 17 November 2007
ADRA2A 150 AndrewGNF 17 November 2007
ADRA2B 151 AndrewGNF 17 November 2007
ADRA2C 152 AndrewGNF 17 November 2007
ADRB1 153 AndrewGNF 17 November 2007
ADRB2 154 AndrewGNF 17 November 2007
ADRB3 155 AndrewGNF 17 November 2007
ADRBK1 156 AndrewGNF 17 November 2007
ADRBK2 157 AndrewGNF 17 November 2007
MAP2K7 5609 Lihmwiki done
MAP3K1 4214 Lihmwiki done
MAP2K5 5607 Lihmwiki done
Parkin (ligase) 5071 cmcnicoll
Ubiquitin carboxy-terminal hydrolase L1 7345 cmcnicoll done
ATP13A2 23400 cmcnicoll done
HtrA serine peptidase 2 27429 cmcnicoll done
DFNB31 25861 Willow done

Trial RunEdit

A trial run for this bot was approved. The trial was completed and the log is here: User:ProteinBoxBot/PBB_Log_Wiki_Live_Run.

After making quite a few adjustments, a second trial run was completed and the log file is here: User:ProteinBoxBot/PBB_Log_Wiki_Live_Run3_Char_Fix

The bot was subsequently approved and granted bot status.

The eight pages created by the ProteinBoxBot in the trial are:


In addition, these pre-existing pages were supplemented with ProteinBoxBot content in a semi-automated edit:

APP APOE Androgen receptor BRCA1
Bcl-2 P21 P16 Beta-catenin
Epidermal growth factor receptor HER2/neu Estrogen receptor HLA-B
Insulin-like growth factor 1 Interleukin 10 IL1B Interleukin 6
Interleukin 8 CD29 PKC alpha Retinoblastoma protein
Src (gene) Tumor necrosis factor-alpha P53 (protein) Vascular endothelial growth factor
Caspase 3

The discussion of the ProteinBoxBot's trial run is archived at Wikipedia:Bots/Requests_for_approval/ProteinBoxBot.

Logic Flow Edit

The following Flow charts describe the logic of Protein Box Bot:


Protein Box bot does extensive logging of its activities.

Protein Box Bot does not always know the exact name of a protein page. This page has been created to help with that.

Protein Box Bot Quick ManualEdit

When dealing with wikipages it is often difficult to automatically determine how and what to update - especially for a bot. Therefore a group of templates were created to ensure that Protein Box Bot behaves appropriately and will not overwrite any information without permission. The templates provide update options and editing boundaries. The Templates are described below:

Template: PBB_Controls (Required)Edit

PBB_Controls does not display any information on the gene page, instead its sole purpose is to allow update options for PBB. PBB cannot update a gene page that is missing this template. (See the template page for further details)

Template: PBB_Summary Edit

This template contains the entrez summary for the gene. If no summary is available, then this template is left blank. It is suggested that a blank template be left on the gene page to provide a location for possible future summary updates. During an update, all information in this template is overwritten. See Template:PBB_Summary for more information.

Template: GNF_Protein_boxEdit

The GNF_Protein_Box is the core template updated by PBB. The majority of the information provided by PBB is places in this protein. While it is possible to exclude this template from a gene page, it is not recommended.
During an update, all information in the protein box is overwritten (even with blank values) with the exception of 'image' and 'image_source', which are carried over into the new box. Only if those fields are blank will the Bot try and locate an image. Default image file names follow this format: </p>

PBB_Protein_<protein symbol>_image.jpg

Where <protein symbol> is the actual symbol for the protein (such as PBB_Protein_AKT1_image.jpg).

Template: PBB_Further_readingEdit

PBB_Further_reading is the template that PBB uses to store citation information. All entries within this template are overwritten when PBB does an update.

TAG: No Bots (Optional)Edit

<!-- NO BOT EDITS -->



This tag will cause the bot to skip updating this page. As the presence of this tag will abort the operation of the bot, its use is optional and not required for bot operation.

