Proposal overview (brief version)Edit
This bot will create or amend up to ~10,000 pages corresponding to mammalian genes. Pages will be created in groups of ~100 to ensure page quality. Each new page will be seeded with content from databases in the public domain. This content will include information about the gene's symbol, description, function, genomic location, structure and identifiers. Genes which do not have any existing WP pages for its symbol, aliases, or title will be created (e.g., MMP9). Genes which do have these conflicts in the WP namespace will be flagged for manual integration (e.g., Apolipoprotein_E). More details are presented in User:ProteinBoxBot/Ideas. This bot is currently being designed and developed by AndrewGNF and JonSDSUGrad. The list of all pages which were created or edited with ProteinBoxBot content is shown here.
Want an enhanced Protein box for your favorite gene? Feel free to add your request below and we'll bump it to the top of the list. If the target gene page already exists, please wikilink it!
|Gene Symbol||Entrez Gene ID||Requestor||PBB completion date|
|SCXB||642658||18.104.22.168 (talk) 11:24, 23 January 2008 (UTC)|
|RELN||5649||CopperKettle||30 October 2007|
|SERPINC1||462||K.murphy||17 November 2007|
|F10||2159||K.murphy||17 November 2007|
|TYR||7299||K.murphy||17 November 2007|
|FGF2||2247||K.murphy||2 November 2007|
|EPAS1||2034||Clockguy||17 November 2007|
|HIF3A||64344||Clockguy||17 November 2007|
|OPN4||94233||Clockguy||17 November 2007|
|USH1C||10083||Willow||17 November 2007|
|USH1G||124590||Willow||17 November 2007|
|USH2A||7399||Willow||17 November 2007|
|CLRN1||7401||Willow||17 November 2007|
|MYO7A||4647||Willow||17 November 2007|
|SLC4A7||9497||Willow||17 November 2007|
|CDH23||64072||Willow||17 November 2007|
|PCDH15||65217||Willow||17 November 2007|
|VLGR1||84059||Willow||17 November 2007|
|RXRB||6257||Boghog2||17 November 2007|
|RXRG||6258||Boghog2||17 November 2007|
|PPARD||9235||Boghog2||17 November 2007|
|ADRA1A||148||AndrewGNF||17 November 2007|
|ADRA1B||147||AndrewGNF||17 November 2007|
|ADRA1D||146||AndrewGNF||17 November 2007|
|ADRA2A||150||AndrewGNF||17 November 2007|
|ADRA2B||151||AndrewGNF||17 November 2007|
|ADRA2C||152||AndrewGNF||17 November 2007|
|ADRB1||153||AndrewGNF||17 November 2007|
|ADRB2||154||AndrewGNF||17 November 2007|
|ADRB3||155||AndrewGNF||17 November 2007|
|ADRBK1||156||AndrewGNF||17 November 2007|
|ADRBK2||157||AndrewGNF||17 November 2007|
|Ubiquitin carboxy-terminal hydrolase L1||7345||cmcnicoll||done|
|HtrA serine peptidase 2||27429||cmcnicoll||done|
After making quite a few adjustments, a second trial run was completed and the log file is here: User:ProteinBoxBot/PBB_Log_Wiki_Live_Run3_Char_Fix
The eight pages created by the ProteinBoxBot in the trial are:
In addition, these pre-existing pages were supplemented with ProteinBoxBot content in a semi-automated edit:
The discussion of the ProteinBoxBot's trial run is archived at Wikipedia:Bots/Requests_for_approval/ProteinBoxBot.
Logic Flow Edit
Protein Box bot does extensive logging of its activities.
- Bot Log File: User:ProteinBoxBot/PBB_Log_Index
Protein Box Bot does not always know the exact name of a protein page. This page has been created to help with that.
- Bot Page Directory: User:ProteinBoxBot/Protein_Directory
Protein Box Bot Quick ManualEdit
When dealing with wikipages it is often difficult to automatically determine how and what to update - especially for a bot. Therefore a group of templates were created to ensure that Protein Box Bot behaves appropriately and will not overwrite any information without permission. The templates provide update options and editing boundaries. The Templates are described below:
Template: PBB_Controls (Required)Edit
PBB_Controls does not display any information on the gene page, instead its sole purpose is to allow update options for PBB. PBB cannot update a gene page that is missing this template. (See the template page for further details)
Template: PBB_Summary Edit
This template contains the entrez summary for the gene. If no summary is available, then this template is left blank. It is suggested that a blank template be left on the gene page to provide a location for possible future summary updates. During an update, all information in this template is overwritten. See Template:PBB_Summary for more information.
The GNF_Protein_Box is the core template updated by PBB. The majority of the information provided by PBB is places in this protein. While it is possible to exclude this template from a gene page, it is not recommended.
During an update, all information in the protein box is overwritten (even with blank values) with the exception of 'image' and 'image_source', which are carried over into the new box. Only if those fields are blank will the Bot try and locate an image. Default image file names follow this format: </p>
Where <protein symbol> is the actual symbol for the protein (such as PBB_Protein_AKT1_image.jpg).
PBB_Further_reading is the template that PBB uses to store citation information. All entries within this template are overwritten when PBB does an update.
TAG: No Bots (Optional)Edit
<!-- NO BOT EDITS -->
This tag will cause the bot to skip updating this page. As the presence of this tag will abort the operation of the bot, its use is optional and not required for bot operation.