Frequently Asked Questions: Assembly Releases and Versions
Topics
Return to FAQ Table of Contents
List of UCSC genome releases
How do UCSC's release numbers correspond to those of other organizations, such as
NCBI?
The first release of an assembly is given a name using the first three characters of the organism's
genus and species classification in the format gggSss#, with subsequent assemblies incrementing
the number. Assemblies predating the 2003 introduction of the six-letter naming system were given
two-letter names in a similar gs# format and human assemblies are named hg# for human genome.
SPECIES |
UCSC VERSION |
RELEASE DATE |
RELEASE NAME |
STATUS |
MAMMALS | | | | |
Human | hg38 | Dec. 2013 | Genome Reference Consortium GRCh38 | Available |
| hg19 | Feb. 2009 | Genome Reference Consortium GRCh37 | Available |
| hg18 | Mar. 2006 | NCBI Build 36.1 | Available |
| hg17 | May 2004 | NCBI Build 35 | Available |
| hg16 | Jul. 2003 | NCBI Build 34 | Available |
| hg15 | Apr. 2003 | NCBI Build 33 | Archived |
| hg13 | Nov. 2002 | NCBI Build 31 | Archived |
| hg12 | Jun. 2002 | NCBI Build 30 | Archived |
| hg11 | Apr. 2002 | NCBI Build 29 | Archived (data only) |
| hg10 | Dec. 2001 | NCBI Build 28 | Archived (data only) |
| hg8 | Aug. 2001 | UCSC-assembled | Archived (data only) |
| hg7 | Apr. 2001 | UCSC-assembled | Archived (data only) |
| hg6 | Dec. 2000 | UCSC-assembled | Archived (data only) |
| hg5 | Oct. 2000 | UCSC-assembled | Archived (data only) |
| hg4 | Sep. 2000 | UCSC-assembled | Archived (data only) |
| hg3 | Jul. 2000 | UCSC-assembled | Archived (data only) |
| hg2 | Jun. 2000 | UCSC-assembled | Archived (data only) |
| hg1 | May 2000 | UCSC-assembled | Archived (data only) |
Alpaca | vicPac2 | Mar. 2013 | Broad Institute Vicugna_pacos-2.0.1 | Available |
| vicPac1 | Jul. 2008 | Broad Institute VicPac1.0 | Available |
Armadillo | dasNov3 | Dec. 2011 | Broad Institute DasNov3 | Available |
Baboon | papHam1 | Nov. 2008 | Baylor College of Medicine HGSC Pham_1.0 | Available |
| papAnu2 | Mar. 2012 | Baylor College of Medicine Panu_2.0 | Available |
Bonobo | panPan1 | May 2012 | Max-Planck Institute panpan1 | Available |
Brown kiwi | aptMan1 | Jun. 2015 | Max-Planck Institute for Evolutionary Anthropology AptMant0 | Available |
Bushbaby | otoGar3 | Mar. 2011 | Broad Institute OtoGar3 | Available |
Cat | felCat8 | Nov. 2014 | ICGSC Felis_catus_8.0 | Available |
| felCat5 | Sep. 2011 | ICGSC Felis_catus-6.2 | Available |
| felCat4 | Dec. 2008 | NHGRI catChrV17e | Available |
| felCat3 | Mar. 2006 | Broad Institute Release 3 | Available |
Chimp | panTro5 | May 2016 | CGSC Build 3.0 | Available |
| panTro4 | Feb. 2011 | CGSC Build 2.1.4 | Available |
| panTro3 | Oct. 2010 | CGSC Build 2.1.3 | Available |
| panTro2 | Mar. 2006 | CGSC Build 2.1 | Available |
| panTro1 | Nov. 2003 | CGSC Build 1.1 | Available |
Chinese hamster | criGri1 | Jul. 2013 | Beijing Genomics Institution-Shenzhen C_griseus_v1.0 | Available |
Chinese pangolin | manPen1 | Aug. 2014 | Washington University (WashU) M_pentadactyla-1.1.1 | Available |
Cow | bosTau8 | Jun. 2014 | University of Maryland v3.1.1 | Available |
| bosTau7 | Oct. 2011 | Baylor College of Medicine HGSC Btau_4.6.1 | Available |
| bosTau6 | Nov. 2009 | University of Maryland v3.1 | Available |
| bosTau4 | Oct. 2007 | Baylor College of Medicine HGSC Btau_4.0 | Available |
| bosTau3 | Aug. 2006 | Baylor College of Medicine HGSC Btau_3.1 | Available |
| bosTau2 | Mar. 2005 | Baylor College of Medicine HGSC Btau_2.0 | Available |
| bosTau1 | Sep. 2004 | Baylor College of Medicine HGSC Btau_1.0 | Archived |
Crab-eating macaque | macFas5 | Jun. 2013 | Washington University Macaca_fascicularis_5.0 | Available |
Dog | canFam3 | Sep. 2011 | Broad Institute v3.1 | Available |
| canFam2 | May 2005 | Broad Institute v2.0 | Available |
| canFam1 | Jul. 2004 | Broad Institute v1.0 | Available |
Dolphin | turTru2 | Oct. 2011 | Baylor College of Medicine Ttru_1.4 | Available |
Elephant | loxAfr3 | Jul. 2009 | Broad Institute LoxAfr3 | Available |
Ferret | musFur1 | Apr. 2011 | Ferret Genome Sequencing Consortium MusPutFur1.0 | Available |
Gibbon | nomLeu3 | Oct. 2012 | Gibbon Genome Sequencing Consortium Nleu3.0 | Available |
| nomLeu2 | Jun. 2011 | Gibbon Genome Sequencing Consortium Nleu1.1 | Available |
| nomLeu1 | Jan. 2010 | Gibbon Genome Sequencing Consortium Nleu1.0 | Available |
Golden eagle | aquChr2 | Oct. 2014 | University of Washington aquChr2-1.0.2 | Available |
Gorilla | gorGor5 | Mar. 2016 | University of Washington GSMRT3 | Available |
| gorGor4 | Dec. 2014 | Wellcome Trust Sanger Institute gorGor4 | Available |
| gorGor3 | May 2011 | Wellcome Trust Sanger Institute gorGor3.1 | Available |
Green Monkey | chlSab2 | Mar. 2014 | Vervet Genomics Consortium 1.1 | Available |
Guinea pig | cavPor3 | Feb. 2008 | Broad Institute cavPor3 | Available |
Hedgehog | eriEur2 | May 2012 | Broad Institute EriEur2.0 | Available |
| eriEur1 | Jun. 2006 | Broad Institute Draft_v1 | Available |
Horse | equCab2 | Sep. 2007 | Broad Institute EquCab2 | Available |
| equCab1 | Jan. 2007 | Broad Institute EquCab1 | Available |
Kangaroo rat | dipOrd1 | Jul. 2008 | Baylor/Broad Institute DipOrd1.0 | Available |
Malayan flying lemur | galVar1 | Jul. 2014 | WashU G_variegatus-3.0.2 | Available |
Manatee | triMan1 | Oct. 2011 | Broad Institute TriManLat1.0 | Available |
Marmoset | calJac3 | Mar. 2009 | WUSTL Callithrix_jacchus-v3.2 | Available |
| calJac1 | Jun. 2007 | WUSTL Callithrix_jacchus-v2.0.2 | Available |
Megabat | pteVam1 | Jul. 2008 | Broad Institute Ptevap1.0 | Available |
Microbat | myoLuc2 | Jul. 2010 | Broad Institute MyoLuc2.0 | Available |
Minke whale | balAcu1 | Oct. 2013 | KORDI BalAcu1.0 | Available |
Mouse | mm10 | Dec. 2011 | Genome Reference Consortium GRCm38 | Available |
| mm9 | Jul. 2007 | NCBI Build 37 | Available |
| mm8 | Feb. 2006 | NCBI Build 36 | Available |
| mm7 | Aug. 2005 | NCBI Build 35 | Available |
| mm6 | Mar. 2005 | NCBI Build 34 | Archived |
| mm5 | May 2004 | NCBI Build 33 | Archived |
| mm4 | Oct. 2003 | NCBI Build 32 | Archived |
| mm3 | Feb. 2003 | NCBI Build 30 | Archived |
| mm2 | Feb. 2002 | MGSCv3 | Archived |
| mm1 | Nov. 2001 | MGSCv2 | Archived (data only) |
Mouse lemur | micMur2 | May 2015 | Baylor/Broad Institute Mmur_2.0 | Available |
| micMur1 | Jul. 2007 | Broad Institute MicMur1.0 | Available |
Naked mole-rat | hetGla2 | Jan. 2012 | Broad Institute HetGla_female_1.0 | Available |
| hetGla1 | Jul. 2011 | Beijing Genomics Institute HetGla_1.0 | Available |
Opossum | monDom5 | Oct. 2006 | Broad Institute release MonDom5 | Available |
| monDom4 | Jan. 2006 | Broad Institute release MonDom4 | Available |
| monDom1 | Oct. 2004 | Broad Institute release MonDom1 | Available |
Orangutan | ponAbe2 | Jul. 2007 | WUSTL Pongo_albelii-2.0.2 | Available |
Panda | ailMel1 | Dec. 2009 | BGI-Shenzhen AilMel 1.0 | Available |
Pig | susScr3 | Aug. 2011 | Swine Genome Sequencing Consortium Sscrofa10.2 | Available |
| susScr2 | Nov. 2009 | Swine Genome Sequencing Consortium Sscrofa9.2 | Available |
Pika | ochPri3 | May 2012 | Broad Institute OchPri3.0 | Available |
| ochPri2 | Jul. 2008 | Broad Institute OchPri2 | Available |
Platypus | ornAna2 | Feb. 2007 | WUSTL v5.0.1 | Available |
| ornAna1 | Mar. 2007 | WUSTL v5.0.1 | Available |
Proboscis Monkey | nasLar1 | Nov. 2014 | Proboscis Monkey Functional Genome Consortium Charlie1.0 | Available |
Rabbit | oryCun2 | Apr. 2009 | Broad Institute release OryCun2 | Available |
Rat | rn6 | Jul. 2014 | RGSC Rnor_6.0 | Available |
| rn5 | Mar. 2012 | RGSC Rnor_5.0 | Available |
| rn4 | Nov. 2004 | Baylor College of Medicine HGSC v3.4 | Available |
| rn3 | Jun. 2003 | Baylor College of Medicine HGSC v3.1 | Available |
| rn2 | Jan. 2003 | Baylor College of Medicine HGSC v2.1 | Archived |
| rn1 | Nov. 2002 | Baylor College of Medicine HGSC v1.0 | Archived |
Rhesus | rheMac8 | Nov. 2015 | Baylor College of Medicine HGSC Mmul_8.0.1 | Available |
| rheMac3 | Oct. 2010 | Beijing Genomics Institute CR_1.0 | Available |
| rheMac2 | Jan. 2006 | Baylor College of Medicine HGSC v1.0 Mmul_051212 | Available |
| rheMac1 | Jan. 2005 | Baylor College of Medicine HGSC Mmul_0.1 | Archived |
Rock hyrax | proCap1 | Jul. 2008 | Baylor College of Medicine HGSC Procap1.0 | Available |
Sheep | oviAri3 | Aug. 2012 | ISGC Oar_v3.1 | Available |
| oviAri1 | Feb. 2010 | ISGC Ovis aries 1.0 | Available |
Shrew | sorAra2 | Aug. 2008 | Broad Institute SorAra2.0 | Available |
| sorAra1 | Jun. 2006 | Broad Institute SorAra1.0 | Available |
Sloth | choHof1 | Jul. 2008 | Broad Institute ChoHof1.0 | Available |
Squirrel | speTri2 | Nov. 2011 | Broad Institute SpeTri2.0 | Available |
Squirrel monkey | saiBol1 | Oct. 2011 | Broad Institute SaiBol1.0 | Available |
Tarsier | tarSyr2 | Sep. 2013 | WashU Tarsius_syrichta-2.0.1 | Available |
| tarSyr1 | Aug. 2008 | WUSTL/Broad Institute Tarsyr1.0 | Available |
Tasmanian devil | sarHar1 | Feb. 2011 | Wellcome Trust Sanger Institute Devil_refv7.0 | Available |
Tenrec | echTel2 | Nov. 2012 | Broad Institute EchTel2.0 | Available |
| echTel1 | Jul. 2005 | Broad Institute echTel1 | Available |
Tree shrew | tupBel1 | Dec. 2006 | Broad Institute Tupbel1.0 | Available |
Wallaby | macEug2 | Sep. 2009 | Tammar Wallaby Genome Sequencing Consortium Meug_1.1 | Available |
White rhinoceros | cerSim1 | May 2012 | Broad Institute CerSimSim1.0 | Available |
| | | | |
VERTEBRATES | | | | |
American alligator | allMis1 | Aug. 2012 | Int. Crocodilian Genomes Working Group allMis0.2 | Available |
Atlantic cod | gadMor1 | May 2010 | Genofisk GadMor_May2010 | Available |
Budgerigar | melUnd1 | Sep. 2011 | WUSTL v6.3 | Available |
Chicken | galGal5 | Dev. 2015 | ICGC Gallus-gallus-5.0 | Available |
| galGal4 | Nov. 2011 | ICGC Gallus-gallus-4.0 | Available |
| galGal3 | May 2006 | WUSTL Gallus-gallus-2.1 | Available |
| galGal2 | Feb. 2004 | WUSTL Gallus-gallus-1.0 | Available |
Coelacanth | latCha1 | Aug. 2011 | Broad Institute LatCha1 | Available |
Elephant shark | calMil1 | Dec. 2013 | IMCB Callorhinchus_milli_6.1.3 | Available |
Fugu | fr3 | Oct. 2011 | JGI v5.0 | Available |
| fr2 | Oct. 2004 | JGI v4.0 | Available |
| fr1 | Aug. 2002 | JGI v3.0 | Available |
Lamprey | petMar2 | Sep. 2010 | WUGSC 7.0 | Available |
| petMar1 | Mar. 2007 | WUSTL v3.0 | Available |
Lizard | anoCar2 | May 2010 | Broad Institute AnoCar2 | Available |
| anoCar1 | Feb. 2007 | Broad Institute AnoCar1 | Available |
Medaka | oryLat2 | Oct. 2005 | NIG v1.0 | Available |
Medium ground finch | geoFor1 | Apr. 2012 | BGI GeoFor_1.0 / NCBI 13302 | Available |
Nile tilapia | oreNil2 | Jan. 2011 | Broad Institute Release OreNil1.1 | Available |
Painted turtle | chrPic1 | Dec. 2011 | IPTGSC Chrysemys_picta_bellii-3.0.1 | Available |
Stickleback | gasAcu1 | Feb. 2006 | Broad Institute Release 1.0 | Available |
Tetraodon | tetNig2 | Mar. 2007 | Genoscope v7 | Available |
| tetNig1 | Feb. 2004 | Genoscope v7 | Available |
Tibetan frog | nanPar1 | Mar. 2015 | Beijing Genomics Institute BGI_ZX_20015 | Available |
Turkey | melGal5 | Nov. 2014 | Turkey Genome Consortium v5.0 | Available |
| melGal1 | Dec. 2009 | Turkey Genome Consortium v2.01 | Available |
X. tropicalis | xenTro7 | Sep. 2012 | JGI v.7.0 | Available |
| xenTro3 | Nov. 2009 | JGI v.4.2 | Available |
| xenTro2 | Aug. 2005 | JGI v.4.1 | Available |
| xenTro1 | Oct. 2004 | JGI v.3.0 | Available |
Zebra finch | taeGut2 | Feb. 2013 | WashU taeGut324 | Available |
| taeGut1 | Jul. 2008 | WUSTL v3.2.4 | Available |
Zebrafish | danRer10 | Sep. 2014 | Genome Reference Consortium GRCz10 | Available |
| danRer7 | Jul. 2010 | Sanger Institute Zv9 | Available |
| danRer6 | Dec. 2008 | Sanger Institute Zv8 | Available |
| danRer5 | Jul. 2007 | Sanger Institute Zv7 | Available |
| danRer4 | Mar. 2006 | Sanger Institute Zv6 | Available |
| danRer3 | May 2005 | Sanger Institute Zv5 | Available |
| danRer2 | Jun. 2004 | Sanger Institute Zv4 | Archived |
| danRer1 | Nov. 2003 | Sanger Institute Zv3 | Archived |
| | | | |
DEUTEROSTOMES | | | | |
C. intestinalis | ci3 | Apr. 2011 | Kyoto KH | Available |
C. intestinalis | ci2 | Mar. 2005 | JGI v2.0 | Available |
| ci1 | Dec. 2002 | JGI v1.0 | Available |
Lancelet | braFlo1 | Mar. 2006 | JGI v1.0 | Available |
S. purpuratus | strPur2 | Sep. 2006 | Baylor College of Medicine HGSC v. Spur 2.1 | Available |
| strPur1 | Apr. 2005 | Baylor College of Medicine HGSC v. Spur_0.5 | Available |
| | | | |
INSECTS | | | | |
A. mellifera | apiMel2 | Jan. 2005 | Baylor College of Medicine HGSC v.Amel_2.0 | Available |
| apiMel1 | Jul. 2004 | Baylor College of Medicine HGSC v.Amel_1.2 | Available |
A. gambiae | anoGam1 | Feb. 2003 | IAGP v.MOZ2 | Available |
D. ananassae | droAna2 | Aug. 2005 | Agencourt Arachne release | Available |
| droAna1 | Jul. 2004 | TIGR Celera release | Available |
D. erecta | droEre1 | Aug. 2005 | Agencourt Arachne release | Available |
D. grimshawi | droGri1 | Aug. 2005 | Agencourt Arachne release | Available |
D. melanogaster | dm6 | Aug. 2014 | BDGP Release 6 + ISO1 MT | Available |
| dm3 | Apr. 2006 | BDGP Release 5 | Available |
| dm2 | Apr. 2004 | BDGP Release 4 | Available |
| dm1 | Jan. 2003 | BDGP Release 3 | Available |
D. mojavensis | droMoj2 | Aug. 2005 | Agencourt Arachne release | Available |
| droMoj1 | Aug. 2004 | Agencourt Arachne release | Available |
D. persimilis | droPer1 | Oct. 2005 | Broad Institute release | Available |
D. pseudoobscura | dp3 | Nov. 2004 | FlyBase Release 1.0 | Available |
| dp2 | Aug. 2003 | Baylor College of Medicine HGSC Freeze 1 | Available |
D. sechellia | droSec1 | Oct. 2005 | Broad Institute Release 1.0 | Available |
D. simulans | droSim1 | Apr. 2005 | WUSTL Release 1.0 | Available |
D. virilis | droVir2 | Aug. 2005 | Agencourt Arachne release | Available |
| droVir1 | Jul. 2004 | Agencourt Arachne release | Available |
D. yakuba | droYak2 | Nov. 2005 | WUSTL Release 2.0 | Available |
| droYak1 | Apr. 2004 | WUSTL Release 1.0 | Available |
| | | | |
NEMATODES | | | | |
C. brenneri | caePb2 | Feb. 2008 | WUSTL 6.0.1 | Available |
| caePb1 | Jan. 2007 | WUSTL 4.0 | Available |
C. briggsae | cb3 | Jan. 2007 | WUSTL Cb3 | Available |
| cb1 | Jul. 2002 | WormBase v. cb25.agp8 | Available |
C. elegans | ce11 | Feb. 2013 | C. elegans Sequencing Consortium WBcel235 | Available |
| ce10 | Oct. 2010 | WormBase v. WS220 | Available |
| ce6 | May 2008 | WormBase v. WS190 | Available |
| ce4 | Jan. 2007 | WormBase v. WS170 | Available |
| ce2 | Mar. 2004 | WormBase v. WS120 | Available |
| ce1 | May 2003 | WormBase v. WS100 | Archived |
C. japonica | caeJap1 | Mar. 2008 | WUSTL 3.0.2 | Available |
C. remanei | caeRem3 | May 2007 | WUSTL 15.0.1 | Available |
| caeRem2 | Mar. 2006 | WUSTL 1.0 | Available |
P. pacificus | priPac1 | Feb. 2007 | WUSTL 5.0 | Available |
| | | | |
OTHER | | | | |
Sea Hare | aplCal1 | Sep. 2008 | Broad Release Aplcal2.0 | Available |
Yeast | sacCer3 | April 2011 | SGD April 2011 sequence | Available |
| sacCer2 | June 2008 | SGD June 2008 sequence | Available |
| sacCer1 | Oct. 2003 | SGD 1 Oct 2003 sequence | Available |
| | | | |
VIRUSES | | | | |
Ebola Virus | eboVir3 | June 2014 | Sierra Leone 2014 (G3683/KM034562.1) | Available |
Initial assembly release dates
When will the next assembly be out?
UCSC does not produce its own genome assemblies, but instead obtains them from standard sources.
For example, the human assembly is obtained from NCBI. Because of this, you can expect us to
release a new version of a genome soon after the assembling organization has released the version.
A new assembly release initially consists of the genome sequence and a small set of aligned
annotation tracks. Additional annotation tracks are added as they are obtained or generated. Bulk
downloads of the data are typically available in the first week after the assembly is released in
the browser.
Data sources - UCSC assemblies
Where does UCSC obtain the assembly and annotation data displayed in the Genome
Browser?
All the assembly data displayed in the UCSC Genome Browser are obtained from external sequencing
centers. To determine the data source and version for a given assembly, see the assembly's
description on the Genome Browser Gateway page or the
List of UCSC Genome Releases.
The annotations accompanying an assembly are obtained from a variety of sources. The UCSC Genome
Bioinformatics Group generates several of the tracks; the remainder are contributed by collaborators
at other sites. Each track has an associated description page that credits the authors of the
annotation.
For detailed information about the individuals and organizations who contributed to a specific
assembly, see the Credits page.
Comparison of UCSC and NCBI human assemblies
How do the human assemblies displayed in the UCSC Genome Browser differ from the NCBI human
assemblies?
Recent human assemblies displayed in the Genome Browser (hg10 and higher) are identical to the
NCBI assemblies.
Differences between UCSC and NCBI mouse assemblies
Is the mouse genome assembly displayed in the UCSC Genome Browser the same as the one on the
NCBI website?
The mouse genome assemblies featured in the UCSC Genome Browser are the same as those on the NCBI
web site with one difference: the UCSC versions contain only the reference strain data (C57BL/6J).
NCBI provides data for several additional strains in their builds.
Accessing older assembly versions
I need to access an older version of a genome assembly that's no longer listed in the Genome
Browser menu. What should I do?
In addition to the assembly versions currently available in the Genome Browser, you can access the
data for older assemblies of the browser through our
Downloads page.
Frequency of GenBank data updates
How frequently does UCSC update its databases with new data from GenBank?
Daily and weekly incremental updates of mRNA, RefSeq, and EST data are in place for several of the
more recent Genome Browser assemblies. Assemblies that are not on an incremental update schedule
are updated whenever we load a new assembly or make a major revision to a table.
Data are updated on the following schedule:
-
Native and xeno mRNA and refSeq tracks: updated daily for human and mouse assemblies; updated
approximately weekly for all other organisms
-
EST data: updated weekly on Saturday morning
-
Downloadable data files: updated weekly on Saturday morning
-
Outdated sequences - removed once per quarter
Mirror sites are not required to use an incremental update process, and should not experience
problems as a result of these updates.
Coordinate changes between assemblies
I noticed that the chromosomal coordinates for a particular gene that I'm looking at have
changed since the last time I used your browser. What happened?
A common source of confusion for users arises from mixing up different assemblies. It is very
important to be aware of which assembly you are looking at. Within the Genome Browser display,
assemblies are labeled by organism and date. To look up the corresponding UCSC database name or
NCBI build number, use the release table.
UCSC database labels are of the form hg#, panTro#, etc. The letters designate the
organism, e.g. hg for human genome or panTro for Pan troglodytes. The
number denotes the UCSC assembly version for that organism. For example, ce1 refers to the first
UCSC assembly of the C. elegans genome.
The coordinates of your favorite gene in one assembly may not be the same as those in the next
release of the assembly unless the gene happens to lie on a completely sequenced and unrevised
chromosome. For information on integrating data from one assembly into another, see the
Converting positions between assembly versions section.
Converting positions between assembly versions
I've been researching a specific area of the human genome on the current assembly, and now
you've just released a new version. Is there an easy way to locate my area of interest on the new
assembly?
See the section on converting coordinates for
information on assembly migration tools.
Missing annotation tracks
Why is my favorite annotation track missing from your latest release?
The initial release of a new genome assembly typically contains a small subset of core annotation
tracks. New tracks are added as they are generated. In many cases, our annotation tracks are
contributed by scientists not affiliated with UCSC who must first obtain the sequence, repeatmasked
data, etc. before they can produce their tracks. If you have need of an annotation that has not
appeared on an assembly within a month or so of its release, feel free to send an inquiry to
genome@soe.ucsc.edu.
Messages sent to this address will be posted to the moderated genome mailing list, which is
archived on a SEARCHABLE, PUBLIC
Google Groups
forum.
What next with the human genome?
Now that the human genome is "finished", will there be any more releases?
Rest assured that work will continue. There will be updates to the assembly over the next
several years. This has been the case for all other finished (i.e. essentially complete) genome
assemblies as gaps are closed. For example, the C. elegans genome has been
"finished" for several years, but small bits of sequence are still being added and
corrections are being made. NCBI will continue to coordinate the human genome assemblies in
collaboration with the individual chromosome coordinators, and UCSC will continue to QC the assembly
in conjunction with NCBI (and, to a lesser extent, Ensembl). UCSC, NCBI, Ensembl, and others will
display the new releases on their sites as they become available.
Mouse strain used for mouse genome sequence
What strain of mouse was used for the Mus musculus genome?
C57BL/6J.
UniProt (Swiss-Prot/TrEMBL) display changes
What has UCSC done to accommodate the changes to display IDs recently introduced by UniProt
(aka Swiss-Prot/TrEMBL)?
Here is a detailed description of the database changes we have made to accommodate the UniProt
changes. If you are using the proteinID field in our knownGene table or the
Swiss-Prot/TrEMBL display ID for indexing or cross-referencing other data, we strongly suggest you
transition to the UniProt accession number. These changes will also affect anyone who is mirroring
our site.
-
The latest UniProt Knowledgebase (Release 46.0, Feb. 1st, 2005) was parsed and the results were
stored in a newly created database sp050201.
-
A corresponding database, proteins050201, was constructed based on data in
sp050201 and other protein data sources.
-
Two new symbolic database pointers, uniProt and proteome, have been created to
point to the two new databases mentioned above. Some parts of our programs use the data in these
two DBs.
uniProt ---> sp050201
proteome ---> proteins050201
-
The existing protein symbolic database pointers, swissProt and proteins remain
unchanged. Some parts of our programs still use these two pointers and the data in their
associated protein databases.
swissProt ---> sp041115
proteins ---> proteins041115
-
Two new tables, spOldNew and uniProtAlias, have been added to the proteome
database.
The spOldNew table contains three columns:
- acc -- primary accession number
- oldDisplayId -- old display ID
- newDisplayId -- new display ID
The uniProtAlias table contains four columns:
- acc -- UniProt accession number
- alias -- alias (could be acc, old and new display IDs, etc.)
- aliasSrc -- source of the alias type
- aliasSrcDate -- date of the source data
The aliases include primary accessions, secondary accessions new display IDs, old display IDs,
and old display IDs corresponding to new secondary accessions.
-
Three new functions have been added to kent/src/hg/spDb.c:
char *oldSpDisplayId(char *newSpDisplayId);
/* Convert from new Swiss-Prot display ID to old display ID */
char *newSpDisplayId(char *oldSpDisplayId);
/* Convert from old Swiss-Prot display ID to new display ID */
char *uniProtFindPrimAcc(char *id);
/* Return primary accession given an alias. */
The uniProtFindPrimAcc() function is enabled by the new uniProtAlias
table.
We anticipate additional changes down the road and may eventually merge the two sets of protein DB
pointers into one set.
Currently, the proteinID field of the knownGene table for existing genome releases (hg15,
hg16, hg17, mm3, mm4, mm5, rn2, and rn3) uses old Swiss-Prot/TrEMBL display IDs (pre-1 Feb. '05).
In the future, we may change this field to show the UniProt accession number. Should we choose not
to change the content of the proteinID field, we may consider adding a new field,
uniProtAcc.
If you have any questions about these changes and their impact on your work, please email us at
genome@soe.ucsc.edu. Mirror sites may send questions to
genome-mirror@soe.ucsc.edu.
Messages sent to these addresses will be posted to the
moderated mailing lists, which are archived on a SEARCHABLE, PUBLIC
Google Groups
forum.