<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<id>tag:old.nabble.com,2006:forum-11537</id>
	<title>Nabble - Bio.net - Genbankb</title>
	<updated>2009-10-19T13:38:18Z</updated>
	<link rel="self" type="application/atom+xml" href="http://old.nabble.com/Bio.net---Genbankb-f11537.xml" />
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Bio.net---Genbankb-f11537.html" />
	<subtitle type="html">GENBANK-BB/bionet.molbio.genbank</subtitle>
	
<entry>
	<id>tag:old.nabble.com,2006:post-25966393</id>
	<title>GenBank Release 174.0 Now Available</title>
	<published>2009-10-19T13:38:18Z</published>
	<updated>2009-10-19T13:38:18Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 174.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 174.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 174.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 174.0 occurred on 10/16/2009. Uncompressed,
&lt;br&gt;the Release 174.0 flatfiles require roughly 416 GB (sequence files only)
&lt;br&gt;or 445 GB (including the 'short directory', 'index' and the *.txt
&lt;br&gt;files). The ASN.1 data require approximately 376 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp;Base Pairs &amp;nbsp; &amp;nbsp;Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 173 &amp;nbsp; &amp;nbsp; &amp;nbsp;Aug 2009 &amp;nbsp;106533156756 &amp;nbsp;108431692
&lt;br&gt;&amp;nbsp; 174 &amp;nbsp; &amp;nbsp; &amp;nbsp;Oct 2009 &amp;nbsp;108560236506 &amp;nbsp;110946879
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp;Base Pairs &amp;nbsp; &amp;nbsp;Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 173 &amp;nbsp; &amp;nbsp; &amp;nbsp;Aug 2009 &amp;nbsp;148165117763 &amp;nbsp;48443067
&lt;br&gt;&amp;nbsp; 174 &amp;nbsp; &amp;nbsp; &amp;nbsp;Oct 2009 &amp;nbsp;149348923035 &amp;nbsp;48119301
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 56 days between the close dates for GenBank Releases 173.0
&lt;br&gt;and 174.0, the non-WGS/non-CON portion of GenBank grew by 2,027,079,750
&lt;br&gt;basepairs and by 2,515,187 sequence records. During that same period,
&lt;br&gt;467,858 records were updated. An average of about 53,269 non-WGS/non-CON
&lt;br&gt;records were added and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 173.0 and 174.0, the WGS component of GenBank grew by
&lt;br&gt;1,183,805,272 basepairs, while the number of sequence records decreased
&lt;br&gt;by 323,766 (due to the re-assembly of some WGS projects into fewer, but
&lt;br&gt;larger, records). 
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 174.0 and Upcoming Changes) have been appended
&lt;br&gt;below for your convenience.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;* &amp;nbsp;GenBank 'index' files are now provided without any EST content, and
&lt;br&gt;&amp;nbsp; &amp;nbsp;without most GSS content. See Section 1.3.4 of the release notes for
&lt;br&gt;&amp;nbsp; &amp;nbsp;further details.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;NCBI is considering ceasing support for the index files, so we
&lt;br&gt;&amp;nbsp; &amp;nbsp;encourage affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 174.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank
&lt;br&gt;release notes (gbrel.txt) whenever a release is being obtained. Check
&lt;br&gt;to make sure that the date and release number in the header of the
&lt;br&gt;release notes are current (eg: August 15 2009, 174.0). If they are
&lt;br&gt;not, interrupt the remaining transfers and then request assistance from
&lt;br&gt;the NCBI Service Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;174.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25966393&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Michael Kimelman, Ilya Dondoshansky, Sergey Zhdanov
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;1.3 Important Changes in Release 174.0
&lt;br&gt;&lt;br&gt;1.3.1 Organizational changes
&lt;br&gt;&lt;br&gt;The total number of sequence data files increased by 31 with this
&lt;br&gt;release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the CON division is now composed of 132 files (+1)
&lt;br&gt;&amp;nbsp; - the ENV division is now composed of &amp;nbsp;22 files (+5)
&lt;br&gt;&amp;nbsp; - the EST division is now composed of 892 files (+6)
&lt;br&gt;&amp;nbsp; - the GSS division is now composed of 350 files (+10)
&lt;br&gt;&amp;nbsp; - the INV division is now composed of &amp;nbsp;19 files (+1)
&lt;br&gt;&amp;nbsp; - the PAT division is now composed of &amp;nbsp;85 files (+6)
&lt;br&gt;&amp;nbsp; - the VRL division is now composed of &amp;nbsp;13 files (+1)
&lt;br&gt;&amp;nbsp; - the VRT division is now composed of &amp;nbsp;20 files (+1)
&lt;br&gt;&lt;br&gt;The total number of 'index' files increased by 3 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the AUT (author) index is now composed of 69 files (+3)
&lt;br&gt;&lt;br&gt;1.3.2 New class of /exception value
&lt;br&gt;&lt;br&gt;&amp;nbsp; As of this October 2009 release, a new class of /exception is
&lt;br&gt;available
&lt;br&gt;for use on coding region features:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;/exception=&amp;quot;annotated by transcript or proteomic data&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; This exception can be used if: a) the protein sequence (presented via
&lt;br&gt;a
&lt;br&gt;coding region's /translation qualifier) differs from the conceptual
&lt;br&gt;translation; b) the quality of the DNA sequencing is high; and c) there
&lt;br&gt;is evidence at the transcript or proteome level that the presented
&lt;br&gt;protein
&lt;br&gt;*is* actually expressed by the organism.
&lt;br&gt;&lt;br&gt;&amp;nbsp; An inference qualifier of type &amp;quot;similar to&amp;quot; should be used in
&lt;br&gt;conjunction
&lt;br&gt;with this new type of exception, to indicate the supporting
&lt;br&gt;EST/cDNA/protein
&lt;br&gt;sequence.
&lt;br&gt;&lt;br&gt;&amp;nbsp; An update to the definition of the /exception qualifier which
&lt;br&gt;incorporates
&lt;br&gt;the new class will be provided via the GenBank newsgroup within a few
&lt;br&gt;weeks.
&lt;br&gt;&lt;br&gt;1.3.3 /haplogroup qualifier introduced
&lt;br&gt;&lt;br&gt;&amp;nbsp; A haplotype is a combination of alleles at multiple loci that are
&lt;br&gt;transmitted together on the same chromosome. A haplogroup is a group of
&lt;br&gt;similar haplotypes that share a common ancestor with a single nucleotide
&lt;br&gt;polymorphism mutation. The majority of submitters of complete human
&lt;br&gt;mitochondrial genomes provide information about their haplogroup rather
&lt;br&gt;than
&lt;br&gt;their haplotype. Stable mtDNA polymorphic variants clustered together in
&lt;br&gt;specific combination form a haplogroup. 
&lt;br&gt;&lt;br&gt;&amp;nbsp; To accommodate this need, a new /haplogroup qualifier has been
&lt;br&gt;introduced
&lt;br&gt;as of this October 2009 GenBank Release.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A formal definition of /haplogroup will be provided via the GenBank
&lt;br&gt;newsgroup within a few weeks.
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;1.3.4 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which
&lt;br&gt;accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data
&lt;br&gt;product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January
&lt;br&gt;2005
&lt;br&gt;seem to support this: the index files were transferred only half as
&lt;br&gt;frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index
&lt;br&gt;file
&lt;br&gt;format also lead us to suspect that they have little serious use by the
&lt;br&gt;user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several
&lt;br&gt;different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that
&lt;br&gt;originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of
&lt;br&gt;the
&lt;br&gt;release, including all EST and GSS records, however the file contents
&lt;br&gt;are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index
&lt;br&gt;data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank
&lt;br&gt;releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank
&lt;br&gt;newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25966393&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.5 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems,
&lt;br&gt;depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel.
&lt;br&gt;Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many
&lt;br&gt;GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its
&lt;br&gt;own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;seventy-eight of the GSS flatfiles in Release 174.0. Consider
&lt;br&gt;gbgss273.seq :
&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;October 15 2009
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 174.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;87177 loci, &amp;nbsp; &amp;nbsp;64203737 bases, from &amp;nbsp; &amp;nbsp;87177 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the
&lt;br&gt;file
&lt;br&gt;has been renamed as &amp;quot;273&amp;quot; based on the number of files dumped from the
&lt;br&gt;other
&lt;br&gt;system. &amp;nbsp;We hope to resolve this discrepancy at some point, but the
&lt;br&gt;priority
&lt;br&gt;is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;1.4.1 /artificial_location qualifier introduced
&lt;br&gt;&lt;br&gt;&amp;nbsp; A new qualifier, intended for use in limited genome-scale annotation
&lt;br&gt;contexts, will be introduced as of GenBank Release 175.0 in December
&lt;br&gt;2009:
&lt;br&gt;&lt;br&gt;Qualifier	/artificial_location
&lt;br&gt;&lt;br&gt;Definition	indicates that location of the CDS or mRNA is modified
&lt;br&gt;to
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; adjust for the presence of a frameshift or internal stop
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; codon and not because of biological processing between
&lt;br&gt;the
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; regions. &amp;nbsp;This is expected to be used only for
&lt;br&gt;genome-scale
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; annotation, either because a heterogeneous population
&lt;br&gt;was
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sequenced, or because the feature is in a region of
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; low-quality sequence.
&lt;br&gt;&lt;br&gt;1.4.2 New /pseudogene and /non_functional qualifiers
&lt;br&gt;&lt;br&gt;&amp;nbsp; The GenBank 173.0 release notes described an anticipated conversion
&lt;br&gt;of the /pseudo qualifier to /non_functional, based on the results of
&lt;br&gt;the May 2009 INSDC annual meeting:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;quot;Because the term &amp;quot;pseudo&amp;quot; is often assumed to mean 'pseudogene',
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;the /pseudo qualifier will be renamed as /non_functional, to
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;better reflect its actual usage in the sequence databases.&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; During follow-up discussions, the INSDC members decided that existing
&lt;br&gt;uses of /pseudo can include both of the possible meanings of the term,
&lt;br&gt;and that a more conservative course would be to introduce two new
&lt;br&gt;qualifiers:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; /pseudogene
&lt;br&gt;&amp;nbsp; &amp;nbsp; /non_functional
&lt;br&gt;&lt;br&gt;&amp;nbsp; Sequence submission tools will be updated to utilize these, and
&lt;br&gt;the ambiguous /pseudo qualifier will be deprecated. If it is 
&lt;br&gt;possible, existing instances of /pseudo would then be converted
&lt;br&gt;to one of the two new qualifiers.
&lt;br&gt;&lt;br&gt;&amp;nbsp; /pseudogene and /non_functional will become legal for the Feature
&lt;br&gt;Table as of the April 2010 GenBank Release.
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25966393&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-174.0-Now-Available-tp25966393p25966393.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25960518</id>
	<title>GenBank 174.0 Close-Of-Data</title>
	<published>2009-10-19T08:28:49Z</published>
	<updated>2009-10-19T08:28:49Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 174.0 occurred on
&lt;br&gt;Friday October 16 2009 at approximately 1:30am EDT.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc1016.aso, nc1016.flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Note: Release processing often does not begin until sometime during
&lt;br&gt;business hours on the close date. As a result, a number of sequence
&lt;br&gt;records processed *after* 1:30am are likely to be present in the
&lt;br&gt;GenBank 174.0 release files, even though they are &amp;quot;post-close&amp;quot; .
&lt;br&gt;&lt;br&gt;Similarly, the first GenBank Incremental Update that is generated
&lt;br&gt;after the close date is likely to contain a number of sequence
&lt;br&gt;records that are unchanged, compared to their appearance in the
&lt;br&gt;release files.
&lt;br&gt;&lt;br&gt;We expect to make the GenBank 174.0 data files available sometime
&lt;br&gt;later today.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25960518&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-174.0-Close-Of-Data-tp25960518p25960518.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25165309</id>
	<title>GenBank Release 173.0 Now Available</title>
	<published>2009-08-26T19:24:45Z</published>
	<updated>2009-08-26T19:24:45Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 173.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 173.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 173.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 173.0 occured on 08/21/2009. Uncompressed,
&lt;br&gt;the Release 173.0 flatfiles require roughly 408 GB (sequence files only)
&lt;br&gt;or 437 GB (including the 'short directory', 'index' and the *.txt
&lt;br&gt;files). The ASN.1 data require approximately 370 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 172 &amp;nbsp; &amp;nbsp; &amp;nbsp;Jun 2009 &amp;nbsp;105277306080 &amp;nbsp;106073709
&lt;br&gt;&amp;nbsp; 173 &amp;nbsp; &amp;nbsp; &amp;nbsp;Aug 2009 &amp;nbsp;106533156756 &amp;nbsp; 108431692
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 172 &amp;nbsp; &amp;nbsp; &amp;nbsp;Jun 2009 &amp;nbsp;145959997864 &amp;nbsp;49063546
&lt;br&gt;&amp;nbsp; 173 &amp;nbsp; &amp;nbsp; &amp;nbsp;Aug 2009 &amp;nbsp;148165117763 &amp;nbsp;48443067
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 72 days between the close dates for GenBank Releases 172.0
&lt;br&gt;and 173.0, the non-WGS/non-CON portion of GenBank grew by 1,255,850,676
&lt;br&gt;basepairs and by 2,357,983 sequence records. During that same period,
&lt;br&gt;654,396 records were updated. An average of about 41,838
&lt;br&gt;non-WGS/non-CON records were added and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 172.0 and 173.0, the WGS component of GenBank grew
&lt;br&gt;by 2,205,119,899 basepairs, while the number of sequence records decreased
&lt;br&gt;by 620,479 (due to some WGS projects being re-assembled into fewer,
&lt;br&gt;but larger, records). 
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 173.0 and Upcoming Changes) have been appended
&lt;br&gt;below for your convenience.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;* &amp;nbsp;GenBank 'index' files are now provided without any EST content, and
&lt;br&gt;&amp;nbsp; &amp;nbsp;without most GSS content. See Section 1.3.5 of the release notes for
&lt;br&gt;&amp;nbsp; &amp;nbsp;further details.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;NCBI is considering ceasing support for the index files, so we
&lt;br&gt;&amp;nbsp; &amp;nbsp;encourage affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 173.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank
&lt;br&gt;release notes (gbrel.txt) whenever a release is being obtained. Check
&lt;br&gt;to make sure that the date and release number in the header of the
&lt;br&gt;release notes are current (eg: August 15 2009, 173.0). If they are
&lt;br&gt;not, interrupt the remaining transfers and then request assistance from
&lt;br&gt;the NCBI Service Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;173.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25165309&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Michael Kimelman, Ilya Dondoshansky, Sergey Zhdanov
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;1.3 Important Changes in Release 173.0
&lt;br&gt;&lt;br&gt;1.3.1 Organizational changes
&lt;br&gt;&lt;br&gt;The total number of sequence data files increased by 28 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the BCT division is now composed of &amp;nbsp;48 files (+3)
&lt;br&gt;&amp;nbsp; - the ENV division is now composed of &amp;nbsp;17 files (+1)
&lt;br&gt;&amp;nbsp; - the EST division is now composed of 886 files (+11)
&lt;br&gt;&amp;nbsp; - the GSS division is now composed of 340 files (+3)
&lt;br&gt;&amp;nbsp; - the HTC division is now composed of &amp;nbsp;15 files (+1)
&lt;br&gt;&amp;nbsp; - the HTG division is now composed of 134 files (+1)
&lt;br&gt;&amp;nbsp; - the PAT division is now composed of &amp;nbsp;79 files (+6)
&lt;br&gt;&amp;nbsp; - the PLN division is now composed of &amp;nbsp;38 files (-1)
&lt;br&gt;&amp;nbsp; - the PRI division is now composed of &amp;nbsp;40 files (+1)
&lt;br&gt;&amp;nbsp; - the TSA division is now composed of &amp;nbsp; 2 files (+1)
&lt;br&gt;&amp;nbsp; - the VRT division is now composed of &amp;nbsp;19 files (+1)
&lt;br&gt;&lt;br&gt;Note: The decline in the number of PLN division files is due to the fact that
&lt;br&gt;twelve records representing the chromosomes of Oryza sativa Japonica Group
&lt;br&gt;have been converted to a CON-division representation. See section 1.3.2 for
&lt;br&gt;more details.
&lt;br&gt;&lt;br&gt;The total number of 'index' files increased by 1 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the AUT (author) index is now composed of 66 files (+1)
&lt;br&gt;&lt;br&gt;1.3.2 Twelve Oryza sativa Japonica Group records moved to CON division.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Prior to Release 173.0, the sequence records for 12 Oryza sativa Japonica
&lt;br&gt;Group chromosomes had been 'traditional' records, totalling over 382 Mbp of
&lt;br&gt;sequence data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; However, on August 7 2009, the records were converted to a CON-division
&lt;br&gt;representation, with CONTIG-line join() statements that describe how the
&lt;br&gt;chromosomes are constructed from underlying PAC and fosmid sequences. The
&lt;br&gt;records involved are:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008207 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;45064769 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008208 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;36823111 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008209 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;37257345 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008210 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;35863200 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008211 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;30039014 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008212 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;32124789 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008213 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;30357780 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008214 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;28530027 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008215 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;23843360 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008216 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;23661561 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008217 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;30828668 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; AP008218 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;27757321 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; CON 07-AUG-2009
&lt;br&gt;&lt;br&gt;&amp;nbsp; Due to this change, these chromosomal records are now present in file
&lt;br&gt;gbcon1.seq of the CON division. In addition, the total number of Oryza sativa
&lt;br&gt;Japonica Group basepairs listed in the table in Section 2.2.7 has declined
&lt;br&gt;by 370,461,808 bases, compared to Release 172.0. The table intentionally
&lt;br&gt;excludes CON-division records, to avoid double-counting of underlying sequence
&lt;br&gt;records and any larger-scale objects that are constructed from them.
&lt;br&gt;&lt;br&gt;1.3.3 File header problem for EST and GSS files
&lt;br&gt;&lt;br&gt;&amp;nbsp; A new method of generating the EST and GSS sequence files has been 
&lt;br&gt;developed, which has reduced the time required to generate a GenBank
&lt;br&gt;release by one day. However, a minor problem in the formatting of the
&lt;br&gt;header of the sequence files was inadvertently introduced : a leading 
&lt;br&gt;space exists before the filename on the very first line. For example:
&lt;br&gt;&lt;br&gt;&amp;nbsp;GBGSS100.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;August 15 2009 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
&lt;br&gt;It should be:
&lt;br&gt;&lt;br&gt;GBGSS100.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;August 15 2009 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
&lt;br&gt;&lt;br&gt;The problem effects all EST files and most GSS files. We had hoped to
&lt;br&gt;repair this formatting issue for Release 173.0, but the code changes
&lt;br&gt;which were expected to fix the problem did not perform correctly. It
&lt;br&gt;is hoped that this issue will be resolved for Release 174.0 .
&lt;br&gt;&lt;br&gt;1.3.4 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January 2005
&lt;br&gt;seem to support this: the index files were transferred only half as frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index file
&lt;br&gt;format also lead us to suspect that they have little serious use by the user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of the
&lt;br&gt;release, including all EST and GSS records, however the file contents are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25165309&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.5 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems, depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel. Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;seventy-two of the GSS flatfiles in Release 173.0. Consider gbgss269.seq :
&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; August 15 2009
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 173.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;87186 loci, &amp;nbsp; &amp;nbsp;64231458 bases, from &amp;nbsp; &amp;nbsp;87186 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the file
&lt;br&gt;has been renamed as &amp;quot;269&amp;quot; based on the number of files dumped from the other
&lt;br&gt;system. &amp;nbsp;We hope to resolve this discrepancy at some point, but the priority
&lt;br&gt;is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;1.4.1 New class of /exception value
&lt;br&gt;&lt;br&gt;&amp;nbsp; As of October 2009, a new class of /exception will be available for use
&lt;br&gt;on coding region features:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;/exception=&amp;quot;annotated by transcript or proteomic data&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; This exception will be used for situations in which: the protein sequence
&lt;br&gt;(presented via the coding region's /translation qualifier) differs from the
&lt;br&gt;conceptual translation; the quality of the DNA sequencing is high; and there
&lt;br&gt;is evidence at the transcript or proteome level that the presented protein
&lt;br&gt;*is* actually expressed by the organism.
&lt;br&gt;&lt;br&gt;&amp;nbsp; An inference qualifier of type &amp;quot;similar to&amp;quot; should be used in conjunction
&lt;br&gt;with this new type of exception, to indicate the supporting EST/cDNA/protein
&lt;br&gt;sequence.
&lt;br&gt;&lt;br&gt;&amp;nbsp; An updated definition of /exception, with examples for this new type
&lt;br&gt;of exception value, will be provided via the GenBank newsgroup prior to
&lt;br&gt;the October release.
&lt;br&gt;&lt;br&gt;1.4.2 /haplogroup qualifier introduced
&lt;br&gt;&lt;br&gt;&amp;nbsp; A haplotype is a combination of alleles at multiple loci that are transmitted
&lt;br&gt;together on the same chromosome. A haplogroup is a group of similar haplotypes
&lt;br&gt;that share a common ancestor with a single nucleotide polymorphism mutation.
&lt;br&gt;The majority of submitters of complete human mitochondrial genomes provide
&lt;br&gt;information about their haplogroup rather than their haplotype. Stable mtDNA
&lt;br&gt;polymorphic variants clustered together in specific combination form a haplogroup. 
&lt;br&gt;&lt;br&gt;&amp;nbsp; To accomodate this need, a new /haplogroup qualifier will be introduced as
&lt;br&gt;of GenBank Release 174.0 in October 2009.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A formal definition of /haplogroup was not available as of the creation
&lt;br&gt;of these release notes, so it will be provided via the GenBank newsgroup
&lt;br&gt;prior to the October release.
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;1.4.3 /artificial_location qualifier introduced
&lt;br&gt;&lt;br&gt;&amp;nbsp; A new qualifier, intended for use in limited genome-scale annotation
&lt;br&gt;contexts, will be introduced as of GenBank Release 175.0 in December 2009:
&lt;br&gt;&lt;br&gt;Qualifier	/artificial_location
&lt;br&gt;&lt;br&gt;Definition	indicates that location of the CDS or mRNA is modified to
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; adjust for the presence of a frameshift or internal stop
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; codon and not because of biological processing between the
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; regions. &amp;nbsp;This is expected to be used only for genome-scale
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; annotation, either because a heterogeneous population was
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sequenced, or because the feature is in a region of
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; low-quality sequence.
&lt;br&gt;&lt;br&gt;1.4.4 /pseudo qualifier renamed as /non_functional
&lt;br&gt;&lt;br&gt;&amp;nbsp; Because the term &amp;quot;pseudo&amp;quot; is often assumed to mean &amp;quot;pseudogene&amp;quot;, the
&lt;br&gt;/pseudo qualifier will be renamed as /non_functional, to better reflect
&lt;br&gt;its actual usage in the sequence databases. This change will take place
&lt;br&gt;as of the April 2010 GenBank Release.
&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25165309&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-173.0-Now-Available-tp25165309p25165309.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25141773</id>
	<title>GenBank 173.0 Close-Of-Data</title>
	<published>2009-08-25T10:32:47Z</published>
	<updated>2009-08-25T10:32:47Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 173.0 occurred on
&lt;br&gt;Friday August 21 2009 at approximately 1:30am EDT.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc0821.aso, nc0821.flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Note: Release processing often does not begin until sometime during
&lt;br&gt;business hours on the close date. As a result, a number of sequence
&lt;br&gt;records processed *after* 1:30am are likely to be present in the
&lt;br&gt;GenBank 173.0 release files, even though they are &amp;quot;post-close&amp;quot; .
&lt;br&gt;&lt;br&gt;Similarly, the first GenBank Incremental Update that is generated
&lt;br&gt;after the close date is likely to contain a number of sequence
&lt;br&gt;records that are unchanged, compared to their appearance in the
&lt;br&gt;release files.
&lt;br&gt;&lt;br&gt;We expect to make the GenBank 173.0 data files available sometime
&lt;br&gt;tomorrow.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date,
&lt;br&gt;and for delays that have affected the delivery of the GB 173 product.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25141773&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-173.0-Close-Of-Data-tp25141773p25141773.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-24059657</id>
	<title>GenBank Updates : Three day outage for GenBank Incremental Updates : 0613-0615</title>
	<published>2009-06-16T10:24:05Z</published>
	<updated>2009-06-16T10:24:05Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Due to an error in a configuration file, attempts to generate the 0613,
&lt;br&gt;0614,
&lt;br&gt;and 0615 GenBank Incremental Update (GIU) products failed.
&lt;br&gt;&lt;br&gt;This problem was resolved on Tuesday June 16 at approximately 1:00pm
&lt;br&gt;EDT,
&lt;br&gt;and a set of 0616 data files were made available at the NCBI FTP site at
&lt;br&gt;approximately 1:19pm EDT.
&lt;br&gt;&lt;br&gt;The 0616 GIU contains all GenBank records new/modified since 1:33am EDT
&lt;br&gt;on June 12.
&lt;br&gt;&lt;br&gt;Our apologies for the inconvenience that this outage caused.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=24059657&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Updates-%3A-Three-day-outage-for-GenBank-Incremental-Updates-%3A-0613-0615-tp24059657p24059657.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-24005069</id>
	<title>GenBank Release 172.0 Now Available</title>
	<published>2009-06-12T12:41:53Z</published>
	<updated>2009-06-12T12:41:53Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 172.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 172.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 172.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 172.0 occured on 06/10/2009. Uncompressed,
&lt;br&gt;the Release 172.0 flatfiles require roughly 403 GB (sequence files only)
&lt;br&gt;or 431 GB (including the 'short directory', 'index' and the *.txt
&lt;br&gt;files). The ASN.1 data require approximately 366 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 171 &amp;nbsp; &amp;nbsp; &amp;nbsp;Apr 2009 &amp;nbsp;102980268709 &amp;nbsp;103335421
&lt;br&gt;&amp;nbsp; 172 &amp;nbsp; &amp;nbsp; &amp;nbsp;Jun 2009 &amp;nbsp;105277306080 &amp;nbsp;106073709
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 171 &amp;nbsp; &amp;nbsp; &amp;nbsp;Apr 2009 &amp;nbsp;144522542010 &amp;nbsp;48948309
&lt;br&gt;&amp;nbsp; 172 &amp;nbsp; &amp;nbsp; &amp;nbsp;Jun 2009 &amp;nbsp;145959997864 &amp;nbsp;49063546
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 60 days between the close dates for GenBank Releases 171.0
&lt;br&gt;and 172.0, the non-WGS/non-CON portion of GenBank grew by 2,297,037,371
&lt;br&gt;basepairs and by 2,738,288 sequence records. During that same period,
&lt;br&gt;3,680,844 records were updated. An average of about 106,985
&lt;br&gt;non-WGS/non-CON records were added and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 171.0 and 172.0, the WGS component of GenBank grew
&lt;br&gt;by 1,437,455,854 basepairs and by 115,237 sequence records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 172.0 and Upcoming Changes) have been appended
&lt;br&gt;below for your convenience.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;* &amp;nbsp;GenBank 'index' files are now provided without any EST content, and
&lt;br&gt;&amp;nbsp; &amp;nbsp;without most GSS content. See Section 1.3.5 of the release notes for
&lt;br&gt;&amp;nbsp; &amp;nbsp;further details.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;NCBI is considering ceasing support for the index files, so we
&lt;br&gt;&amp;nbsp; &amp;nbsp;encourage affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 172.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank
&lt;br&gt;release notes (gbrel.txt) whenever a release is being obtained. Check
&lt;br&gt;to make sure that the date and release number in the header of the
&lt;br&gt;release notes are current (eg: April 15 2009, 172.0). If they are
&lt;br&gt;not, interrupt the remaining transfers and then request assistance from
&lt;br&gt;the NCBI Service Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;172.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=24005069&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Michael Kimelman, Ilya Dondoshansky, Sergey Zhdanov
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;1.3 Important Changes in Release 172.0
&lt;br&gt;&lt;br&gt;1.3.1 PROJECT linetype has been replaced by DBLINK
&lt;br&gt;&lt;br&gt;&amp;nbsp; The DBLINK linetype was introduced as of the February 2009
&lt;br&gt;GenBank Release 170.0, to accomodate links to Project IDs and
&lt;br&gt;the NCBI Trace Assembly Archive, and new types of links that
&lt;br&gt;will arise in the future.
&lt;br&gt;&lt;br&gt;DBLINK co-existed with its predecessor linetype (PROJECT) for GenBank
&lt;br&gt;releases 170.0 and 171.0 . With Release 172.0, however, the PROJECT
&lt;br&gt;line has been completely removed, as this record illustrates:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&lt;br&gt;1.3.2 Organizational changes
&lt;br&gt;&lt;br&gt;The total number of sequence data files increased by 36 with this
&lt;br&gt;release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the BCT division is now composed of &amp;nbsp;45 files (+5)
&lt;br&gt;&amp;nbsp; - the ENV division is now composed of &amp;nbsp;16 files (+3)
&lt;br&gt;&amp;nbsp; - the EST division is now composed of 875 files (+15)
&lt;br&gt;&amp;nbsp; - the GSS division is now composed of 337 files (+2)
&lt;br&gt;&amp;nbsp; - the INV division is now composed of &amp;nbsp;18 files (+3)
&lt;br&gt;&amp;nbsp; - the PAT division is now composed of &amp;nbsp;73 files (+6)
&lt;br&gt;&amp;nbsp; - the PLN division is now composed of &amp;nbsp;39 files (+1)
&lt;br&gt;&amp;nbsp; - the VRL division is now composed of &amp;nbsp;12 files (+1)
&lt;br&gt;&lt;br&gt;The total number of 'index' files increased by 2 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the JOU (journal) index is now composed of 7 files (+1)
&lt;br&gt;&amp;nbsp; - the KEY (keyword) index is now composed of 4 files (+1)
&lt;br&gt;&lt;br&gt;1.3.3 File header problem for EST and GSS files
&lt;br&gt;&lt;br&gt;&amp;nbsp; A new method of generating the EST and GSS sequence files has been 
&lt;br&gt;developed, which has reduced the time required to generate a GenBank
&lt;br&gt;release by one day. However, a minor problem in the formatting of the
&lt;br&gt;header of the sequence files was inadvertently introduced : a leading 
&lt;br&gt;space exists before the filename on the very first line. For example:
&lt;br&gt;&lt;br&gt;&amp;nbsp;GBGSS100.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;June 15 2009
&lt;br&gt;&lt;br&gt;It should be:
&lt;br&gt;&lt;br&gt;GBGSS100.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;June 15 2009
&lt;br&gt;&lt;br&gt;&lt;br&gt;The problem effects all EST files and most GSS files. We had hoped to
&lt;br&gt;repair this formatting issue for Release 172.0, but the code changes
&lt;br&gt;just missed the cut-off for release generation. The problem will
&lt;br&gt;definitely be resolved for Release 173.0 .
&lt;br&gt;&lt;br&gt;1.3.4 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which
&lt;br&gt;accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data
&lt;br&gt;product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January
&lt;br&gt;2005
&lt;br&gt;seem to support this: the index files were transferred only half as
&lt;br&gt;frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index
&lt;br&gt;file
&lt;br&gt;format also lead us to suspect that they have little serious use by the
&lt;br&gt;user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several
&lt;br&gt;different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that
&lt;br&gt;originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of
&lt;br&gt;the
&lt;br&gt;release, including all EST and GSS records, however the file contents
&lt;br&gt;are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index
&lt;br&gt;data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank
&lt;br&gt;releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank
&lt;br&gt;newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=24005069&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.5 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems,
&lt;br&gt;depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel.
&lt;br&gt;Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many
&lt;br&gt;GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its
&lt;br&gt;own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;seventy-two of the GSS flatfiles in Release 172.0. Consider gbgss266.seq
&lt;br&gt;:
&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; June 15 2009
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 172.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;87198 loci, &amp;nbsp; &amp;nbsp;64267715 bases, from &amp;nbsp; &amp;nbsp;87198 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the
&lt;br&gt;file
&lt;br&gt;has been renamed as &amp;quot;266&amp;quot; based on the number of files dumped from the
&lt;br&gt;other
&lt;br&gt;system. &amp;nbsp;We hope to resolve this discrepancy at some point, but the
&lt;br&gt;priority
&lt;br&gt;is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;1.4.1 Qualifier changes from INSDC 2009
&lt;br&gt;&lt;br&gt;&amp;nbsp; Several qualifier changes for the Feature Table were agreed to at the
&lt;br&gt;annual INSDC meeting in May 2009. Complete details and implementation
&lt;br&gt;timelines will be made available in the August GenBank Release Notes.
&lt;br&gt;In the meantime, here is an early preview of the changes that were
&lt;br&gt;approved:
&lt;br&gt;&lt;br&gt;New value for /exception:
&lt;br&gt;&lt;br&gt;&amp;nbsp; /exception=&amp;quot;annotated by transcript or proteomic data&amp;quot;
&lt;br&gt;&lt;br&gt;/pseudo qualifier to be re-named as /non_functional
&lt;br&gt;&lt;br&gt;&amp;nbsp; Because the term &amp;quot;pseudo&amp;quot; is often equated with &amp;quot;pseudogene&amp;quot;, the
&lt;br&gt;&amp;nbsp; /pseudo qualifier will be renamed as /non_functional, to better
&lt;br&gt;&amp;nbsp; reflect its actual usage.
&lt;br&gt;&lt;br&gt;New /haplogroup qualifier defined for the source feature
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=24005069&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-172.0-Now-Available-tp24005069p24005069.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-23997960</id>
	<title>GenBank 172.0 Close-Of-Data</title>
	<published>2009-06-11T14:45:19Z</published>
	<updated>2009-06-11T14:45:19Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 172.0 occurred on
&lt;br&gt;Wednesday June 10 2009 at approximately 1:30am EDT.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc0610.aso, nc0610.flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Note: Release processing often does not begin until sometime during
&lt;br&gt;business hours on the close date. As a result, a number of sequence
&lt;br&gt;records processed *after* 1:30am are likely to be present in the
&lt;br&gt;GenBank 172.0 release files, even though they are &amp;quot;post-close&amp;quot; .
&lt;br&gt;&lt;br&gt;Similarly, the first GenBank Incremental Update that is generated
&lt;br&gt;after the close date is likely to contain a number of sequence
&lt;br&gt;records that are unchanged, compared to their appearance in the
&lt;br&gt;release files.
&lt;br&gt;&lt;br&gt;We expect to make the GenBank 172.0 data files available sometime
&lt;br&gt;tomorrow.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=23997960&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-172.0-Close-Of-Data-tp23997960p23997960.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-23050447</id>
	<title>GenBank Release 171.0 Now Available</title>
	<published>2009-04-14T16:14:06Z</published>
	<updated>2009-04-14T16:14:06Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 171.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 171.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 171.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 171.0 occured on 04/10/2009. Uncompressed,
&lt;br&gt;the Release 171.0 flatfiles require roughly 395 GB (sequence files only)
&lt;br&gt;or 422 GB (including the 'short directory', 'index' and the *.txt
&lt;br&gt;files). The ASN.1 data require approximately 360 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 170 &amp;nbsp; &amp;nbsp; &amp;nbsp;Feb 2009 &amp;nbsp;101467270308 &amp;nbsp;101815678
&lt;br&gt;&amp;nbsp; 171 &amp;nbsp; &amp;nbsp; &amp;nbsp;Apr 2009 &amp;nbsp;102980268709 &amp;nbsp;103335421
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 170 &amp;nbsp; &amp;nbsp; &amp;nbsp;Feb 2009 &amp;nbsp;143797800446 &amp;nbsp;49036947
&lt;br&gt;&amp;nbsp; 171 &amp;nbsp; &amp;nbsp; &amp;nbsp;Apr 2009 &amp;nbsp;144522542010 &amp;nbsp;48948309
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 56 days between the close dates for GenBank Releases 170.0
&lt;br&gt;and 171.0, the non-WGS/non-CON portion of GenBank grew by 1,512,998,401
&lt;br&gt;basepairs and by 1,519,743 sequence records. During that same period,
&lt;br&gt;1,040,778 records were updated. An average of about 45,723
&lt;br&gt;non-WGS/non-CON records were added and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 170.0 and 171.0, the WGS component of GenBank grew
&lt;br&gt;by 724,741,564 basepairs and the number of records **decreased** by
&lt;br&gt;88,638. A decrease in the overall number of WGS records can
&lt;br&gt;occasionally occur, as a result of genome re-assemblies which yield
&lt;br&gt;larger (but fewer) records, and due to the submission of completed
&lt;br&gt;genomes which supercede WGS projects.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 171.0 and Upcoming Changes) have been appended
&lt;br&gt;below for your convenience.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;* &amp;nbsp;This is the final release for which the PROJECT linetype
&lt;br&gt;&amp;nbsp; &amp;nbsp;will be present in GenBank flatfiles. The new DBLINK linetype
&lt;br&gt;&amp;nbsp; &amp;nbsp;replaces PROJECT. Post-171.0 GenBank Update files will
&lt;br&gt;&amp;nbsp; &amp;nbsp;have only the new DBLINK linetype within about two weeks,
&lt;br&gt;&amp;nbsp; &amp;nbsp;and the same will be true of all future complete releases.
&lt;br&gt;&amp;nbsp; &amp;nbsp;See Section 1.3.1 for more information.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;GenBank 'index' files are now provided without any EST content, and
&lt;br&gt;&amp;nbsp; &amp;nbsp;without most GSS content. See Section 1.3.5 of the release notes for
&lt;br&gt;&amp;nbsp; &amp;nbsp;further details.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;NCBI is considering ceasing support for the index files, so we
&lt;br&gt;&amp;nbsp; &amp;nbsp;encourage affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 171.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank
&lt;br&gt;release notes (gbrel.txt) whenever a release is being obtained. Check
&lt;br&gt;to make sure that the date and release number in the header of the
&lt;br&gt;release notes are current (eg: April 15 2009, 171.0). If they are
&lt;br&gt;not, interrupt the remaining transfers and then request assistance from
&lt;br&gt;the NCBI Service Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;171.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=23050447&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Michael Kimelman, Ilya Dondoshansky, Sergey Zhdanov
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;1.3 Important Changes in Release 171.0
&lt;br&gt;&lt;br&gt;1.3.1 PROJECT linetype to be replaced by DBLINK (April 2009)
&lt;br&gt;&lt;br&gt;&amp;nbsp; The new DBLINK linetype was introduced as of the February 2009
&lt;br&gt;GenBank Release 170.0.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Genome Project IDs and Trace Assembly Archive IDs are now presented
&lt;br&gt;via DBLINK, in conjunction with the legacy PROJECT linetype, as this
&lt;br&gt;mock-up for CP000964 illustrates:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=23050447&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;PROJECT and DBLINK have co-existed for GenBank releases 170.0 and 171.0
&lt;br&gt;.
&lt;br&gt;But subsequent to this April release, the PROJECT line will be removed
&lt;br&gt;from the
&lt;br&gt;flatfile format. In its final state, the above mock-up for CP000964
&lt;br&gt;becomes:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=23050447&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;&amp;nbsp; In summary: The PROJECT linetype will cease to be displayed in
&lt;br&gt;post-171.0
&lt;br&gt;GenBank Update data products within the next two weeks. The same will be
&lt;br&gt;true of all future complete GenBank releases.
&lt;br&gt;&lt;br&gt;1.3.2 Organizational changes
&lt;br&gt;&lt;br&gt;The total number of sequence data files increased by 34 with this
&lt;br&gt;release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the BCT division is now composed of &amp;nbsp;40 files (+2)
&lt;br&gt;&amp;nbsp; - the CON division is now composed of 131 files (+3)
&lt;br&gt;&amp;nbsp; - the ENV division is now composed of &amp;nbsp;13 files (+1)
&lt;br&gt;&amp;nbsp; - the EST division is now composed of 860 files (+22)
&lt;br&gt;&amp;nbsp; - the GSS division is now composed of 335 files (+13)
&lt;br&gt;&amp;nbsp; - the INV division is now composed of &amp;nbsp;15 files (+1)
&lt;br&gt;&amp;nbsp; - the MAM division is now composed of &amp;nbsp; 5 files (+1)
&lt;br&gt;&amp;nbsp; - the PAT division is now composed of &amp;nbsp;67 files (+2)
&lt;br&gt;&amp;nbsp; - the PLN division is now composed of &amp;nbsp;38 files (+1)
&lt;br&gt;&lt;br&gt;1.3.3 CON-division records for 'segmented sets' restored.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A previously overlooked problem in Release 170.0 processing resulted
&lt;br&gt;in
&lt;br&gt;the exclusion of roughly 14,000 CON-division entries for a type of
&lt;br&gt;record
&lt;br&gt;referred to as a 'segmented set'. Segmented sets consist of small
&lt;br&gt;sequence
&lt;br&gt;fragments of an incompletely sequenced molecule, packaged together, with
&lt;br&gt;a
&lt;br&gt;top-level sequence that specifies the order of the underlying fragments.
&lt;br&gt;That top-level sequence can be displayed as a CON division record.
&lt;br&gt;AH000819
&lt;br&gt;is an example:
&lt;br&gt;&lt;br&gt;ASN.1 for AH000819:
&lt;br&gt;&lt;br&gt;&amp;nbsp;
&lt;br&gt;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/nuccore/405204?report=asn1&amp;log$=seqview&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/nuccore/405204?report=asn1&amp;log$=seqview&lt;/a&gt;&lt;br&gt;&lt;br&gt;Flatfile view of the nine sequenced fragments:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/nuccore/405204&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/nuccore/405204&lt;/a&gt;&lt;br&gt;&lt;br&gt;These (largely legacy) CON-division records have been restored in
&lt;br&gt;Release 171.0 and can be found in gbcon131.seq .
&lt;br&gt;&lt;br&gt;1.3.4 File header problem for EST and GSS files
&lt;br&gt;&lt;br&gt;&amp;nbsp; A new method of generating the EST and GSS sequence files has been 
&lt;br&gt;developed, which has reduced the time required to generate a GenBank
&lt;br&gt;release by one day. However, a minor problem in the formatting of the
&lt;br&gt;header of the sequence files was inadvertently introduced : a leading 
&lt;br&gt;space exists before the filename on the very first line. For example:
&lt;br&gt;&lt;br&gt;&amp;nbsp;GBGSS100.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;April 15 2009
&lt;br&gt;&lt;br&gt;&lt;br&gt;It should be:
&lt;br&gt;&lt;br&gt;GBGSS100.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;April 15 2009
&lt;br&gt;&lt;br&gt;&lt;br&gt;The problem effects all EST files and most GSS files. We doubt that it
&lt;br&gt;will cause significant problems for users, however the problem will be
&lt;br&gt;corrected for our next release.
&lt;br&gt;&lt;br&gt;1.3.5 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which
&lt;br&gt;accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data
&lt;br&gt;product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January
&lt;br&gt;2005
&lt;br&gt;seem to support this: the index files were transferred only half as
&lt;br&gt;frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index
&lt;br&gt;file
&lt;br&gt;format also lead us to suspect that they have little serious use by the
&lt;br&gt;user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several
&lt;br&gt;different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that
&lt;br&gt;originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of
&lt;br&gt;the
&lt;br&gt;release, including all EST and GSS records, however the file contents
&lt;br&gt;are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index
&lt;br&gt;data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank
&lt;br&gt;releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank
&lt;br&gt;newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=23050447&amp;i=3&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.6 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems,
&lt;br&gt;depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel.
&lt;br&gt;Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many
&lt;br&gt;GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its
&lt;br&gt;own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;seventy-two of the GSS flatfiles in Release 171.0. Consider gbgss264.seq
&lt;br&gt;:
&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;April 15 2009
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 171.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;87206 loci, &amp;nbsp; &amp;nbsp;64290178 bases, from &amp;nbsp; &amp;nbsp;87206 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the
&lt;br&gt;file
&lt;br&gt;has been renamed as &amp;quot;254&amp;quot; based on the number of files dumped from the
&lt;br&gt;other
&lt;br&gt;system. &amp;nbsp;We will work to resolve this discrepancy in future releases,
&lt;br&gt;but the
&lt;br&gt;priority is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;&amp;nbsp; There are no scheduled format changes for GenBank.
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=23050447&amp;i=4&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-171.0-Now-Available-tp23050447p23050447.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-23050446</id>
	<title>GenBank 171.0 Close-Of-Data</title>
	<published>2009-04-14T09:43:10Z</published>
	<updated>2009-04-14T09:43:10Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 171.0 occurred on
&lt;br&gt;Friday April 10 2009 at approximately 1:30am EDT.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc0410.aso, nc0410.flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Note: Release processing often does not begin until sometime during
&lt;br&gt;business hours on the close date. As a result, a number of sequence
&lt;br&gt;records processed *after* 1:30am are likely to be present in the
&lt;br&gt;GenBank 171.0 release files, even though they are &amp;quot;post-close&amp;quot; .
&lt;br&gt;&lt;br&gt;Similarly, the first GenBank Incremental Update that is generated
&lt;br&gt;after the close date is likely to contain a number of sequence
&lt;br&gt;records that are unchanged, compared to their appearance in the
&lt;br&gt;release files.
&lt;br&gt;&lt;br&gt;We expect to make the GenBank 171.0 data files available later
&lt;br&gt;today.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=23050446&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-171.0-Close-Of-Data-tp23050446p23050446.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-22787495</id>
	<title>RE: enquire</title>
	<published>2009-03-30T09:30:47Z</published>
	<updated>2009-03-30T09:30:47Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Dear Yang Jiang,
&lt;br&gt;&lt;br&gt;There are many data products at the NCBI FTP site, generated by
&lt;br&gt;many different NCBI groups. 
&lt;br&gt;&lt;br&gt;The RefSeq/UniProt mapping file that you mention is not a
&lt;br&gt;GenBank-related product, so it might be better to send your
&lt;br&gt;inquiry to the NCBI Help Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22787495&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;I hope this helps.
&lt;br&gt;&lt;br&gt;Regards,
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;&amp;gt; -----Original Message-----
&lt;br&gt;&amp;gt; From: jybackup [mailto:&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22787495&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;jybackup@...&lt;/a&gt;]
&lt;br&gt;&amp;gt; Sent: Friday, March 27, 2009 2:41 AM
&lt;br&gt;&amp;gt; To: &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22787495&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;genbankb@...&lt;/a&gt;
&lt;br&gt;&amp;gt; Subject: [Genbank-bb] enquire
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; Dear administrator,
&lt;br&gt;&amp;gt; i was not very clearly about the meaning of the word &amp;quot;identical&amp;quot;,could
&lt;br&gt;you tell me? in NCBI ftp,there is
&lt;br&gt;&amp;gt; a document which map the ncbi refseq accno to uniprot acc, it refers
&lt;br&gt;that it get the identical taxid and
&lt;br&gt;&amp;gt; sequence,but the length of ncbi doesn't equal to UNIPROT'S.can you
&lt;br&gt;tell me why.identical means equal
&lt;br&gt;&amp;gt; to?waiting for your reply.
&lt;br&gt;&amp;gt; best wishes
&lt;br&gt;&amp;gt; yang jiang
&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22787495&amp;i=3&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/enquire-tp22753509p22787495.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-22753509</id>
	<title>enquire</title>
	<published>2009-03-26T23:41:17Z</published>
	<updated>2009-03-26T23:41:17Z</updated>
	<author>
		<name>jybackup</name>
	</author>
	<content type="html">&lt;div&gt;Dear administrator,&lt;br&gt;i was not very clearly about the meaning of the word &quot;identical&quot;,could you tell me? in NCBI ftp,there is a document which map the ncbi refseq accno to uniprot acc, it refers that it get the identical taxid and sequence,but the length of ncbi doesn't equal to UNIPROT'S.can you tell me why.identical means equal to?waiting for your reply.&lt;br&gt;best wishes&lt;br&gt;yang jiang&lt;br&gt;&lt;br&gt;&lt;/div&gt;&lt;br&gt;&lt;!-- footer --&gt;&lt;br&gt;&lt;span title=&quot;neteasefooter&quot; /&gt;&lt;hr /&gt;
&lt;a href=&quot;http://www.yeah.net&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;网易邮箱，中国第一大电子邮件服务商&lt;/a&gt;
&lt;/span&gt;&lt;br /&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22753509&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/enquire-tp22753509p22753509.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-22091999</id>
	<title>GenBank Release 170.0 Now Available</title>
	<published>2009-02-18T17:03:07Z</published>
	<updated>2009-02-18T17:03:07Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 170.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 170.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 170.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 170.0 occured on 02/13/2009. Uncompressed,
&lt;br&gt;the Release 170.0 flatfiles require roughly 390 GB (sequence files only)
&lt;br&gt;or 417 GB (including the 'short directory', 'index' and the *.txt
&lt;br&gt;files). The ASN.1 data require approximately 356 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 169 &amp;nbsp; &amp;nbsp; &amp;nbsp;Dec 2008 &amp;nbsp; 99116431942 &amp;nbsp; 98868465
&lt;br&gt;&amp;nbsp; 170 &amp;nbsp; &amp;nbsp; &amp;nbsp;Feb 2009 &amp;nbsp;101467270308 &amp;nbsp;101815678
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 169 &amp;nbsp; &amp;nbsp; &amp;nbsp;Dec 2008 &amp;nbsp;141374971004 &amp;nbsp;48394838
&lt;br&gt;&amp;nbsp; 170 &amp;nbsp; &amp;nbsp; &amp;nbsp;Feb 2009 &amp;nbsp;143797800446 &amp;nbsp;49036947
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 64 days between the close dates for GenBank Releases 169.0
&lt;br&gt;and 170.0, the non-WGS/non-CON portion of GenBank grew by 2,350,838,366
&lt;br&gt;basepairs and by 2,947,213 sequence records. During that same period,
&lt;br&gt;1,318,594 records were updated. An average of about 66,653
&lt;br&gt;non-WGS/non-CON
&lt;br&gt;records were added and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 169.0 and 170.0, the WGS component of GenBank grew by
&lt;br&gt;2,422,829,442 basepairs and by 642,109 records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 170.0 and Upcoming Changes) have been appended
&lt;br&gt;below for your convenience.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;* &amp;nbsp;A new linetype ( DBLINK ) has been implemented with this February
&lt;br&gt;2009
&lt;br&gt;&amp;nbsp; &amp;nbsp;release. See Section 1.3.2 for information. Records with the DBLINK
&lt;br&gt;&amp;nbsp; &amp;nbsp;line will begin to appear via the GenBank Updates shortly after 
&lt;br&gt;&amp;nbsp; &amp;nbsp;February 18, 2009.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;GenBank 'index' files are now provided without any EST content, and
&lt;br&gt;&amp;nbsp; &amp;nbsp;without most GSS content. See Section 1.3.3 of the release notes for
&lt;br&gt;&amp;nbsp; &amp;nbsp;further details.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;NCBI is considering ceasing support for the index files, so we
&lt;br&gt;&amp;nbsp; &amp;nbsp;encourage affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 170.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank
&lt;br&gt;release notes (gbrel.txt) whenever a release is being obtained. Check
&lt;br&gt;to make sure that the date and release number in the header of the
&lt;br&gt;release notes are current (eg: February 15 2009, 170.0). If they are
&lt;br&gt;not, interrupt the remaining transfers and then request assistance from
&lt;br&gt;the NCBI Service Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;170.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22091999&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Michael Kimelman, Ilya Dondoshansky
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;1.3.1 Organizational changes
&lt;br&gt;&lt;br&gt;The total number of sequence data files increased by 34 with this
&lt;br&gt;release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the BCT division is now composed of &amp;nbsp;38 files (+4)
&lt;br&gt;&amp;nbsp; - the ENV division is now composed of &amp;nbsp;12 files (+1)
&lt;br&gt;&amp;nbsp; - the EST division is now composed of 838 files (+16)
&lt;br&gt;&amp;nbsp; - the GSS division is now composed of 322 files (+3)
&lt;br&gt;&amp;nbsp; - the HTG division is now composed of 133 files (-1)
&lt;br&gt;&amp;nbsp; - the PAT division is now composed of &amp;nbsp;65 files (+10)
&lt;br&gt;&amp;nbsp; - the VRT division is now composed of &amp;nbsp;18 files (+1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; Note that the HTG division has one less file than previously.
&lt;br&gt;Occasional
&lt;br&gt;decreases like this are now possible, given a new method of flatfile
&lt;br&gt;processing that was adopted in December 2008. Essentially, records
&lt;br&gt;within
&lt;br&gt;a particular sequence file (eg, gbhtg134.seq) can become absorbed in a
&lt;br&gt;different file, depending on the overall distribution of the number of
&lt;br&gt;records in the files that make up a division. Such decreases are
&lt;br&gt;reviewed,
&lt;br&gt;to ensure that they do not indicate a loss of release content.
&lt;br&gt;&lt;br&gt;In addition, the total number of index files increased by 3 with this
&lt;br&gt;release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the AUT index is now composed of 65 files (+3)
&lt;br&gt;&lt;br&gt;1.3.2 New DBLINK linetype legal as of February 2009.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The PROJECT linetype allows a sequence record to be linked to
&lt;br&gt;information
&lt;br&gt;about the sequencing project that generated the data which ultimately
&lt;br&gt;resulted in the record's submission to the International Nucleotide
&lt;br&gt;Sequence
&lt;br&gt;Database ( INSD; see &lt;a href=&quot;http://www.insdc.org&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.insdc.org&lt;/a&gt;&amp;nbsp;) .
&lt;br&gt;&lt;br&gt;&amp;nbsp; This complete bacterial GenBank record illustrates the use of the
&lt;br&gt;PROJECT
&lt;br&gt;line:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;&lt;br&gt;&amp;nbsp; When viewed on the web in NCBI's Entrez:Nucleotide, the record's
&lt;br&gt;project
&lt;br&gt;identifier (28471) links to an entry in the Genome Project Database
&lt;br&gt;(GPDB) :
&lt;br&gt;&lt;br&gt;&amp;nbsp;
&lt;br&gt;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;cmd=Retrieve&amp;dopt=&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;cmd=Retrieve&amp;dopt=&lt;/a&gt;&lt;br&gt;Overview&amp;uid=28471
&lt;br&gt;&lt;br&gt;where information about the sequencing center, the bacterium, and other
&lt;br&gt;GenBank records (eg, plasmids) associated with the sequencing project
&lt;br&gt;can be found.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Since the introduction of PROJECT, the scope of the &amp;quot;Genome Project&amp;quot;
&lt;br&gt;Database has expanded, to include projects that are not necessarily
&lt;br&gt;targetted
&lt;br&gt;to the sequencing of a complete genome.
&lt;br&gt;&lt;br&gt;&amp;nbsp; In addition, there can be other resources which underlie an INSD
&lt;br&gt;sequence
&lt;br&gt;record, such as the Trace Assembly Archive at the NCBI:
&lt;br&gt;&lt;br&gt;&amp;nbsp;
&lt;br&gt;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&amp;f=t&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&amp;f=t&lt;/a&gt;&lt;br&gt;ree&amp;m=main&amp;s=tree
&lt;br&gt;&lt;br&gt;&amp;nbsp; Because of the expanded scope of the GPDB, and because we anticipate a
&lt;br&gt;need
&lt;br&gt;to link to more resources than just the GPDB, the PROJECT linetype is
&lt;br&gt;going to
&lt;br&gt;be replaced by a new linetype:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;DBLINK
&lt;br&gt;&lt;br&gt;&amp;nbsp; With this release, the new DBLINK linetype is now legal for GenBank
&lt;br&gt;sequence
&lt;br&gt;records, and it will begin to appear in GenBank Update files, soon after
&lt;br&gt;GenBank 170.0 is made available.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The Genome Project ID and the Trace Assembly Archive ID will be
&lt;br&gt;presented
&lt;br&gt;via DBLINK, and the existing PROJECT line will continue to be displayed,
&lt;br&gt;as
&lt;br&gt;illustrated in the below mock-up of CP000964 :
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22091999&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;Note: Use of the Trace Assembly Archive is still in its early stages, so
&lt;br&gt;only
&lt;br&gt;a few records are expected to have that type of DBLINK in the short
&lt;br&gt;term.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For those who process sequence data in NCBI's ASN.1 format: The
&lt;br&gt;underlying representation for (Genome) Project IDs will remain
&lt;br&gt;unchanged.
&lt;br&gt;There will be no changes to the ASN.1 User-object that is used to store
&lt;br&gt;them:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; user {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;GenomeProjectsDB&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; data {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;ProjectID&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; int 28471 } ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;ParentID&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; int 0 } } } ,
&lt;br&gt;&lt;br&gt;&amp;nbsp; However, to support linkages to other resources, such as the Trace
&lt;br&gt;Assembly Archive, a new &amp;quot;DBLink&amp;quot; User-object will be introduced:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; user {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;DBLink&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; data {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;Trace Assembly Archive&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ints { 123456 } } } }
&lt;br&gt;&lt;br&gt;&amp;nbsp; As new types of linkages are established, they will be added to
&lt;br&gt;the DBLink User-object, and displayed via the DBLINK linetype in
&lt;br&gt;the GenBank flatfile format. 
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is a possibility that the GenomeProjectsDB User-object
&lt;br&gt;might someday be incorporated into the new DBLink User-object.
&lt;br&gt;But at the moment, there are no firm plans to do so.
&lt;br&gt;&lt;br&gt;1.3.3 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which
&lt;br&gt;accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data
&lt;br&gt;product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January
&lt;br&gt;2005
&lt;br&gt;seem to support this: the index files were transferred only half as
&lt;br&gt;frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index
&lt;br&gt;file
&lt;br&gt;format also lead us to suspect that they have little serious use by the
&lt;br&gt;user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several
&lt;br&gt;different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that
&lt;br&gt;originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of
&lt;br&gt;the
&lt;br&gt;release, including all EST and GSS records, however the file contents
&lt;br&gt;are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index
&lt;br&gt;data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank
&lt;br&gt;releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank
&lt;br&gt;newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22091999&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.4 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems,
&lt;br&gt;depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel.
&lt;br&gt;Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many
&lt;br&gt;GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its
&lt;br&gt;own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;sixty-nine of the GSS flatfiles in Release 170.0. Consider gbgss254.seq
&lt;br&gt;:
&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; February 15 2009
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 170.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;87215 loci, &amp;nbsp; &amp;nbsp;64322450 bases, from &amp;nbsp; &amp;nbsp;87215 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the
&lt;br&gt;file
&lt;br&gt;has been renamed as &amp;quot;254&amp;quot; based on the number of files dumped from the
&lt;br&gt;other
&lt;br&gt;system. &amp;nbsp;We will work to resolve this discrepancy in future releases,
&lt;br&gt;but the
&lt;br&gt;priority is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;1.4.1 PROJECT linetype to be replaced by DBLINK (April 2009)
&lt;br&gt;&lt;br&gt;&amp;nbsp; The new DBLINK linetype has been introduced as of the February 2009
&lt;br&gt;GenBank Release 170.0.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Genome Project IDs and Trace Assembly Archive IDs can now be presented
&lt;br&gt;via DBLINK, in conjunction with the legacy PROJECT linetype:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22091999&amp;i=3&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;PROJECT and DBLINK will co-exist for one GenBank release, until Release
&lt;br&gt;171.0
&lt;br&gt;(April 15, 2009), at which point the PROJECT line will be removed from
&lt;br&gt;the
&lt;br&gt;flatfile format. In its final state, the above mock-up for CP000964
&lt;br&gt;becomes:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22091999&amp;i=4&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;&amp;nbsp; In summary: The PROJECT linetype will be replaced by DBLINK as of
&lt;br&gt;GenBank
&lt;br&gt;Release 171.0 in April 2009.
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22091999&amp;i=5&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-170.0-Now-Available-tp22091999p22091999.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-22084375</id>
	<title>GenBank 170.0 Close-Of-Data</title>
	<published>2009-02-18T09:43:51Z</published>
	<updated>2009-02-18T09:43:51Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 170.0 occurred on
&lt;br&gt;Friday February 13 2009 at approximately 1:30am EST.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc0213.aso, nc0213.flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Note: Release processing often does not begin until sometime during
&lt;br&gt;business hours on the close date. As a result, a number of sequence
&lt;br&gt;records processed *after* 1:30am are likely to be present in the
&lt;br&gt;GenBank 170.0 release files, even though they are &amp;quot;post-close&amp;quot; .
&lt;br&gt;&lt;br&gt;Similarly, the first GenBank Incremental Update that is generated
&lt;br&gt;after the close date is likely to contain a number of sequence
&lt;br&gt;records that are unchanged, compared to their appearance in the
&lt;br&gt;release files.
&lt;br&gt;&lt;br&gt;We expect to make the GenBank 170.0 data files available later
&lt;br&gt;today.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=22084375&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-170.0-Close-Of-Data-tp22084375p22084375.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-21255026</id>
	<title>GenBank WGS projects : WGS-master records provided as of January 12</title>
	<published>2009-01-02T08:51:53Z</published>
	<updated>2009-01-02T08:51:53Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Starting on January 12th of 2009, a new type of data file will be
&lt;br&gt;made available for GenBank WGS (Whole Genome Shotgun) projects,
&lt;br&gt;in the WGS areas of our FTP site.
&lt;br&gt;&lt;br&gt;Since their inception in 2002, WGS projects have had an associated
&lt;br&gt;'WGS-master' record, which summarizes the content of a project. Here
&lt;br&gt;is a link to the master for project ABRT (Philippine tarsier) :
&lt;br&gt;&lt;br&gt;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&amp;id=203287470&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&amp;id=203287470&lt;/a&gt;&lt;br&gt;&lt;br&gt;And here is an excerpt from that master record:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; ABRT010000000 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1201173 rc &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; linear &amp;nbsp; PRI
&lt;br&gt;18-NOV-2008
&lt;br&gt;DEFINITION &amp;nbsp;Tarsius syrichta, whole genome shotgun sequence.
&lt;br&gt;ACCESSION &amp;nbsp; ABRT000000000
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; ABRT000000000.1 &amp;nbsp;GI:203287470
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:20339
&lt;br&gt;KEYWORDS &amp;nbsp; &amp;nbsp;WGS.
&lt;br&gt;SOURCE &amp;nbsp; &amp;nbsp; &amp;nbsp;Tarsius syrichta (Philippine tarsier)
&lt;br&gt;....
&lt;br&gt;WGS &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ABRT010000001-ABRT011201173
&lt;br&gt;WGS_SCAFLD &amp;nbsp;GG299110-GG500513
&lt;br&gt;//
&lt;br&gt;&lt;br&gt;&lt;br&gt;This flatfile representation of the ABRT WGS-master does *not*
&lt;br&gt;conform to the specifications for normal GenBank flatfiles.
&lt;br&gt;For example:
&lt;br&gt;&lt;br&gt;- It has neither sequence data nor a CONTIG join() statement.
&lt;br&gt;&lt;br&gt;- The 'rc' (record count) value on the LOCUS line represents the
&lt;br&gt;&amp;nbsp; number of sequence-overlap contig records in the project, rather
&lt;br&gt;&amp;nbsp; than a basepair count.
&lt;br&gt;&lt;br&gt;- Undocumented linetypes 'WGS' and 'WGS_SCAFLD' exist, which 
&lt;br&gt;&amp;nbsp; provide the ranges of accession numbers for the 1,201,173
&lt;br&gt;&amp;nbsp; sequence-overlap contig sequences in the project, and for
&lt;br&gt;&amp;nbsp; the 201,404 CON-division records that have been constructed
&lt;br&gt;&amp;nbsp; from the ABRT01 contigs.
&lt;br&gt;&lt;br&gt;Nonetheless, a WGS-master record has utility because it provides
&lt;br&gt;an overview of many important characteristics of a WGS project,
&lt;br&gt;in a simple and concise way.
&lt;br&gt;&lt;br&gt;The ASN.1 version of WGS-master records will be placed in:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/ncbi-asn1/wgs
&lt;br&gt;&lt;br&gt;and the file naming convention will be:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; wgs.XXXX.mstr.bse.gz
&lt;br&gt;&lt;br&gt;These files will contain a gzip-compressed, binary ASN.1 Seq-entry
&lt;br&gt;value. 'XXXX' represents a four-character WGS Project Code, such as
&lt;br&gt;ABYH.
&lt;br&gt;&lt;br&gt;The GenBank flatfile representation of WGS-master records will be
&lt;br&gt;placed in:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/genbank/wgs
&lt;br&gt;&lt;br&gt;and the file naming convention will be:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; wgs.XXXX.mstr.gbff.gz
&lt;br&gt;&lt;br&gt;Here is an example of the filenames that one would encounter for
&lt;br&gt;the ABYH project in the /genbank/wgs area, as of January 12:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; wgs.ABYH.1.gbff.gz
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; wgs.ABYH.1.gnp.gz
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; wgs.ABYH.1.qscore.gz
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; wgs.ABYH.mstr.gbff.gz
&lt;br&gt;&lt;br&gt;If you process the GenBank flatfile representation of WGS projects,
&lt;br&gt;and you are *not* interested in WGS-masters, you may need to add
&lt;br&gt;a filtration step to remove the master files from automated FTP
&lt;br&gt;transfers (due to similarities in filename patterns).
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=21255026&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-WGS-projects-%3A-WGS-master-records-provided-as-of-January-12-tp21255026p21255026.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-21031273</id>
	<title>GenBank Release 169.0 Now Available</title>
	<published>2008-12-16T01:17:17Z</published>
	<updated>2008-12-16T01:17:17Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 169.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 169.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 169.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 169.0 occured on 12/11/2008. Uncompressed,
&lt;br&gt;the Release 169.0 flatfiles require roughly 381 GB (sequence files only)
&lt;br&gt;or 407 GB (including the 'short directory', 'index' and the *.txt
&lt;br&gt;files). The ASN.1 data require approximately 349 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 168 &amp;nbsp; &amp;nbsp; &amp;nbsp;Oct 2008 &amp;nbsp; 97381682336 &amp;nbsp;96400790
&lt;br&gt;&amp;nbsp; 169 &amp;nbsp; &amp;nbsp; &amp;nbsp;Dec 2008 &amp;nbsp; 99116431942 &amp;nbsp;98868465
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 168 &amp;nbsp; &amp;nbsp; &amp;nbsp;Oct 2008 &amp;nbsp;136085973423 &amp;nbsp;46108952
&lt;br&gt;&amp;nbsp; 169 &amp;nbsp; &amp;nbsp; &amp;nbsp;Dec 2008 &amp;nbsp;141374971004 &amp;nbsp;48394838
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 45 days between the close dates for GenBank Releases 168.0
&lt;br&gt;and 169.0, the non-WGS/non-CON portion of GenBank grew by 1,734,749,606
&lt;br&gt;basepairs and by 2,467,675 sequence records. During that same period,
&lt;br&gt;4,183,486 records were updated. An average of about 147,803 non-WGS/non-CON
&lt;br&gt;records were added and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 168.0 and 169.0, the WGS component of GenBank grew by
&lt;br&gt;5,288,997,581 basepairs and by 2,285,886 records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 169.0 and Upcoming Changes) have been appended
&lt;br&gt;below for your convenience.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;* &amp;nbsp;A new linetype ( DBLINK ) will be implemented as of the February 2009
&lt;br&gt;&amp;nbsp; &amp;nbsp;release. See Section 1.4.1 for information.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;GenBank 'index' files are now provided without any EST content, and
&lt;br&gt;&amp;nbsp; &amp;nbsp;without most GSS content. See Section 1.3.12 of the release notes for
&lt;br&gt;&amp;nbsp; &amp;nbsp;further details.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;NCBI is considering ceasing support for the index files, so we
&lt;br&gt;&amp;nbsp; &amp;nbsp;encourage affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 169.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank
&lt;br&gt;release notes (gbrel.txt) whenever a release is being obtained. Check
&lt;br&gt;to make sure that the date and release number in the header of the
&lt;br&gt;release notes are current (eg: December 15 2008, 169.0). If they are
&lt;br&gt;not, interrupt the remaining transfers and then request assistance from
&lt;br&gt;the NCBI Service Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;169.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=21031273&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Michael Kimelman, Ilya Dondoshansky
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;1.3 Important Changes in Release 169.0
&lt;br&gt;&lt;br&gt;1.3.1 Organizational changes
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in a post to the GenBank Newsgroup on December 11, 2008, the
&lt;br&gt;number of sequence data files associated with this GenBank release has 
&lt;br&gt;increased by a larger-than-usual amount, due to changes in the way that data
&lt;br&gt;are processed and stored at NCBI. Thus the per-division increases described
&lt;br&gt;below reflect both the storage-related changes *and* actual database growth.
&lt;br&gt;&lt;br&gt;&amp;nbsp; In addition, GenBank divisions which consist of a single data file now
&lt;br&gt;include a number in their names:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;gbtsa.seq -&amp;gt; gbtsa1.seq
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;gbuna.seq -&amp;gt; gbuna1.seq
&lt;br&gt;&lt;br&gt;The total number of sequence data files increased by 94 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the BCT division is now composed of &amp;nbsp;34 files (+2)
&lt;br&gt;&amp;nbsp; - the CON division is now composed of 128 files (+25)
&lt;br&gt;&amp;nbsp; - the ENV division is now composed of &amp;nbsp;11 files (+1)
&lt;br&gt;&amp;nbsp; - the EST division is now composed of 822 files (+20)
&lt;br&gt;&amp;nbsp; - the GSS division is now composed of 319 files (+10)
&lt;br&gt;&amp;nbsp; - the HTC division is now composed of &amp;nbsp;13 files (+1)
&lt;br&gt;&amp;nbsp; - the HTG division is now composed of 134 files (+12)
&lt;br&gt;&amp;nbsp; - the INV division is now composed of &amp;nbsp;14 files (+1)
&lt;br&gt;&amp;nbsp; - the PAT division is now composed of &amp;nbsp;55 files (+8)
&lt;br&gt;&amp;nbsp; - the PLN division is now composed of &amp;nbsp;37 files (+5)
&lt;br&gt;&amp;nbsp; - the PRI division is now composed of &amp;nbsp;39 files (+3)
&lt;br&gt;&amp;nbsp; - the ROD division is now composed of &amp;nbsp;28 files (+2)
&lt;br&gt;&amp;nbsp; - the STS division is now composed of &amp;nbsp;20 files (+2)
&lt;br&gt;&amp;nbsp; - the VRL division is now composed of &amp;nbsp;11 files (+1)
&lt;br&gt;&amp;nbsp; - the VRT division is now composed of &amp;nbsp;17 files (+1)
&lt;br&gt;&lt;br&gt;In addition, the total number of index files increased by 3 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the AUT index is now composed of 62 files (+3)
&lt;br&gt;&lt;br&gt;1.3.2 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January 2005
&lt;br&gt;seem to support this: the index files were transferred only half as frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index file
&lt;br&gt;format also lead us to suspect that they have little serious use by the user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of the
&lt;br&gt;release, including all EST and GSS records, however the file contents are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=21031273&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.3 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems, depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel. Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;sixty-nine of the GSS flatfiles in Release 169.0. Consider gbgss251.seq :
&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; December 15 2008
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 169.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;87209 loci, &amp;nbsp; &amp;nbsp;64341123 bases, from &amp;nbsp; &amp;nbsp;87209 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the file
&lt;br&gt;has been renamed as &amp;quot;251&amp;quot; based on the number of files dumped from the other
&lt;br&gt;system. &amp;nbsp;We will work to resolve this discrepancy in future releases, but the
&lt;br&gt;priority is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;1.4.1 PROJECT linetype to be replaced by DBLINK (April 2009)
&lt;br&gt;&lt;br&gt;&amp;nbsp; The PROJECT linetype allows a sequence record to be linked to information
&lt;br&gt;about the sequencing project that generated the data which ultimately
&lt;br&gt;resulted in the record's submission to the International Nucleotide Sequence
&lt;br&gt;Database ( INSD; see &lt;a href=&quot;http://www.insdc.org&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.insdc.org&lt;/a&gt;&amp;nbsp;) .
&lt;br&gt;&lt;br&gt;&amp;nbsp; This complete bacterial GenBank record illustrates the use of the PROJECT
&lt;br&gt;line:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT 24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;&lt;br&gt;&amp;nbsp; When viewed on the web in NCBI's Entrez:Nucleotide, the record's project
&lt;br&gt;identifier (28471) links to an entry in the Genome Project Database (GPDB) :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;cmd=Retrieve&amp;dopt=Overview&amp;uid=28471&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;cmd=Retrieve&amp;dopt=Overview&amp;uid=28471&lt;/a&gt;&lt;br&gt;&lt;br&gt;where information about the sequencing center, the bacterium, and other
&lt;br&gt;GenBank records (eg, plasmids) associated with the sequencing project
&lt;br&gt;can be found.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Since the introduction of PROJECT, the scope of the &amp;quot;Genome&amp;quot; Project
&lt;br&gt;Database has expanded, to include projects that are not necessarily targetted
&lt;br&gt;to the sequencing of a complete genome.
&lt;br&gt;&lt;br&gt;&amp;nbsp; In addition, there can be other resources which underlie an INSD sequence
&lt;br&gt;record, such as the Trace Assembly Archive at the NCBI:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&amp;f=tree&amp;m=main&amp;s=tree&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&amp;f=tree&amp;m=main&amp;s=tree&lt;/a&gt;&lt;br&gt;&lt;br&gt;&amp;nbsp; Because of the expanded scope of the GPDB, and because we anticipate a need
&lt;br&gt;to link to more resources than just the GPDB, the PROJECT linetype is going to
&lt;br&gt;be replaced by a new linetype:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;DBLINK
&lt;br&gt;&lt;br&gt;&amp;nbsp; Modifications to linetypes can be disruptive, so the switch to DBLINK will occur
&lt;br&gt;in several stages. As of October 2008, links to the NCBI Trace Assembly Archive are
&lt;br&gt;displayed via a line of text in the COMMENT section of sequence records. Here is a
&lt;br&gt;mock-up, based on CP000964, to illustrate that initial change:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT 24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=21031273&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;Note: Use of the Trace Assembly Archive is still in its early stages, so only
&lt;br&gt;a few records are expected to have these links in the short term.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The new DBLINK linetype will then be introduced with GenBank Release 170.0 ,
&lt;br&gt;on or near February 15, 2009 .
&lt;br&gt;&lt;br&gt;&amp;nbsp; The Genome Project ID and the Trace Assembly Archive ID will be presented
&lt;br&gt;via DBLINK, and the existing PROJECT line will continue to be displayed:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT 24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=21031273&amp;i=3&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;PROJECT and DBLINK will co-exist for one GenBank release, until Release 171.0
&lt;br&gt;(April 15, 2009), at which point the PROJECT line will be removed. In its final
&lt;br&gt;state, our mock-up for CP000964 becomes:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT 24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=21031273&amp;i=4&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;&amp;nbsp; In summary: The PROJECT linetype will be replaced by DBLINK as of GenBank
&lt;br&gt;Release 171.0 in April 2009.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For those who process sequence data in NCBI's ASN.1 format: The
&lt;br&gt;underlying representation for (Genome) Project IDs will remain unchanged.
&lt;br&gt;There will be no changes to the ASN.1 User-object that is used to store them:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; user {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;GenomeProjectsDB&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; data {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;ProjectID&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; int 28471 } ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;ParentID&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; int 0 } } } ,
&lt;br&gt;&lt;br&gt;&amp;nbsp; However, to support linkages to other resources, such as the Trace
&lt;br&gt;Assembly Archive, a new &amp;quot;DBLink&amp;quot; User-object will be introduced:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; user {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;DBLink&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; data {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;Trace Assembly Archive&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ints { 123456 } } } }
&lt;br&gt;&lt;br&gt;&amp;nbsp; As new types of linkages are established, they will be added to
&lt;br&gt;the DBLink User-object, and displayed via the DBLINK linetype in
&lt;br&gt;the GenBank flatfile format. 
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is a possibility that the GenomeProjectsDB User-object
&lt;br&gt;might someday be incorporated into the new DBLink User-object.
&lt;br&gt;But at the moment, there are no firm plans to do so.
&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=21031273&amp;i=5&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-169.0-Now-Available-tp21031273p21031273.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20968781</id>
	<title>GenBank 169.0 Close-of-Data</title>
	<published>2008-12-11T16:00:19Z</published>
	<updated>2008-12-11T16:00:19Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 169.0 occurred on
&lt;br&gt;Thursday December 11 at approximately 1:30am EST.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc1211.aso, nc1211.flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Note: Release processing often does not begin until sometime during
&lt;br&gt;business hours on the close date. As a result, a number of sequence
&lt;br&gt;records processed *after* 1:30am are likely to be present in the
&lt;br&gt;GenBank 169.0 release files, even though they are &amp;quot;post-close&amp;quot; .
&lt;br&gt;&lt;br&gt;Similarly, the first GenBank Incremental Update that is generated
&lt;br&gt;after the close date is likely to contain a number of sequence
&lt;br&gt;records that are unchanged, compared to their appearance in the
&lt;br&gt;release files.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=20968781&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-169.0-Close-of-Data-tp20968781p20968781.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20968780</id>
	<title>GenBank : Increase in number of release files planned</title>
	<published>2008-12-11T15:57:09Z</published>
	<updated>2008-12-11T15:57:09Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Due to changes in the method by which the flatfile representations
&lt;br&gt;of GenBank sequence records are stored at NCBI, there will be
&lt;br&gt;about a 10% increase in the overall number of files in GenBank 
&lt;br&gt;Release 169.0, compared to Release 168.0 in October. This will be in 
&lt;br&gt;addition to the normal increase in files caused by the addition of 
&lt;br&gt;new records.
&lt;br&gt;&lt;br&gt;The sizes of the files in each division will also vary more than
&lt;br&gt;previously (although most files will still be in the 250MB to 350MB
&lt;br&gt;size range).
&lt;br&gt;&lt;br&gt;These changes are the costs of an improvement in overall flatfile
&lt;br&gt;processing. Among the benefits: large-scale changes which affect 
&lt;br&gt;many records (flatfile format changes; changes to widely
&lt;br&gt;used organism names) can now be handled much more quickly, and
&lt;br&gt;the preparation of certain GB release files now requires much
&lt;br&gt;less time.
&lt;br&gt;&lt;br&gt;These changes, and others that are underway, should ultimately 
&lt;br&gt;translate into a much faster release generation process in 2009.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=20968780&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-%3A-Increase-in-number-of-release-files-planned-tp20968780p20968780.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20257851</id>
	<title>GenBank Release 168.0 Now Available</title>
	<published>2008-10-30T16:57:15Z</published>
	<updated>2008-10-30T16:57:15Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 168.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 168.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 168.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 168.0 occured on 10/27/2008. Uncompressed,
&lt;br&gt;the
&lt;br&gt;Release 168.0 flatfiles require roughly 371 GB (sequence files only)
&lt;br&gt;or 396 GB (including the 'short directory', 'index' and the *.txt
&lt;br&gt;files). 
&lt;br&gt;The ASN.1 data require approximately 338 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 167 &amp;nbsp; &amp;nbsp; &amp;nbsp;Aug 2008 &amp;nbsp; 95033791652 &amp;nbsp;92748599
&lt;br&gt;&amp;nbsp; 168 &amp;nbsp; &amp;nbsp; &amp;nbsp;Oct 2008 &amp;nbsp; 97381682336 &amp;nbsp;96400790
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 167 &amp;nbsp; &amp;nbsp; &amp;nbsp;Aug 2008 &amp;nbsp;118593509342 &amp;nbsp;40214247
&lt;br&gt;&amp;nbsp; 168 &amp;nbsp; &amp;nbsp; &amp;nbsp;Oct 2008 &amp;nbsp;136085973423 &amp;nbsp;46108952
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 69 days between the close dates for GenBank Releases 167.0
&lt;br&gt;and
&lt;br&gt;168.0, the non-WGS/non-CON portion of GenBank grew by 2,347,890,684
&lt;br&gt;basepairs
&lt;br&gt;and by 3,652,191 sequence records. During that same period, 1,111,311
&lt;br&gt;records
&lt;br&gt;were updated. An average of about 69,036 non-WGS/non-CON records were
&lt;br&gt;added
&lt;br&gt;and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 167.0 and 168.0, the WGS component of GenBank grew by
&lt;br&gt;17,492,464,081 basepairs and by 5,894,705 records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The combined WGS/non-WGS single-release increase of 19.84 Gbp for
&lt;br&gt;Release 168.0 is the largest that GenBank has experienced, to date.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 168.0 and Upcoming Changes) have been appended
&lt;br&gt;below for your convenience.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;* &amp;nbsp;A number of changes have been implemented for the October 2008
&lt;br&gt;&amp;nbsp; &amp;nbsp;GenBank Release. Please see Section 1.3 for a complete list.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;GenBank 'index' files are now provided without any EST content, and
&lt;br&gt;&amp;nbsp; &amp;nbsp;without most GSS content. See Section 1.3.12 of the release notes for
&lt;br&gt;&amp;nbsp; &amp;nbsp;further details.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;NCBI is considering ceasing support for the index files, so we
&lt;br&gt;encourage
&lt;br&gt;&amp;nbsp; &amp;nbsp;affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;A new linetype ( DBLINK ) will be implemented as of the February 2009
&lt;br&gt;&amp;nbsp; &amp;nbsp;release. See Section 1.4.1 for information.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 168.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank
&lt;br&gt;release
&lt;br&gt;notes (gbrel.txt) whenever a release is being obtained. Check to make
&lt;br&gt;sure
&lt;br&gt;that the date and release number in the header of the release notes are
&lt;br&gt;current (eg: October 15 2008, 168.0). If they are not, interrupt the
&lt;br&gt;remaining transfers and then request assistance from the NCBI Service
&lt;br&gt;Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;168.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=20257851&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Michael Kimelman, Ilya Dondoshansky
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;1.3 Important Changes in Release 168.0
&lt;br&gt;&lt;br&gt;1.3.1 Organizational changes
&lt;br&gt;&lt;br&gt;&amp;nbsp; The total number of sequence data files increased by 60 with this
&lt;br&gt;release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the BCT division is now comprised of &amp;nbsp;32 files (+2)
&lt;br&gt;&amp;nbsp; - the CON division is now comprised of 103 files (+6)
&lt;br&gt;&amp;nbsp; - the EST division is now comprised of 802 files (+40)
&lt;br&gt;&amp;nbsp; - the GSS division is now comprised of 309 files (+3)
&lt;br&gt;&amp;nbsp; - the HTG division is now comprised of 122 files (+2)
&lt;br&gt;&amp;nbsp; - the PAT division is now comprised of &amp;nbsp;47 files (+1)
&lt;br&gt;&amp;nbsp; - the PLN division is now comprised of &amp;nbsp;32 files (+2)
&lt;br&gt;&amp;nbsp; - the STS division is now comprised of &amp;nbsp;18 files (+4)
&lt;br&gt;&lt;br&gt;&amp;nbsp; The total number of index files increased by 1 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the JOU index is now comprised of 6 files (+1)
&lt;br&gt;&lt;br&gt;1.3.2 Changes related to ncRNA features, /ncRNA_class, and /moltype
&lt;br&gt;&lt;br&gt;&amp;nbsp; The list of allowed values for the /ncRNA_class qualifier, which is
&lt;br&gt;mandatory for all ncRNA features, has been expanded to include:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /ncRNA_class=&amp;quot;ribozyme&amp;quot;
&lt;br&gt;&lt;br&gt;Non-coding RNAs which are not yet in the INSDC's controlled vocabulary:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://www.insdc.org/page.php?page=rna_vocab&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.insdc.org/page.php?page=rna_vocab&lt;/a&gt;&lt;br&gt;&lt;br&gt;previously required /ncRNA_class=&amp;quot;other&amp;quot; plus an accompanying /note
&lt;br&gt;qualifer to describes the nature of the ncRNA. This requirement will
&lt;br&gt;be changed, such that *either* a /product or a /note qualifier must
&lt;br&gt;accompany &amp;quot;other&amp;quot; ncRNAs features.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The list of allowed /mol_type qualifiers for the source feature
&lt;br&gt;currently includes:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mol_type=&amp;quot;snoRNA&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mol_type=&amp;quot;snRNA&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mol_type=&amp;quot;scRNA&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mol_type=&amp;quot;tmRNA&amp;quot;
&lt;br&gt;&lt;br&gt;All of these molecule types have been collapsed into a single value:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mol_type=&amp;quot;transcribed RNA&amp;quot;
&lt;br&gt;&lt;br&gt;Sequence records which represent one of these four types of molecules
&lt;br&gt;will thus have:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; an ncRNA feature with /ncRNA_class of &amp;quot;snoRNA&amp;quot;, &amp;quot;scRNA&amp;quot; or
&lt;br&gt;&amp;quot;snRNA&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; a source feature with /mol_type of &amp;quot;transcribed RNA&amp;quot;
&lt;br&gt;or
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; a tmRNA feature
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; a source feature with /mol_type of &amp;quot;transcribed RNA&amp;quot;
&lt;br&gt;&lt;br&gt;All of these changes take effect with this October 2008 release.
&lt;br&gt;&lt;br&gt;1.3.3 Merging the satellite and repeat_unit features into repeat_region
&lt;br&gt;&lt;br&gt;&amp;nbsp; Satellites, minisatellites and microsatellites are comprised of
&lt;br&gt;repetitive
&lt;br&gt;units of DNA, with a variety of lengths and repeat patterns. With the 
&lt;br&gt;addition of a new qualifier (/satellite), the satellite and repeat_unit
&lt;br&gt;features are now represented by the repeat_region feature.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /satellite=
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;identifier for satellite DNA marker; many tandem
&lt;br&gt;repeats 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(identical or related) of a short basic repeating
&lt;br&gt;unit; many 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;have a base composition or other property different
&lt;br&gt;from the 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;genome average that allows them to be separated from
&lt;br&gt;the bulk 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;genomic DNA;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Value format &amp;nbsp; &amp;nbsp;&amp;quot;&amp;lt;satellite_type&amp;gt;[:&amp;lt;class&amp;gt;][ &amp;lt;identifier&amp;gt;]&amp;quot; 
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;where satellite_type is one of the following 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;quot;satellite&amp;quot;, &amp;quot;microsatellite&amp;quot;, &amp;quot;minisatellite&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /satellite=&amp;quot;satellite: S1a&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/satellite=&amp;quot;satellite: alpha&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/satellite=&amp;quot;satellite: gamma III&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/satellite=&amp;quot;microsatellite: DC130&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; As of this October 2008 GenBank release, all satellite and repeat_unit
&lt;br&gt;features have &amp;nbsp;been transformed into repeat_region features with an 
&lt;br&gt;appropriate /satellite qualifier.
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;1.3.4 New /gene_synonym qualifier
&lt;br&gt;&lt;br&gt;&amp;nbsp; Gene symbols are presented via the /gene qualifier. When synonymous or
&lt;br&gt;&lt;br&gt;alternative gene symbols are available, they have often been presented
&lt;br&gt;via
&lt;br&gt;multiple /gene qualifiers.
&lt;br&gt;&lt;br&gt;&amp;nbsp; To distinguish what might be an approved or official gene symbol from
&lt;br&gt;its
&lt;br&gt;synonyms or alternatives, a new /gene_synonym qualifier has been
&lt;br&gt;introduced
&lt;br&gt;for GenBank Release 168.0 .
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /gene_synonym=
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;synonymous or alternative symbol for a gene
&lt;br&gt;corresponding to
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;a sequence region
&lt;br&gt;&amp;nbsp; &amp;nbsp;Value format &amp;nbsp; &amp;nbsp;&amp;quot;text&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Examples &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/gene=&amp;quot;CF&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/gene=&amp;quot;ABCC7&amp;quot;
&lt;br&gt;&lt;br&gt;1.3.5 New /mating_type qualifier
&lt;br&gt;&lt;br&gt;&amp;nbsp; Because the /sex qualifier has a free-text value format, is has been
&lt;br&gt;innapropriately utilized for certain organisms, such as bacteria, fungi,
&lt;br&gt;and some insects and worms. In such cases, a more appropriate term would
&lt;br&gt;be 'mating type'.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A new qualifier has been made available for non-sexual reproductive
&lt;br&gt;strategies as of October 2008:
&lt;br&gt;&lt;br&gt;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /mating_type=
&lt;br&gt;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;mating type of the organism from which the sequence was
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; obtained; mating type is used for prokaryotes, and for
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; eukaryotes that undergo meiosis without sexually
&lt;br&gt;dimorphic
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gametes
&lt;br&gt;Value format &amp;nbsp; &amp;nbsp;&amp;quot;text&amp;quot;
&lt;br&gt;Examples &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/mating_type=&amp;quot;MAT-1&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mating_type=&amp;quot;plus&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mating_type=&amp;quot;-&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mating_type=&amp;quot;odd&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mating_type=&amp;quot;even&amp;quot;
&lt;br&gt;Comment &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mating_type=&amp;quot;male&amp;quot; and /mating_type=&amp;quot;female&amp;quot; are
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; valid in the prokaryotes, but not in the eukaryotes;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; for more information, see the entry for /sex.
&lt;br&gt;&lt;br&gt;In light of the above, the definition for the /sex qualifier has been
&lt;br&gt;refined:
&lt;br&gt;&lt;br&gt;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /sex=
&lt;br&gt;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;sex of the organism from which the sequence was
&lt;br&gt;obtained;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sex is used for eukaryotic organisms that undergo
&lt;br&gt;meiosis
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; and have sexually dimorphic gametes
&lt;br&gt;Value format &amp;nbsp; &amp;nbsp;&amp;quot;text&amp;quot;
&lt;br&gt;Examples &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/sex=&amp;quot;female&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /sex=&amp;quot;male&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /sex=&amp;quot;hermaphrodite&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /sex=&amp;quot;unisexual&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /sex=&amp;quot;bisexual&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /sex=&amp;quot;asexual&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /sex=&amp;quot;monoecious&amp;quot; [or monecious]
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /sex=&amp;quot;dioecious&amp;quot; [or diecious]
&lt;br&gt;Comment &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /sex should be used (instead of /mating_type)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; in the Metazoa, Embryophyta, Rhodophyta &amp; Phaeophyceae;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mating_type should be used (instead of /sex)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; in the Bacteria, Archaea &amp; Fungi;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; neither /sex nor /mating_type should be used
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; in the viruses;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; outside of the taxa listed above, /mating_type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; should be used unless the value of the qualifier
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; is taken from the vocabulary given in the examples
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; above
&lt;br&gt;&lt;br&gt;&amp;nbsp; Records which inappropriately used the /sex qualifier have been
&lt;br&gt;updated,
&lt;br&gt;to utilize the new /mating_type qualifier.
&lt;br&gt;&lt;br&gt;1.3.6 Renaming of /specific_host as /host
&lt;br&gt;&lt;br&gt;&amp;nbsp; The /specific_host qualifier has been renamed as /host for Release
&lt;br&gt;168.0 .
&lt;br&gt;&amp;gt;From the Feature Table document:
&lt;br&gt;&lt;br&gt;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /host=
&lt;br&gt;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;natural (as opposed to laboratory) host to the organism
&lt;br&gt;from
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; which sequenced molecule was obtained
&lt;br&gt;Value format &amp;nbsp; &amp;nbsp;&amp;quot;text&amp;quot;
&lt;br&gt;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /host=&amp;quot;Homo sapiens&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /host=&amp;quot;Homo sapiens 12 year old girl&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /host=&amp;quot;Rhizobium NGR234&amp;quot;
&lt;br&gt;&lt;br&gt;In contrast:
&lt;br&gt;&lt;br&gt;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /lab_host=
&lt;br&gt;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;scientific name of the laboratory host used to propagate
&lt;br&gt;the
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; source organism from which the sequenced molecule was
&lt;br&gt;obtained
&lt;br&gt;Value format &amp;nbsp; &amp;nbsp;&amp;quot;text&amp;quot;
&lt;br&gt;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /lab_host=&amp;quot;Gallus gallus&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /lab_host=&amp;quot;Gallus gallus embryo&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /lab_host=&amp;quot;Escherichia coli strain DH5 alpha&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /lab_host=&amp;quot;Homo sapiens HeLa cells&amp;quot;
&lt;br&gt;Comment &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; the full binomial scientific name of the host organism
&lt;br&gt;should
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; be used when known; extra conditional information
&lt;br&gt;relating to
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; the host may also be included
&lt;br&gt;&lt;br&gt;1.3.7 New value for /organelle
&lt;br&gt;&lt;br&gt;&amp;nbsp; As of October 2008, the list of allowed values for /organelle has been
&lt;br&gt;expanded
&lt;br&gt;to include:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /organelle=&amp;quot;chromatophore&amp;quot;
&lt;br&gt;&lt;br&gt;1.3.8 Modification to value format for /frequency
&lt;br&gt;&lt;br&gt;&amp;nbsp; As of October 2008, the definition of /frequency has been expanded to
&lt;br&gt;accomodate both the fraction of a population carrying a variation
&lt;br&gt;expressed
&lt;br&gt;as a decimal value, and as the number of observed instances vs. the
&lt;br&gt;total
&lt;br&gt;number of sequenced isolates:
&lt;br&gt;&lt;br&gt;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /frequency=
&lt;br&gt;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;frequency of the occurrence of a feature
&lt;br&gt;Value format &amp;nbsp; &amp;nbsp;text representing the proportion of a population
&lt;br&gt;carrying the
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; feature expressed as a fraction
&lt;br&gt;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /frequency=&amp;quot;23/108&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /frequency=&amp;quot;1 in 12&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /frequency=&amp;quot;.85&amp;quot;
&lt;br&gt;&lt;br&gt;1.3.9 /cons_splice qualifier removed
&lt;br&gt;&lt;br&gt;&amp;nbsp; The /cons_splice qualifier has almost no usage within the sequence
&lt;br&gt;database. In addition, it does not account for the variation in splice
&lt;br&gt;signals that might be used by different classes of introns. So this
&lt;br&gt;qualfier has been removed from sequence records, and the Feature Table
&lt;br&gt;document, as of Release 168.0 .
&lt;br&gt;&lt;br&gt;1.3.10 /virion qualifier removed
&lt;br&gt;&lt;br&gt;&amp;nbsp; The intent of /virion was to indicate that a sequenced molecule
&lt;br&gt;originates from an encapsidated viral particle (as opposed to the
&lt;br&gt;proviral form of a virus, integrated into the host's genome). Viral
&lt;br&gt;sequences derived from a blood sample taken from an infected organism
&lt;br&gt;might be flagged with /virion, if it is believed that the sample
&lt;br&gt;contained viral particles.
&lt;br&gt;&lt;br&gt;&amp;nbsp; However, a review of the database revealed that /virion was not
&lt;br&gt;used consistently, and furthermore, submitters are often unable to
&lt;br&gt;conclusively state that a virus sequence derives from the encapsidated
&lt;br&gt;form. So the /virion qualifier has been removed from sequence records,
&lt;br&gt;and the Feature Table document, as of Release 168.0 .
&lt;br&gt;&lt;br&gt;1.3.11 Updated value format for /exception
&lt;br&gt;&lt;br&gt;&amp;nbsp; Only three values for the /exception qualifier have been approved
&lt;br&gt;for use by the INSDC : 
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;quot;rearrangement required for product&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;quot;RNA editing&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;quot;reasons given in citation&amp;quot;
&lt;br&gt;&lt;br&gt;However, the definition of /exception in the Feature Table document
&lt;br&gt;does not indicate that the contents of /exception are controlled.
&lt;br&gt;This oversight has been corrected, and the definition of the qualifier
&lt;br&gt;is now:
&lt;br&gt;&lt;br&gt;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /exception=
&lt;br&gt;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;indicates that the coding region cannot be translated
&lt;br&gt;using
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; standard biological rules
&lt;br&gt;Value format &amp;nbsp; &amp;nbsp;&amp;quot;RNA editing&amp;quot;, &amp;quot;reasons given in citation&amp;quot;,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;quot;rearrangement required for product&amp;quot;
&lt;br&gt;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /exception=&amp;quot;RNA editing&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /exception=&amp;quot;reasons given in citation&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /exception=&amp;quot;rearrangement required for product&amp;quot;
&lt;br&gt;Comment &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; only to be used to describe biological mechanisms such 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; as RNA editing; &amp;nbsp;where the exception cannot easily be
&lt;br&gt;described 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; a published citation must be referred to; protein
&lt;br&gt;translation of
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /exception CDS will be different from the according
&lt;br&gt;conceptual 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; translation; 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; - must not be used where transl_except would be
&lt;br&gt;adequate,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; e.g. in case of stop codon completion use:
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /transl_except=(pos:6883,aa:TERM)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /note=&amp;quot;TAA stop codon is completed by addition of 3' A
&lt;br&gt;residues to &amp;nbsp; 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; mRNA&amp;quot;.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; - must not be used for ribosomal slippage, instead use
&lt;br&gt;join operator, 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; e.g.: CDS &amp;nbsp; join(486..1784,1787..4810)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /note=&amp;quot;ribosomal slip on tttt sequence at
&lt;br&gt;1784..1787&amp;quot;
&lt;br&gt;&lt;br&gt;1.3.12 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which
&lt;br&gt;accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data
&lt;br&gt;product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January
&lt;br&gt;2005
&lt;br&gt;seem to support this: the index files were transferred only half as
&lt;br&gt;frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index
&lt;br&gt;file
&lt;br&gt;format also lead us to suspect that they have little serious use by the
&lt;br&gt;user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several
&lt;br&gt;different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that
&lt;br&gt;originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of
&lt;br&gt;the
&lt;br&gt;release, including all EST and GSS records, however the file contents
&lt;br&gt;are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index
&lt;br&gt;data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank
&lt;br&gt;releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank
&lt;br&gt;newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=20257851&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.13 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems,
&lt;br&gt;depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel.
&lt;br&gt;Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many
&lt;br&gt;GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its
&lt;br&gt;own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;sixty of the GSS flatfiles in Release 168.0. Consider gbgss250.seq :
&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; October 15 2008
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 168.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;87217 loci, &amp;nbsp; &amp;nbsp;64373883 bases, from &amp;nbsp; &amp;nbsp;87217 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the
&lt;br&gt;file
&lt;br&gt;has been renamed as &amp;quot;250&amp;quot; based on the number of files dumped from the
&lt;br&gt;other
&lt;br&gt;system. &amp;nbsp;We will work to resolve this discrepancy in future releases,
&lt;br&gt;but the
&lt;br&gt;priority is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;1.4.1 PROJECT linetype to be replaced by DBLINK (April 2009)
&lt;br&gt;&lt;br&gt;&amp;nbsp; The PROJECT linetype allows a sequence record to be linked to
&lt;br&gt;information
&lt;br&gt;about the sequencing project that generated the data which ultimately
&lt;br&gt;resulted in the record's submission to the International Nucleotide
&lt;br&gt;Sequence
&lt;br&gt;Database ( INSD; see &lt;a href=&quot;http://www.insdc.org&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.insdc.org&lt;/a&gt;&amp;nbsp;) .
&lt;br&gt;&lt;br&gt;&amp;nbsp; This complete bacterial GenBank record illustrates the use of the
&lt;br&gt;PROJECT
&lt;br&gt;line:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;&lt;br&gt;&amp;nbsp; When viewed on the web in NCBI's Entrez:Nucleotide, the record's
&lt;br&gt;project
&lt;br&gt;identifier (28471) links to an entry in the Genome Project Database
&lt;br&gt;(GPDB) :
&lt;br&gt;&lt;br&gt;&amp;nbsp;
&lt;br&gt;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;cmd=Retrieve&amp;dopt=&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;cmd=Retrieve&amp;dopt=&lt;/a&gt;&lt;br&gt;Overview&amp;uid=28471
&lt;br&gt;&lt;br&gt;where information about the sequencing center, the bacterium, and other
&lt;br&gt;GenBank records (eg, plasmids) associated with the sequencing project
&lt;br&gt;can be found.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Since the introduction of PROJECT, the scope of the &amp;quot;Genome&amp;quot; Project
&lt;br&gt;Database has expanded, to include projects that are not necessarily
&lt;br&gt;targetted
&lt;br&gt;to the sequencing of a complete genome.
&lt;br&gt;&lt;br&gt;&amp;nbsp; In addition, there can be other resources which underlie an INSD
&lt;br&gt;sequence
&lt;br&gt;record, such as the Trace Assembly Archive at the NCBI:
&lt;br&gt;&lt;br&gt;&amp;nbsp;
&lt;br&gt;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&amp;f=t&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&amp;f=t&lt;/a&gt;&lt;br&gt;ree&amp;m=main&amp;s=tree
&lt;br&gt;&lt;br&gt;&amp;nbsp; Because of the expanded scope of the GPDB, and because we anticipate a
&lt;br&gt;need
&lt;br&gt;to link to more resources than just the GPDB, the PROJECT linetype is
&lt;br&gt;going to
&lt;br&gt;be replaced by a new linetype:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;DBLINK
&lt;br&gt;&lt;br&gt;&amp;nbsp; Modifications to linetypes can be disruptive, so the switch to DBLINK
&lt;br&gt;will occur in several stages. Starting in October 2008, links to the
&lt;br&gt;NCBI Trace Assembly Archive will be supported via a line of text in the
&lt;br&gt;COMMENT section of sequence records. Here is a mock-up, based on
&lt;br&gt;CP000964,
&lt;br&gt;to illustrate this change:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=20257851&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;Note: Use of the Trace Assembly Archive is still in its early stages, so
&lt;br&gt;only
&lt;br&gt;a few records are expected to have these links in the short term.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The new DBLINK linetype will be introduced as of GenBank Release 170.0
&lt;br&gt;,
&lt;br&gt;on or near February 15, 2009 .
&lt;br&gt;&lt;br&gt;&amp;nbsp; The Genome Project ID and the Trace Assembly Archive ID will be
&lt;br&gt;presented
&lt;br&gt;via DBLINK, and the existing PROJECT line will continue to be displayed:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=20257851&amp;i=3&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;PROJECT and DBLINK will co-exist for one GenBank release, until Release
&lt;br&gt;171.0
&lt;br&gt;(April 15, 2009), at which point the PROJECT line will be removed. In
&lt;br&gt;its final
&lt;br&gt;state, our mock-up for CP000964 becomes:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=20257851&amp;i=4&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;&amp;nbsp; In summary: The PROJECT linetype will be replaced by DBLINK as of
&lt;br&gt;Release 171.0 in April 2009.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For those who process sequence data in NCBI's ASN.1 format: The
&lt;br&gt;underlying representation for (Genome) Project IDs will remain
&lt;br&gt;unchanged.
&lt;br&gt;There will be no changes to the ASN.1 User-object that is used to store
&lt;br&gt;them:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; user {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;GenomeProjectsDB&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; data {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;ProjectID&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; int 28471 } ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;ParentID&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; int 0 } } } ,
&lt;br&gt;&lt;br&gt;&amp;nbsp; However, to support linkages to other resources, such as the Trace
&lt;br&gt;Assembly Archive, a new &amp;quot;DBLink&amp;quot; User-object will be introduced:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; user {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;DBLink&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; data {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;Trace Assembly Archive&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ints { 123456 } } } }
&lt;br&gt;&lt;br&gt;&amp;nbsp; As new types of linkages are established, they will be added to
&lt;br&gt;the DBLink User-object, and displayed via the DBLINK linetype in
&lt;br&gt;the GenBank flatfile format. 
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is a possibility that the GenomeProjectsDB User-object
&lt;br&gt;might someday be incorporated into the new DBLink User-object.
&lt;br&gt;But at the moment, there are no firm plans to do so.
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=20257851&amp;i=5&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-168.0-Now-Available-tp20257851p20257851.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20256275</id>
	<title>GenBank 168.0 Close-of-Data</title>
	<published>2008-10-30T14:54:12Z</published>
	<updated>2008-10-30T14:54:12Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 168.0 occurred on
&lt;br&gt;Monday October 27 at approximately 1:30am EDT.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc1027.aso, nc1027.flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Note: Release processing often does not begin until sometime during
&lt;br&gt;business hours on the close date. As a result, a number of sequence
&lt;br&gt;records processed *after* 1:30am are likely to be present in the
&lt;br&gt;GenBank 168.0 release files, even though they are &amp;quot;post-close&amp;quot; .
&lt;br&gt;&lt;br&gt;Similarly, the first GenBank Incremental Update that is generated
&lt;br&gt;after the close date is likely to contain a number of sequence
&lt;br&gt;records that are unchanged, compared to their appearance in the
&lt;br&gt;release files.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=20256275&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-168.0-Close-of-Data-tp20256275p20256275.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20256235</id>
	<title>GenBank 168.0 Status Report</title>
	<published>2008-10-30T14:49:45Z</published>
	<updated>2008-10-30T14:49:45Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Dear GenBank Users,
&lt;br&gt;&lt;br&gt;October's GenBank Release 168.0 has been significantly delayed
&lt;br&gt;by a multitude of factors, most of which were related to the 
&lt;br&gt;Feature Table and other changes scheduled for implementation
&lt;br&gt;this month. An unusually large number of records were affected,
&lt;br&gt;which caused processing difficulties for one of our systems.
&lt;br&gt;&lt;br&gt;The difficulties were finally resolved as of Monday October 27,
&lt;br&gt;and we anticipate that the release files will be made available
&lt;br&gt;later this evening (Thursday, October 30).
&lt;br&gt;&lt;br&gt;Our apologies for both the two-week delay, and for the lack of 
&lt;br&gt;prior status reports.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=20256235&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-168.0-Status-Report-tp20256235p20256235.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19696383</id>
	<title>Re-Send : PROJECT linetype to be replaced by DBLINK</title>
	<published>2008-09-26T14:03:42Z</published>
	<updated>2008-09-26T14:03:42Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">[This listserv seems to impose a fairly short line-wrap for
&lt;br&gt;&amp;nbsp;text messages, which made my previous post difficult to
&lt;br&gt;&amp;nbsp;read. Hence this re-send, with shorter line lengths, where
&lt;br&gt;&amp;nbsp;possible.]
&lt;br&gt;&lt;br&gt;Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;The PROJECT linetype allows a sequence record to be linked to
&lt;br&gt;information about the sequencing project that generated the data
&lt;br&gt;which ultimately resulted in the record's submission to the
&lt;br&gt;International Nucleotide Sequence Database ( INSD; see
&lt;br&gt;&lt;a href=&quot;http://www.insdc.org&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.insdc.org&lt;/a&gt;&amp;nbsp;).
&lt;br&gt;&lt;br&gt;This complete bacterial GenBank record illustrates the use of
&lt;br&gt;the PROJECT line:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;&lt;br&gt;When viewed on the web in NCBI's Entrez:Nucleotide, the record's
&lt;br&gt;project identifier (28471) links to an entry in the Genome Project
&lt;br&gt;Database (GPDB) :
&lt;br&gt;&lt;br&gt;&amp;nbsp;
&lt;br&gt;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;cmd=Retrieve&amp;dopt=&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;cmd=Retrieve&amp;dopt=&lt;/a&gt;&lt;br&gt;Overview&amp;uid=28471
&lt;br&gt;&lt;br&gt;where information about the sequencing center, the bacterium, and
&lt;br&gt;other GenBank records (eg, plasmids) associated with the sequencing
&lt;br&gt;project can be found.
&lt;br&gt;&lt;br&gt;Since the introduction of PROJECT, the scope of the &amp;quot;Genome&amp;quot; Project
&lt;br&gt;Database has expanded, to include projects that are not necessarily
&lt;br&gt;targetted to the sequencing of a complete genome.
&lt;br&gt;&lt;br&gt;In addition, there can be other resources which underlie an INSD
&lt;br&gt;sequence record, such as the Trace Assembly Archive at the NCBI:
&lt;br&gt;&lt;br&gt;&amp;nbsp;
&lt;br&gt;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&amp;f=t&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&amp;f=t&lt;/a&gt;&lt;br&gt;ree&amp;m=main&amp;s=tree
&lt;br&gt;&lt;br&gt;Because of the expanded scope of the GPDB, and because we
&lt;br&gt;anticipate a need to link to more resources than just the GPDB,
&lt;br&gt;the PROJECT linetype is going to be replaced by a new linetype:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;DBLINK
&lt;br&gt;&lt;br&gt;Further details about this change, and its timetable, follow.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
&lt;br&gt;&lt;br&gt;Modifications to linetypes can be disruptive, so the switch to
&lt;br&gt;DBLINK will occur in several stages.
&lt;br&gt;&lt;br&gt;Starting in October 2008, links to the NCBI Trace Assembly Archive
&lt;br&gt;will be supported via a line of text in the COMMENT section of
&lt;br&gt;sequence records.
&lt;br&gt;&lt;br&gt;Here is a mock-up, based on CP000964, to illustrate this change:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19696383&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;Note: Use of the Trace Assembly Archive is still in its early
&lt;br&gt;stages, so only a few records are expected to have these links in
&lt;br&gt;the short term.
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
&lt;br&gt;&lt;br&gt;The new DBLINK linetype will be introduced as of GenBank Release
&lt;br&gt;170.0 (February 15, 2009) .
&lt;br&gt;&lt;br&gt;The Genome Project ID and the Trace Assembly Archive ID will be
&lt;br&gt;presented via DBLINK, and the existing PROJECT line will continue
&lt;br&gt;to be displayed:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19696383&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
&lt;br&gt;&lt;br&gt;PROJECT and DBLINK will co-exist for one GenBank release, until
&lt;br&gt;Release 171.0 (April 15, 2009), at which point the PROJECT line
&lt;br&gt;will be removed.
&lt;br&gt;&lt;br&gt;In its final state, our mock-up for CP000964 becomes:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19696383&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;In summary:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;PROJECT -&amp;gt; DBLINK
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;'GenomeProject' -&amp;gt; 'Project'
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Additional linkages, such as Trace Assembly, added to DBLINK
&lt;br&gt;&amp;nbsp; &amp;nbsp;as-needed
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;The PROJECT line will be removed as of April 15 2009.
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
&lt;br&gt;&lt;br&gt;For those who process sequence data in NCBI's ASN.1 format:
&lt;br&gt;&lt;br&gt;The underlying representation for (Genome) Project IDs will remain
&lt;br&gt;unchanged; there will be no changes to the ASN.1 User-object that 
&lt;br&gt;is used to store them:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; user {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;GenomeProjectsDB&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; data {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;ProjectID&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; int 28471 } ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;ParentID&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; int 0 } } } ,
&lt;br&gt;&lt;br&gt;However, to support linkages to other resources, like the Trace
&lt;br&gt;Assembly Archive, a new &amp;quot;DBLink&amp;quot; User-object will be introduced:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; user {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;DBLink&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; data {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;Trace Assembly Archive&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ints { 123456 } } } }
&lt;br&gt;&lt;br&gt;As new types of linkages are established, they will be added to
&lt;br&gt;the DBLink User-object, and displayed via the DBLINK linetype in
&lt;br&gt;the GenBank flatfile format. 
&lt;br&gt;&lt;br&gt;There is a possibility that the GenomeProjectsDB User-object
&lt;br&gt;might someday be incorporated into the new DBLink User-object.
&lt;br&gt;But at the moment, there are no firm plans to do so.
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19696383&amp;i=3&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Re-Send-%3A-PROJECT-linetype-to-be-replaced-by-DBLINK-tp19696383p19696383.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19696046</id>
	<title>PROJECT linetype to be replaced by DBLINK</title>
	<published>2008-09-26T13:41:11Z</published>
	<updated>2008-09-26T13:41:11Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;The PROJECT linetype allows a sequence record to be linked to
&lt;br&gt;information
&lt;br&gt;about the sequencing project that generated the data which ultimately
&lt;br&gt;resulted in the record's submission to the International Nucleotide
&lt;br&gt;Sequence
&lt;br&gt;Database ( INSD : &lt;a href=&quot;http://www.insdc.org&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.insdc.org&lt;/a&gt;&amp;nbsp;).
&lt;br&gt;&lt;br&gt;This complete bacterial GenBank record illustrates the use of PROJECT:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;&lt;br&gt;When viewed on the web in NCBI's Entrez:Nucleotide, the record's project
&lt;br&gt;identifier (28471) links to an entry in the Genome Project Database
&lt;br&gt;(GPDB) :
&lt;br&gt;&lt;br&gt;&amp;nbsp;
&lt;br&gt;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;cmd=Retrieve&amp;dopt=&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&amp;cmd=Retrieve&amp;dopt=&lt;/a&gt;&lt;br&gt;Overview&amp;uid=28471
&lt;br&gt;&lt;br&gt;where information about the sequencing center, the bacterium, and other
&lt;br&gt;GenBank records (eg, plasmids) associated with the sequencing project
&lt;br&gt;can
&lt;br&gt;be obtained.
&lt;br&gt;&lt;br&gt;Since the introduction of PROJECT, the scope of the &amp;quot;Genome&amp;quot; Project
&lt;br&gt;Database
&lt;br&gt;has expanded, to include projects that are not necessarily targeted to
&lt;br&gt;the sequencing of a complete genome.
&lt;br&gt;&lt;br&gt;In addition, there can be other resources which underlie an INSD
&lt;br&gt;sequence
&lt;br&gt;record, such as the Trace Assembly Archive at the NCBI:
&lt;br&gt;&lt;br&gt;&amp;nbsp;
&lt;br&gt;&lt;a href=&quot;http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&amp;f=t&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&amp;f=t&lt;/a&gt;&lt;br&gt;ree&amp;m=main&amp;s=tree
&lt;br&gt;&lt;br&gt;Because of the expanded scope of the GPDB, and because we anticipate a
&lt;br&gt;need
&lt;br&gt;to link to more resources than just the GPDB, the PROJECT linetype is
&lt;br&gt;going
&lt;br&gt;to be replaced by a new linetype:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;DBLINK
&lt;br&gt;&lt;br&gt;Further details about this change, and its timetable, follow.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
&lt;br&gt;=-=
&lt;br&gt;&lt;br&gt;Modifications to linetypes can be disruptive, so the switch to DBLINK
&lt;br&gt;will
&lt;br&gt;occur in several stages.
&lt;br&gt;&lt;br&gt;Starting in October 2008, links to the NCBI Trace Assembly Archive will
&lt;br&gt;be
&lt;br&gt;supported via a line of text in the COMMENT section of sequence records.
&lt;br&gt;&lt;br&gt;Here is a mock-up, based on CP000964, which illustrates this change:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19696046&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;Note: Use of the Trace Assembly Archive is still in its early stages, so
&lt;br&gt;only a few records are expected to have these links in the short term.
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
&lt;br&gt;=-=
&lt;br&gt;&lt;br&gt;The new DBLINK linetype will be introduced as of GenBank Release 170.0
&lt;br&gt;(February 15, 2009) .
&lt;br&gt;&lt;br&gt;The Genome Project ID and the Trace Assembly Archive ID will be
&lt;br&gt;presented
&lt;br&gt;via DBLINK, and the existing PROJECT line will continue to be displayed:
&lt;br&gt;&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;PROJECT &amp;nbsp; &amp;nbsp; GenomeProject:28471
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19696046&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
&lt;br&gt;=-=
&lt;br&gt;&lt;br&gt;PROJECT and DBLINK will co-exist for one GenBank release, until Release
&lt;br&gt;171.0
&lt;br&gt;(April 15, 2009), at which point the PROJECT line will be removed.
&lt;br&gt;&lt;br&gt;In its final state, our mock-up for CP000964 becomes:
&lt;br&gt;&lt;br&gt;LOCUS &amp;nbsp; &amp;nbsp; &amp;nbsp; CP000964 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5641239 bp &amp;nbsp; &amp;nbsp;DNA &amp;nbsp; &amp;nbsp; circular BCT
&lt;br&gt;24-SEP-2008
&lt;br&gt;DEFINITION &amp;nbsp;Klebsiella pneumoniae 342, complete genome.
&lt;br&gt;ACCESSION &amp;nbsp; CP000964
&lt;br&gt;VERSION &amp;nbsp; &amp;nbsp; CP000964.1 &amp;nbsp;GI:206564770
&lt;br&gt;DBLINK &amp;nbsp; &amp;nbsp; &amp;nbsp;Project:28471
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Trace Assembly Archive:123456
&lt;br&gt;....
&lt;br&gt;COMMENT &amp;nbsp; &amp;nbsp; The source for the DNA and/or cells is: &amp;nbsp;Professor Eric W.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Triplett, Chair, Department of Microbiology and Cell
&lt;br&gt;Science,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Institute of Food and Agricultural Sciences, University of
&lt;br&gt;Florida,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; P.O. Box 110700, Gainesville, FL 32611-0700, &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19696046&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;ewt@...&lt;/a&gt;.
&lt;br&gt;&lt;br&gt;In summary:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;PROJECT -&amp;gt; DBLINK
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;'GenomeProject' -&amp;gt; 'Project'
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Additional linkages, such as Trace Assembly, will be added to
&lt;br&gt;&amp;nbsp; &amp;nbsp;DBLINK as-needed
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;The PROJECT line will be removed as of April 15 2009.
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
&lt;br&gt;=-=
&lt;br&gt;&lt;br&gt;For those who process sequence data in NCBI's ASN.1 format:
&lt;br&gt;&lt;br&gt;The underlying representation for (Genome) Project IDs will remain
&lt;br&gt;unchanged; there will be no changes to the ASN.1 User-object that 
&lt;br&gt;is used to store them:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; user {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;GenomeProjectsDB&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; data {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;ProjectID&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; int 28471 } ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;ParentID&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; int 0 } } } ,
&lt;br&gt;&lt;br&gt;However, to support linkages to other resources, like the Trace
&lt;br&gt;Assembly Archive, a new &amp;quot;DBLink&amp;quot; User-object will be introduced:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; user {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; type
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;DBLink&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; data {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; label
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; str &amp;quot;Trace Assembly Archive&amp;quot; ,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; data
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ints { 123456 } } } }
&lt;br&gt;&lt;br&gt;As new types of linkages are established, they will be added to
&lt;br&gt;the DBLink User-object, and displayed via the DBLINK linetype in
&lt;br&gt;the GenBank flatfile format. 
&lt;br&gt;&lt;br&gt;There is a possibility that the GenomeProjectsDB User-object
&lt;br&gt;might someday be incorporated into the new DBLink User-object.
&lt;br&gt;But at the moment, there are no firm plans to do so.
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
&lt;br&gt;=-=
&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19696046&amp;i=3&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/PROJECT-linetype-to-be-replaced-by-DBLINK-tp19696046p19696046.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19458434</id>
	<title>GenBank Update Problem : 0912 : Incorrect files between 04:30am and 10:08am</title>
	<published>2008-09-12T08:35:21Z</published>
	<updated>2008-09-12T08:35:21Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Dear GenBank Users,
&lt;br&gt;&lt;br&gt;Processing for the GenBank Incremental Update (GIU) and for GenBank WGS
&lt;br&gt;data products was moved to new hardware on Thursday, September 11. 
&lt;br&gt;&lt;br&gt;Unfortunately, some configuration files that were used during previous
&lt;br&gt;tests of the new hardware were *not* updated with the files from the
&lt;br&gt;production system.
&lt;br&gt;&lt;br&gt;This led to the creation of unnecessarily large GIU files on September
&lt;br&gt;12
&lt;br&gt;(nc0912), containing records that date back to (at least) August 10th.
&lt;br&gt;&lt;br&gt;The affected 0912 GIU files had these timestamps and sizes:
&lt;br&gt;&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 30892117 Sep 12 04:44
&lt;br&gt;con_nc.0912.flat.gz
&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 533951491 Sep 12 04:31 nc0912.flat.gz
&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 317692100 Sep 12 04:37 nc0912.fsa_nt.gz
&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 34925179 Sep 12 04:10 nc0912.fsa.gz
&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 60205075 Sep 12 04:10 nc0912.gnp.gz
&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 92824768 Sep 12 04:14 nc0912.qscore.gz
&lt;br&gt;&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 33507308 Sep 12 04:44
&lt;br&gt;con_nc.0912.aso.gz
&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 423510078 Sep 12 04:16 nc0912.aso.gz
&lt;br&gt;&lt;br&gt;Note that the uncompressed size of nc0912.flat.gz is over 2.5 GB :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;compressed &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;uncompressed &amp;nbsp;ratio uncompressed_name
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 533951491 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;2652522424 &amp;nbsp;79.9% nc0912.flat
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
&lt;br&gt;&lt;br&gt;This problem was discovered on the morning of September 12. The
&lt;br&gt;incorrect
&lt;br&gt;GIU files were removed, a new GIU run was started, and this yielded
&lt;br&gt;corrected 0912 update products at about 10:00am :
&lt;br&gt;&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 55850950 Sep 12 10:08 nc0912.flat.gz
&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 36800819 Sep 12 10:08 nc0912.fsa_nt.gz
&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 1919440 Sep 12 10:08 nc0912.fsa.gz
&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 3169378 Sep 12 10:08 nc0912.gnp.gz
&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 2254192 Sep 12 10:08 nc0912.qscore.gz
&lt;br&gt;&lt;br&gt;-rw-r--r-- &amp;nbsp; 1 gbupdate gbproces 38159306 Sep 12 10:08 nc0912.aso.gz
&lt;br&gt;&lt;br&gt;Note that the uncompressed size of the corrected nc0912.flat.gz GIU
&lt;br&gt;is only a tenth of the size of the incorrect version:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;compressed &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;uncompressed &amp;nbsp;ratio uncompressed_name
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;55850950 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 252578593 &amp;nbsp;77.9% nc0912.flat
&lt;br&gt;&lt;br&gt;Note also that there are no CON-division GIU products for 0912 .
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
&lt;br&gt;&lt;br&gt;The invalid 0912 GIU products were available via FTP for approximately
&lt;br&gt;six hours. If you transferred them between 4:00am ET and 10:08am ET,
&lt;br&gt;please check their sizes to see if you need to obtain new, corrected,
&lt;br&gt;smaller versions of the files.
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
&lt;br&gt;&lt;br&gt;Fortunately, the effect on our WGS project files was very minimal :
&lt;br&gt;the data files for a single project, CABB, were unnecessarily refreshed.
&lt;br&gt;&lt;br&gt;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
&lt;br&gt;&lt;br&gt;Our apologies for the inconvenience that this error has caused.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19458434&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Update-Problem-%3A-0912-%3A-Incorrect-files-between-04%3A30am-and-10%3A08am-tp19458434p19458434.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19116947</id>
	<title>GenBank Release 167.0 Now Available</title>
	<published>2008-08-22T16:12:51Z</published>
	<updated>2008-08-22T16:12:51Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">[Apologies, the initial announcement was interpreted as an attachment.
&lt;br&gt;&amp;nbsp;Re-sending because the listserv might not handle it gracefully.]
&lt;br&gt;&lt;br&gt;Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 167.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 167.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 167.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 167.0 occured on 08/19/2008. Uncompressed, the
&lt;br&gt;Release 167.0 flatfiles require roughly 357 GB (sequence files only)
&lt;br&gt;or 381 GB (including the 'short directory', 'index' and the *.txt files). 
&lt;br&gt;The ASN.1 data require approximately 326 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 166 &amp;nbsp; &amp;nbsp; &amp;nbsp;Jun 2008 &amp;nbsp; 92008611867 &amp;nbsp;88554578
&lt;br&gt;&amp;nbsp; 167 &amp;nbsp; &amp;nbsp; &amp;nbsp;Aug 2008 &amp;nbsp; 95033791652 &amp;nbsp;92748599
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 166 &amp;nbsp; &amp;nbsp; &amp;nbsp;Jun 2008 &amp;nbsp;113639291344 &amp;nbsp;39163548
&lt;br&gt;&amp;nbsp; 167 &amp;nbsp; &amp;nbsp; &amp;nbsp;Aug 2008 &amp;nbsp;118593509342 &amp;nbsp;40214247
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 69 days between the close dates for GenBank Releases 166.0 and
&lt;br&gt;167.0, the non-WGS/non-CON portion of GenBank grew by 3,025,179,785 basepairs
&lt;br&gt;and by 4,194,021 sequence records. During that same period, 939,305 records
&lt;br&gt;were updated. An average of about 74,396 non-WGS/non-CON records were added
&lt;br&gt;and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 166.0 and 167.0, the WGS component of GenBank grew by
&lt;br&gt;4,954,217,998 basepairs and by 1,050,699.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 167.0 and Upcoming Changes) have been appended
&lt;br&gt;below for your convenience.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;* &amp;nbsp;Support for the single, comprehensive protein FASTA file which accompanies
&lt;br&gt;&amp;nbsp; &amp;nbsp;GenBank releases ceased as of the June 2008 release. See Section 1.3.3 of
&lt;br&gt;&amp;nbsp; &amp;nbsp;the release notes for details.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;A number of changes are expected for October's GenBank Release 168.0 .
&lt;br&gt;&amp;nbsp; &amp;nbsp;Please see Section 1.4 for a complete list.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;GenBank 'index' files are now provided without any EST content, and without
&lt;br&gt;&amp;nbsp; &amp;nbsp;most GSS content. See Section 1.3.4 of the release notes for further details.
&lt;br&gt;&amp;nbsp; &amp;nbsp;NCBI is considering ceasing support for the index files, so we encourage
&lt;br&gt;&amp;nbsp; &amp;nbsp;affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 167.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank release
&lt;br&gt;notes (gbrel.txt) whenever a release is being obtained. Check to make sure
&lt;br&gt;that the date and release number in the header of the release notes are
&lt;br&gt;current (eg: August 15 2008, 167.0). If they are not, interrupt the
&lt;br&gt;remaining transfers and then request assistance from the NCBI Service Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;167.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19116947&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Vladimir Alekseyev, Michael Kimelman
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;1.3 Important Changes in Release 167.0
&lt;br&gt;&lt;br&gt;1.3.1 Announcements for upcoming changes absent from Release 166.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; An oversight during GenBank Release 166.0 processing resulted in the
&lt;br&gt;exclusion of several announcements for changes that will be implemented
&lt;br&gt;as of GenBank Release 168.0 in October 2008 (see section 1.4, below).
&lt;br&gt;&lt;br&gt;&amp;nbsp; This means that only two months advance notice can be provided for those
&lt;br&gt;changes, rather than the customary four months. Our apologies for this
&lt;br&gt;oversight and any inconvenience that it may cause.
&lt;br&gt;&lt;br&gt;1.3.2 Organizational changes
&lt;br&gt;&lt;br&gt;&amp;nbsp; The total number of sequence data files increased by 58 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the BCT division is now comprised of &amp;nbsp;30 files (+1)
&lt;br&gt;&amp;nbsp; - the CON division is now comprised of &amp;nbsp;97 files (+6)
&lt;br&gt;&amp;nbsp; - the ENV division is now comprised of &amp;nbsp;10 files (+1)
&lt;br&gt;&amp;nbsp; - the EST division is now comprised of 762 files (+24)
&lt;br&gt;&amp;nbsp; - the GSS division is now comprised of 306 files (+16)
&lt;br&gt;&amp;nbsp; - the HTG division is now comprised of 120 files (+3)
&lt;br&gt;&amp;nbsp; - the INV division is now comprised of &amp;nbsp;13 files (+1)
&lt;br&gt;&amp;nbsp; - the PAT division is now comprised of &amp;nbsp;46 files (+4)
&lt;br&gt;&amp;nbsp; - the PLN division is now comprised of &amp;nbsp;30 files (+1)
&lt;br&gt;&amp;nbsp; - the VRL division is now comprised of &amp;nbsp;10 files (+1)
&lt;br&gt;&amp;nbsp; - the VRT division is now comprised of &amp;nbsp;16 files (+1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; The total number of index files increased by 5 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the AUT index is now comprised of &amp;nbsp;59 files &amp;nbsp;(+5)
&lt;br&gt;&lt;br&gt;1.3.3 Comprehensive protein FASTA file has been discontinued
&lt;br&gt;&lt;br&gt;&amp;nbsp; 'Divisional' protein FASTA files :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/ncbi-asn1/gbXXX.fsa_aa.gz
&lt;br&gt;&lt;br&gt;where 'XXX' represents an alphanumeric GenBank division code (such as
&lt;br&gt;pri10) were made available starting with GenBank Release 164.0 . Given
&lt;br&gt;their availability, support for the single, comprehensive protein FASTA
&lt;br&gt;file:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/genbank/relNNN.fsa_aa.gz
&lt;br&gt;&lt;br&gt;(where 'NNN' represents a three-digit GenBank release number) was
&lt;br&gt;discontinued as of GenBank Release 166.0 in June 2008. The size of the
&lt;br&gt;comprehensive file had exceeded 4GB, which was unmanageable for many users.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The final comprehensive protein FASTA file was: rel166.fsa_aa .
&lt;br&gt;&lt;br&gt;1.3.4 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January 2005
&lt;br&gt;seem to support this: the index files were transferred only half as frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index file
&lt;br&gt;format also lead us to suspect that they have little serious use by the user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of the
&lt;br&gt;release, including all EST and GSS records, however the file contents are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19116947&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.5 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems, depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel. Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;fifty-nine of the GSS flatfiles in Release 167.0. Consider gbgss248.seq :
&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;August 15 2008
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 167.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;87184 loci, &amp;nbsp; &amp;nbsp;64446495 bases, from &amp;nbsp; &amp;nbsp;87184 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the file
&lt;br&gt;has been renamed as &amp;quot;248&amp;quot; based on the number of files dumped from the other
&lt;br&gt;system. &amp;nbsp;We will work to resolve this discrepancy in future releases, but the
&lt;br&gt;priority is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;1.4.1 Changes related to ncRNA features, /ncRNA_class, and /moltype
&lt;br&gt;&lt;br&gt;&amp;nbsp; The list of allowed values for the /ncRNA_class qualifier, which is
&lt;br&gt;mandatory for all ncRNA features, will be expanded to include:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /ncRNA_class=&amp;quot;ribozyme&amp;quot;
&lt;br&gt;&lt;br&gt;Non-coding RNAs which are not yet in the INSDC's controlled vocabulary:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://www.insdc.org/page.php?page=rna_vocab&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.insdc.org/page.php?page=rna_vocab&lt;/a&gt;&lt;br&gt;&lt;br&gt;previously required /ncRNA_class=&amp;quot;other&amp;quot; plus an accompanying /note
&lt;br&gt;qualifer to describes the nature of the ncRNA. This requirement will
&lt;br&gt;be changed, such that *either* a /product or a /note qualifier must
&lt;br&gt;accompany &amp;quot;other&amp;quot; ncRNAs features.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The list of allowed /mol_type qualifiers for the source feature
&lt;br&gt;currently includes:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mol_type=&amp;quot;snoRNA&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mol_type=&amp;quot;snRNA&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mol_type=&amp;quot;scRNA&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mol_type=&amp;quot;tmRNA&amp;quot;
&lt;br&gt;&lt;br&gt;All of these molecule types will be collapsed into a single value:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mol_type=&amp;quot;transcribed RNA&amp;quot;
&lt;br&gt;&lt;br&gt;Sequence records which represent one of these four types of molecules
&lt;br&gt;will thus have:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; an ncRNA feature with /ncRNA_class of &amp;quot;snoRNA&amp;quot;, &amp;quot;scRNA&amp;quot; or &amp;quot;snRNA&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; a source feature with /mol_type of &amp;quot;transcribed RNA&amp;quot;
&lt;br&gt;or
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; a tmRNA feature
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; a source feature with /mol_type of &amp;quot;transcribed RNA&amp;quot;
&lt;br&gt;&lt;br&gt;All of these changes will take effect as of Release 169.0 in October 2008.
&lt;br&gt;&lt;br&gt;1.4.2 Merging the satellite and repeat_unit features into repeat_region
&lt;br&gt;&lt;br&gt;&amp;nbsp; Satellites, minisatellites and microsatellites are comprised of repetitive
&lt;br&gt;units of DNA, with a variety of lengths and repeat patterns. With the 
&lt;br&gt;addition of a new qualifier (/satellite), the satellite and repeat_unit
&lt;br&gt;features can be represented by the repeat_region feature.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /satellite=
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;identifier for satellite DNA marker; many tandem repeats 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(identical or related) of a short basic repeating unit; many 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;have a base composition or other property different from the 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;genome average that allows them to be separated from the bulk 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;genomic DNA;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Value format &amp;nbsp; &amp;nbsp;&amp;quot;&amp;lt;satellite_type&amp;gt;[:&amp;lt;class&amp;gt;][ &amp;lt;identifier&amp;gt;]&amp;quot; 
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;where satellite_type is one of the following 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;quot;satellite&amp;quot;, &amp;quot;microsatellite&amp;quot;, &amp;quot;minisatellite&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /satellite=&amp;quot;satellite: S1a&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/satellite=&amp;quot;satellite: alpha&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/satellite=&amp;quot;satellite: gamma III&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/satellite=&amp;quot;microsatellite: DC130&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; As of the October 2008 GenBank release, all satellite features will be 
&lt;br&gt;transformed into repeat_region features with /satellite qualifiers of
&lt;br&gt;type &amp;quot;satellite&amp;quot;, and all repeat_unit features will be transformed into
&lt;br&gt;repeat_region features.
&lt;br&gt;&amp;nbsp; 
&lt;br&gt;1.4.3 New /gene_synonym qualifier
&lt;br&gt;&lt;br&gt;&amp;nbsp; Gene symbols are presented via the /gene qualifier. When synonymous or 
&lt;br&gt;alternative gene symbols are available, they have often been presented via
&lt;br&gt;multiple /gene qualifiers.
&lt;br&gt;&lt;br&gt;&amp;nbsp; To distinguish what might be an approved or official gene symbol from its
&lt;br&gt;synonyms or alternatives, a new /gene_synonym qualifier will be introduced
&lt;br&gt;for GenBank Release 168.0 .
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /gene_synonym=
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;synonymous or alternative symbol for a gene corresponding to
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;a sequence region
&lt;br&gt;&amp;nbsp; &amp;nbsp;Value format &amp;nbsp; &amp;nbsp;&amp;quot;text&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Examples &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/gene=&amp;quot;CF&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/gene=&amp;quot;ABCC7&amp;quot;
&lt;br&gt;&lt;br&gt;1.4.4 New /mating_type qualifier
&lt;br&gt;&lt;br&gt;&amp;nbsp; Because the /sex qualifier has a free-text value format, is has been
&lt;br&gt;innapropriately utilized for certain organisms, such as bacteria, fungi,
&lt;br&gt;and some insects and worms. In such cases, a more appropriate term would
&lt;br&gt;be 'mating type'.
&lt;br&gt;&lt;br&gt;In October 2008, a new qualifier will be made available for non-sexual
&lt;br&gt;reproductive strategies:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /mating_type=
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;mating type of the organism from which the sequence was obtained
&lt;br&gt;&amp;nbsp; &amp;nbsp;Value format &amp;nbsp; &amp;nbsp;&amp;quot;text&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /mating_type=&amp;quot;Mating type A&amp;quot;
&lt;br&gt;&lt;br&gt;1.4.5 Renaming of /specific_host as /host
&lt;br&gt;&lt;br&gt;&amp;nbsp; The definition of /specific_host in the Feature Table document is as follows:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /specific_host=
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;natural host from which the sequence was obtained
&lt;br&gt;&amp;nbsp; &amp;nbsp;Value format &amp;nbsp; &amp;nbsp;&amp;quot;text&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /specific_host=&amp;quot;Rhizobium NGR234&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; The usage of /specific_host, and particularly the distinction between it and
&lt;br&gt;/lab_host, is not made clear with this definition. So the qualifier is going
&lt;br&gt;to be renamed and redefined, starting with Release 168.0 in October 2008 :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /host=
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;Natural (as opposed to laboratory) host to the organism from
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;which the sequenced molecule was obtained
&lt;br&gt;&amp;nbsp; &amp;nbsp;Value format &amp;nbsp; &amp;nbsp;&amp;quot;text&amp;quot;
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /host=&amp;quot;Homo sapiens&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/host=&amp;quot;Homo sapiens 12 year old girl&amp;quot;
&lt;br&gt;&lt;br&gt;1.4.6 New value for /organelle
&lt;br&gt;&lt;br&gt;&amp;nbsp; In October 2008, the list of allowed values for /organelle will be expanded
&lt;br&gt;to include:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /organelle=&amp;quot;chromatophore&amp;quot;
&lt;br&gt;&lt;br&gt;1.4.7 Modification to value format for /frequency
&lt;br&gt;&lt;br&gt;&amp;nbsp; The current definition of /frequency is :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Qualifier &amp;nbsp; &amp;nbsp; &amp;nbsp; /frequency=
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Definition &amp;nbsp; &amp;nbsp; &amp;nbsp;frequency of the occurrence of a feature
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Value format &amp;nbsp; &amp;nbsp;text representing the fraction of population carrying the
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;variation expressed as a decimal fraction
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /frequency=&amp;quot;.85&amp;quot;
&lt;br&gt;&lt;br&gt;Although this format is appropriate and useful in some contexts (for 
&lt;br&gt;example, within a global population of individuals), it does not convey
&lt;br&gt;the number of sequences that might have been included in a variation
&lt;br&gt;study. As of Release 168.0, the format of /frequency will be expanded:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Value format &amp;nbsp; &amp;nbsp;text representing the fraction of population carrying the
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;variation expressed as a decimal fraction, or the number
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;observed instances vs the total number of sequenced isolates
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;Example &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; /frequency=&amp;quot;.85&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/frequency=&amp;quot;23/108&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/frequency=&amp;quot;1 in 12&amp;quot;
&lt;br&gt;&lt;br&gt;1.4.8 /cons_splice qualifier to be removed
&lt;br&gt;&lt;br&gt;&amp;nbsp; The /cons_splice qualifier has almost no usage within the sequence
&lt;br&gt;database. In addition, it does not account for the variation in splice
&lt;br&gt;signals that might be used by different classes of introns. So this
&lt;br&gt;qualfier will be removed from sequence records, and the Feature Table
&lt;br&gt;document, as of Release 168.0 in October 2008.
&lt;br&gt;&lt;br&gt;1.4.9 /virion qualifier to be removed
&lt;br&gt;&lt;br&gt;&amp;nbsp; The intent of /virion is to indicate that a sequenced molecule
&lt;br&gt;is from an encapsidated viral particle (as opposed to the proviral
&lt;br&gt;form of a virus, integrated into the host's genome). Viral sequences
&lt;br&gt;derived from a blood sample taken from an infected organism might
&lt;br&gt;typically be flagged with /virion, if it is believed that the sample
&lt;br&gt;contained viral particles.
&lt;br&gt;&lt;br&gt;&amp;nbsp; However, a review of the database reveals that this qualifier is
&lt;br&gt;not used consistently, and furthermore, submitters often are unable
&lt;br&gt;to conclusively state that a virus sequence derives from the 
&lt;br&gt;encapsidated form. So the /virion qualifier will be removed from 
&lt;br&gt;sequence records, and the Feature Table document, as of Release 168.0
&lt;br&gt;in October 2008.
&lt;br&gt;&lt;br&gt;1.4.10 Updated value format for /exception
&lt;br&gt;&lt;br&gt;&amp;nbsp; Only three values for the /exception qualifier have been approved
&lt;br&gt;for use by the INSDC : 
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;quot;rearrangement required for product&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;quot;RNA editing&amp;quot;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;quot;reasons given in citation&amp;quot;
&lt;br&gt;&lt;br&gt;However, the definition of /exception in the Feature Table document
&lt;br&gt;does not indicate that the contents of /exception are controlled.
&lt;br&gt;This oversight will be corrected in October 2008.
&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19116947&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-167.0-Now-Available-tp19116947p19116947.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19116836</id>
	<title>GenBank Release 167.0 Now Available</title>
	<published>2008-08-22T16:00:06Z</published>
	<updated>2008-08-22T16:00:06Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html"> &lt;br /&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19116836&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;div class=&quot;small&quot;&gt;&lt;br/&gt;&lt;img src=&quot;http://old.nabble.com/images/icon_attachment.gif&quot; &gt; &lt;strong&gt;attachment0&lt;/strong&gt; (15K) &lt;a href=&quot;http://old.nabble.com/attachment/19116836/0/attachment0&quot; target=&quot;_top&quot;&gt;Download Attachment&lt;/a&gt;&lt;/div&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-167.0-Now-Available-tp19116836p19116836.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19094830</id>
	<title>GenBank 167.0 Close-of-Data</title>
	<published>2008-08-21T11:58:21Z</published>
	<updated>2008-08-21T11:58:21Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 167.0 occurred on
&lt;br&gt;Tuesday August 19 at approximately 1:30am ET.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc0819.aso, nc0819,flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Note: Release processing often does not begin until sometime during
&lt;br&gt;business hours on the close date. As a result, a number of sequence
&lt;br&gt;records processed *after* 1:30am are likely to be present in the
&lt;br&gt;GenBank 167.0 release files, even though they are &amp;quot;post-close&amp;quot; .
&lt;br&gt;&lt;br&gt;Similarly, the first GenBank Incremental Update that is generated
&lt;br&gt;after the close date is likely to contain a number of sequence
&lt;br&gt;records that are unchanged, compared to their appearance in the
&lt;br&gt;release files.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=19094830&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-167.0-Close-of-Data-tp19094830p19094830.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-17841838</id>
	<title>Re: GenBank Release 166.0 Now Available</title>
	<published>2008-06-14T08:19:14Z</published>
	<updated>2008-06-14T08:19:14Z</updated>
	<author>
		<name>Francis Ouellette-2</name>
	</author>
	<content type="html">&amp;gt; From: Mark Cavanaugh &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=17841838&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;cavanaug@...&lt;/a&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;nbsp;GenBank has surpassed the 200-billion basepair threshold, with a total
&lt;br&gt;&amp;gt; of 205,647,903,211 bases as of this June 2008 release.
&lt;br&gt;&lt;br&gt;Wow 200 billion nucleotides! I remember when we celebrated 1 Billion nucleotides!
&lt;br&gt;&lt;br&gt;Good work all the folks at the NCBI - good way to celebrate 25 years!
&lt;br&gt;&lt;br&gt;Cheers,
&lt;br&gt;&lt;br&gt;f.
&lt;br&gt;&lt;br&gt;--
&lt;br&gt;B.F. Francis Ouellette &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://oicr.on.ca/research/ouellette/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://oicr.on.ca/research/ouellette/&lt;/a&gt;&lt;br&gt;We are hiring a &amp;quot;Manager, Web Development&amp;quot;: &lt;a href=&quot;http://tinyurl.com/4dgyuk&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://tinyurl.com/4dgyuk&lt;/a&gt;&lt;br&gt;NextGen sequencing Bioinformatics Workshop: &lt;a href=&quot;http://tinyurl.com/29ulqj&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://tinyurl.com/29ulqj&lt;/a&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=17841838&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-166.0-Now-Available-tp17835257p17841838.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-17835257</id>
	<title>GenBank Release 166.0 Now Available</title>
	<published>2008-06-13T20:03:16Z</published>
	<updated>2008-06-13T20:03:16Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">[Apologies for the double-post : The contents of my original post 
&lt;br&gt;&amp;nbsp;about Release 166.0 availability will be interpreted as an
&lt;br&gt;&amp;nbsp;attachment under some circumstances...]
&lt;br&gt;&lt;br&gt;&lt;br&gt;Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 166.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 166.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 166.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 166.0 occured on 06/11/2008. Uncompressed, the
&lt;br&gt;Release 166.0 flatfiles require roughly 343 GB (sequence files only)
&lt;br&gt;or 366 GB (including the 'short directory', 'index' and the *.txt files). 
&lt;br&gt;The ASN.1 data require approximately 314 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 165 &amp;nbsp; &amp;nbsp; &amp;nbsp;Apr 2008 &amp;nbsp; 89172350468 &amp;nbsp;85500730
&lt;br&gt;&amp;nbsp; 166 &amp;nbsp; &amp;nbsp; &amp;nbsp;Jun 2008 &amp;nbsp; 92008611867 &amp;nbsp;88554578
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 165 &amp;nbsp; &amp;nbsp; &amp;nbsp;Apr 2008 &amp;nbsp;110500961400 &amp;nbsp;26931049
&lt;br&gt;&amp;nbsp; 166 &amp;nbsp; &amp;nbsp; &amp;nbsp;Jun 2008 &amp;nbsp;113639291344 &amp;nbsp;39163548
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 57 days between the close dates for GenBank Releases 165.0 and
&lt;br&gt;166.0, the non-WGS/non-CON portion of GenBank grew by 2,836,261,399 basepairs
&lt;br&gt;and by 3,053,848 sequence records. During that same period, 376,881 records
&lt;br&gt;were updated. An average of about 60,188 non-WGS/non-CON records were added
&lt;br&gt;and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 165.0 and 166.0, the WGS component of GenBank grew by
&lt;br&gt;3,138,329,944 basepairs and by 12,232,499 records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank has surpassed the 200-billion basepair threshold, with a total
&lt;br&gt;of 205,647,903,211 bases as of this June 2008 release.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 166.0 and Upcoming Changes) have been appended
&lt;br&gt;below.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;* &amp;nbsp;A new GenBank division appears with Release 166.0 : the Transcriptome Shotgun
&lt;br&gt;&amp;nbsp; &amp;nbsp;Assembly, or TSA, division. Please see Section 1.3.2 of the release notes for
&lt;br&gt;&amp;nbsp; &amp;nbsp;more information about TSA and the records that it contains.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;Support for the single, comprehensive protein FASTA file which accompanies
&lt;br&gt;&amp;nbsp; &amp;nbsp;GenBank releases ceases with this June 2008 release. See Section 1.4.1 of
&lt;br&gt;&amp;nbsp; &amp;nbsp;the release notes for details.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;GenBank 'index' files are now provided without any EST content, and without
&lt;br&gt;&amp;nbsp; &amp;nbsp;most GSS content. See Section 1.3.3 of the release notes for further details.
&lt;br&gt;&amp;nbsp; &amp;nbsp;NCBI is considering ceasing support for the index files, so we encourage
&lt;br&gt;&amp;nbsp; &amp;nbsp;affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 166.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank release
&lt;br&gt;notes (gbrel.txt) whenever a release is being obtained. Check to make sure
&lt;br&gt;that the date and release number in the header of the release notes are
&lt;br&gt;current (eg: June 15 2008, 166.0). If they are not, interrupt the
&lt;br&gt;remaining transfers and then request assistance from the NCBI Service Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;166.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=17835257&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Vladimir Alekseyev, Michael Kimelman
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;1.3 Important Changes in Release 166.0
&lt;br&gt;&lt;br&gt;1.3.1 Organizational changes
&lt;br&gt;&lt;br&gt;&amp;nbsp; The total number of sequence data files increased by 47 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the TSA division is now comprised of &amp;nbsp; 1 file &amp;nbsp;(+1)
&lt;br&gt;&amp;nbsp; - the BCT division is now comprised of &amp;nbsp;29 files (+1)
&lt;br&gt;&amp;nbsp; - the CON division is now comprised of &amp;nbsp;91 files (+6)
&lt;br&gt;&amp;nbsp; - the EST division is now comprised of 738 files (+25)
&lt;br&gt;&amp;nbsp; - the ENV division is now comprised of &amp;nbsp; 9 files (+1)
&lt;br&gt;&amp;nbsp; - the GSS division is now comprised of 290 files (+5)
&lt;br&gt;&amp;nbsp; - the HTG division is now comprised of 117 files (+3)
&lt;br&gt;&amp;nbsp; - the PAT division is now comprised of &amp;nbsp;42 files (+4)
&lt;br&gt;&amp;nbsp; - the PRI division is now comprised of &amp;nbsp;36 files (+1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; The total number of index files increased by 4 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the AUT index is now comprised of &amp;nbsp;54 files &amp;nbsp;(+4)
&lt;br&gt;&lt;br&gt;1.3.2 New file for the Transcriptome Shotgun Assembly (TSA) division
&lt;br&gt;&lt;br&gt;&amp;nbsp; As announced with the previous release, Release 166.0 contains a new
&lt;br&gt;divisional file (gbtsa.seq) for Transcriptome Shotgun Assembly (TSA)
&lt;br&gt;mRNA sequences.
&lt;br&gt;&lt;br&gt;&amp;nbsp; TSA sequences are shotgun assemblies of primary sequences deposited in
&lt;br&gt;dbEST, the Trace Archive (TA) or the Short-Read Archive (SRA). &amp;nbsp;Keywords
&lt;br&gt;&amp;quot;TSA&amp;quot; and &amp;quot;Transcriptome Shotgun Assembly&amp;quot; are present on all TSA
&lt;br&gt;records, in addition to a division code value of &amp;quot;TSA&amp;quot; on the LOCUS line.
&lt;br&gt;&lt;br&gt;&amp;nbsp; No format changes (new or changed line types, features, or qualifiers)
&lt;br&gt;are anticipated for this new class of GenBank record.
&lt;br&gt;&lt;br&gt;&amp;nbsp; However, note that TSA records make use of the same PRIMARY block that
&lt;br&gt;is utilized for Third-Party Annotation (TPA) records. The PRIMARY block
&lt;br&gt;contains references to the underlying reads/transcripts that were assembled
&lt;br&gt;to construct a TSA record.
&lt;br&gt;&lt;br&gt;&amp;nbsp; It might be helpful to review the content of TSA record EZ000001 and its
&lt;br&gt;use of the PRIMARY block:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&amp;id=189498984&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&amp;id=189498984&lt;/a&gt;&lt;br&gt;&lt;br&gt;Requirements for the new Transcriptome Shotgun Assembly division include:
&lt;br&gt;&lt;br&gt;1. Submission of primary transcipt sequence data to dbEST, the Trace Archive,
&lt;br&gt;&amp;nbsp; &amp;nbsp;or the Short-Read archive (SRA). &amp;nbsp;
&lt;br&gt;&lt;br&gt;2. Registration of an associated transcriptome project with the International
&lt;br&gt;&amp;nbsp; &amp;nbsp;Nucleotide Sequence Database Collaboration (INSDC).
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;For information about submitting projects via NCBI/GenBank, see:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/genomes/mpfsubmission.cgi&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/genomes/mpfsubmission.cgi&lt;/a&gt;&lt;br&gt;&lt;br&gt;3. Submission of TSA sequence records to GenBank, including an assembly file
&lt;br&gt;&amp;nbsp; &amp;nbsp;(.ace format)
&lt;br&gt;&lt;br&gt;Note that TSA records and the primary transcript sequences that they are
&lt;br&gt;built from must be provided by the same submitter or collaborative group.
&lt;br&gt;&lt;br&gt;1.3.3 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January 2005
&lt;br&gt;seem to support this: the index files were transferred only half as frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index file
&lt;br&gt;format also lead us to suspect that they have little serious use by the user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of the
&lt;br&gt;release, including all EST and GSS records, however the file contents are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=17835257&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.4 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems, depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel. Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;fifty-nine of the GSS flatfiles in Release 166.0. Consider gbgss232.seq :
&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; June 15 2008
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 166.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;87182 loci, &amp;nbsp; &amp;nbsp;64465152 bases, from &amp;nbsp; &amp;nbsp;87182 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the file
&lt;br&gt;has been renamed as &amp;quot;232&amp;quot; based on the number of files dumped from the other
&lt;br&gt;system. &amp;nbsp;We will work to resolve this discrepancy in future releases, but the
&lt;br&gt;priority is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;1.4.1 Comprehensive protein FASTA file to be discontinued
&lt;br&gt;&lt;br&gt;&amp;nbsp; Given the availability of divisional protein FASTA files as of GenBank
&lt;br&gt;Release 164.0, support for the single, large, comprehensive protein FASTA
&lt;br&gt;file:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/genbank/relNNN.fsa_aa.gz
&lt;br&gt;&lt;br&gt;(where 'NNN' represents a three-digit GenBank release number) will be
&lt;br&gt;discontinued after GenBank Release 166.0 in June of 2008. The size
&lt;br&gt;of this file has grown to exceed 4GB, which is unmanageable for many users.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Users are advised to make plans to utilize the new divisional files by
&lt;br&gt;August of 2008. The divisional protein FASTA files are located at:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/ncbi-asn1/gbXXX.fsa_aa.gz
&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=17835257&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-166.0-Now-Available-tp17835257p17835257.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-17835180</id>
	<title>GenBank Release 166.0 Now Available</title>
	<published>2008-06-13T19:51:40Z</published>
	<updated>2008-06-13T19:51:40Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html"> &lt;br /&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=17835180&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;div class=&quot;small&quot;&gt;&lt;br/&gt;&lt;img src=&quot;http://old.nabble.com/images/icon_attachment.gif&quot; &gt; &lt;strong&gt;attachment0&lt;/strong&gt; (9K) &lt;a href=&quot;http://old.nabble.com/attachment/17835180/0/attachment0&quot; target=&quot;_top&quot;&gt;Download Attachment&lt;/a&gt;&lt;/div&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-166.0-Now-Available-tp17835180p17835180.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-17803367</id>
	<title>GenBank 166.0 Close-of-Data</title>
	<published>2008-06-12T08:51:59Z</published>
	<updated>2008-06-12T08:51:59Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 166.0 occurred on
&lt;br&gt;Wednesday June 11 at approximately 1:30am ET.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc0611.aso, nc0611,flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Note: Release processing often does not begin until sometime during
&lt;br&gt;business hours on the close date. As a result, a number of sequence
&lt;br&gt;records processed *after* 1:30am are likely to be present in the
&lt;br&gt;GenBank 166.0 release files, even though they are &amp;quot;post-close&amp;quot; .
&lt;br&gt;&lt;br&gt;Similarly, the first GenBank Incremental Update that is generated
&lt;br&gt;after the close date is likely to contain a number of sequence
&lt;br&gt;records that are unchanged, compared to their appearance in the
&lt;br&gt;release files.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=17803367&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-166.0-Close-of-Data-tp17803367p17803367.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-16775864</id>
	<title>GenBank Release 165.0 Now Aailable</title>
	<published>2008-04-18T13:52:46Z</published>
	<updated>2008-04-18T13:52:46Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 165.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 165.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 165.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 165.0 occured on 04/15/2008. Uncompressed, the
&lt;br&gt;Release 165.0 flatfiles require roughly 332 GB (sequence files only)
&lt;br&gt;or 353 GB (including the 'short directory', 'index' and the *.txt files). 
&lt;br&gt;The ASN.1 data require approximately 305 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 164 &amp;nbsp; &amp;nbsp; &amp;nbsp;Feb 2008 &amp;nbsp; 85759586764 &amp;nbsp;82853685
&lt;br&gt;&amp;nbsp; 165 &amp;nbsp; &amp;nbsp; &amp;nbsp;Apr 2008 &amp;nbsp; 89172350468 &amp;nbsp;85500730
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 164 &amp;nbsp; &amp;nbsp; &amp;nbsp;Feb 2008 &amp;nbsp;108635736141 &amp;nbsp;27439206
&lt;br&gt;&amp;nbsp; 165 &amp;nbsp; &amp;nbsp; &amp;nbsp;Apr 2008 &amp;nbsp;110500961400 &amp;nbsp;26931049
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 62 days between the close dates for GenBank Releases 164.0 and
&lt;br&gt;165.0, the non-WGS/non-CON portion of GenBank grew by 3,412,763,704 basepairs
&lt;br&gt;and by 2,647,045 sequence records. During that same period, 1,590,201 records
&lt;br&gt;were updated. An average of about 68,340 non-WGS/non-CON records were added
&lt;br&gt;and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 164.0 and 165.0, the WGS component of GenBank grew by
&lt;br&gt;1,865,225,259 basepairs. The number of records decreased by 508,157 due to
&lt;br&gt;the re-assembly of WGS project AAKN, into far fewer (but larger) contigs.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 165.0 and Upcoming Changes) have been appended
&lt;br&gt;below.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;* &amp;nbsp;A new GenBank division has become legal with this April's Release 165.0 :
&lt;br&gt;&amp;nbsp; &amp;nbsp;the Transcriptome Shotgun Assembly, or TSA, division. Please see
&lt;br&gt;&amp;nbsp; &amp;nbsp;Section 1.3.2 of the release notes for more information about TSA and the
&lt;br&gt;&amp;nbsp; &amp;nbsp;records that it will contain.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;Support for the single, comprehensive protein FASTA file which accompanies
&lt;br&gt;&amp;nbsp; &amp;nbsp;GenBank releases will be ceased as of the June 2008 release. See Section
&lt;br&gt;&amp;nbsp; &amp;nbsp;1.4.1 of the release notes for details.
&lt;br&gt;&lt;br&gt;* &amp;nbsp;GenBank 'index' files are now provided without any EST content, and without
&lt;br&gt;&amp;nbsp; &amp;nbsp;most GSS content. See Section 1.3.3 of the release notes for further details.
&lt;br&gt;&amp;nbsp; &amp;nbsp;NCBI is considering ceasing support for the index files, so we encourage
&lt;br&gt;&amp;nbsp; &amp;nbsp;affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 165.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank release
&lt;br&gt;notes (gbrel.txt) whenever a release is being obtained. Check to make sure
&lt;br&gt;that the date and release number in the header of the release notes are
&lt;br&gt;current (eg: April 15 2008, 165.0). If they are not, interrupt the
&lt;br&gt;remaining transfers and then request assistance from the NCBI Service Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;165.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=16775864&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Vladimir Alekseyev, Michael Kimelman
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;1.3 Important Changes in Release 165.0
&lt;br&gt;&lt;br&gt;1.3.1 Organizational changes
&lt;br&gt;&lt;br&gt;&amp;nbsp; The total number of sequence data files increased by 41 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the BCT division is now comprised of &amp;nbsp;28 files (+2)
&lt;br&gt;&amp;nbsp; - the EST division is now comprised of 713 files (+19)
&lt;br&gt;&amp;nbsp; - the GSS division is now comprised of 285 files (+8)
&lt;br&gt;&amp;nbsp; - the HTG division is now comprised of 114 files (+7)
&lt;br&gt;&amp;nbsp; - the PAT division is now comprised of &amp;nbsp;38 files (+3)
&lt;br&gt;&amp;nbsp; - the PLN division is now comprised of &amp;nbsp;29 files (+1)
&lt;br&gt;&amp;nbsp; - the VRL division is now comprised of &amp;nbsp; 9 files (+1)
&lt;br&gt;&lt;br&gt;1.3.2 New Transcriptome Shotgun Assembly (TSA) division now legal
&lt;br&gt;&lt;br&gt;&amp;nbsp; A new GenBank division for assembled mRNA sequences, Transcriptome Shotgun
&lt;br&gt;Assembly (TSA), can now appear in GenBank releases, as of this April 2008
&lt;br&gt;Release 165.0. The date of first appearance of a TSA record will depend
&lt;br&gt;on the status of TSA submission processing, but it *is* expected that they will
&lt;br&gt;begin to appear in the GenBank Incremental Updates (GIU) within the next
&lt;br&gt;month, and that Release 166.0 will include a divisional TSA file.
&lt;br&gt;&lt;br&gt;Files in this new division will have filenames of:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gbtsaNN.aso.gz	(ASN.1 format)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gbtsaNN.seq.gz	(GenBank flatfile format)
&lt;br&gt;&lt;br&gt;where 'NN' represents an integer file-number within the TSA division.
&lt;br&gt;&lt;br&gt;&amp;nbsp; TSA sequences are shotgun assemblies of primary sequences deposited in
&lt;br&gt;dbEST, the Trace Archive (TA) or the Short-Read Archive (SRA). &amp;nbsp;Keywords
&lt;br&gt;&amp;quot;TSA&amp;quot; and &amp;quot;Transcriptome Shotgun Assembly&amp;quot; are present on all TSA
&lt;br&gt;records, in addition to a division code value of &amp;quot;TSA&amp;quot; on the LOCUS line.
&lt;br&gt;&lt;br&gt;&amp;nbsp; No format changes (new or changed line types, features, or qualifiers)
&lt;br&gt;are anticipated for this new class of GenBank record.
&lt;br&gt;&lt;br&gt;&amp;nbsp; However, note that TSA records make use of the same PRIMARY block that
&lt;br&gt;is utilized for Third-Party Annotation (TPA) records. The PRIMARY block
&lt;br&gt;contains references to the underlying reads/transcripts that were assembled
&lt;br&gt;to construct a TSA record.
&lt;br&gt;&lt;br&gt;&amp;nbsp; It might be helpful to review Third Party Annotation record BK005658,
&lt;br&gt;which provides a good example of PRIMARY block usage:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&amp;id=83843278&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&amp;id=83843278&lt;/a&gt;&lt;br&gt;&lt;br&gt;Requirements for the new Transcriptome Shotgun Assembly division include:
&lt;br&gt;&lt;br&gt;1. Submission of primary transcipt sequence data to dbEST, the Trace Archive,
&lt;br&gt;&amp;nbsp; &amp;nbsp;or the Short-Read archive (SRA). &amp;nbsp;
&lt;br&gt;&lt;br&gt;2. Registration of an associated transcriptome project with the International
&lt;br&gt;&amp;nbsp; &amp;nbsp;Nucleotide Sequence Database Collaboration (INSDC).
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;For information about submitting projects via NCBI/GenBank, see:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/genomes/mpfsubmission.cgi&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/genomes/mpfsubmission.cgi&lt;/a&gt;&lt;br&gt;&lt;br&gt;3. Submission of TSA sequence records to GenBank, including an assembly file
&lt;br&gt;&amp;nbsp; &amp;nbsp;(.ace format)
&lt;br&gt;&lt;br&gt;Note that TSA records and the primary transcript sequences that they are
&lt;br&gt;built from must be provided by the same submitter or collaborative group.
&lt;br&gt;&lt;br&gt;Examples of TSA records and more information about how to submit them
&lt;br&gt;will be provided in future editions of these release notes, and via the
&lt;br&gt;GenBank newsgroup.
&lt;br&gt;&lt;br&gt;1.3.3 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January 2005
&lt;br&gt;seem to support this: the index files were transferred only half as frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index file
&lt;br&gt;format also lead us to suspect that they have little serious use by the user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of the
&lt;br&gt;release, including all EST and GSS records, however the file contents are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=16775864&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.4 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems, depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel. Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;fifty-four of the GSS flatfiles in Release 165.0. Consider gbgss232.seq :
&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;April 15 2008
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 165.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;87177 loci, &amp;nbsp; &amp;nbsp;64476488 bases, from &amp;nbsp; &amp;nbsp;87177 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the file
&lt;br&gt;has been renamed as &amp;quot;232&amp;quot; based on the number of files dumped from the other
&lt;br&gt;system. &amp;nbsp;We will work to resolve this discrepancy in future releases, but the
&lt;br&gt;priority is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;1.4.1 Comprehensive protein FASTA file to be discontinued
&lt;br&gt;&lt;br&gt;&amp;nbsp; With the availability of divisional protein FASTA files as of GenBank
&lt;br&gt;Release 164.0, support for the single, large, comprehensive protein FASTA
&lt;br&gt;file:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/genbank/relNNN.fsa_aa.gz
&lt;br&gt;&lt;br&gt;(where 'NNN' represents a three-digit GenBank release number) will be
&lt;br&gt;discontinued after GenBank Release 166.0 in June of 2008. The size
&lt;br&gt;of this file has grown to exceed 4GB, which is unmanageable for many users.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Users are advised to make plans to utilize the new divisional files by
&lt;br&gt;August of 2008. If this timetable poses problems, please let us know at the
&lt;br&gt;NCBI Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=16775864&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=16775864&amp;i=3&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-165.0-Now-Aailable-tp16775864p16775864.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-16753325</id>
	<title>GenBank 165.0 Close-of-Data</title>
	<published>2008-04-17T11:51:18Z</published>
	<updated>2008-04-17T11:51:18Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 165.0 occurred on
&lt;br&gt;Monday April 14 at approximately 8:00pm EDT.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc0415.aso, nc0415,flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=16753325&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-165.0-Close-of-Data-tp16753325p16753325.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-15560690</id>
	<title>GenBank Release 164.0 Now Aailable</title>
	<published>2008-02-18T22:17:39Z</published>
	<updated>2008-02-18T22:17:39Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank Release 164.0 is now available via FTP from the National
&lt;br&gt;Center for Biotechnology Information (NCBI):
&lt;br&gt;&lt;br&gt;&amp;nbsp; Ftp Site &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Directory &amp;nbsp; Contents
&lt;br&gt;&amp;nbsp; ---------------- &amp;nbsp; --------- &amp;nbsp; ---------------------------------------
&lt;br&gt;&amp;nbsp; ftp.ncbi.nih.gov &amp;nbsp; genbank &amp;nbsp; &amp;nbsp; GenBank Release 164.0 flatfiles
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ncbi-asn1 &amp;nbsp; ASN.1 data used to create Release 164.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; Close-of-data for GenBank 164.0 occured on 02/12/2008. Uncompressed, the
&lt;br&gt;Release 164.0 flatfiles require roughly 321 GB (sequence files only)
&lt;br&gt;or 342 GB (including the 'short directory', 'index' and the *.txt files). 
&lt;br&gt;The ASN.1 data require approximately 295 GB.
&lt;br&gt;&lt;br&gt;Recent statistics for non-WGS, non-CON sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 163 &amp;nbsp; &amp;nbsp; &amp;nbsp;Dec 2007 &amp;nbsp; 83874179730 &amp;nbsp;80388382
&lt;br&gt;&amp;nbsp; 164 &amp;nbsp; &amp;nbsp; &amp;nbsp;Feb 2008 &amp;nbsp; 85759586764 &amp;nbsp;82853685
&lt;br&gt;&lt;br&gt;Recent statistics for WGS sequences:
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release &amp;nbsp;Date &amp;nbsp; &amp;nbsp; &amp;nbsp; Base Pairs &amp;nbsp; Entries
&lt;br&gt;&lt;br&gt;&amp;nbsp; 163 &amp;nbsp; &amp;nbsp; &amp;nbsp;Dec 2007 &amp;nbsp;106505691578 &amp;nbsp;26177471
&lt;br&gt;&amp;nbsp; 164 &amp;nbsp; &amp;nbsp; &amp;nbsp;Feb 2008 &amp;nbsp;108635736141 &amp;nbsp;27439206
&lt;br&gt;&lt;br&gt;&amp;nbsp; During the 56 days between the close dates for GenBank Releases 163.0 and
&lt;br&gt;164.0, the non-WGS/non-CON portion of GenBank grew by 1,885,407,034 basepairs
&lt;br&gt;and by 2,465,303 sequence records. During that same period, 1,750,703 records
&lt;br&gt;were updated. An average of about 75,286 non-WGS/non-CON records were added
&lt;br&gt;and/or updated per day.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Between releases 163.0 and 164.0, the WGS component of GenBank grew by
&lt;br&gt;2,130,044,563 basepairs and by 1,261,735 sequence records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; For additional release information, see the README files in either of
&lt;br&gt;the directories mentioned above, and the release notes (gbrel.txt) in
&lt;br&gt;the genbank directory. Sections 1.3 and 1.4 of the release notes
&lt;br&gt;(Changes in Release 164.0 and Upcoming Changes) have been appended
&lt;br&gt;below.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ** Important Notes **
&lt;br&gt;&lt;br&gt;&amp;nbsp; A new GenBank division will become legal as of Release 165.0 in April 2008:
&lt;br&gt;the Transcriptome Shotgun Assembly, or TSA, division. Please see Section 1.4.1
&lt;br&gt;of the release notes for more information about TSA and the records that it 
&lt;br&gt;will contain.
&lt;br&gt;&lt;br&gt;&amp;nbsp; GenBank 'index' files are now provided without any EST content, and without
&lt;br&gt;most GSS content. See Section 1.3.3 of the release notes for further details.
&lt;br&gt;NCBI is considering ceasing support for the index files, so we encourage
&lt;br&gt;affected users to review that section and provide feedback.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Release 164.0 data, and subsequent updates, are available now via
&lt;br&gt;NCBI's Entrez and Blast services.
&lt;br&gt;&lt;br&gt;&amp;nbsp; As a general guideline, we suggest first transferring the GenBank release
&lt;br&gt;notes (gbrel.txt) whenever a release is being obtained. Check to make sure
&lt;br&gt;that the date and release number in the header of the release notes are
&lt;br&gt;current (eg: February 15 2008, 164.0). If they are not, interrupt the
&lt;br&gt;remaining transfers and then request assistance from the NCBI Service Desk.
&lt;br&gt;&lt;br&gt;&amp;nbsp; A comprehensive check of the headers of all release files after your
&lt;br&gt;transfers are complete is also suggested. Here's how one might go about
&lt;br&gt;this on a unix platform, using csh/tcsh :
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; set files = `ls gb*.*`
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; foreach i ($files)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; head -10 $i | grep Release
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; end
&lt;br&gt;&lt;br&gt;Or, if the files are compressed, perhaps:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gzcat $i | head -10 | grep Release
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you encounter problems while ftp'ing or uncompressing Release
&lt;br&gt;164.0, please send email outlining your difficulties to:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=15560690&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;Mark Cavanaugh, Vladimir Alekseyev, Michael Kimelman
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;&lt;br&gt;1.3 Important Changes in Release 164.0
&lt;br&gt;&lt;br&gt;1.3.1 Organizational changes
&lt;br&gt;&lt;br&gt;&amp;nbsp; The total number of sequence data files increased by 10 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the CON division is now comprised of &amp;nbsp;85 files (+1)
&lt;br&gt;&amp;nbsp; - the ENV division is now comprised of &amp;nbsp; 8 files (+1)
&lt;br&gt;&amp;nbsp; - the HTC division is now comprised of &amp;nbsp;13 files (+1)
&lt;br&gt;&amp;nbsp; - the EST division is now comprised of 694 files (+19)
&lt;br&gt;&amp;nbsp; - the GSS division is now comprised of 277 files (-18) &amp;nbsp; (see Note below)
&lt;br&gt;&amp;nbsp; - the HTG division is now comprised of 107 files (+2)
&lt;br&gt;&amp;nbsp; - the INV division is now comprised of &amp;nbsp;12 files (+1)
&lt;br&gt;&amp;nbsp; - the PAT division is now comprised of &amp;nbsp;35 files (+1)
&lt;br&gt;&amp;nbsp; - the PLN division is now comprised of &amp;nbsp;28 files (+1)
&lt;br&gt;&amp;nbsp; - the PRI division is now comprised of &amp;nbsp;35 files (+1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; The total number of index files increased by 2 with this release:
&lt;br&gt;&lt;br&gt;&amp;nbsp; - the AUT (AUTHOR Name) index is now comprised of 50 files (+2)
&lt;br&gt;&lt;br&gt;&amp;nbsp; NOTE:
&lt;br&gt;&lt;br&gt;&amp;nbsp; A configuration setting that determines the average size for many
&lt;br&gt;&amp;nbsp; of the GSS division GenBank flatfiles was mistakenly changed for
&lt;br&gt;&amp;nbsp; Release 163.0. This resulted in a filesize decrease for a large
&lt;br&gt;&amp;nbsp; number of GSS flatfiles, from 230 MB to 210 MB. Consequently, the
&lt;br&gt;&amp;nbsp; total number of GSS flatfiles underwent an artificially-large
&lt;br&gt;&amp;nbsp; increase of 26 files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The configuration setting was restored for GenBank Release 164.0.
&lt;br&gt;&amp;nbsp; As as a result, there is now an apparent net decrease of 18 GSS files.
&lt;br&gt;&amp;nbsp; Our apologies for any confusion that this may have caused.
&lt;br&gt;&amp;nbsp; &amp;nbsp;
&lt;br&gt;1.3.2 Divisional protein FASTA files now available
&lt;br&gt;&lt;br&gt;&amp;nbsp; Individual protein FASTA data files are now being made available for
&lt;br&gt;GenBank releases, in the ASN.1 area of the NCBI FTP site:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/ncbi-asn1/protein_fasta
&lt;br&gt;&lt;br&gt;Each protein FASTA file reflects the protein data content of the ASN.1
&lt;br&gt;data file bearing the same division code in its name. For example, these
&lt;br&gt;two &amp;quot;pri12&amp;quot; divisional files:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gbpri12.aso.gz
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gbpri12.fsa_aa.gz
&lt;br&gt;&lt;br&gt;are 'equivalent', in that the proteins annotated on the DNA sequences
&lt;br&gt;of gbpri12.aso are all present in gbpri12.fsa_aa.gz . For further
&lt;br&gt;information, please see this README:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/ncbi-asn1/protein_fasta/README.protein_fasta
&lt;br&gt;&lt;br&gt;&amp;nbsp; These divisional files are a replacement (see Section 1.4.2) for the single
&lt;br&gt;protein FASTA file that has been provided in conjunction with GenBank releases:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/genbank/relNNN.fsa_aa.gz
&lt;br&gt;&lt;br&gt;where 'NNN' represents a three-digit GenBank release number.
&lt;br&gt;&lt;br&gt;1.3.3 Changes in the content of index files
&lt;br&gt;&lt;br&gt;&amp;nbsp; As described in the GB 153 release notes, the 'index' files which accompany
&lt;br&gt;GenBank releases (see Section 3.3) are considered to be a legacy data product by
&lt;br&gt;NCBI, generated mostly for historical reasons. FTP statistics of January 2005
&lt;br&gt;seem to support this: the index files were transferred only half as frequently as
&lt;br&gt;the files of sequence records. The inherent inefficiencies of the index file
&lt;br&gt;format also lead us to suspect that they have little serious use by the user
&lt;br&gt;community, particularly for EST and GSS records.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The software that generated the index file products received little
&lt;br&gt;attention over the years, and finally reached its limitations in
&lt;br&gt;February 2006 (Release 152.0). The required multi-server queries which
&lt;br&gt;obtained and sorted many millions of rows of terms from several different
&lt;br&gt;databases simply outgrew the capacity of the hardware used for GenBank
&lt;br&gt;Release generation.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our short-term solution is to cease generating some index-file content
&lt;br&gt;for all EST sequence records, and for GSS sequence records that originate
&lt;br&gt;via direct submission to NCBI.
&lt;br&gt;&lt;br&gt;&amp;nbsp; The three gbacc*.idx index files continue to reflect the entirety of the
&lt;br&gt;release, including all EST and GSS records, however the file contents are
&lt;br&gt;unsorted.
&lt;br&gt;&lt;br&gt;&amp;nbsp; These 'solutions' are really just stop-gaps, and we will likely pursue
&lt;br&gt;one of two options:
&lt;br&gt;&lt;br&gt;a) Cease support of the 'index' file products altogether.
&lt;br&gt;&lt;br&gt;b) Provide new products that present some of the most useful data from
&lt;br&gt;&amp;nbsp; &amp;nbsp;the legacy 'index' files, and cease support for other types of index data.
&lt;br&gt;&lt;br&gt;&amp;nbsp; If you are a user of the 'index' files associated with GenBank releases, we
&lt;br&gt;encourage you to make your wishes known, either via the GenBank newsgroup,
&lt;br&gt;or via email to NCBI's Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=15560690&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;&amp;nbsp; Our apologies for any inconvenience that these changes may cause.
&lt;br&gt;&lt;br&gt;1.3.4 GSS File Header Problem
&lt;br&gt;&lt;br&gt;&amp;nbsp; GSS sequences at GenBank are maintained in two different systems, depending
&lt;br&gt;on their origin, and the dumps from those systems occur in parallel. Because
&lt;br&gt;the second dump (for example) has no prior knowledge of exactly how many GSS
&lt;br&gt;files will be dumped by the first, it does not know how to number its own
&lt;br&gt;output files.
&lt;br&gt;&lt;br&gt;&amp;nbsp; There is thus a discrepancy between the filenames and file headers for
&lt;br&gt;fifty-one of the GSS flatfiles in Release 164.0. Consider gbgss227.seq :
&lt;br&gt;&lt;br&gt;&lt;br&gt;GBGSS1.SEQ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Genetic Sequence Data Bank
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; February 15 2008
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; NCBI-GenBank Flat File Release 164.0
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;GSS Sequences (Part 1)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;86927 loci, &amp;nbsp; &amp;nbsp;64285938 bases, from &amp;nbsp; &amp;nbsp;86927 reported sequences
&lt;br&gt;&lt;br&gt;&amp;nbsp; Here, the filename and part number in the header is &amp;quot;1&amp;quot;, though the file
&lt;br&gt;has been renamed as &amp;quot;227&amp;quot; based on the number of files dumped from the other
&lt;br&gt;system. &amp;nbsp;We will work to resolve this discrepancy in future releases, but the
&lt;br&gt;priority is certainly much lower than many other tasks.
&lt;br&gt;&lt;br&gt;1.4 Upcoming Changes
&lt;br&gt;&lt;br&gt;1.4.1 New GenBank TSA division: Transcriptome Shotgun Assembly
&lt;br&gt;&lt;br&gt;&amp;nbsp; A new GenBank division for assembled mRNA sequences, Transcriptome Shotgun
&lt;br&gt;Assembly (TSA), will be included in GenBank releases on or after Release 165.0
&lt;br&gt;in April of 2008.
&lt;br&gt;&lt;br&gt;Files in this new division will have filenames of:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gbtsaNN.aso.gz &amp;nbsp;(ASN.1 format)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gbtsaNN.seq.gz &amp;nbsp;(GenBank flatfile format)
&lt;br&gt;&lt;br&gt;where 'NN' represents an integer file-number within the TSA division.
&lt;br&gt;&lt;br&gt;&amp;nbsp; TSA sequences are shotgun assemblies of primary sequences deposited in
&lt;br&gt;dbEST, the Trace Archive (TA) or the Short-Read Archive (SRA). &amp;nbsp;Keywords
&lt;br&gt;&amp;quot;TSA&amp;quot; and &amp;quot;Transcriptome Shotgun Assembly&amp;quot; will be present for all TSA
&lt;br&gt;records, in addition to a division code value of &amp;quot;TSA&amp;quot; on the LOCUS line.
&lt;br&gt;&lt;br&gt;&amp;nbsp; No format changes (new or changed line types, features, or qualifiers)
&lt;br&gt;are anticipated for this new class of GenBank record.
&lt;br&gt;&lt;br&gt;&amp;nbsp; TSA records make use of the same PRIMARY block that is utilized for
&lt;br&gt;Third-Party Annotation (TPA) records. The PRIMARY block will contain
&lt;br&gt;references to the underlying reads/transcripts that were assembled to
&lt;br&gt;construct the TSA record.
&lt;br&gt;&lt;br&gt;&amp;nbsp; It might be helpful to review Third Party Annotation record BK005658,
&lt;br&gt;which provides a good example of PRIMARY block usage:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&amp;id=83843278&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&amp;id=83843278&lt;/a&gt;&lt;br&gt;&lt;br&gt;Requirements for the new Transcriptome Shotgun Assembly division include:
&lt;br&gt;&lt;br&gt;1. Submission of primary transcipt sequence data to dbEST, the Trace Archive,
&lt;br&gt;&amp;nbsp; &amp;nbsp;or the Short-Read archive (SRA). &amp;nbsp;
&lt;br&gt;&lt;br&gt;2. Registration of an associated transcriptome project with the International
&lt;br&gt;&amp;nbsp; &amp;nbsp;Nucleotide Sequence Database Collaboration (INSDC).
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp;For information about submitting projects via NCBI/GenBank, see:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/genomes/mpfsubmission.cgi&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.ncbi.nlm.nih.gov/genomes/mpfsubmission.cgi&lt;/a&gt;&lt;br&gt;&lt;br&gt;3. Submission of TSA sequence records to GenBank, including an assembly file
&lt;br&gt;&amp;nbsp; &amp;nbsp;(.ace format)
&lt;br&gt;&lt;br&gt;Note that TSA records and the primary transcript sequences that they are
&lt;br&gt;built from must be provided by the same submitter or collaborative group.
&lt;br&gt;&lt;br&gt;Examples of TSA records and more information about how to submit them
&lt;br&gt;will be provided in future editions of these release notes, and via the
&lt;br&gt;GenBank newsgroup.
&lt;br&gt;&lt;br&gt;1.4.2 Comprehensive protein FASTA file to be discontinued
&lt;br&gt;&lt;br&gt;&amp;nbsp; With the availability of divisional protein FASTA files as of GenBank
&lt;br&gt;Release 164.0, support for the single, large, comprehensive protein FASTA
&lt;br&gt;file:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/genbank/relNNN.fsa_aa.gz
&lt;br&gt;&lt;br&gt;(where 'NNN' represents a three-digit GenBank release number) will be
&lt;br&gt;discontinued after GenBank Release 166.0 in June of 2008. The size
&lt;br&gt;of this file has grown to exceed 4GB, which is unmanageable for many users.
&lt;br&gt;&lt;br&gt;&amp;nbsp; Users are advised to make plans to utilize the new divisional files by
&lt;br&gt;August of 2008. If this timetable poses problems, please let us know at the
&lt;br&gt;NCBI Service Desk:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=15560690&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;info@...&lt;/a&gt;
&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=15560690&amp;i=3&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-Release-164.0-Now-Aailable-tp15560690p15560690.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-15506336</id>
	<title>Protein FASTA files for GenBank releases : new location and file convention</title>
	<published>2008-02-15T09:34:02Z</published>
	<updated>2008-02-15T09:34:02Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Prior to February 2008, a FASTA product for protein sequences from
&lt;br&gt;coding regions annotated on the DNA sequences in GenBank has been
&lt;br&gt;provided as a single large file:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/genbank/relNNN.fsa_aa.gz
&lt;br&gt;&lt;br&gt;where 'NNN' represents a 3-digit GenBank release number.
&lt;br&gt;&lt;br&gt;The uncompressed size of this file has grown to exceed 4GB,
&lt;br&gt;which is unmanageable for many users. So as of GenBank Release
&lt;br&gt;164.0, individual protein FASTA files will be provided on
&lt;br&gt;a per-division basis, in a new subdirectory of the NCBI FTP site:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ftp://ftp.ncbi.nih.gov/ncbi-asn1/protein_fasta
&lt;br&gt;&lt;br&gt;One such file would be named:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; gbpri1.fsa_aa.gz
&lt;br&gt;&lt;br&gt;Further information will be available via a README file in the
&lt;br&gt;new ncbi-asn1/protein_fasta directory, when the Release 164.0
&lt;br&gt;files are made installed (possibly by Saturday February 16).
&lt;br&gt;&lt;br&gt;Note that the location of the protein FASTA files is within the
&lt;br&gt;/ncbi-asn1 area, not the /genbank area. Since the protein FASTA
&lt;br&gt;files have a 1-to-1 correspondence with NCBI's ASN.1 files, this
&lt;br&gt;is a more natural location for them.
&lt;br&gt;&lt;br&gt;[In fact, the quality-score data files currently located
&lt;br&gt;&amp;nbsp;in /genbank/quality_scores would *also* be located more 
&lt;br&gt;&amp;nbsp;naturally in the /ncbi-asn1 area. They may be relocated
&lt;br&gt;&amp;nbsp;at a future date.]
&lt;br&gt;&lt;br&gt;The old single-file protein FASTA product will be supported
&lt;br&gt;for two more GenBank releases, through Release 166.0 in June
&lt;br&gt;of 2008. But after that release, the relNNN.fsa_aa.gz file
&lt;br&gt;will be discontinued.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=15506336&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Protein-FASTA-files-for-GenBank-releases-%3A-new-location-and-file-convention-tp15506336p15506336.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-15484488</id>
	<title>GenBank 164.0 Close-of-Data</title>
	<published>2008-02-14T08:56:36Z</published>
	<updated>2008-02-14T08:56:36Z</updated>
	<author>
		<name>Cavanaugh, Mark (NIH/NLM/NCBI) [E]</name>
	</author>
	<content type="html">Greetings GenBank Users,
&lt;br&gt;&lt;br&gt;Close-of-data for the upcoming GenBank Release 164.0 occurred on
&lt;br&gt;Tuesday February 12 at approximately 1:30am EST.
&lt;br&gt;&lt;br&gt;The subsequently generated GenBank Incremental Update files
&lt;br&gt;nc0212.aso, nc0212,flat, etc. contain data through the close.
&lt;br&gt;&lt;br&gt;Note: Release processing often does not begin until sometime during
&lt;br&gt;business hours on the close date. As a result, a number of sequence
&lt;br&gt;records processed *after* 1:30am are likely to be present in the
&lt;br&gt;GenBank 164.0 release files, even though they are &amp;quot;post-close&amp;quot; .
&lt;br&gt;&lt;br&gt;Similarly, the first GenBank Incremental Update that is generated
&lt;br&gt;after the close date is likely to contain a number of sequence
&lt;br&gt;records that are unchanged, compared to their appearance in the
&lt;br&gt;release files.
&lt;br&gt;&lt;br&gt;Our apologies for the lack of advanced notice about the close date.
&lt;br&gt;&lt;br&gt;Mark Cavanaugh
&lt;br&gt;GenBank
&lt;br&gt;NCBI/NLM/NIH/HHS
&lt;br&gt;&lt;br&gt;_______________________________________________
&lt;br&gt;Genbankb mailing list
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=15484488&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;Genbankb@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.bio.net/biomail/listinfo/genbankb&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.bio.net/biomail/listinfo/genbankb&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/GenBank-164.0-Close-of-Data-tp15484488p15484488.html" />
</entry>

</feed>
