Spider dragline silk is a proteinaceous fiber with impressive physical characteristics making it attractive for use in advanced materials. The fiber is composed of two proteins (spidroins MaSp1 and MaSp2), each of which contains a large central repeat array flanked by non-repetitive N- and C-terminal domains. The repeat arrays appear to be largely responsible for the tensile properties of the fiber, suggesting that the N- and C-terminal domains may be involved in self-assembly. We recently isolated the MaSp1 and MaSp2 N-terminal domains from Nephila clavipes and have incorporated these into mini-silk genes for expression in transgenic systems. Current efforts involve the development of expression vectors that will allow purification using a removable affinity tag for scalable protein purification.
Cot-based sequence discovery represents a powerful means by which both low-copy and repetitive sequences can be selectively and efficiently fractionated, cloned, and characterized. Based upon the results of a Cot analysis, hydroxyapatite chromatography was used to fractionate sorghum (Sorghum bicolor) genomic DNA into highly repetitive (HR), moderately repetitive (MR), and single/low-copy (SL) sequence components that were consequently cloned to produce HRCot, MRCot, and SLCot genomic libraries. Filter hybridization (blotting) and sequence analysis both show that the HRCot library is enriched in sequences traditionally found in high-copy number (e.g., retroelements, rDNA, centromeric repeats), the SLCot library is enriched in low-copy sequences (e.g., genes and "nonrepetitive ESTs"), and the MRCot library contains sequences of moderate redundancy. The Cot analysis suggests that the sorghum genome is approximately 700 Mb (in agreement with previous estimates) and that HR, MR, and SL components comprise 15%, 41%, and 24% of sorghum DNA, respectively. Unlike previously described techniques to sequence the low-copy components of genomes, sequencing of Cot components is independent of expression and methylation patterns that vary widely among DNA elements, developmental stages, and taxa. High-throughput sequencing of Cot clones may be a means of "capturing" the sequence complexity of eukaryotic genomes at unprecedented efficiency.