CROSS-REFERENCE TO RELATED APPLICATION(S)
This patent application claims the benefit of priority of U.S. application Ser. No. 61/219,610, filed Jun. 23, 2009, which application is herein incorporated by reference.
STATEMENT OF GOVERNMENT SUPPORT
This invention was made with government support under Grant #U54 GM074958 awarded by the National Institutes of General Medical Science, Protein Structure Initiative program. The government has certain rights in the invention.
The NAD(P)H-dependent carbonyl reductases catalyze reduction of a variety of endogenous and xenobiotic carbonyl compounds, including biologically and pharmacologically active substrates (Forrest et al., Chem. Biol. Interact., 129, 21-40 (2000)). There is considerable interest in the use of carbonyl reductases in the pharmaceutical and fine chemicals industries for the production of chiral alcohols, which are important building blocks for the synthesis of chirally-pure compounds, e.g., pharmaceutical agents (Panke et al., Curr. Opin. Biotechnol., 15, 272-279 (2004); Schmid et al., Nature, 409, 258-268 (2001); and Schoemaker et al., Science, 299, 1694-1697 (2003)). For such chiral auxiliaries, production from their corresponding prochiral ketones, the use of carbonyl reductases has advantages over chemo-catalysts in terms of their highly chemo-, enantio-, and regioselectivities. These features make stereospecific carbonyl reductases very useful from both scientific and industrial perspectives (Kroutil et al., Curr. Opin. Chem. Biol., 8, 120-126 (2004)). However, the range of current applications for stereospecific carbonyl reductases remains modest. This can be attributed to several limitations, including the stereospecificity and availability of enzymes. In addition, research on molecular mechanisms of oxidoreductases is still in its infancy. Further, most enzymes that can catalyze asymmetric reductions generally follow Prelog's rule in terms of stereochemical outcomes (Bradshaw et al., J. Org. Chem., 57, 1526-1532 (1992); Ernst et al., Appl. Microbiol. Biotechnol., 66, 629-634 (2005); Niefind et al., J. Mol. Biol., 327, 317-328 (2003); Prelog, Pure Appl. Chem., 9, 119-130 (1964)). Enzymes with anti-Prelog stereospecificity are quite rare, and only few have been isolated and characterized in purified forms (De Wildeman et al., Acc. Chem. Res. 40, 1260-1266, (2007)). Accordingly, stereospecific carbonyl reductases are needed. In particular, stereospecific carbonyl reductases with anti-Prelog stereospecificity are needed.
SUMMARY OF CERTAIN EMBODIMENTS OF THE INVENTION
Accordingly, as described herein, three stereospecific carbonyl reductase genes (scr1, scr2, and scr3) from C. parapsilosis have been discovered. These genes have been cloned and expressed, and the encoded proteins purified to homogeneity and confirmed to function as stereospecific carbonyl reductases (SCR1, SCR2, and SCR3). These stereospecific carbonyl reductases have anti-Prelog selectivity and convert 2-hydroxyacetophenone to (S)-1-phenyl-1,2-ethanediol (PED). These oxidoreductases have useful specificities that are useful for fine biochemical synthesis.
Application of biocatalysis in the synthesis of chiral molecules is one of the greenest technologies for the replacement of chemical routes. This is due to environmentally benign reaction conditions for biocatalysis and unparalleled chemo-, regio- and stereoselectivities. The newly identified stereospecific carbonyl reductases (SCRs) showed high catalytic activities for producing (S)-1-phenyl-1,2-ethanediol (PED) from 2-hydroxyacetophenone with NADPH as the coenzyme. The enzymes from this cluster are carbonyl reductases with novel anti-Prelog stereo selectivity. Of the enzymes encoded in the gene cluster, SCR1 and SCR3 exhibited distinct specificities to acetophenone derivatives and chloro-substituted 2-hydroxyacetophenones, and especially very high activities to ethyl 4-chloro-3-oxobutyrate, which affords ethyl 4-chloro-3-hydroxybutyrate, a precursor of the chiral side chain in the synthesis of atorvastatin (Lipitor®) and rosuvastatin, e.g., rosuvastatin calcium (Crestor®).
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1. Map of contig005802 of Candida parapsilosis genome including the four open reading frames, scr1, scr2, scr3, and cpadh.
FIG. 2. Amino acid sequence alignment of CPADH (GenBank accession number DQ675534; SEQ ID NO:1), SCR1 (GenBank accession number FJ939565; SEQ ID NO:4), SCR2 (GenBank accession number FJ939563; SEQ ID NO:3), and SCR3 (GenBank accession number FJ939564; SEQ ID NO:2) from C. parapsilosis. Gaps in the aligned sequences are indicated by dashes. Identical amino acid residues are enclosed in boxes. The conserved sequences of the cofactor-binding motif Gly-X-X-X-Gly-X-Gly (SEQ ID NO:9) and the catalytic tetrad of Asn-Ser-Tyr-Lys (SEQ ID NO:10) in the majority of SDRs are marked with arrows.
FIG. 3. Analysis of the overexpression of SCR1, SCR2, and SCR3. The proteins were separated on a 12% SDS-polyacrylamide gel and stained with Coomassie Brilliant Blue G-250. Lane 1, total protein for SCR1; Lane 2, soluble fraction for SCR1; Lane 3, total protein for SCR2; Lane 4, soluble fraction for SCR2; Lane 5, total protein for SCR3; Lane 6, soluble fraction for SCR3; Lane 7, molecular mass standard.
FIG. 4. SDS-PAGE analysis of purified enzymes. The purified proteins were resolved by SDS-PAGE on a 12% polyacrylamide gel and stained with Coomassie Brilliant Blue G-250. Lane 1, molecular mass standard; Lane 2, purified SCR1; Lane 3, purified SCR2; Lane 4, purified SCR3.
FIG. 5. pH dependence of SCR1, SCR2, and SCR3 catalyzing 2-hydroxyacetophenone reduction. The enzyme activities of SCR1 (squares), SCR2 (triangles), and SCR3 (circles) were measured in 0.1 M acetate buffer (pH 4.0 to 6.0) or 0.1 M sodium phosphate buffer (pH 6.0 to 8.0) or 0.1 M Tris-HCl buffer (pH 8.0 to 8.5) with 2-hydroxyacetophenone as the substrate and NADPH as the cofactor. Maximal enzyme activity observed was set as 100% relative activity for each enzyme.
FIG. 6A-6E. Asymmetric reduction of 2-hydroxyacetophenone (2-HAP) to 1-phenyl-1,2-ethanediol (PED) enantiomer by SCR1, SCR2, and SCR3, respectively. (6A) Standard sample of (R)-PED. (6B) Standard sample of (S)-PED. (6C) SCR1 catalyzed asymmetric reduction of 2-HAP. (6D) SCR2 catalyzed asymmetric reduction of 2-HAP. (6E) SCR3 catalyzed asymmetric reduction of 2-HAP.
FIG. 7A-7D. Substrate specificity of SCR1 and SCR3. The enzyme activities of SCR1 (open bars) and SCR3 (shaded bars) (7A) to various substrates (7B-7D) were measured as described herein. Maximal enzyme activity observed was set as 100% relative activity for the enzymes to various substrates.
Certain embodiments of the present invention provide a purified polypeptide, the sequence of which comprises an amino acid sequence that has at least 70% identity to a Candida parapsilosis stereospecific carbonyl reductase, wherein the polypeptide has carbonyl reductase activity and does not comprise SEQ ID NO:1.
In certain embodiments, the amino acid sequence has at least 70% identity to at least one of the Candida parapsilosis stereospecific carbonyl reductases represented by SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4.
In certain embodiments, the amino acid sequence has at least 70% identity to SEQ ID NO:2.
In certain embodiments, the amino acid sequence has at least 70% identity to SEQ ID NO:3.
In certain embodiments, the amino acid sequence has at least 70% identity to SEQ ID NO:4.
In certain embodiments, the amino acid sequence has at least 75% identity to the Candida parapsilosis stereospecific carbonyl reductase (e.g., to at least one of SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4).
In certain embodiments, the amino acid sequence has at least 80% identity to the Candida parapsilosis stereospecific carbonyl reductase (e.g., to at least one of SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4).
In certain embodiments, the amino acid sequence has at least 85% identity to the Candida parapsilosis stereospecific carbonyl reductase (e.g., to at least one of SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4).
In certain embodiments, the amino acid sequence has at least 90% identity to the Candida parapsilosis stereospecific carbonyl reductase (e.g., to at least one of SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4).
In certain embodiments, the amino acid sequence has at least 95% identity to the Candida parapsilosis stereospecific carbonyl reductase (e.g., to at least one of SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4).
In certain embodiments, the amino acid sequence has at least 99% identity to the Candida parapsilosis stereospecific carbonyl reductase (e.g., to at least one of SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4).
In certain embodiments, the amino acid sequence comprises SEQ ID NO:9, SEQ ID NO:10 or SEQ ID NO:11.
In certain embodiments, the amino acid sequence comprises SEQ ID NO:9, SEQ ID NO:10 and SEQ ID NO:11.
In certain embodiments, the amino acid sequence comprises SEQ ID NO:2.
In certain embodiments, the amino acid sequence comprises SEQ ID NO:3.
In certain embodiments, the amino acid sequence comprises SEQ ID NO:4.
In certain embodiments, the sequence of the polypeptide consists essentially of, or consists of, SEQ ID NO:2, SEQ ID NO:3 or SEQ ID NO:4.
In certain embodiments, the carbonyl reductase activity of the polypeptide is NADPH-dependent.
In certain embodiments, the polypeptide is an anti-Prelog-type stereospecific carbonyl reductase.
Certain embodiments of the present invention provide composition comprising the polypeptide as described herein.
Certain embodiments of the present invention provide an isolated nucleic acid sequence comprising a sequence that encodes a polypeptide described herein.
In certain embodiments, the sequence comprises SEQ ID NO:6 or of a degenerate variant of SEQ ID NO:6.
In certain embodiments, the sequence comprises SEQ ID NO:7 or of a degenerate variant of SEQ ID NO:7.
In certain embodiments, the sequence comprises SEQ ID NO:8 or of a degenerate variant of SEQ ID NO:8.
In certain embodiments, the sequence encodes SEQ ID NO:2.
In certain embodiments, the sequence encodes SEQ ID NO:3.
In certain embodiments, the sequence encodes SEQ ID NO:4.
Certain embodiments of the present invention provide an expression vector comprising an expression cassette operably linked to a nucleic acid molecule as described herein.
Certain embodiments of the present invention provide a host cell comprising a vector as described herein.
Certain embodiments of the present invention provide a method of reducing a carbonyl substrate, comprising contacting the substrate with a polypeptide described herein, or a composition described herein, in conditions suitable to catalyze the reduction of the carbonyl substrate. As used herein, a “carbonyl substrate” is a substrate that comprises at least one carbonyl group, such as a compound that comprises an α-ketoester, a β-ketoester, an aryl ketone or an aliphatic ketone (see, e.g., FIG. 7). The polypeptide having carbonyl reductase activity reduces a carbonyl group of the carbonyl substrate.
In certain embodiments, the reduction takes place in the presence of a coenzyme.
In certain embodiments, the coenzyme is NADPH.
In certain embodiments, the carbonyl substrate comprises an α-ketoester, a β-ketoester, an aryl ketone or an aliphatic ketone.
In certain embodiments, the carbonyl substrate comprises an α-ketoester.
In certain embodiments, the α-ketoester is methyl pyruvate, methyl phenylglyoxylate, ethyl pyruvate or ethyl benzoylformate.
In certain embodiments, the carbonyl substrate comprises β-ketoester.
In certain embodiments, the O-ketoester is ethyl trifluoroacetoacetate, methyl acetoacetate, methyl 3-oxovalerate, methyl 4-fluorobenzoylacetate, ethyl acetoacetate, ethyl 3-oxovalerate, ethyl 4-chloroacetoacetate, ethyl benzoylacetate, or ethyl 3,4-dimethoxybenzoylacetate.
In certain embodiments, the carbonyl substrate comprises an aryl ketone.
In certain embodiments, the aryl ketone is 2-hydroxyacetophenone, or a derivative thereof.
In certain embodiments, the aryl ketone is 2′-chloro-2-hydroxyacetophenone, 3′-chloro-2-hydroxyacetophenone, 4′-chloro-2-hydroxyacetophenone or 4′-methoxy-2-hydroxyacetophenone.
In certain embodiments, the carbonyl substrate comprises an aliphatic ketone.
In certain embodiments, the aliphatic ketone is 2-butanone, 2-pentanone, 2-hexanone, 2-heptanone, 2-octanone or 2-methyl-3-pentanone
In certain embodiments, the carbonyl substrate is ethyl 4-chloro-3-oxobutyrate.
In certain embodiments, the reduction takes place at pH ranging from 5.0 to 6.0 (e.g., at about 5.0, 5.5 or 6.0).
Described herein is a new gene cluster of enantioselective oxidoreductases with unusual stereospecificity in C. parapsilosis. It was confirmed that these genes code for three unique stereospecific carbonyl reductases through cloning, expression, and purification of the corresponding gene products, and verification of enantiomer configuration of the enzymatic products of asymmetric reduction of prochiral carbonyl groups of multiple substrates. SCR1, SCR2, and SCR3 all exhibit a novel anti-Prelog stereospecificity in reducing prochiral carbonyl groups; e.g., forming (S)-1-phenyl-1,2-ethanediol from the corresponding ketone substrate, 2-hydroxyacetophenone. The enzymes are, however, distinct in their catalytic properties, including their pH dependency and substrate specificity spectrum.
According to catalytic properties and primary structure information, stereospecific oxidoreductases, including alcohol dehydrogenases and carbonyl reductases, are mainly classified into three different groups, the zinc-dependent alcohol dehydrogenase, the short-chain dehydrogenase/reductase (SDR), and the aldo-keto reductase (AKR) (Kamitori et al., J. Mol. Biol., 352, 551-558 (2005); Reid and Fewson, Crit. Rev. Microbiol., 20, 13-56 (1994)). These proteins share sequence motifs characteristic of the SDR superfamily, including the cofactor-binding motif Gly-X-X-X-Gly-X-Gly (X denotes any amino acid; SEQ ID NO:9), the catalytic triad of Ser-Tyr-Lys (SEQ ID NO:11), and also the extended tetrad of Asn-Ser-Tyr-Lys (SEQ ID NO:10) observed in the majority of SDRs (Filling et al., J. Biol. Chem. 277, 25677-25684 (2002)). In addition, the SCRs also have the conserved sequence motifs of secondary structural elements and key positions for assignment of coenzyme specificity of the cP2 subfamily in classic SDRs, except that the conserved basic residue K/R responsible for binding phosphate group in NADPH is replaced by weak basic residue H (Kallberg et al., Eur. J. Biochem., 269, 4409-4417 (2002)). These highly-conserved, characteristic sequence motifs indicate that the SCRs belong to the cP2 subfamily of the classical SDR superfamily, one of the three NADPH-dependent subfamilies (Kallberg et al., Eur. J. Biochem., 269, 4409-4417 (2002)).
Oxidoreductases perform a wide variety of asymmetric reductions, differing in stereospecificity and substrate specificity, and have been used for producing optically active alcohols from various prochiral ketones, ketoacids, and ketoesters. The SCRs catalyze (S)-specific reduction of 2-hydroxyacetophenone, an anti-Prelog type reaction (Manzocchi et al., J. Org. Chem., 53, 4405-4407, (1988); Prelog, Pure Appl. Chem., 9, 119-130 (1964)). Therefore, these new enzymes complement the stereospecific oxidoreductases described to date for catalysis of the reduction of prochiral carbonyl compounds to the corresponding optically pure alcohols with anti-Prelog stereopreference. Additionally, the finding of stereospecific carbonyl reductases from the same host provides profound knowledge on the reaction mechanism of C. parapsilosis whole-cell mediated stereoinversion, involving the oxidation step of (R)-PED to the intermediate (2-hydroxyacetophenone) and the reduction step of the intermediate to (S)-PED (Gruber et al., Adv. Synth. Catal., 348, 1789-1805 (2006); Nie et al., Org. Process Res. Dev., 8, 246-251 (2004); Nie et al., Appl. Environ. Microbiol., 73, 3759-3764 (2007); Voss et al., Angew. Chem. Int. Ed., 47, 741-745 (2008); Voss et al., J. Am. Chem. Soc., 130, 13969-13972 (2008)). It is worthy to note that SCR1 catalyzes the reduction of a broad spectrum of ketones including aryl, aliphatic ketones, α- and β-ketoesters, and shows a particular highest substrate specificity towards ethyl 4-chloro-3-oxobutyrate, a precursor for the synthesis of an important pharmaceutical intermediate. Therefore, the new discovered stereospecific carbonyl reductases will be useful enzymes with application potential.
The discovery of novel stereospecific carbonyl reductases of anti-Prelog selectivity further demonstrates the diversity of stereospecific oxidoreductases in microorganisms. Such enzymes provide a basis for elucidating the molecular mechanisms of enzyme-mediated asymmetric reactions involving stereo-recognition between proteins and chiral molecules, and mechanisms of electron transfer between functional groups of chiral molecules and key amino acid residues in enzymes. Apart from their unique value in studies of mechanisms of stereospecific oxidoreduction reactions, these novel carbonyl reductases of anti-Prelog stereopreference, have multiple potential uses in industrial applications to produce chiral alcohols useful as intermediates in fine chemical synthesis.
In some embodiments of the invention, the carbonyl reductase can catalyze asymmetric reduction of 2-hydroxyacetophenone into (S)-1-phenyl-1,2-ethanediol (PED) (Nie et al., Appl. Environ. Microbiol., 73, 3759-3764 (2007)), a versatile chiral building block for the synthesis of pharmaceuticals, agrochemicals, and liquid crystals. PED is also a precursor for the production of chiral biphosphines and a chiral initiator for stereoselective polymerization (Iwasaki et al., Org. Lett., 1, 969-972 (1999); Liese et al., Biotechnol. Bioeng., 51, 544-550 (1996)).
In some embodiments, the carbonyl reductase can catalyze the reduction of a compound that comprises an aryl ketone, an aliphatic ketone, an α-ketoester, or a β-ketoester. In some embodiments, the carbonyl reductase catalyzes the reduction of an aryl ketone. In some embodiments, the carbonyl reductase catalyzes the reduction of an aliphatic ketone. In some embodiments, the carbonyl reductase catalyzes the reduction of an α-ketoester. In some embodiments, the carbonyl reductase catalyzes the reduction of a β-ketoester.
The term “nucleic acid” refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single or double stranded form, made of monomers (nucleotides) containing a sugar, phosphate and a base that is either a purine or pyrimidine. Unless specifically limited, the term encompasses known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues.
The term “nucleotide sequence” refers to a polymer of DNA or RNA which can be single-stranded or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases capable of incorporation into DNA or RNA polymers. The terms “nucleic acid,” “nucleic acid molecule,” and “polynucleotide” are used interchangeably.
Certain embodiments of the invention encompass compositions that comprise isolated or substantially purified nucleic acid. In the context of the present invention, an “isolated” or “purified” DNA molecule or RNA molecule is a DNA molecule or RNA molecule that exists apart from its native environment and is therefore not a product of nature. An isolated DNA molecule or RNA molecule may exist in a purified form or may exist in a non-native environment such as, for example, a transgenic host cell. For example, an “isolated” or “purified” nucleic acid molecule is substantially free of other cellular material or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. In one embodiment, an “isolated” nucleic acid is free of sequences that naturally flank the nucleic acid (i.e., sequences located at the 5′ and 3′ ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived.
The following terms are used to describe the sequence relationships between two or more nucleic acids or polynucleotides: (a) “reference sequence,” (b) “comparison window,” (c) “sequence identity,” (d) “percentage of sequence identity,” and (e) “substantial identity.”
(a) As used herein, “reference sequence” is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence.
(b) As used herein, “comparison window” makes reference to a contiguous and specified segment of a polynucleotide sequence, wherein the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Generally, the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30, 40, 50, 100, or longer. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the polynucleotide sequence a gap penalty is typically introduced and is subtracted from the number of matches.
Methods of alignment of sequences for comparison are well-known in the art. Thus, the determination of percent identity between any two sequences can be accomplished using a mathematical algorithm. Non-limiting examples of such mathematical algorithms are the algorithm of Myers and Miller (Myers and Miller, CABIOS, 4, 11 (1988)); the local homology algorithm of Smith et al. (Smith et al., Adv. Appl. Math., 2, 482 (1981)); the homology alignment algorithm of Needleman and Wunsch (Needleman and Wunsch, JMB, 48, 443 (1970)); the search-for-similarity-method of Pearson and Lipman (Pearson and Lipman, Proc. Natl. Acad. Sci. USA, 85, 2444 (1988)); the algorithm of Karlin and Altschul (Karlin and Altschul, Proc. Natl. Acad. Sci. USA, 87, 2264 (1990)), modified as in Karlin and Altschul (Karlin and Altschul, Proc. Natl. Acad. Sci. USA, 90, 5873 (1993)).
Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity. Such implementations include, but are not limited to: CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, Calif.); the ALIGN program and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package (available from Genetics Computer Group (GCG), 575 Science Drive, Madison, Wis., USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described by Higgins et al. (Higgins et al., CABIOS, 5, 151 (1989)); Corpet et al. (Corpet et al., Nucl. Acids Res., 16, 10881 (1988)); Huang et al. (Huang et al., CABIOS, 8, 155 (1992)); and Pearson et al. (Pearson et al., Meth. Mol. Biol., 24, 307 (1994)). The ALIGN program is based on the algorithm of Myers and Miller, supra. The BLAST programs of Altschul et al. (Altschul et al., JMB, 215, 403 (1990)) are based on the algorithm of Karlin and Altschul supra.
Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized. Alternatively, PSI-BLAST can be used to perform an iterated search that detects distant relationships between molecules. When utilizing BLAST, Gapped BLAST, PSI-BLAST, the default parameters of the respective programs (e.g., BLASTN for nucleotide sequences, BLASTX for proteins) can be used.
For purposes of the present invention, comparison of nucleotide sequences for determination of percent sequence identity to another sequence may be made using the BlastN program (version 1.4.7 or later) with its default parameters or any equivalent program. By “equivalent program” is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide or amino acid residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by the program.
(c) As used herein, “sequence identity” or “identity” in the context of two nucleic acid or polypeptide sequences makes reference to a specified percentage of residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window, as measured by sequence comparison algorithms. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore may not change the functional properties of the molecule. When sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences that differ by such conservative substitutions are said to have “sequence similarity” or “similarity.” Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, Calif.).
(d) As used herein, “percentage of sequence identity” means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.
(e)(i) The term “substantial identity” of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, or 94%, or even at least 95%, 96%, 97%, 98%, or 99% sequence identity, compared to a reference sequence using one of the alignment programs described using standard parameters. One of skill in the art will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning, and the like. Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 70%, 80%, 90%, or even at least 95%.
Another indication that nucleotide sequences are substantially identical is if two molecules hybridize to each other under stringent conditions. Generally, stringent conditions are selected to be about 5° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. However, stringent conditions encompass temperatures in the range of about 1° C. to about 20° C., depending upon the desired degree of stringency as otherwise qualified herein. Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides they encode are substantially identical. This may occur, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. One indication that two nucleic acid sequences are substantially identical is when the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the polypeptide encoded by the second nucleic acid.
(e)(ii) The term “substantial identity” in the context of a peptide indicates that a peptide comprises an amino acid sequence with at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, or 94%, or even 95%, 96%, 97%, 98% or 99%, sequence identity to a reference sequence over a specified comparison window. In certain embodiments, optimal alignment is conducted using the homology alignment algorithm of Needleman and Wunsch (Needleman and Wunsch, JMB, 48, 443 (1970)). An indication that two peptide sequences are substantially identical is that one peptide is immunologically reactive with antibodies raised against the second peptide. Thus, a peptide is substantially identical to a second peptide, for example, where the two peptides differ only by a conservative substitution. Thus, certain embodiments of the invention provide amino acid sequences that are substantially identical to the amino acid sequences described herein.
For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.
As noted above, another indication that two nucleic acid sequences are substantially identical is that the two molecules hybridize to each other under stringent conditions. The phrase “hybridizing specifically to” refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. “Bind(s) substantially” refers to complementary hybridization between a probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target nucleic acid sequence.
“Stringent hybridization conditions” and “stringent hybridization wash conditions” in the context of nucleic acid hybridization experiments such as Southern and Northern hybridizations are sequence dependent, and are different under different environmental parameters. Longer sequences hybridize specifically at higher temperatures. The thermal melting point (Tm) is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution. For DNA-DNA hybrids, the Tm can be approximated from the equation of Meinkoth and Wahl (1984); Tm 81.5° C.+16.6 (log M)+0.41 (% GC)−0.61 (% form)−500/L; where M is the molarity of monovalent cations, % GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. Tm is reduced by about 1° C. for each 1% of mismatching; thus, Tm, hybridization, and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with >90% identity are sought, the Tm can be decreased 10° C. Generally, stringent conditions are selected to be about 5° C. lower than the Tm for the specific sequence and its complement at a defined ionic strength and pH. However, severely stringent conditions can utilize a hybridization and/or wash at 1, 2, 3, or 4° C. lower than the Tm; moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10° C. lower than the Tm; low stringency conditions can utilize a hybridization and/or wash at 11, 12, 13, 14, 15, or 20° C. lower than the Tm. Using the equation, hybridization and wash compositions, and desired temperature, those of ordinary skill will understand that variations in the stringency of hybridization and/or wash solutions are inherently described. If the desired degree of mismatching results in a temperature of less than 45° C. (aqueous solution) or 32° C. (formamide solution), the SSC concentration is increased so that a higher temperature can be used. Generally, highly stringent hybridization and wash conditions are selected to be about 5° C. lower than the Tm for the specific sequence at a defined ionic strength and pH.
An example of highly stringent wash conditions is 0.15 M NaCl at 72° C. for about 15 minutes. An example of stringent wash conditions is a 0.2×SSC wash at 65° C. for 15 minutes. Often, a high stringency wash is preceded by a low stringency wash to remove background probe signal. An example medium stringency wash for a duplex of, e.g., more than 100 nucleotides, is 1×SSC at 45° C. for 15 minutes. For short nucleotide sequences (e.g., about 10 to 50 nucleotides), stringent conditions typically involve salt concentrations of less than about 1.5 M, less than about 0.01 to 1.0 M, Na ion concentration (or other salts) at pH 7.0 to 8.3, and the temperature is typically at least about 30° C. and at least about 60° C. for long probes (e.g., >50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. In general, a signal to noise ratio of 2× (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization. Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the proteins that they encode are substantially identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code.
Very stringent conditions are selected to be equal to the Tm for a particular probe. An example of stringent conditions for hybridization of complementary nucleic acids that have more than 100 complementary residues on a filter in a Southern or Northern blot is 50% formamide, e.g., hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37° C., and a wash in 0.1×SSC at 60 to 65° C. Exemplary low stringency conditions include hybridization with a buffer solution of 30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at 37° C., and a wash in 1× to 2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55° C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1.0 M NaCl, 1% SDS at 37° C., and a wash in 0.5× to 1×SSC at 55 to 60° C.