REFERENCE TO SEQUENCE LISTING
This application contains a Sequence Listing in computer readable form, which is incorporated herein by reference.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to alpha-amylase variants, polynucleotides encoding the variants, methods of producing the variants, and methods of using the variants.
2. Description of Related Art
Alpha-amylases (alpha-1,4-glucan-4-glucanohydrolases, E.C. 126.96.36.199) constitute a group of enzymes, which catalyze the hydrolysis of starch and other linear and branched 1,4-glucosidic oligo- and polysaccharides.
Alpha-amylases are used commercially for a variety of purposes such as in the initial stages of starch processing (e.g., liquefaction); in wet milling processes; and in alcohol production from carbohydrate sources. They are also used as cleaning agents or adjuncts in detergent matrices; in the textile industry for starch desizing; in baking applications; in the beverage industry; in oil fields in drilling processes; in recycling processes, e.g., for de-inking paper; and in animal feed.
One of the first bacterial alpha-amylases to be used was an alpha-amylase from B. licheniformis, also known as Termamyl™, which has been extensively characterized and the crystal structure has been determined for this enzyme. Alkaline amylases, such as the alpha-amylase derived from Bacillus sp. strains NCIB 12289, NCIB 12512, NCIB 12513, and DSM 9375 (disclosed in WO 95/26397), form a particular group of alpha-amylases that are useful in detergents. Many of these known bacterial amylases have been modified in order to improve their functionality in a particular application.
Termamyl™ and many highly efficient alpha-amylases require calcium for activity. The crystal structure of Termamyl™ shows that three calcium atoms are bound to the alpha-amylase structure coordinated by negatively charged amino acid residues. This requirement for calcium is a disadvantage in applications where strong chelating compounds are present, such as in detergents or during ethanol production from whole grains, where the plant material comprises a large amount of natural chelators such as phytate.
Calcium-insensitive amylases are known, e.g., the alpha-amylases disclosed in EP 1022334 and WO 03/083054, and a Bacillus circulans alpha-amylase having the sequence disclosed in UNIPROT:Q03657.
It would therefore be beneficial to provide alpha-amylases with reduced calcium sensitivity.
SUMMARY OF THE INVENTION
The present invention provides alpha-amylase variants comprising an A-domain of a calcium-sensitive alpha-amylase, a B-domain which has at least 55% and less than 100% sequence identity with the B-domain of SEQ ID NO: 13, and a C-domain of a calcium-sensitive alpha-amylase.
The present invention also relates to isolated polynucleotides encoding an alpha-amylase variant, nucleic acid constructs, vectors, and host cells comprising the polynucleotides, and methods of producing a variant of a parent alpha-amylase.
The present invention also relates to the use of the variants in starch processing (e.g., liquefaction); wet milling processes; alcohol production from carbohydrate sources; detergents; dishwashing compositions; starch desizing in the textile industry; baking applications; the beverage industry; oil fields in drilling processes; recycling processes, e.g., for de-inking paper, and animal feed.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 shows an alignment of SEQ ID NOS: 1-16, 29 and 30.
DETAILED DESCRIPTION OF THE INVENTION
The present invention provides alpha-amylase variants comprising an A-domain of a calcium-sensitive alpha-amylase, a B-domain which has at least 50% and less than 100% sequence identity with the B-domain of SEQ ID NO: 13, and a C-domain of a calcium-sensitive alpha-amylase.
A, B and C-Domains: The structure of alpha-amylases comprises three distinct domains A, B and C, see, e.g., Machius et al., 1995, J. Mol. Biol. 246: 545-559. The term “domain” means a region of a polypeptide that in itself forms a distinct and independent substructure of the whole molecule. Alpha-amylases consist of a beta/alpha-8 barrel harboring the active site, which is denoted the A-domain, a rather long loop between the beta-sheet 3 and alpha-helix 3, which is denoted the B-domain, and a C-domain and in some cases also a carbohydrate binding domain (e.g., WO 2005/001064; Machius et al., supra).
The domains of an alpha-amylase can be determined by structure analysis such as by using crystallographically techniques. An alternative method for determining the domains of an alpha-amylase is by sequence alignment of the amino acid sequence of the alpha-amylase with another alpha-amylase for which the domains have been determined. The sequence that aligns with, e.g., the B-domain sequence in the alpha-amylase for which the B-domain has been determined can be considered the B-domain for the given alpha-amylase.
Allelic variant: The term “allelic variant” means any of two or more alternative forms of a gene occupying the same chromosomal locus. Allelic variation arises naturally through mutation, and may result in polymorphism within populations. Gene mutations can be silent (no change in the encoded polypeptide) or may encode polypeptides having altered amino acid sequences. An allelic variant of a polypeptide is a polypeptide encoded by an allelic variant of a gene.
Alpha-amylases (alpha-1,4-glucan-4-glucanohydrolases, E.C. 188.8.131.52) are a group of enzymes, which catalyze the hydrolysis of starch and other linear and branched 1,4-glucosidic oligo- and polysaccharides.
Calcium-insensitive amylase means an alpha-amylase that does not require the presence of calcium for optimal activity and/or for maintaining the active conformation/structure.
Calcium-sensitive amylase means an alpha-amylase that requires the presence of calcium to retain its structure and/or to have full enzymatic activity. For some calcium-sensitive amylases it has been shown that they contains a calcium atom coordinated to acidic amino acid residues in the active conformation. A large number of calcium-sensitive alpha-amylases are known and have been used industrially because of their beneficial properties. Calcium-sensitive alpha-amylases are generally sensitive towards conditions that leads to loss of the calcium atom coordinated in their structure such as detergent compositions and fuel mass.
Calcium sensitivity is determined by incubating an alpha-amylase in the presence of a strong chelator and analyzing the impact of this incubation on the activity or stability of the alpha-amylase. A calcium-sensitive alpha-amylase will be less stable in the presence of a chelator or lose a major part or all of its activity during incubation whereas a calcium-insensitive alpha-amylase will not lose all of its activity or will lose only a minor part of the activity during incubation. Chelator strength may be evaluated using methods known in the art such as the methods disclosed in Nielsen et al., 2003, Anal. Biochem. 314: 227-234; and Nagarajan and Paine, 1984, J. Am. Oil Chem. Soc. 61(9): 1475-1478. Examples of strong chelators that may be used for such an assay are EGTA (ethylene glycol tetraacetic acid), EDTA (ethylene diamine tetraacetic acid), DTPA (diethylene triamine pentaacetic acid), DTMPA (diethylene triamine-penta-methylene phosphonic acid) and HEDP (1-hydroxyethan-1,1-diylbis(phosphonic acid)). Other strong chelators may be used to determine the calcium sensitivity of an alpha-amylase.
Coding sequence: The term “coding sequence” means a polynucleotide, which directly specifies the amino acid sequence of its polypeptide product. The boundaries of the coding sequence are generally determined by an open reading frame, which usually begins with the ATG start codon or alternative start codons such as GTG and TTG and ends with a stop codon such as TAA, TAG, and TGA. The coding sequence may be a DNA, cDNA, synthetic, or recombinant polynucleotide.
Control sequence: The term “control sequence” means all components necessary for the expression of a polynucleotide encoding a variant of the present invention. Each control sequence may be native or foreign to the polynucleotide encoding the variant or native or foreign to each other. Such control sequences include, but are not limited to, a leader, polyadenylation sequence, propeptide sequence, promoter, signal peptide sequence, and transcription terminator. At a minimum, the control sequences include a promoter, and transcriptional and translational stop signals. The control sequences may be provided with linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the polynucleotide encoding a variant.
Expression: The term “expression” includes any step involved in the production of the polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
Expression vector: The term “expression vector” means a linear or circular DNA molecule that comprises a polynucleotide encoding a polypeptide of the present invention and is operably linked to additional nucleotides that provide for its expression.
Host cell: The term “host cell” means any cell type that is susceptible to transformation, transfection, transduction, and the like with a nucleic acid construct or expression vector comprising a polynucleotide of the present invention. The term “host cell” encompasses any progeny of a parent cell that is not identical to the parent cell due to mutations that occur during replication.
Improved property: The term “improved property” means a characteristic associated with a variant that is improved compared to other alpha-amylases. Such improved properties include, but are not limited to, altered temperature-dependent activity profile, thermostability, pH activity, pH stability, substrate specificity, product specificity, and chemical stability.
Isolated variant: The terms “isolated” and “purified” mean a polypeptide or polynucleotide that is removed from at least one component with which it is naturally associated. For example, a variant may be at least 1% pure, e.g., at least 5% pure, at least 10% pure, at least 20% pure, at least 40% pure, at least 60% pure, at least 80% pure, and at least 90% pure, as determined by SDS-PAGE and a polynucleotide may be at least 1% pure, e.g., at least 5% pure, at least 10% pure, at least 20% pure, at least 40% pure, at least 60% pure, at least 80% pure, at least 90% pure, and at least 95% pure, as determined by agarose electrophoresis.
Mature polypeptide: The term “mature polypeptide” means a polypeptide in its final form following translation and any post-translational modifications, such as N-terminal processing, C-terminal truncation, glycosylation, phosphorylation, etc. It is known in the art that a host cell may produce a mixture of two of more different mature polypeptides (i.e., with a different C-terminal and/or N-terminal amino acid) expressed by the same polynucleotide.
Mature polypeptide coding sequence: The term “mature polypeptide coding sequence” means a nucleotide sequence that encodes a mature polypeptide having alpha-amylase activity.
Nucleic acid construct: The term “nucleic acid construct” means a nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally occurring gene or is modified to contain segments of nucleic acids in a manner that would not otherwise exist in nature or which is synthetic. The term nucleic acid construct is synonymous with the term “expression cassette” when the nucleic acid construct contains the control sequences required for expression of a coding sequence.
Operably linked: The term “operably linked” means a configuration in which a control sequence is placed at an appropriate position relative to the coding sequence of the polynucleotide sequence such that the control sequence directs the expression of the coding sequence of a polypeptide.
Parent: The term “parent” alpha-amylase means an alpha-amylase to which an alteration is made to produce a variant of the present invention. The parent may be a naturally occurring (wild-type) polypeptide, or a variant thereof, prepared by any suitable means. For instance, the parent polypeptide may be a variant of a naturally occurring polyptide which has a modified or altered amino acid sequence. A parent may also be an allelic variant.
Polypeptide fragment: The term “polypeptide fragment” means a polypeptide having one or more (several) amino acids deleted from the amino and/or carboxyl terminus of a mature polypeptide; wherein the fragment has alpha-amylase activity. In one aspect, a fragment contains at least 481 amino acid residues, e.g., at least 483, at least 486, and at least 493 amino acid residues.
Sequence identity: The relatedness between two amino acid sequences or between two nucleotide sequences is described by the parameter “sequence identity”.
For purposes of the present invention, the degree of sequence identity between two amino acid sequences is determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), preferably version 3.0.0 or later. The optional parameters used are gap open penalty of 10, gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix. The output of Needle labeled “longest identity” (obtained using the-nobrief option) is used as the percent identity and is calculated as follows:
(Identical Residues×100)/(Length of Alignment−Total Number of Gaps in Alignment)
For purposes of the present invention, the degree of sequence identity between two deoxyribonucleotide sequences is determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, supra) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, supra), preferably version 3.0.0 or later. The optional parameters used are gap open penalty of 10, gap extension penalty of 0.5, and the EDNAFULL (EMBOSS version of NCBI NUC4.4) substitution matrix. The output of Needle labeled “longest identity” (obtained using the-nobrief option) is used as the percent identity and is calculated as follows:
(Identical Deoxyribonucleotides×100)/(Length of Alignment−Total Number of Gaps in Alignment)
Subsequence: The term “subsequence” means a polynucleotide sequence having one or more (several) nucleotides deleted from the 5′ and/or 3′ end of a mature polypeptide coding sequence; wherein the subsequence encodes a polypeptide fragment having alpha-amylase activity.
Variant: The term “variant” means a polypeptide having alpha-amylase activity comprising an alteration, i.e., a substitution, insertion, and/or deletion, of one or more (several) amino acid residues at one or more (several) positions. A substitution means a replacement of an amino acid occupying a position with a different amino acid; a deletion means removal of an amino acid occupying a position; and an insertion means adding 1-5 amino acids adjacent to and following an amino acid occupying a position.
Wild-Type: The term “wild-type” means an alpha-amylase expressed by a naturally occurring microorganism, such as a bacterial, yeast, or filamentous fungus found in nature.
Conventions for Designation of Variants
For purposes of the present invention, unless otherwise indicated, the hybrid polypeptide disclosed in SEQ ID NO: 27 (which has the sequence of amino acids 1-104 of Bacillus stearothermophilus alpha-amylase (SEQ ID NO: 4), followed by amino acids 103-208 of Bacillus circulans alpha-amylase (SEQ ID NO: 13), followed by amino acids 211-515 of Bacillus stearothermophilus alpha-amylase (SEQ ID NO: 4)) is used to determine the corresponding amino acid residue in another alpha-amylase. The amino acid sequence of another alpha-amylase is aligned with the mature polypeptide disclosed in SEQ ID NO: 27, and based on the alignment, the amino acid position number corresponding to any amino acid residue in the mature polypeptide disclosed in SEQ ID NO: 27 can be determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al., 2000, Trends Genet. 16: 276-277), preferably version 3.0.0 or later.
Identification of the corresponding amino acid residue in another alpha-amylase can be confirmed by an alignment of multiple polypeptide sequences using “ClustalW” (Larkin et al., 2007, Bioinformatics 23: 2947-2948).
When the other enzyme has diverged from the mature polypeptide of SEQ ID NO: 27 such that traditional sequence-based comparison fails to detect their relationship (Lindahl and Elofsson, 2000, J. Mol. Biol. 295: 613-615), other pairwise sequence comparison algorithms can be used. Greater sensitivity in sequence-based searching can be attained using search programs that utilize probabilistic representations of polypeptide families (profiles) to search databases. For example, the PSI-BLAST program generates profiles through an iterative database search process and is capable of detecting remote homologs (Atschul et al., 1997, Nucleic Acids Res. 25: 3389-3402). Even greater sensitivity can be achieved if the family or superfamily for the polypeptide has one or more (several) representatives in the protein structure databases. Programs such as GenTHREADER (Jones, 1999, J. Mol. Biol. 287: 797-815; McGuffin and Jones, 2003, Bioinformatics 19: 874-881) utilize information from a variety of sources (PSI-BLAST, secondary structure prediction, structural alignment profiles, and solvation potentials) as input to a neural network that predicts the structural fold for a query sequence. Similarly, the method of Gough et al., 2000, J. Mol. Biol. 313: 903-919, can be used to align a sequence of unknown structure with the superfamily models present in the SCOP database. These alignments can in turn be used to generate homology models for the polypeptide, and such models can be assessed for accuracy using a variety of tools developed for that purpose.
For proteins of known structure, several tools and resources are available for retrieving and generating structural alignments. For example the SCOP superfamilies of proteins have been structurally aligned, and those alignments are accessible and downloadable. Two or more protein structures can be aligned using a variety of algorithms such as the distance alignment matrix (Holm and Sander, 1998, Proteins 33: 88-96) or combinatorial extension (Shindyalov and Bourne, 1998, Protein Eng. 11: 739-747), and implementations of these algorithms can additionally be utilized to query structure databases with a structure of interest in order to discover possible structural homologs (e.g., Holm and Park, 2000, Bioinformatics 16: 566-567). These structural alignments can be used to predict the structurally and functionally corresponding amino acid residues in proteins within the same structural superfamily. This information, along with information derived from homology modeling and profile searches, can be used to predict which residues to mutate when moving mutations of interest from one protein to a close or remote homolog.
In describing the alpha-amylase variants of the present invention, the nomenclature described below is adapted for ease of reference. In all cases, the accepted IUPAC single letter or triple letter amino acid abbreviation is employed.
For an amino acid substitution, the following nomenclature is used: original amino acid, position, substituted amino acid. Accordingly, the substitution of threonine with alanine at position 226 is designated as “Thr226Ala” or “T226A”. Multiple mutations are separated by addition marks (“+”), e.g., “Gly205Arg+Ser411Phe” or “G205R+S411F”, representing mutations at positions 205 and 411 substituting glycine (G) with arginine (R), and serine (S) with phenylalanine (F), respectively.
For an amino acid deletion, the following nomenclature is used: original amino acid, position, *. Accordingly, the deletion of glycine at position 195 is designated as “Gly195*” or “G195*”. Multiple deletions are separated by addition marks (“+”), e.g., “Gly195*+Ser411*” or “G195*+S411*”.
For an amino acid insertion, the following nomenclature is used: original amino acid, position, original amino acid, new inserted amino acid. Accordingly the insertion of lysine after glycine at position 195 is designated “Gly195GlyLys” or “G195GK”. Multiple insertions of amino acids are designated [Original amino acid, position, original amino acid, new inserted amino acid #1, new inserted amino acid #2; etc.]. For example, the insertion of lysine and alanine after glycine at position 195 is indicated as “Gly195GlyLysAla” or “G195GKA”.
In such cases the inserted amino acid residue(s) are numbered by the addition of lower case letters to the position number of the amino acid residue preceding the inserted amino acid residue(s). In the above example the sequence would thus be:
195 195a 195b