CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a divisional application of U.S. non-provisional application Ser. No. 12/886,016, filed Sep. 20, 2010, which claims priority to U.S. non-provisional application Ser. No. 11/124,367, filed on May 9, 2005, which claims priority to U.S. provisional application Ser. No. 60/599,554, filed on Aug. 9, 2004, U.S. provisional application Ser. No. 60/582,609, filed on Jun. 24, 2004 and U.S. provisional application Ser. No. 60/568,846, filed on May 7, 2004, the contents of each of which are hereby incorporated by reference in their entirety into this application.
FIELD OF THE INVENTION
The present invention is in the field of fibrosis diagnosis and therapy and in particular liver fibrosis diagnosis and therapy, and more particularly, liver fibrosis associated with hepatitis C virus (HCV) infection. More specifically, the present invention relates to specific single nucleotide polymorphisms (SNPs) in the human genome, and their association with liver fibrosis and related pathologies. Based on differences in allele frequencies in the patient population with advanced or bridging fibrosis/cirrhosis relative to individuals with no or minimal fibrosis, the naturally-occurring SNPs disclosed herein can be used as targets for the design of diagnostic reagents and the development of therapeutic agents, as well as for disease association and linkage analysis. In particular, the SNPs of the present invention are useful for identifying an individual who is at an increased or decreased risk of developing liver fibrosis and for early detection of the disease, for providing clinically important information for the prevention and/or treatment of liver fibrosis, and for screening and selecting therapeutic agents. The SNPs disclosed herein are also useful for human identification applications. Methods, assays, kits, and reagents for detecting the presence of these polymorphisms and their encoded products are provided.
BACKGROUND OF THE INVENTION
Fibrosis is a quantitative and qualitative change in the extracellular matrix that surrounds cells as a response to tissue injury. The trauma that generates fibrosis is varied and includes radiological trauma (i.e., x-ray, gamma ray, etc.), chemical trauma (ie., radicals, ethanol, phenols, etc.) viral infection and physical trauma. Fibrosis encompasses pathological conditions in a variety of tissues such as pulmonary fibrosis, retroperitoneal fibrosis, epidural fibrosis, congenital fibrosis, focal fibrosis, muscle fibrosis, massive fibrosis, radiation fibrosis (e.g. radiation induced lung fibrosis), liver fibrosis and cardiac fibrosis.
Liver Fibrosis in HCV-Infected Subjects
HCV affects about 4 million people in the United States and more than 170 million people worldwide. Approximately 85% of the infected individuals develop chronic hepatitis, and up to 20% progress to bridging fibrosis/cirrhosis, which is end-stage severe liver fibrosis and is generally irreversible (Lauer et al. 2001, N Eng J Med 345: 41-52). HCV infection is the major cause of cirrhosis and hepatocellular carcinoma (HCC), and accounts for one third of liver transplantations. The interval between infection and the development of cirrhosis may exceed 30 years but varies widely among individuals. Based on fibrosis progression rate, chronic HCV patients can be roughly divided into three groups (Poynard et al 1997, Lancet 349: 825-832): rapid, median, and slow fibrosers.
Previous studies have indicated that host factors may play a role in the progression of fibrosis, and these include age at infection, duration of infection, alcohol consumption, and gender. However, these host factors account for only 17%-29% of the variability in fibrosis progression (Poynard et al., 1997, Lancet 349: 825-832; Wright et al Gut. 2003, 52(4):574-9). Viral load or viral genotype has not shown significant correlation with fibrosis progression (Poynard et al., 1997, Lancet 349: 825-832). Thus, other factors, such as host genetic factors, are likely to play an important role in determining the rate of fibrosis progression.
Recent studies suggest that some genetic polymorphisms influence the progression of fibrosis in patients with HCV infection (Powell et al. Hepatology 31(4): 828-33, 2000), autoimmune chronic cholestasis (Tanaka et al. J. Infec. Dis. 187:1822-5, 2003), alcohol induced liver diseases (Yamauchi et al., J. Hepatology 23(5):519-23, 1995), and nonalcoholic fatty liver diseases (Bernard et al. Diabetologia 2000, 43(8):995-9). However, none of these genetic polymorphisms have been integrated into clinical practice for various reasons (Bataller et al Hepatology. 2003, 37(3):493-503). For example, limitations in study design, such as small study populations, lack of replication sample sets, and lack of proper control groups have contributed to contradictory results; an example being the conflicting results reported on the role of mutations in the hemochromatosis gene (HFE) on fibrosis progression in HCV-infected patients (Smith et al., Hepatology. 1998, 27(6):1695-9; Thorburn et al., Gut. 2002, 50(2):248-52).
Currently, there is no diagnostic test that can identify patients who are predisposed to developing liver damage from chronic HCV infection, despite the large variability in fibrosis progression rate among HCV patients. Furthermore, diagnosis of fibrosis stage (early, middle or late) and monitoring of fibrosis progression is currently accomplished by liver biopsy, which is invasive, painful, and costly, and generally must be performed multiple times to assess fibrosis status. The discovery of genetic markers which are useful in identifying HCV-infected individuals who are at increased risk for advancing from early stage fibrosis to cirrhosis and/or HCC may lead to, for example, better therapeutic strategies, economic models, and health care policy decisions.
The genomes of all organisms undergo spontaneous mutation in the course of their continuing evolution, generating variant forms of progenitor genetic sequences (Gusella, Ann. Rev. Biochem. 55, 831-854 (1986)). A variant form may confer an evolutionary advantage or disadvantage relative to a progenitor form or may be neutral. In some instances, a variant form confers an evolutionary advantage to the species and is eventually incorporated into the DNA of many or most members of the species and effectively becomes the progenitor form. Additionally, the effects of a variant form may be both beneficial and detrimental, depending on the circumstances. For example, a heterozygous sickle cell mutation confers resistance to malaria, but a homozygous sickle cell mutation is usually lethal. In many cases, both progenitor and variant forms survive and co-exist in a species population. The coexistence of multiple forms of a genetic sequence gives rise to genetic polymorphisms, including SNPs.
Approximately 90% of all polymorphisms in the human genome are SNPs. SNPs are single base positions in DNA at which different alleles, or alternative nucleotides, exist in a population. The SNP position (interchangeably referred to herein as SNP, SNP site, SNP locus, SNP marker, or marker) is usually preceded by and followed by highly conserved sequences of the allele (e.g., sequences that vary in less than 1/100 or 1/1000 members of the populations). An individual may be homozygous or heterozygous for an allele at each SNP position. A SNP can, in some instances, be referred to as a “cSNP” to denote that the nucleotide sequence containing the SNP is an amino acid coding sequence.
A SNP may arise from a substitution of one nucleotide for another at the polymorphic site. Substitutions can be transitions or transversions. A transition is the replacement of one purine nucleotide by another purine nucleotide, or one pyrimidine by another pyrimidine. A transversion is the replacement of a purine by a pyrimidine, or vice versa. A SNP may also be a single base insertion or deletion variant referred to as an “indel” (Weber et al., “Human diallelic insertion/deletion polymorphisms”, Am J Hum Genet 2002 October; 71(4):854-62).
A synonymous codon change, or silent mutation/SNP (terms such as “SNP”, “polymorphism”, “mutation”, “mutant”, “variation”, and “variant” are used herein interchangeably), is one that does not result in a change of amino acid due to the degeneracy of the genetic code. A substitution that changes a codon coding for one amino acid to a codon coding for a different amino acid (i.e., a non-synonymous codon change) is referred to as a missense mutation. A nonsense mutation results in a type of non-synonymous codon change in which a stop codon is formed, thereby leading to premature termination of a polypeptide chain and a truncated protein. A read-through mutation is another type of non-synonymous codon change that causes the destruction of a stop codon, thereby resulting in an extended polypeptide product. While SNPs can be bi-, tri-, or tetra-allelic, the vast majority of the SNPs are bi-allelic, and are thus often referred to as “bi-allelic markers”, or “di-allelic markers”.
As used herein, references to SNPs and SNP genotypes include individual SNPs and/or haplotypes, which are groups of SNPs that are generally inherited together. Haplotypes can have stronger correlations with diseases or other phenotypic effects compared with individual SNPs, and therefore may provide increased diagnostic accuracy in some cases (Stephens et al. Science 293, 489-493, 20 Jul. 2001).
Causative SNPs are those SNPs that produce alterations in gene expression or in the expression, structure, and/or function of a gene product, and therefore are most predictive of a possible clinical phenotype. One such class includes SNPs falling within regions of genes encoding a polypeptide product, i.e. cSNPs. These SNPs may result in an alteration of the amino acid sequence of the polypeptide product (i.e., non-synonymous codon changes) and give rise to the expression of a defective or other variant protein. Furthermore, in the case of nonsense mutations, a SNP may lead to premature termination of a polypeptide product. Such variant products can result in a pathological condition, e.g., genetic disease. Examples of genes in which a SNP within a coding sequence causes a genetic disease include sickle cell anemia and cystic fibrosis.
Causative SNPs do not necessarily have to occur in coding regions; causative SNPs can occur in, for example, any genetic region that can ultimately affect the expression, structure, and/or activity of the protein encoded by a nucleic acid. Such genetic regions include, for example, those involved in transcription, such as SNPs in transcription factor binding domains, SNPs in promoter regions, in areas involved in transcript processing, such as SNPs at intron-exon boundaries that may cause defective splicing, or SNPs in mRNA processing signal sequences such as polyadenylation signal regions. Some SNPs that are not causative SNPs nevertheless are in close association with, and therefore segregate with, a disease-causing sequence In this situation, the presence of a SNP correlates with the presence of, or predisposition to, or an increased risk in developing the disease. These SNPs, although not causative, are nonetheless also useful for diagnostics, disease predisposition screening, and other uses.
An association study of a SNP and a specific disorder involves determining the presence or frequency of the SNP allele in biological samples from individuals with the disorder of interest, such as liver fibrosis and related pathologies and comparing the information to that of controls (i.e., individuals who do not have the disorder; controls may be also referred to as “healthy” or “normal” individuals) who are preferably of similar age and race. The appropriate selection of patients and controls is important to the success of SNP association studies. Therefore, a pool of individuals with well-characterized phenotypes is extremely desirable.
A SNP may be screened in diseased tissue samples or any biological sample obtained from a diseased individual, and compared to control samples, and selected for its increased (or decreased) occurrence in a specific pathological condition, such as pathologies related to liver fibrosis, increased or decreased risk of developing bridging fibrosis/cirrhosis, and progression of liver fibrosis. Once a statistically significant association is established between one or more SNP(s) and a pathological condition (or other phenotype) of interest, then the region around the SNP can optionally be thoroughly screened to identify the causative genetic locus/sequence(s) (e.g., causative SNP/mutation, gene, regulatory region, etc.) that influences the pathological condition or phenotype. Association studies may be conducted within the general population and are not limited to studies performed on related individuals in affected families (linkage studies).
Clinical trials have shown that patient response to treatment with pharmaceuticals is often heterogeneous. There is a continuing need to improve pharmaceutical agent design and therapy. In that regard, SNPs can be used to identify patients most suited to therapy with particular pharmaceutical agents (this is often termed “pharmacogenomics”). Similarly, SNPs can be used to exclude patients from certain treatment due to the patient's increased likelihood of developing toxic side effects or their likelihood of not responding to the treatment. Pharmacogenomics can also be used in pharmaceutical research to assist the drug development and selection process. (Linder et al. (1997), Clinical Chemistry, 43, 254; Marshall (1997), Nature Biotechnology, 15, 1249; International Patent Application WO 97/40462, Spectra Biomedical; and Schafer et al. (1998), Nature Biotechnology, 16: 3).
SUMMARY OF THE INVENTION
The present invention relates to the identification of novel SNPs, unique combinations of such SNPs, and haplotypes of SNPs that are associated with liver fibrosis and in particular the increased or decreased risk of developing bridging fibrosis/cirrhosis, and the rate of progression of liver fibrosis. The polymorphisms disclosed herein are directly useful as targets for the design of diagnostic reagents and the development of therapeutic agents for use in the diagnosis and treatment of liver fibrosis and related pathologies.
Based on the identification of SNPs associated with liver fibrosis, the present invention also provides methods of detecting these variants as well as the design and preparation of detection reagents needed to accomplish this task. The invention specifically provides, for example, novel SNPs in genetic sequences involved in liver fibrosis and related pathologies, isolated nucleic acid molecules (including, for example, DNA and RNA molecules) containing these SNPs, variant proteins encoded by nucleic acid molecules containing such SNPs, antibodies to the encoded variant proteins, computer-based and data storage systems containing the novel SNP information, methods of detecting these SNPs in a test sample, methods of identifying individuals who have an altered (i.e., increased or decreased) risk of developing liver fibrosis based on the presence or absence of one or more particular nucleotides (alleles) at one or more SNP sites disclosed herein or the detection of one or more encoded variant products (e.g., variant mRNA transcripts or variant proteins), methods of identifying individuals who are more or less likely to respond to a treatment (or more or less likely to experience undesirable side effects from a treatment, etc.), methods of screening for compounds useful in the treatment of a disorder associated with a variant gene/protein, compounds identified by these methods, methods of treating disorders mediated by a variant gene/protein, methods of using the novel SNPs of the present invention for human identification, etc.
In Tables 1-2, the present invention provides gene information, transcript sequences (SEQ ID NOS:1-261), encoded amino acid sequences (SEQ ID NOS:262-522), genomic sequences (SEQ ID NOS:4999-5321), transcript-based context sequences (SEQ ID NOS:523-4998) and genomic-based context sequences (SEQ ID NOS:5322-34256) that contain the SNPs of the present invention, and extensive SNP information that includes observed alleles, allele frequencies, populations/ethnic groups in which alleles have been observed, information about the type of SNP and corresponding functional effect, and, for cSNPs, information about the encoded polypeptide product. The transcript sequences (SEQ ID NOS:1-261), amino acid sequences (SEQ ID NOS:262-522), genomic sequences (SEQ ID NOS:4999-5321), transcript-based SNP context sequences (SEQ ID NOS: 523-4998), and genomic-based SNP context sequences (SEQ ID NOS:5322-34256) are also provided in the Sequence Listing.
In a specific embodiment of the present invention, SNPs that occur naturally in the human genome are provided as isolated nucleic acid molecules. These SNPs are associated with liver fibrosis and related pathologies. In particular the SNPs are associated with either an increased or decreased risk of developing bridging fibrosis/cirrhosis and affect the rate of progression of liver fibrosis. As such, they can have a variety of uses in the diagnosis and/or treatment of liver fibrosis and related pathologies. One aspect of the present invention relates to an isolated nucleic acid molecule comprising a nucleotide sequence in which at least one nucleotide is a SNP disclosed in Tables 3 and/or 4. In an alternative embodiment, a nucleic acid of the invention is an amplified polynucleotide, which is produced by amplification of a SNP-containing nucleic acid template. In another embodiment, the invention provides for a variant protein that is encoded by a nucleic acid molecule containing a SNP disclosed herein.
In yet another embodiment of the invention, a reagent for detecting a SNP in the context of its naturally-occurring flanking nucleotide sequences (which can be, e.g., either DNA or mRNA) is provided. In particular, such a reagent may be in the form of, for example, a hybridization probe or an amplification primer that is useful in the specific detection of a SNP of interest. In an alternative embodiment, a protein detection reagent is used to detect a variant protein that is encoded by a nucleic acid molecule containing a SNP disclosed herein. A preferred embodiment of a protein detection reagent is an antibody or an antigen-reactive antibody fragment.
Various embodiments of the invention also provide kits comprising SNP detection reagents, and methods for detecting the SNPs disclosed herein by employing detection reagents. In a specific embodiment, the present invention provides for a method of identifying an individual having an increased or decreased risk of developing liver fibrosis by detecting the presence or absence of one or more SNP alleles disclosed herein. In another embodiment, a method for diagnosis of liver fibrosis and related pathologies by detecting the presence or absence of one or more SNP alleles disclosed herein is provided.
The nucleic acid molecules of the invention can be inserted in an expression vector, such as to produce a variant protein in a host cell. Thus, the present invention also provides for a vector comprising a SNP-containing nucleic acid molecule, genetically-engineered host cells containing the vector, and methods for expressing a recombinant variant protein using such host cells. In another specific embodiment, the host cells, SNP-containing nucleic acid molecules, and/or variant proteins can be used as targets in a method for screening and identifying therapeutic agents or pharmaceutical compounds useful in the treatment of liver fibrosis and related pathologies.
An aspect of this invention is a method for treating liver fibrosis in a human subject wherein said human subject harbors a SNP, gene, transcript, and/or encoded protein identified in Tables 1-2, which method comprises administering to said human subject a therapeutically or prophylactically effective amount of one or more agents counteracting the effects of the disease, such as by inhibiting (or stimulating) the activity of the gene, transcript, and/or encoded protein identified in Tables 1-2.
Another aspect of this invention is a method for identifying an agent useful in therapeutically or prophylactically treating liver fibrosis and related pathologies in a human subject wherein said human subject harbors a SNP, gene, transcript, and/or encoded protein identified in Tables 1-2, which method comprises contacting the gene, transcript, or encoded protein with a candidate agent under conditions suitable to allow formation of a binding complex between the gene, transcript, or encoded protein and the candidate agent and detecting the formation of the binding complex, wherein the presence of the complex identifies said agent.
Another aspect of this invention is a method for treating liver fibrosis and related pathologies in a human subject, which method comprises:
(i) determining that said human subject harbors a SNP, gene, transcript, and/or encoded protein identified in Tables 1-2, and
(ii) administering to said subject a therapeutically or prophylactically effective amount of one or more agents counteracting the effects of the disease.
Many other uses and advantages of the present invention will be apparent to those skilled in the art upon review of the detailed description of the preferred embodiments herein. Solely for clarity of discussion, the invention is described in the sections below by way of non-limiting examples.
Description of the Files Contained on the CD-R Named CL001519CDR
The CD-R named CL001519CDR contains the following five text (ASCII) files:
1) File SEQLIST—1519DIV2.txt provides the Sequence Listing. The Sequence Listing provides the transcript sequences (SEQ ID NOS:1-261) and protein sequences (SEQ ID NOS:262-522) as shown in Table 1, and genomic sequences (SEQ ID NOS:4999-5321) as shown in Table 2, for each liver fibrosis-associated gene that contains one or more SNPs of the present invention. Also provided in the Sequence Listing are context sequences flanking each SNP, including both transcript-based context sequences as shown in Table 1 (SEQ ID NOS:523-4998) and genomic-based context sequences as shown in Table 2 (SEQ ID NOS:5322-34256). The context sequences generally provide 100 bp upstream (5′) and 100 bp downstream (3′) of each SNP, with the SNP in the middle of the context sequence, for a total of 200 bp of context sequence surrounding each SNP.
2) File TABLE1—1519DIV.txt provides Table 1. File TABLE1—1519DIV.txt is 268 KB in size, and was created on Jul. 21, 2010.
3) File TABLE2—1519DIV.txt provides Table 2. File TABLE2—1519DIV.txt is 249 KB in size, and was created on Jul. 21, 2010.
4) File TABLE3—1519.txt provides Table 3. File TABLE3—1519.txt is 21 KB in size, and was created on May 5, 2005.
5) File TABLE4—1519.txt provides Table 4. File TABLE4—1519.txt is 28 KB in size, and was created on May 5, 2005.
The material contained on the CD-R labeled CL001519CDR is hereby incorporated by reference pursuant to 37 CFR 1.77(b)(4).
The patent application contains a lengthy table section. A copy of the table is available in electronic form from the USPTO web site (). An electronic copy of the table will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3).
Description of Table 1 and Table 2
Table 1 and Table 2 (both provided on the CD-R) disclose the SNP and associated gene/transcript/protein information of the present invention. For each gene, Table 1 and Table 2 each provide a header containing gene/transcript/protein information, followed by a transcript and protein sequence (in Table 1) or genomic sequence (in Table 2), and then SNP information regarding each SNP found in that gene/transcript.
NOTE: SNPs may be included in both Table 1 and Table 2; Table 1 presents the SNPs relative to their transcript sequences and encoded protein sequences, whereas Table 2 presents the SNPs relative to their genomic sequences (in some instances Table 2 may also include, after the last gene sequence, genomic sequences of one or more intergenic regions, as well as SNP context sequences and other SNP information for any SNPs that lie within these intergenic regions). SNPs can readily be cross-referenced between Tables based on their hCV (or, in some instances, hDV) identification numbers.
The gene/transcript/protein information includes:
a gene number (1 through n, where n=the total number of genes in the Table)
a Celera hCG and UID internal identification numbers for the gene
a Celera hCT and UID internal identification numbers for the transcript (Table 1 only)
a public Genbank accession number (e.g., RefSeq NM number) for the transcript (Table 1 only)
a Celera hCP and UID internal identification numbers for the protein encoded by the hCT transcript (Table 1 only)
a public Genbank accession number (e.g., RefSeq NP number) for the protein (Table 1 only)
an art-known gene symbol
an art-known gene/protein name
Celera genomic axis position (indicating start nucleotide position-stop nucleotide position)
the chromosome number of the chromosome on which the gene is located
an OMIM (Online Mendelian Inheritance in Man; Johns Hopkins University/NCBI) public reference number for obtaining further information regarding the medical significance of each gene
alternative gene/protein name(s) and/or symbol(s) in the OMIM entry
NOTE: Due to the presence of alternative splice forms, multiple transcript/protein entries can be provided for a single gene entry in Table 1; i.e., for a single Gene Number, multiple entries may be provided in series that differ in their transcript/protein information and sequences.
Following the gene/transcript/protein information is a transcript sequence and protein sequence (in Table 1), or a genomic sequence (in Table 2), for each gene, as follows:
transcript sequence (Table 1 only) (corresponding to SEQ ID NOS:1-261 of the Sequence Listing), with SNPs identified by their IUB codes (transcript sequences can include 5′ UTR, protein coding, and 3′ UTR regions). (NOTE: If there are differences between the nucleotide sequence of the hCT transcript and the corresponding public transcript sequence identified by the Genbank accession number, the hCT transcript sequence (and encoded protein) is provided, unless the public sequence is a RefSeq transcript sequence identified by an NM number, in which case the RefSeq NM transcript sequence (and encoded protein) is provided. However, whether the hCT transcript or RefSeq NM transcript is used as the transcript sequence, the disclosed SNPs are represented by their IUB codes within the transcript.)
the encoded protein sequence (Table 1 only) (corresponding to SEQ ID NOS:262-522 of the Sequence Listing)
the genomic sequence of the gene (Table 2 only), including 6 kb on each side of the gene boundaries (i.e., 6 kb on the 5′ side of the gene plus 6 kb on the 3′ side of the gene) (corresponding to SEQ ID NOS:4999-5321 of the Sequence Listing).