The present invention relates to a novel form of core+1 protein of Hepatitis C virus (HCV), designated shorter form core+1 protein. The invention also provides methods for detecting infection by Hepatitis C virus in biological samples, methods of screening compounds which interact with viral propagation in HCV infected cells and advantageously decrease inhibit or prevent viral propagation or screening of compounds impaction on the expression of shorter form core+1 protein and uses of these compounds for the preparation of compositions useful for their anti-viral activities. The invention also proposes to use the shorter form core+1 protein of the invention to derive immunogenic compositions for protection against HCV infection or against its consequences.
Hepatitis C is a viral infection of the liver which has also been referred to as “non A, non B hepatitis” (NANBH) until identification of the causative agent. Hepatitis C virus is one of the viruses (A, B, C, D and E), which together account for the majority of cases of viral hepatitis. Hepatitis C virus was first identified in 1989 (Choo et al. 1989) and defined as a common cause of liver disease with an estimated 170-million infected people worldwide. Hepatitis C virus (HCV) infection affects the liver, which causes hepatitis, i.e., an inflammation of the liver. 75 to 85% of persons infected with HCV progress to chronic infection, approximately 20% of these cases develop complications of chronic hepatitis C, including cirrhosis of the liver or hepatocellular carcinoma after 20 years of infection (Di Bisceglie 2000). The current recommended treatment for HCV infections is a combination of interferon and ribavirin drugs, however the treatment is not effective in all cases and the liver transplantation is indicated in hepatitis C-related end-stage liver disease. At present, there is no vaccine available to prevent HCV infection, therefore all precautions to avoid infection must be taken.
HCV is a (+) sense single-stranded enveloped RNA virus in the Hepacivirus genus within the Flaviviridae family. The viral genome is approximately 10 kb in length and encodes a 3011 amino acid polyprotein precursor. The HCV genome has a large single open reading frame (ORF) coding for a unique polyprotein, said polyprotein being co- and post-translationally processed by cellular and viral proteases into three structural protein, i.e., core, E1 and E2 and at least six non-structural NS2, NS3, NS4A, NS4B, NS5A and NS5B proteins (Houghton 1996 and Reed et al. 2000).
Initiation of translation of the HCV genome is controlled by an internal ribosome entry site (IRES) located mainly within the 5′-non coding region of the viral RNA, between nucleotides 42 and 341 or 356, the 3′ limit being controversial. The core protein, which forms the viral nucleocapsid, is predicted to be 191 amino acids in length and to have a molecular mass of 23 kDa (p23). Further processing of p23 produces the mature core protein (p21), consisting of between 173-182 amino acids. It has been previously reported that a protein having a molecular weight of about 17 kDa is also expressed from the core protein-coding sequence of some HCV isolates both in vitro and in vivo, e.g. in E. coli cells. This additional HCV polypeptide of 16/17 kDa (p16/p17), consisting of maximum 160 amino acids, is encoded by the open reading frame that overlaps the core gene in the +1 frame (core+1 ORF) and is syntheTized in vitro as a result of a +1 ribosomal frameshift for translation.
This 16/17 kDa polypeptide is named ARFP for Alternative Reading Frame Protein or F for Frameshift protein or core+1 according to the location of this novel protein. The ARFP/F/core+1 protein is synthetized in vitro from the initiator codon of the polyprotein sequence followed by a +1 ribosomal frameshift operating in the region of core codons 8-14 (Xu et al. 2001, Varaklioti et al. 2002).
More recently, the expression of the core+1 protein coding sequence has been assayed in mammalian cells, i.e. in vivo, in order to investigate the biological importance of the core+1 protein. It has been shown that expression of the core+1 ORF of HCV-1 and of HCV-1a (H) in rabbit reticulocyte lysates (in vitro) can be obtained respectively for HCV-1 isolate whereas it is not detected for HCV-1a(H) isolate (Varaklioti et al. 2002). Indeed, the core+1 protein has been synthesized in vitro when expressing core +1 ORF from HCV-1 but has not been detected when expressing core+1 ORF from HCV-1a (H). It is reminded that HCV-1 and HCV-1a(H) isolates of HCV, although belonging to the same genotype, have different sequences at the frameshift site located in codons 8-14 of HCV-1. The difference especially consists in the lack of the 10-A nucleotide residues in the HCV-1a(H) sequence at the putative frameshift site. In order to provide some data on expression mechanisms of core+1 protein the inventors have studied said expression in vivo.
The results disclosed in the present invention indicate that, unlike to the in vitro expression studies, both HCV-1 and HCV-1a (H) core coding sequences efficiently allow expression of the core+1 ORF in transfected mammalian cells. The transfection and expression experiments carried out in mammalian cells have also enabled the present inventors to identify that in vivo expression of core+1 ORF is associated with synthesis of a new protein which expression follows a new alternative translation initiation mechanism of core+1 ORF when compared to the mechanism identified for the in vitro expression of core+1 protein. Said alternative mechanism directs the synthesis of a shorter form of core+1 protein, in vivo.
Particular species of HCV-1 and HCV-1a (H) have been disclosed, respectively, in Genebank under references No. M62321 and No. M67463.
Viruses, which are subject to genome size constraints have developed different strategies to expand their coding capacity, such as ribosomal frameshifting or internal translational initiation. The ribosomal frameshifting consists in avoiding a termination codon, which would otherwise have been encountered by the ribosome, and instead creates a protein with extra amino acid sequences at its C terminal end. Therefore, in ribosomal frameshifting a directed change of translational reading frame allows the synthesis of a single protein from two or more overlapping genes. The internal translational initiation consists in escaping from an upstream initiator codon according to different mechanisms including leaky-scanning and ribosome shunting and internal ribosome entry site. Such a mechanism is apparently used for in vivo expression of shorter form core+1 protein.
The invention thus provides a new protein of HCV life cycle, which is designated shorter form core+1 protein and which can be obtained by in vivo expression of the core+1 coding sequence or ORF, especially in mammalian cells.
The invention also relates to nucleic acid sequences encoding said shorter form core+1 protein.
The invention also provides methods for detecting in a biological sample of an individual the presence or absence of the shorter form core+1 protein giving evidence of Hepatitis C virus infection.
The invention also provides use of the shorter form core+1 protein of the invention in an immunogenic composition. An immunogenic composition of the invention may advantageously be prepared in order to elicit a CTL response against HCV infection, in a patient.
The shorter form core+1 HCV protein may also be involved in the preparation of therapeutic composition aiming at interacting with the consequences of HCV infection, especially when persistent infection appears.
The invention also provides means for screening compounds, especially compounds having antiviral activity, as a result of interaction with in vivo expression of the core+1 ORF directing translation of shorter form core+1 protein. Among the several advantages of the present methods, it should be noted that these screening methods are appropriate for routine high throughput screening of compounds capable of interacting with viral propagation and control of life cycle of the virus especially capable of inhibiting or preventing viral propagation.
Moreover, the invention also provides for the use of the compounds capable of interacting with viral propagation and control of life cycle of the virus, especially compounds capable of inhibiting or preventing viral propagation, advantageously as a result of their capacity to interact with expression of shorter form core+1 protein in HCV infected cells, which compounds would be useful for the preparation of a drug for the treatment of disorders induced by or associated with infection of Hepatitis C virus.
A first object of the invention is thus a shorter form core+1 protein of HCV which is the product of translation of a coding sequence consisting of all or part of nucleotide sequence extending from nucleotide 598 to nucleotide 920 within the core +1 ORF of HCV represented on FIG. 3B.
In a particular embodiment, the shorter form core+1 protein which is encoded by a nucleotide sequence having a translation initiation codon (ATG) at position 598 or by a nucleotide sequence having an ATG at position 606 of the HCV core+1 coding sequence.
In a particular embodiment, the shorter form core+1 protein is encoded by:
(i) a nucleotide sequence extending from nucleotide 598 to nucleotide 826 of the sequence represented on FIG. 3B; or
(ii) a nucleotide sequence extending from nucleotide 598 to nucleotide 897 of the sequence represented on FIG. 3B; or
(iii) a nucleotide sequence extending from nucleotide 606 to nucleotide 826 of the sequence represented on FIG. 3B; or
(iv) a nucleotide sequence extending from nucleotide 606 to nucleotide 897 of the sequence represented on FIG. 3B; or
(v) a nucleotide sequence extending from nucleotide 606 to nucleotide 920 of the sequence represented on FIG. 3B.
As used herein, the expression “shorter form core+1 protein”, or “in vivo core+1 protein” refer to the Hepatitis C virus proteins obtainable in vivo, in cells infected with HCV, or in cells transfected with a DNA construct comprising core coding sequence or core+1 ORF. A predominant shorter form of core+1 is especially produced in vivo which is smaller than the 16/17 kDa core+1 in vitro synthesized product, as it is predicted to have a calculated molecular weight of less than 10 kDa. Furthermore, the shorter form core+1 protein does not contain the first 10 consecutive A residues of the core protein. These A residues are located codons 8-11 (nucleotides 364-373) of the HCV-1 genome and have a great importance on the expression of the core+1 ORF. This specific difference of molecular weight explains the term “shorter form core+1 protein”.
As used herein, the expression “core +1 ORF” refers to the nucleotide sequence such as represented FIG. 3B of the present application which is comprised within the “core coding sequence” of HCV. Said core +1 ORF, begins at nucleotide 342 with translation initiation codon and extends up to nucleotide at position 920 (U.S. Ser. No. 09/644,987) in the sequence illustrated on FIG. 3B.
It is pointed out that shorter form core+1 protein is encoded by core+1 ORF or by core coding sequence, when said nucleotide sequences are expressed in vivo.
The invention relates further to a shorter form core+1 protein of HCV which is obtainable in vivo by expression of the core+1 open reading frame (ORF) which is contained in nucleotide sequence extending from nucleotide at position 342 to nucleotide at position 920, preferably to nucleotide at position 826 of the nucleotide sequence represented on FIG. 3B and which calculated molecular weight is less than 10 kDa.
It is emphasized that shorter form core+1 protein is obtainable in vivo independently of the expression of the HCV polyprotein and also independently of the expression of core+1 protein. Said expression in vivo uses the same frame as the one used for core+1 expression in the core+1 ORF but does not involve the frameshift transfection mechanism required for core+1 in vitro expression.
In an other embodiment, the shorter form core+1 protein is the expression product of the core+1 ORF in mammalian cells.
In a preferred embodiment, the shorter form core +1 protein is recognized by a serum of patients infected with HCV. In the same way circulating anti-core+1 antibodies have been detected in HCV-infected individuals, suggesting that this protein is produced during natural HCV infection.
In a preferred embodiment, the shorter form core+1 protein comprises the amino acid sequence extending from amino acid residue corresponding to nucleotide 598 to amino acid residue corresponding to nucleotide 826, or to nucleotide 897 or to nucleotide 920 of the sequence represented on FIG. 3B. In another preferred embodiment, the shorter form core+1 protein comprises the amino acid sequence extending from amino acid residue corresponding to nucleotide 606 to amino acid residue corresponding to nucleotide 826, or to nucleotide 897 or to nucleotide 920 of the sequence represented on FIG. 3B.
The start and/or stop codons disclosed for shorter core+1 protein may vary depending on the HCV isolate considered. The above positions of start and stop codons are given with respect to the amino-acid sequence of FIG. 3. Although shorter form core+1 protein ending with codon corresponding to nucleotide 826 can be regarded as a preferred form of said protein, the above given longer sequences may be encoded simultaneously or alternatively.
The invention further concerns peptides contained within the shorter form core+1 protein, especially peptide useful as epitopes. For the purposes of the present invention, the term “epitope” when referring to a peptide is to be considered as an antigenic determinant or the immunologically active region of said peptide. It is the portion of said immunogenic peptide which is bound specifically by antibody or TCR. Said epitope on a peptide antigen may involve elements of the primary, secondary, tertiary, an even quaternary structure of the peptide and contains at least three residues. The present invention provides a particular peptide of interest, useful as epitope, and having the following sequence:
Such peptide of interest comprises amino-acid sequence extending from amino-acid residue corresponding to nucleotide 749 to amino-acid residue corresponding to nucleotide 793, or to nucleotide 796 in the sequence of FIG. 3B.
Variants of this peptide, such as those obtained by deletions, additions or substitutions of amino acids in the peptide, are also encompassed by the present invention and can be obtained by methods known in the art, as long as these variants can elicit antibodies or can immunologically react with antibodies directed against the above sequence.
Examples of variants of this peptide of interest encompassed by the present invention can be illustrated as follows and according to FIG. 8:
COOH-T-X-R-S-S-A-P-L-L-E-A-L-P-G-P-NH2 where X=F or S;
COOH-T-Y-X-S-S-A-P-L-L-E-A-L-P-G-P-NH2 where X=L, P or R;
COOH-T-Y-R-S-X-A-P-L-L-E-A-L-P-G-P-NH2 where X=L;
COOH-T-Y-R-S-S-X-P-L-L-E-A-L-P-G-P-NH2 where X=V;
COOH-T-Y-R-S-S-A-P-X-L-E-A-L-P-G-P-NH2 where X=P or R;
COOH-T-Y-R-S-S-A-P-L-X-E-A-L-P-G-P-NH2 where X=S or W;
COOH-T-Y-R-S-S-A-P-L-L-X-A-L-P-G-P-NH2 where X=G, V, A, or E;
COOH-T-Y-R-S-S-A-P-L-L-E-A-X-P-G-P-NH2 where X=S;
cooH-T-Y-R-S-S-A-P-L-L-E-A-L-X-G-P-NH2 where X=Q;
cooH-T-Y-R-S-S-A-P-L-L-E-A-L-P-X-P-NH2 where X=E;
COOH-T-Y-R-S-S-A-P-L-L-E-A-L-P-G-X-NH2 where X=L or H;
COOH-T-Y-R-S-S-A-P-L-L-E-A-L-P-G-P-X-NH2 where X=C, W or S.
Such peptides are interesting especially for the preparation of antibodies, either polyclonal or monoclonal.
The translation initiation codon of shorter form core+1 protein may vary depending on HCV isolate. Some isolates contain two ATG which both may be used for synthesis of shorter form core+1 protein. Other isolates contain only one ATG for said protein.
Various shorter form core+1 proteins are for example derivable from the proteins alignment of the sequence of FIG. 3B, with the amino-acid sequences disclosed in FIG. 8, which correspond to the proteins expressed by variants.
The invention also concerns a mosaic of proteins encoded by the above defined core coding sequence of HCV. Such a mosaic contains at least two proteins selected among core protein, core+1 protein, shorter form core+1 protein or their derivatives, including derivatives encoded by said sequence and involving further frameshift mechanism in the 3′ terminal part of the core coding sequence.
These compositions of proteins can comprise proteins of the same isolates or from different HCV isolates.
The invention also relates to a nucleotide sequence consisting in a fragment of the nucleotide sequence extending from nucleotide 342 to nucleotide 920 represented on FIG. 3B, which fragment is capable of encoding a shorter form core+1 protein of HCV when transfected in mammalian cells under expression conditions.
More specifically, it is shown that the nucleotide sequence encoding a shorter form core+1 protein comprises a nucleotide sequence extending from nucleotide 598 or from nucleotide 606 to nucleotide 826 within the core+1 coding sequence of FIG. 3B.
In a specific embodiment, the nucleotide sequence encoding a shorter form core+1 protein is chosen among:
(i) a nucleotide sequence extending from nucleotide 606 to nucleotide 826 of the sequence represented on FIG. 3B;
(ii) a nucleotide sequence extending from nucleotide 606 to nucleotide 897 of the sequence represented on FIG. 3B;
(iii) a nucleotide sequence extending from nucleotide 606 to nucleotide 920 of the sequence represented on FIG. 3B;
(iv) a nucleotide sequence extending from nucleotide 598 to nucleotide 826 of the sequence represented on FIG. 3B;
(v) a nucleotide sequence extending from nucleotide 598 to nucleotide 897 of the sequence represented on FIG. 3B;
(vi) a nucleotide sequence extending from nucleotide 598 to nucleotide 920 of the sequence represented on FIG. 3B;
(vii) a fragment of sequence (i), (ii), (iii), (iv), (v), or (vi) which is capable of encoding a shorter form core+1 protein as defined above, in mammalian cells or an epitope thereof.
The invention also provides variant nucleotide sequences derived from different isolates, which encode the shorter form core+1 proteins illustrated on FIG. 8.
The invention thus provides a nucleotide sequence comprising a Hepatitis C virus core protein coding sequence which is derived from the nucleotide sequence represented on FIG. 3B as a result of one or several mutation selected among the following: