gnomAD annotated search
Wed, 13 Nov 2019 21:50:00 +0000
We've introduced an exciting new way to browse and filter variants using annotations provided by gnomAD for exomes. The existing search, which is based on the pipeline provided annotations, is unchanged and can still be used in the usual way. This format of the search looks at the same data, but viewed in a different way. There are several differences between 2 search mechanisms. The primary difference is that the pipeline annotated search is never updated and represents the annotations that were current at the time they were analyzed. On the other hand, the gnomAD annotated search is updated quarterly with every new release of gnomAD across all previously analyzed patients. By using the latest frequencies and gene names, we can go back and review cases that were previously unsolved. There are several other differences that should be noted: The pipeline annotated search displays ONE variant linked to ONE transcript. With the gnomAD annotated search, we are displaying transcripts (not variants), which means for one variant we can have MULTIPLE transcripts. The significance of this is that in the search results, we may see multiple rows for the same variant, represented by different transcripts. The other difference, are the tabbed search results. The pipeline annotated search currently shows a single tab for all search results. With gnomAD annotated search, we are splitting the results into 2 tabs. The first tab displays variants that have a link to a gnomAD record. The second tabs lists variants that have no link to gnomAD. By default, the gnomAD annotated search assumes the assembly build is GRCh38. If users need to go back and look at previously analyzed patients, they should remember to switch the assembly to GRCh37. Some new filters were introduced including: LoF (loss of function), Homozygous count, Canonical, and gnomAD consequence. It should be noted that structural variants are not supported in the gnomAD annotated search. To search SVs, please continue to use the pipeline annotated search. The new gnomAD search is not meant to replace the pipeline search, but to complement it with fresh new information for the variants gnomAD has coverage for which has far lower coverage than the pipeline. It is highly recommended to use both. For a demonstration video please follow this link: https://youtu.be/iGRiP8hmet8
Exomiser analysis available
Wed, 04 Sep 2019 04:41:00 +0000
What is Exomiser? Taken from the website: "The Exomiser is a tool that finds potential disease-causing variants from whole-exome/genome data. Starting from a VCF file and a set of phenotypes encoded using the Human Phenotype Ontology (HPO) it will annotate, filter and prioritise likely causative variants. The program does this based on user-defined criteria such as a variant's predicted pathogenicity, frequency of occurrence in a population and also how closely the given phenotype matches the known phenotype of diseased genes from human and model organism data. " Our tests indicate that Exomiser is able to rank the correct gene, In the top spot #1, 52% of the time. In the top 5, 78% of the time. In the top 10, 91% of the time. How to use? Download the attachment, Unzip/decompress the attachment, Open the file ending with HTML or TSV, The genes are listed from highest ranking downwards. Currently we use some default settings in exomiser, but it can be re-analyzed using some more specific criteria to produce better results. What's important is that the 'clinical diagnosis' is captured in our Patient database so that Exomiser can do the phenotype+genotype analysis. Exomiser results are available for download through the web-portal in the 'datasets' section for newly imported results provided that the clinical diagnosis is available.
GRCh38 assembly now supported
Thu, 18 Jul 2019 00:27:00 +0000
We have finally made the switch to the latest GRCh38 assembly for bioinformatics analysis. The search now includes the option of choosing assembly versions GRCh38 and the older GRCh37. This means users can still browse our database for both GRCh37 and GRCh38 assemblies. We have done some comparisons between the 2 assemblies for some previously analyzed samples and have noticed differences in pathogenicity scores and inclusion/exlusion of variants.
Data storage and compression using CRAM
Thu, 22 Nov 2018 21:43:00 +0000
As our database expands in scale, managing and storing large genomic sequence data has become a challenge, particularly with the large BAM files. As we head towards cheaper sequencing costs we are anticipating a tsunami of data as researchers switch from exome to whole genome sequencing. In preparation, we've taken early steps of further compressing our BAMs using CRAM compression (lossless). Our database system manages its disk space autonomously such that if our allocated disk space reaches a threshold of 80%, it will automatically convert the oldest BAMs to CRAMs and archive them to tape storage, making way for newer datasets. Our testing has shown that the CRAM compression format saves roughly 30% and will provide significant costs savings. Users that wish to have access to the archived BAMs can click a button from our web interface and the system will automatically restore the CRAM from tape and convert them back to BAM.
Search filter: Clinical diagnosis and provisional variants
Fri, 12 Oct 2018 03:12:00 +0000
We've added 2 new search filters for Clinical diagnosis and Provisional variants. By combining these 2 filters together, we can look for all patients that fall under the same disease category and look for provisional variants they may have in common. This is useful in a research context where we may be able to find a pattern of variants that may be influential in disease pathogenesis. Both the Clinical diagnosis and Provisional variant information is pulled in from our related Patient Database. Therefore, for these filters to be useful, they must first be specified in the Patient database prior to use in the search.
Predictive relatedness, sex and ancestry reports
Fri, 31 Aug 2018 04:23:00 +0000
We've recently added a new step in our pipeline to use the genetic data to predict the degree of relatedness, sex and ancestry. This is particularly useful as a quality check to spot potential sample mix ups, poor DNA quality, contamination, errors in patient details provided etc. In the event of a possible error, users are automatically notified with the reports attached in the email for further investigation. We are currently running the reports retrospectively for all of our previous data sets and have already found a some data entry errors. In such cases, we may want to rerun the pipeline analysis as such errors can affect the variant prioritization. These reports are also available as downloads in the 'datasets' section.
Prediction filtering can be separated by logical OR/AND
Fri, 31 Aug 2018 04:14:00 +0000
Previously when combining filters on predictions and scores, the search automatically separated each filter on a conditional AND by default. The change we've made recently, is to allow users to specify the logical operator (AND/OR) between the prediction and scores filters such that you can query the database by saying give me all the variants that have polyphen prediction 'probably damaging' OR clinvar prediction 'pathogenic' in a single query. Previously, if you had to do this, you would run separate searches for each polyphen and clinvar.
Search profiles - New filters
Mon, 23 Jul 2018 03:24:00 +0000
Previously only gene lists were supported in the Search Profile feature. Recently, we've added support for storing the lists of mutations types (exon, splice, missense, nonsense), ExAC frequency, Gnomad frequency.
Improved structural variation prioritisation
Tue, 26 Jun 2018 04:59:00 +0000
Matt Field has made some significant improvements to the the prioritisation of structural variants (SV) and we've updated our database to reflect those changes which include combined report for both SV callers, prioritise SVs where exons are most likely to be impacted, max length filter applied to most SV types, and whether event is novel/known. These changes dramatically reduced the number of high priority SVs from >3000 to around 90 and 449 medium priority SVs. Please note that we do not retrospectively re-analyse and update the SV reports for any of the previous records. This only affects any new data.
Handling control samples
Tue, 26 Jun 2018 04:44:00 +0000
We don't necessarily want to see variants from our control samples in the database, but at the same time we still want to be able to download the VCFs and do SNP validation to ensure we don't have sample mix ups. We've created a separate page of control samples and their corresponding VCF for download.
Automated archiving of BAM files
Tue, 26 Jun 2018 04:41:00 +0000
Our capacity to keep BAM files available for download is a real challenge and we are faced with the constant pressure to free up diskspace as more projects come onboard. We've come up with a way to automatically archive BAM files that are older than 1 year to tape storage without any human intervention.
Health reports: GWAS
Tue, 26 Jun 2018 04:20:00 +0000
In addition to Clinvar and Snpedia, we've recently added GWAS Catalog to the health reports based on the the rsNumbers for a patient. GWAS is particularly useful in a research context by comparing variant frequencies in the affected population against a control (healthy) population using statistical analysis to establish a hypothetical link between variants and disease traits. In the health report under GWAS, we've added the following columns: disease traits, studies, risk allele, initial sample size, replication sample size, p-value and risk allele frequency. In GWAS it has been shown that false positives are not uncommon (false association between variant and disease) due to uncontrolled biases and so it's important to take into consideration whether any replicate studies were done to give more confidence to the hypothesized association.
Health reports: Clinvar & Snpedia
Tue, 22 May 2018 04:05:00 +0000
We've added a new feature where users can generate health reports downloadable in a Excel format from multiple datasources including Clinvar and Snpedia based on the patient's rsNumbers/variants and genotype. The health reports indicate the patient's risk factor associated with a particular disease/trait. It can take up to 20 mins to generate and an email is sent with the attached health report. Magnitude is a subjective measure of interest ranging between 0-10. The higher the number the more significant. A magnitude score of 2 or higher is probably worth investigating. A magnitude score of 4 or higher is definitely worth investigating. More info at: https://www.snpedia.com/index.php/Magnitude.
Excluding variants from search based on Patient study codes
Tue, 17 Apr 2018 01:48:00 +0000
When doing our own variant analysis, we often seek variants that shared between affected individuals, and we already provide this capability using the 'shared' filter. We recently added a new filter to take this search one step further by removing variants found in the unaffected individuals (usually from the same family). There is a new textbox called 'Exclude variants' where users can add patient study codes to exclude the variants found in these individuals from the variants found in the other individuals in a single search operation. Keep in mind, that each person will carry thousands of variants, so filtering in this way can be quite slow if no other filters are applied. So it is recommended that users apply as many filters as possible to narrow the search before using this functionality.
GnomAD ethnic frequencies exportable
Tue, 17 Apr 2018 01:15:00 +0000
We've added a new option for users to export GnomAD ethnic frequencies to excel which includes south asian, east asian, african american, jewish, non-finnish european, finnish and other minor allele frequencies (MAF). It's optional because we don't actually store the gnomAD frequencies in our database and have to fetch them from elsewhere making export slower especially when exporting thousands of variants. It's best to filter as much as you can before enabling this option.
Affected statuses: Database vs Pipeline
Tue, 03 Apr 2018 22:55:00 +0000
We recently introduced a new filter called 'Pipeline affected status'. This is not be confused with the other 'Affected status' or 'Disease status' filter which is taken from our Patient Database. The 'Pipeline affected status' differs such that you can reconfigure the pipeline to use a different affected status from what is set in the database to produce different cohort reports. This is useful in cases when the affected status applies to multiple phenotypes or diagnoses and you want to to do repeated cohort analysis under different conditions.
Phenotype to Genotype based variant searching
Thu, 15 Feb 2018 04:22:00 +0000
Expand your variant search based on known phenotype-genotype relationships. This filter only works if you have specified patient ids in the filters. The phenotypes collected from the specified patients are used to query OMIM for gene relationships. A new tab called 'Phenotype-Genotype' is displayed in the results showing the relationships between phenotypes and genes. This only works well for patients that have a good number of phenotypes captured in our databases.
RS number filter
Thu, 15 Feb 2018 04:21:00 +0000
Users can now search by rsNumbers in our search fitlers
Variants from cohort reports are now included
Thu, 15 Feb 2018 04:20:00 +0000
Previously only the variants from the SNV, INDEL and SV reports were included into our database. We've recently rebuilt our database to include all variants, even the questionable ones of poor quality, found in the cohort report because there are some suggestions of an inheritance pattern discovered during the pipeline pedigree analysis. This means more variants for you to browse than there was before.
Gene interactions - Genes don't work in isolation, and your gene lists shouldn't either
Wed, 20 Dec 2017 03:56:00 +0000
Genes don't work in isolation, and your gene lists shouldn't either. Researchers will often have a list of known genes to look for when prioritizing variants based on the patient's clinical diagnosis. but what should you do if no candidate variants can be found solely based on your gene list? There are many approaches, but one option is to expand the gene list based on known gene interactions and pathways. We rely on the highly curated database called BioGRID to expand the gene list to include the network of genes known to interact either directly or through protein-to-protein interactions. To use this new feature, there is a new checkbox called 'Gene interactions' which users can tick to expand their gene-based search in this way.
Wed, 20 Dec 2017 03:47:00 +0000
Users can now create their own search profiles as a way of storing commonly used search filters without having to repeatedly choose the same options over and over again. One example is to include your gene lists in a search profile. The search profiles are associated with the user only and are not shared.
BAI - BAM index files downloadable
Wed, 20 Dec 2017 03:45:00 +0000
The BAM index files, known as BAI files, are now available for download along with the BAM file. This is particularly useful when using IGV on your desktop.
New search filters: gnomAD frequency and INDEL ExAC frequencies
Wed, 20 Dec 2017 03:18:00 +0000
The bioinformatics pipeline has been updated to include gnomAD frequencies and added INDEL exac frequencies. Any new data generated from Dec 2017 onwards will have these new fields. However, none of the previously analyzed datasets will have them. They will have to be reanalyzed if you want these new fields populated. To go along with these new fields, we've added the new gnomAD frequency filter to our search page.
Exon coverage search
Tue, 12 Dec 2017 21:41:00 +0000
The sequencing and alignment process isn't perfect and often there are regions of poor coverage as a result of the pipeline analysis. Previously we made the coverage reports available for download as part of our datasets as 'exonReports'. We've taken it a step further by allowing users to search through these coverage reports based on gene, patient ID and coverage type (NO_COVERAGE, POOR_COVERAGE, PARTIAL_COVERAGE). To use this new feature, in the menus, choose 'Search exon coverage'. Furthermore, we added a new tab to display exon coverage to go along with the variant search results. The tab will only have results if users search by Patient ID and Gene. This way users can browse variants and the coverage results side-by-side providing a broader view over the quality of the variants being presented. In particular, this will be useful for difficult to diagnose patients for which no causal variants have been identified, where potentially disease-causing variants may lay hidden in uncovered regions of the genome.
View all previous posts
CACPIC Frequencies exportable
Tue, 12 Dec 2017 21:31:00 +0000
Chinese frequencies using our healthy chinese controls are now exportable to Excel as an optional column. We've made it optional because these frequencies are calculated at runtime during the export process and can delay the completion of export. For those not interested in the CACPIC frequencies, leave the checkbox unticked.