include("/var/www/annualreport/documents/.include/header.php") ?>
Centre for Bioinformation Science
Staffing | Research | Students | Collaborations | Outreach | Publications | Visitors | Workshop
![]() |
![]() |
The Centre for Bioinformation Science (CBiS) forms a bridge between two areas of major strength at ANU, the mathematical and biological sciences. CBiS brings together researchers with backgrounds in mathematics, statistics and quantitative biology with the goal of developing a conceptual architecture for an information-based, integrative approach to complex biological systems. CBiS is now two years old and approaching maturity. With the appointments of Matthew Wakefield and Dan Kortschak we are approaching our staffing quota. These are both joint appointments with The Research School of Biological Sciences, and underpin our commitment to establishing research links across the ANU campus. We now have the substantial interdisclipinary research activities underway detailed below that include work on sequence analysis, genetic epidemiology, protein folding, comparative genomics, molecular evolution, and statistical analysis of micoarray data. A feature of the work of the Centre has been the extent of our outside collaborations. We now have well-developed collaborations with researchers in CMHR, JCSMR, NCEPH, RSBS, RSC, RSISE, and MSI, RSPhysSE, some of whom hold joint appointments with CBiS. CBiS has worked with ANUSF and APAC, and with development of teaching resources in BaMBi and Department of Mathematics. These cross-campus collaborations provide a nucleus for the development of further links. We have also been heavily involved with outside organizations, as detailed below, and with the organization of outside activities. A highlight was the extremely successful workshop on the design and analysis of microarray/chip experiments organized by Sue Wilson in association with the Australian Statistical Conference.
Professors and Co-Directors:
S Easteal, BSc (St Andrews), PhD (Griffith)
S Wilson, BSc (Sydney), PhD
Fellow:
G Huttley, BSc (Hons I) (Macquarie), PhD (Univ California, Riverside,
USA)
Research Fellows:
H Booth, BSc (Adelaide), PhD (New England)
Y Fang, BSC, MSc (Jilin University, China), PhD (Uni of Massachusetts
USA)
A Isaev, MSc, PhD (Moscow State University)
J Maindonald, MSc (N.Z.)
M Wakefield, BSc (Hon) (Melbourne), PhD (La Trobe) (from April)
Postdoctoral Fellows:
R Turakulov, MSc, PhD (Moscow State University)
D Kortschak, BSc (Hons I) (Adelaide), PhD (Adelaide) (from September)
Head of Information Technology:
DL Diedrich, BS (Mich State) PhD (Penn State) (until September)
Scientific Programmers:
C Lawrence, BSc (Hons 1), MA (Univ Delaware, USA), GradDipCompSci
(UNSW)
A Butterfield, BSc (Hons 1) (Uni Canterbury NZ)
E Lang
S Ohms, MBChB (Auckland), MEngSci (Auckland), PhD (Auckland), MSc
(UNSW) (from October)
CBiS Board Meeting
H Booth
Research has continued in the area of sequence analysis. A paper by
Booth, Maindonald, Wilson and Gready (JCSMR) has been submitted for
publication. Booth presented the POZITIVE algorithm in a poster presentation
at PSB2002 (Hawaii January 2002). This drew international interest,
and Booth was invited to give a seminar on the method at N.C.B.I.
in Washington DC in April 2002. The Singapore-based company Helixense
also expressed interest in a hardwired version of the algorithm as
part of the Cogent package. Together with Huttley and Isaev, an ARC
linkage project has been submitted into the second round of applications
for 2003.
With Ole Nielsen from the Advanced Computation and Modelling (ACM) Research Group, Booth, Kortschak, Lawrence and Wakefield are installing the POZITIVE software and efficient versions of Smith-Waterman and BLAST (using sparse structures on the multiple processors set up in the CBiS cluster and also on APAC). The aim is to make large-scale sequence search problems viable on the CBiS/APAC clusters, in preparation for some of the collaborations in development (see Collaborative Research Ventures). Discussions have been taking place (CBiS, ACM, APAC, RSISE) on the setting up of a collaborative consultancy within CBiS for ANU researchers who do not have automatic access and/or knowledge-base for performing large-scale problems on multiple processors.
In another investigation, a CBiS team collaborated with Nielsen on a longstanding problem in sequence analysis known as the "longest common subsequence" (LCS) problem. A paper on this topic (authored by Booth, McNamara, Nielsen and Wilson) was submitted to the RECOMB2003 meeting in Berlin in April 2003. The work was also presented by Booth in Newcastle in October 2002 in a one-day Bioinformatics Workshop, as part of the Australian Mathematical Society's yearly meeting.
S Easteal
Easteal has continued to direct research aimed at understanding the
basis of a number of genetic disorders, and of the patterns of normal
genetic variation in human populations. A particular focus is the
genetic basis of personality variation, which is associated with a
predisposition to common forms of mental illness involving depression
and anxiety. This work is done in collaboration with the NHMRC Centre
for Mental Health Research. Through a secondment to the Menzies Centre
for Population Health Research in Hobart, Tasmania, Easteal is developing
opportunities to investigate complex genetic traits that take advantage
of unique aspects of the Tasmanian population.
Y Fang
I am engaged in the building up of a mathematical model for protein
folding. This model mimics the geometric features of the native structures
of globular proteins as well as posing a challenging problem to mathematics.
I am also involved in a statistical project investigating the long
range interactions of different parts of protein.
G Huttley
Comparative genomics techniques are central to exploiting genome sequence
data. The power of comparative methods derive from the operation of
natural selection over millennia, with the effect that functionally
important regions of sequences change slowly relative to less functionally
significant regions. A rich vein of information, largely untapped,
lies in the pattern of differences between sequences. Dr Alexander
Isaev and Dr Gavin Huttley have developed a model of evolutionary
dependence that can measure how the probability of substitution at
one residue is influenced by the residue present at another position.
The measurement of evolutionary dependence between residues will reflect
their functional interdependence. Accordingly, this model should inform
our understanding of the 3D structure of proteins or RNA, and (potentially)
the interaction networks between molecules.
A major bottleneck in the development of new methods in comparative genomics is the time taken to develop the software. New approaches, such as the dependence model, have computational overlap with conventional techniques but, because no generalised modelling software exists, they must be completely implemented. Further challenges to maximising the utility of a new method are: accommodating the large volume of genomic data, taking advantage of parallel computing hardware, performing run checkpointing, and providing output accessible to biologist end users. Many researchers, both within CBiS and globally, are faced with similar tasks.
A team led by Dr Gavin Huttley from CBiS is working to construct the COmparative GENomics Toolkit (COGENT) that is aimed at reducing these inefficiencies. The contributors are (in alphabetical order): from CBiS, Dr Hilary Booth, Mr Andrew Butterfield, Dr Gavin Huttley, Dr Alexander Isaev, Ms Cath Lawrence, Mr John Maindonald, and Prof Sue Wilson; and from MSI, Dr Ole Nielsen. COGENT consists of modules aimed at evolutionary modelling (including the dependence model), phylogenetic reconstruction, homology searches (the POZITIVE algorithm of Dr Booth and colleagues), plus multiple alignment and visualisation tools. The implementation of COGENT also has attributes to take advantage of high-performance computing resources (eg. run checkpointing and parallelisation).
Dr Huttley is presently discussing potential collaborative ventures with a Singapore Bioinformatics company, HeliXense. One outcome from these discussions is installation of their gRNA product (valued at $150,000) on the CBiS cluster. This product provides a range of useful functionality to CBiS researchers, including - consistent searches across disparate biological databases, distributed computing tools, integration with common analysis methods for a range of problems (eg. expression array analysis, comparative genomics), and automated database warehousing of analysis results.
A Isaev
Pure Mathematics: A complete classification of complex n-dimensional
manifolds admitting effective actions of SUn was completed.
I started working on the problem of describing proper holomorphic
mappings between Reinhardt domains in Cn.
Bioinformatics: I continued working on models of dependent evolution of DNA and protein sequences. Implementation is now in progress, and I helped resolve many issues arising in the implementation process.
D Kortschak
Kortschak has been involved in the annotation and analysis of
an EST collection derived from the coral Acropora millepora
in collaboration with Dr David Miller (Comparative Genomics Centre,
James Cook University) and Prof Robert Saint (Research School of Biological
Science, ANU). The EST collection has also been used to generate EST
spotted microarrays for developmental and environmental studies on
this organism. It is also expected that the data generated in microarray
experiments will provide a data pool for use by Maindonald. Kortschak
has also contributed some early developmental code to the open source
bioinformatics project, BioPerl for the handling of taxonomic relationships.
J Maindonald
Design of experiments, especially for cDNA microarray data.
Approaches to the analysis of cDNA microarray data; variance estimation
for the denominators of t and F-tests.
Statistical software tools for use with cDNA microarray data.
Use of the R system and associated packages for the analysis of microarray
data.
Educational and training issues for laboratory scientists who work
with microarray data.
Estimation of between slide variance, for use in the denominator of t-statistics and F-statistics, is a major issue for the statistical analysis of cDNA microarray data. On the one hand, estimates that are based on individual gene variation are, when the number of slides per treatment is small, highly affected by sampling variation. On the other hand, systematic changes in variance with average intensity, with print tip and with print tip order complicate the attempt to combine variance information from multiple genes.
Courses run jointly with BioLateral Ltd have made clear the need for a substantial investment, in institutions that are moving in to the use of microarrays, in the training of staff in statistical experimental design and data analysis issues. Most of those who are working with such data have little or no statistical support, and are often working with inadequate computing tools. The BioLateral courses, although too short in duration to be as effective as is desirable, have made a useful contribution to training in this area.
R Turakulov
Establishment of genotyping assays for population study of polymorphisms
in Apolipoprotein E (APOE) and Dopamine Beta Hydroxylase (DBH)
genes.
Biological theories of the personality trait of psychoticism (P) have
proposed roles for low dopamine-beta-hydoxylase and high testerosterone.
It was therefore predicted that polymorphisms of the DBH and
androgen receptor genes would be associated with P. Data were taken
from a community survey of Australian adults who completed personality
questionnaires and gave a DNA sample. 896 participants were genotyped
for dopamine-beta-hydroxylase C-1021T polymorphism. For this purpose
original inexpensive allele-specific genotyping assay has been developed
and compared with previously published RFLP method. Our study failed
to find support for the predicted association of DBH polymorphisms
with P, despite the large sample size. However, we cannot rule out
that there are very small associations or that other polymorphisms
of these genes have associations with P. APOE polymorphism
has been shown to be responsible for cholesterol level in plasma.
In numbers of survey were shown association between APOE gene polymorphisms
and Alzheimer's Disease, Cardiovascular Diseases, longevity etc. To
study this polymorphism highthroughput TaqMan® assay was established.
This modern assay allows precise genotyping of large samples in a
very short time. Over 3000 individuals were genotyped for the PATH
Thru Life Project. Samples from other surveys could be genotyped now
in a hundreds per day scale.
Second activity involved analysis of genome scale SNP genotyping
data.
Genetic strategies for understanding genotype-phenotype relationships
in human disease depend on assumptions about the architecture of variation
in the human genome. We have analysed patterns of SNP variation at
>5,000 SNPs for samples of African-American, Asian and Caucasian
populations available from The Single Nucleotide Consortia Ltd (TSCL)
[http://snp.cshl.org/].
From this analysis we got two major conclusions: 1. Evaluation of
the number of SNPs required to reliably assign individuals to the
population from which they are derived. Our results show that less
than 100 loci are required to identify, with high confidence, the
population source of individuals in these samples. 2. Identification
of loci with unusually high levels of variation among populations.
A list of these loci is generated. They are candidates for the action
of diversifying natural selection and they are highly informative
indicators of the population affinity of individuals.
M Wakefield
A major undertaking for the second half of 2002 has been the development
of a consortium to promote and develop opportunities for full genome
sequencing of a kangaroo. The kangaroo genome is a powerful resource
for comparative genomics and has the potential to identify important
conserved regulatory elements with greater efficiency than other species
comparisons with human. The Centre for Bioinformation Science will
play a central role in the production and analysis of kangaroo genome
data. To further the aims of complete genome sequencing the consortium
has applied for an ARC Centre of Excellence and is writing a US National
institute of Health, National Human Genome Resource Initiative (NIH
NHGRI) white paper. This process led to the production in 2002 of
a BAC library funded by the NHGRI.
S Wilson
Pittelkow and Wilson have been developing useful exploratory approaches
to the analysis of (preprocessed) microarray data. These techniques
are based on graphical representations of the data, examining both
the genes and the chips, either separately, or together, and are variants
of statistical ordination techniques. Based on experiments in which
data are simulated in accord with current biological understanding
of microarray data, they have developed the Chip-plot, the
Gene-plot and the GE-biplot as useful exploratory tools. The approach
has been applied to two publicly available data sets that are being
used as benchmarks, namely the Alon et al colon data set and
the Golub et al leukemia data set. The results from using their
techniques have been found to perform at least as well as other more
complex approaches to these data sets. Further, Pittelkow and Wilson
have examined cell level data from a publicly available Affymetrix
Latin Square design experiment. They have shown that hyperbolic response
curves fit the data for Perfect Match (PM) probes (that are designed
to detect genes) reasonably well. For the Mismatch (MM) probes (that
are designed to detect non-specific hybridisation) the fit is not
quite as good, and is at a lower level. Examining the effects of these
findings on the current functions of PM and MM that are used as summary
measures, their conclusion is that the PM values (alone) should be
used as summary measures.
Identifying the genetic basis of human phenotypes, especially complex
diseases and disorders, is a major challenge that is reliant on statistical
procedures. The commonly used statistical models assume that a single
gene is largely responsible for individual disease risk. Such models
are not producing very many reproducible results. So Wilson has been
developing multi-gene models, and in particular examining the power
of data analyses associated with the most commonly used family design,
namely affected sib pairs.
Bassett ML, Wilson SR, Cavanaugh JA (2002). Penetrance of Hfe-related
hemochromatosis in perspective. Hepatol. 36, 500-503.
Cavanaugh JA, Wilson SR, Bassett ML (2002). Genetic testing for HFE
haemochromatosis in Australia: the value of testing relatives of simple
heterozygotes. J Gastroent Hepatol 17, 800-803.
Chistiakov D, Savostanov K, Turakulov R, Petunina N, Balabolkin M, Nosikov V (2002). Further studies of genetic susceptibility to Graves' disease in a Russian population. Med Sci Monit 8, CR180-4.
Fang Y and Hwang J-F (2002). When is a minimal a minimal graph? Pacific Journal of Mathematics 207, 359 - 376.
Hales S, de Wet N, Maindonald J and Woodward A (2002). Potential effect of population and climate changes on global distribution of dengue fever: an empirical model. Lancet. Published online August, 2002.
Hofer SM, Christensen H, Mackinnon AJ, Korten AE, Jorm AF, Henderson
AS and Easteal S (2002). Change in cognitive functioning associated
with ApoE genotype in community sample of older adults. Psychology
and Aging 17, 194-208.
Isaev AV and Kruzhili NG (2002). Effective actions of the unitary
group on complex manifolds. Canad J Math 54, 1254-1279.
Jorm AF, Prior M, Sanson A, Smart D, Zhang Y, Tan S and Easteal S (2002). Lack of association of a single-nucleotide polymorphism of the mu-opioid receptor gene with anxiety-related traits: Results from a cross-sectional study of adults and a longitudinal study of children. American Journal of Medical Genetics. Neuropsychiatric Genetics 114, 659-664.
Maindonald J, Pittelkow Y and Wilson S (2002). Some considerations for the design of microarray experiments. CBiS Technical report, 22pp
Murtagh LJ, Whiley M, Wilson SR, Tran H, and Bassett ML (2002). Unsaturated
iron binding capacity and transferring saturation are equally reliable
in detection of HFE-hemochromatosis. Am J Gastroenterol 97, 2093-2099.
Prichard Z, Jorm AF, Prior M, Sanson A, Smart D, Zhang Y, Huttley
G, Easteal S (2002). Association of polymorphisms of the estrogen
receptor gene with anxiety-related traits in children and adolescents:
a longitudinal study. Am J Med Gene 114, 169-76.
Turakulov R., Chistiakov D (2002). Comparison of DNA and RNA based realtime PCR assays for quantitative detection of chromosomal translocations. Biotech Lett 24, 1709-1714.
Wakefield MJ, Graves JAM and Disteche CM (2002). Identification of a marsupial homologue of the Xist gene. Biochem Cell Biol 80, 390.
Wilson SR (2002). Assortative mating, In: Biostatistical Genetics
and Genetic Epidemiology, Eds Elston R, Olson J and Palmer L,
pp 34-36.
Wilson SR (2002). Modern biometry. In: Knowledge for sustainable
development - An insight into the encyclopedia of life support Systems,
UNESCO Publishing-Eolss Publishers, Paris, France, Oxford, UK.
Wilson SR (2002). On hypothesis testing for data from different family
study designs, Human Heredity 53, 55-58
Wilson SR and Huttley G (2002). Non-replicability of disease gene
results: A modelling perspective. In: Advances in Statistics, Combinatorics
and Related Areas. Eds C Gulati, Y-X Lin, S Mishra and J Rayner.
World Scientific, Singapore.
Students
PhD students currently enrolled:
Y Pittelkow, BA DipEd (Macquarie), Grad Dip Stats (Uni Canberra),
MSc, Dip Arts
Honours students currently enrolled:
S McNamara (Sequence Analysis)
Collaborative Research Ventures
Dr H Booth
Dr O Nielsen, (Mathematical Sciences Institute, ANU & APAC)
Software design and implementation, sequence analysis.
Dr M Wakefield (& Dr O Nielsen)
Applications of software and parallel processing to RNA structure.
Prof P Hall (Mathematical Sciences Institute, ANU) & Dr Mike Waterman
(University of Southern California, LA)
Finer estimates on the distribution of k-word matches between two
random sequences.
Dr D Kortschak & C Lawrence (CBiS)
Parallelization of BLAST on the CBiS cluster.
Dr A Smola (Research School of Information Science and Engineering,
ANU)
Applications of support vector machines to bioinformatics.
Prof N Dixon (Research School of Chemistry, ANU)
Searching protein structure databases for potential experimental
targets.
Dr M Wakefield, Prof J Graves. (Comparative Genomics Group, Research
School of Biological Sciences, ANU)
Bioinformatics of the Kangaroo Genome Project.
Prof S Easteal
Prof A Jorm, (Social Psychiatry Research Unit, ANU), Prof M Prior,
(Psychology Department, Melbourne University)
Genetic basis of common mental disorders associated with anxiety
and depression.
Assos Prof K North, (Children's Hospital, Sydney)
Genetic basis of elite athletic performance.
Dr Y Fang
Prof J-F Hwang, (Institute of Mathematics, Academia Sinica, Taipei,
Taiwan)
Dr W Kaplan, (Garvan Institute, Sydney)
Dr G Huttley
Prof J Graves, (Comparative Genomics Group, Research School of Biological
Sciences, ANU)
The Kangaroo Genome Project.
Prof L Nunney, (Dept. Biology, University of California, Riverside)
Evolution of DNA repair genes, and population genetics of tumour
suppressor genes.
Prof M Ragan, (Institute for Molecular Bioscience, University
of Queensland)
Exploring the roots to the tree of life.
Dr Frances Shannon, (Division of Molecular Bioscience, John Curtin
School of Medical Research, ANU)
Identifying potential control regions in T cell costimulatory receptors.
Dr A Isaev
NG Kruzhilin (Steklov Institute, Moscow).
To classify all connected n-dimensional complex manifolds that
admit an effective action of the special unitary group SUn
by biholomorphic transformations.
Mr J Maindonald
Dr J Braun, (Department of Statistics and Actuarial Science, University
of Western Ontario, Canada)
Writing of a monograph, based around the use of the R statistical
system, aimed at researchers in statistical application areas.
(with Prof S Wilson and Y Pittelkow)
Dr M Gardiner-Garden, (The Garvan Institute for Medical Research)
Use of gene expression information in survival analysis for patients
with ovarian cancer.
Dr M Hegland, Dr O Nielsen and Z Din, (ANU Mathematical Sciences
Institute)
Development of a system for facilitating the processing, on a
computer cluster, of highly parallel tasks such as are common in the
analysis of microarray data.
Dr L Smith (NCEPH) and Dr H Booth (Demography, RSSS)
Methods for mortality projection.
Dr M Boot, (Faculty of Economics and Commerce)
19th century wage rates in the English textile industry.
Dr M Wakefield
Prof C Disteche, (University of Washington, Seattle, USA)
Marsupial X Chromosome Inactivation and BRCA1.
Prof M Renfree, (University of Melbourne); Prof D Cooper, (Macquarie
University); Prof J Mattick, (Australian Genome Research Facility);
Prof T Speed, (Walter and Eliza Hall Institute).; Prof J Boore, (DOE
Joint Genome Institute, California, USA); Dr R Wilson, (Washington
University Genome Centre, St Louis, USA); Prof J Graves, (RSBS ANU)
The Kangaroo Genome Project.
Prof S Wilson
Dr J Cavanaugh and Dr P Pavli, (The Canberra Hospital)
Determining the genetic basis of Cohn's disease.
Dr M Bassett and Dr J Cavanaugh, (The Canberra Hospital)
Understanding the hereditary basis of haemochromatosis.
Service to Outside Organizations
S Easteal
Editor, Molecular Biology and Evolution
Advisory Board, Sydney University Biological Informatics and Technology
Centre
Research Committee, Australian Institute of Sport
Advisory Committee, NHMRC Centre for Mental Health Research
Member of the Council of Scientific Editors
NHMRC Genetics Project Grant Review Panel
Councillor, Society for Molecular Biology and Evolution
Organizing Committee, Lorne Genome Conference
Organizing Committee, XIX International Congress of Genetics
G Huttley
Member of the program committee for the 1st Asia-Pacific Bioinformatics
Conference.
Member of the program committee for the 1st International Workshop
on Biological Data Management.
A Isaev
Associate Editor, Journal of Mathematical Analysis and Applications
J Maindonald
Topic Editor for Statistical Computing, Biometrics Section, Encyclopedia
of Life Support Systems Co-editor, Computing section, 2nd edn, Wiley
Encyclopedia of Biostatistics
Courses on Design and Analysis of Microarray Data, in collaboration
with Dr Tim Littlejohn and Peter Maxwell of Biolateral Ltd and with
Jess Mar from the University of Queensland Mathematics Department.
(Five courses were held in 2002 - three in Sydney and one each in
Townsville and Melbourne.)
S Wilson
Associate Editor, Annals of Human Genetics;
Associate Editor, Computational Statistics & Data Analysis;
Member, Editorial Board, Statistical Methods in Medical Research;
Representative, Statistical Society of Australia, Australian Foundation
for Science;
Member, Conference Advisory Committee, IBS;
Editor, Biometrics Section, Encyclopedia of Life Support Systems;
Member, Institute of Mathematical Statistics (IMS) Committee on Memorials;
Member, IMS Nominations Committee;
Member Editorial committee for the 6th edition of the ISI's Dictionary
of Statistical Terms; Member, ASC16: 16th Australian Statistical Conference,
Committee;
Member, ISI's Mahalanobis Committee;
NHMRC Program Grant Review Panel
Outside Grants and Awards
H Booth
International Society for Computational Biology (ISCB)
Travel fellowship
S Wilson
Large ARC Grant - 5 years - total $385,000
Statistical Advances in the Post-Genome Era
Visitors
Prof D Balding, Department of Epidemiology and Public Health, Imperial
College London, UK
Prof M Waterman, University of Southern California, USA
Workshop
Design and Analysis of Microarray/Chip Experiments
(12 July; organised by S Wilson, in conjunction with the 16th biennial
Australian Statistical Conference).