Average customer rating:
|
Statistical Methods in Bioinformatics: An Introduction (Statistics for Biology and Health)
Warren J. Ewens , and Gregory Grant Manufacturer: Springer ProductGroup: Book Binding: Hardcover Similar Items:
Accessories:
ASIN: 0387400826 |
Book Description
Advances in computers and biotechnology have had a profound impact on biomedical research, and as a result complex data sets can now be generated to address extremely complex biological questions. Correspondingly, advances in the statistical methods necessary to analyze such data are following closely behind the advances in data generation methods. The statistical methods required by bioinformatics present many new and difficult problems for the research community.
This book provides an introduction to some of these new methods. The main biological topics treated include sequence analysis, BLAST, microarray analysis, gene finding, and the analysis of evolutionary processes. The main statistical techniques covered include hypothesis testing and estimation, Poisson processes, Markov models and Hidden Markov models, and multiple testing methods.
The second edition features new chapters on microarray analysis and on statistical inference, including a discussion of ANOVA, and discussions of the statistical theory of motifs and methods based on the hypergeometric distribution. Much material has been clarified and reorganized.
The book is written so as to appeal to biologists and computer scientists who wish to know more about the statistical methods of the field, as well as to trained statisticians who wish to become involved with bioinformatics. The earlier chapters introduce the concepts of probability and statistics at an elementary level, but with an emphasis on material relevant to later chapters and often not covered in standard introductory texts. Later chapters should be immediately accessible to the trained statistician. Sufficient mathematical background consists of introductory courses in calculus and linear algebra. The basic biological concepts that are used are explained, or can be understood from the context, and standard mathematical concepts are summarized in an Appendix. Problems are provided at the end of each chapter allowing the reader to develop aspects of the theory outlined in the main text.
Warren J. Ewens holds the Christopher H. Brown Distinguished Professorship at the University of Pennsylvania. He is the author of two books, Population Genetics and Mathematical Population Genetics. He is a senior editor of Annals of Human Genetics and has served on the editorial boards of Theoretical Population Biology, GENETICS, Proceedings of the Royal Society B and SIAM Journal in Mathematical Biology. He is a fellow of the Royal Society and the Australian Academy of Science.
Gregory R. Grant is a senior bioinformatics researcher in the University of Pennsylvania Computational Biology and Informatics Laboratory. He obtained his Ph.D. in number theory from the University of Maryland in 1995 and his Masters in Computer Science from the University of Pennsylvania in 1999.
Comments on the First Edition. "This book would be an ideal text for a postgraduate course…[and] is equally well suited to individual study…. I would recommend the book highly" (Biometrics). "Ewens and Grant have given us a very welcome introduction to what is behind those pretty [graphical user] interfaces" (Naturwissenschaften.). "The authors do an excellent job of presenting the essence of the material without getting bogged down in mathematical details" (Journal. American Staistical. Association). "The authors have restructured classical material to a great extent and the new organization of the different topics is one of the outstanding services of the book" (Metrika).
Customer Reviews:
Most Elegant Account of Bioinformatics.......2004-11-27
Average customer rating:
|
Microarray Bioinformatics
Dov Stekel Manufacturer: Cambridge University Press ProductGroup: Book Binding: Paperback Similar Items:
ASIN: 052152587X |
Book Description
DNA microarrays have revolutionized molecular biology and are becoming a standard tool in the field. Dov Stekel's book is a comprehensive guide to the mathematics, statistics, and computing required to use microarrays successfully. Unlike traditional molecular biology, the successful use of DNA microarrays requires the application of statistics and computing to design the arrays and experiments, and to analyze and manage the data. This book is written for researchers, clinicians, and laboratory managers.Download Description
This book is a comprehensive guide to all of the mathematics, statistics and computing you will need to successfully operate DNA microarray experiments. It is written for researchers, clinicians, laboratory heads and managers, from both biology and bioinformatics backgrounds, who work with, or who intend to work with microarrays. The book covers all aspects of microarray bioinformatics, giving you the tools to design arrays and experiments, to analyze your data, and to share your results with your organisation or with the international community. There are chapters covering sequence databases, oligonucleotide design, experimental design, image processing, normalisation, identifying differentially expressed genes, clustering, classification and data standards. The book is based on the highly successful Microarray Bioinformatics course at Oxford University, and therefore is ideally suited for teaching the subject at postgraduate or professional level.Customer Reviews:
Great Introduction to Microarray Analysis.......2006-05-12
Neat little book on microarrays.......2006-03-24
If you are new to microarray, get this book........2005-05-16
an intro. for biologists.......2004-09-08
A Good Book for Microarray Bioinformatics.......2004-01-04
Average customer rating:
|
Statistical Methods in Bioinformatics
Warren J. Ewens , and Gregory R. Grant Manufacturer: Springer ProductGroup: Book Binding: Hardcover Accessories:
ASIN: 0387952292 |
Book Description
Advances in computers and biotechnology have had an immense impact on the biomedical fields, with broad consequences for humanity. Correspondingly, new areas of probability and statistics are being developed specifically to meet the needs of this area. There is now a necessity for a text that introduces probability and statistics in the bioinformatics context. This book also describes some of the main statistical applications in the field, including BLAST, gene finding, and evolutionary inference, much of which has not yet been summarized in an introductory textbook format. This book grew out of a need to teach bioinformatics to graduate students at the University of Pennsylvania. At the same time however, it is organized to appeal to a wider audience. In particular it should appeal to any biologist or computer scientist who wants to know more about the statistical methods of the field, as well as to a trained statistician who wishes to become involved in bioinformatics. The earlier chapters introduce the concepts of probability and statistics at an elementary level, and will be accessible to students who have only had introductory calculus and linear algebra. Later chapters are immediately accessible to the trained statistician. Only a basic understanding of biological concepts is assumed, and all concepts are explained when used or can be understood from the context. Several chapters contain material independent of that in other chapters, so that the reader interested in certain areas can proceed directly to those areas.Warren Ewens is Professor of Biology at the University of Pennsylvania. He is the author of two books, Population Genetics and Mathematical Population Genetics, and has served on the editorial boards of Theoretical Population Biology, GENETICS, Proceeding of the Royal Society B and SIAM Journal in Mathematical Biology. He was recently awarded the Gold Medal of the Australian Statistical Society and elected as Fellow of the Royal Society. His research interests are in evolutionary population genetics, linkage analysis for human diseases, and bioinformatics.
Gregory Grant is a bioinformatics researcher at the University of Pennsylvania in the Computational Biology and Informatics Laboratory (CBIL), where he has been since 1998. In 1995 he received a Ph.D. in Mathematics from the University of Maryland and in 1999 a Masters in Computer Science from the University of Pennsylvania. His research interests are in bioinformatics in general and in particular in the statistical analysis of gene expression data and significance testing methods for IBD-mapping.
Customer Reviews:
Misleading title!.......2004-12-12
Great all-around review of probability .......2004-08-17
Disappointing overview.......2003-11-12
A topic such as the two-sample t-statistic is scattered throughout the book, with the main part not even cited in the index!
Unfortunately there are not a lot of books in the field of Statistics in Bioinformatics. However, I would recommend "The Elements of Statistical Learning" (Hastie et al.) for classifiers etc (Duda and Hart's classic is also good). I would recommend "Biostatistical Analysis" by Zar for a general coverage, and Terry Speed's "stat Labs: Mathematical Statistics ..." which is not comprehensive but has good lab examples with associated statistical analysis.
Pretty good overview.......2002-09-19
Chapter one begins, appropriately, with an introduction to probability theory, with a consideration of discrete probability distributions of one variable beginning the chapter. The Bernoulli, binomial, uniform, geometric, generalized geometric, and Poisson distributions are discussed. The authors point out the use of geometric-like distributions in the BLAST application. The also caution the reader as to the difference between the mean and the average of a random variable. They then move on to consider continuous distributions, discussing briefly the uniform, Normal, exponential, gamma, and beta distributions. Moment-generating functions are also introduced, and they prove a "convexity" theorem for these functions that is important in the BLAST application. The authors also introduce the relative entropy and generalized support statistics, the later also being used in BLAST.
The next chapter is an overview of probability theory in many random variables. The results in chapter one are discussed in this context, and the authors give an interesting application to the sequencing of EST libraries. The authors also point out that the variance of the maximum of a collection random variables is finite as the number of variables increases, a fact that is used quite often in bioinformatics. Transformations of random variables are also discussed, with the goal of showing how these can be used to find the density function of a single random variable, this also being important in BLAST.
The most important subject of the book begins in chapter 3, wherein the authors introduce statistical inference. They begin with a very brief discussion of the differences between the frequentist and Bayesian approaches to statistical inference and then move on to classical hypothesis testing and nonparametric tests. This chapter is of great value to those readers, for example biologists/would-be bioinformaticists who are approaching statistics for the first time.
Chapter 4 introduces concepts that are of upmost importance in probabilistic computational biology, namely Markov chains. The discussion in this chapter sets up the strategies used in the next chapter on analyzing a single DNA sequence and a latter chapter on hidden Markov models. Shotgun sequencing is discussed as a tool to determine the an actual DNA sequence, and the authors discuss the probabilistic issues that arise in the reconstruction of long DNA sequences from shorter sequences. Missing in this chapter is a mathematical analysis of the advantages/disadvantages between shotgun and whole genome sequencing strategies.
Chapter 6 then generalizes the analysis of chapter 5 to multiple DNA and protein sequences. It is here that one begins to talk about alignments between sequences, which bring about some very subtle mathematical problems in computational biology. The computational complexity of the (global) alignment problem entails the use of softer techniques, such as dynamic programming, which is discussed in this chapter. The (local) alignment problem is also discussed in some detail, using the linear gap model. The alignment problem and the issues with scoring for protein sequences are also discussed in detail. The reader first encounters the famous PAM and BLOSUM matrices in this chapter. The authors do not discuss any connections with the protein folding problem, unfortunately.
The next chapter introduces the basic probability theory behind the BLAST algorithm, namely random walks. They do so with emphasis on moment generating functions, which might be a little abstract for the biologist reader.
The authors return to tatistical estimation and hypothesis testing in chapter 8, with maximum liklihood and fixed sample size tests discussed in some detail. Again connecting with the BLAST algorithm, the sequential probability ratio test is treated.
The authors finally get down to the BLAST algorithm in chapter 9, using an older version of the software (1.4). The connection of the algorithm with random walks and how to assign scores is immediately apparent, as is the ability of BLAST to do database queries against a chosen sequence. The algorithm is compared with the sequential analysis discussed in the last chapter.
The authors return to Markov chains in chapter 10, and give some numerical examples. In addition, they treat the important topic of Markov chain Monte Carlo via the Hastings-Metropolis algorithm, Gibbs sampling, and simulated annealing. An application of simulated annealing to the double digest problem is described. The authors also spend a litte time discussing continuous-time Markov chains.
Hidden Markov models are finally discussed in chapter 11. These have been the most effective tools in sequence analysis and the authors give a nice overview of their construction and properties in this chapter. The Pfam package is discussed as a software implementation of HMMs for determining protein domains. Unfortunately, they do not discuss the excellent package HMMER for implementing HMMs in sequence analysis.
Chapter 12 discusses computationally intensive methods in classical inference. One of these methods, the bootstrap procedure, which is used for large sample sizes, is described. Used to estimate confidence intervals in situations where there is not enough information to employ classical methods, the authors detail a method using quantiles to estimate the confidence interval for the standard deviation of the expression intensity of a gene. This is followed by a return to the multiple testing problem of chapter 3 in the context of the data analysis of expression arrays.
I did not read the last two chapters on evolutionary models and phylogenetic tree estimation so I will omit their review.
guide into the right direction.......2001-09-06
This book is the first exception I know of. It builds, and rests on, solid foundations of genetic stochastic processes and still goes all the way to real-life problems. Let me illustrate this by means of an example, rather than enumerating all the topics in the book.
Chap. 14, entitled `phylogenetic tree estimation' (as opposed to the more common term `phylogenetic tree reconstruction' - not without reason, I presume) builds on, and is firmly interlaced with, Chap. 13 about `evolutionary models', which systematizes the zoo (if not jungle) of substitution models in both discrete and continuous time. On this basis, the overview of tree-building methods makes a lot of sense. Even better, it does not stop here, but presents an application (to real sequence data), followed by a careful analysis of where the various methods agree, and where - and maybe why - they disagree. This way, it clears away some common misconceptions; in particular, it presents a careful analysis of what bootstrap does and what it does not in this context. The chapter closes with a discussion of unresolved problems (like inhomogeneity of substitution rates), and methods and possible pitfalls related to testing of nested and non-nested hypotheses in tree estimation.
The book is written in an informal style without being imprecise, which makes it pleasant reading. It is particularly suitable for teaching at a high level. This is enhanced by realistic (and even real-life) examples that furnish the text, as well as carefully chosen exercises at the end of each chapter.
Certainly, this first edition of `Statistical Methods in Bioinformatics' cannot be the last word in this fast-moving field. But it is an excellent guide into the `right' direction.
Average customer rating: |
New Directions in Statistical Physics: Econophysics, Bioinformatics, and Pattern Recognition
Manufacturer: Springer ProductGroup: Book Binding: Hardcover ASIN: 3540431829 |
Book Description
Statistical physics addresses the study and understanding of systems with many degrees of freedom. As such it has a rich and varied history, with applications to thermodynamics, magnetic phase transitions, and order/disorder transformations, to name just a few. However, the tools of statistical physics can be profitably used to investigate any system with a large number of components. Thus, recent years have seen these methods applied in many unexpected directions, three of which are the main focus of this volume. These applications have been remarkably successful and have enriched the financial, biological, and engineering literature. Although reported in the physics literature, the results tend to be scattered and the underlying unity of the field overlooked. This book provides a unique insight into the latest breakthroughs in a consistent manner, at a level accessible to undergraduates, yet with enough attention to the theory and computation to satisfy the professional researcher.
Average customer rating: |
Computational and Statistical Methods in Bioinformatics
Xue-wen Chen , George C. Tseng , Xinkun Wang , and Ya Zhang Manufacturer: Chapman & Hall/CRC ProductGroup: Book Binding: Paperback ASIN: 1420070541 |
Average customer rating:
|
Hidden Markov Models for Bioinformatics (Computational Biology)
T. Koski Manufacturer: Springer ProductGroup: Book Binding: Hardcover Similar Items:
ASIN: 1402001355 |
Book Description
The purpose of this book is to give a thorough and systematic introduction to probabilistic modeling in bioinformatics. The book contains a mathematically strict and extensive presentation of the kind of probabilistic models that have turned out to be useful in genome analysis. Questions of parametric inference, selection between model families, and various architectures are treated. Several examples are given of known architectures (e.g., profile HMM) used in genome analysis.Customer Reviews:
Written by a mathematician for mathematicians.......2004-03-11
I wanted a book with a mathematical sophistication simliar to Durbin's book, but this book is way more than that. On the other hand, I showed this book to a mathematics graduate student and she said this book is perfect for her. So I guess this book is written by a mathematician only for mathematicians.
Good material, but you really have to want it........2003-10-10
This additional depth of coverage may go beyond many readers' needs. It is very helpful, though, for people who need more than the usual algorithms. By giving the background in such detail, a persistent reader can follow to a certain point, then create modifications with a clear idea of where the new algorithm actually comes from.
Regarding the current practice of HMM usage, I found it a bit thin. Widely-known tools based on HMMs are mentioned only occasionally and in passing, and HMM-based alignment is discussed only briefly. Well, this book isn't for the tool user. Perhaps more important, I found scant mention of scoring with respect to some background probability model ("null" model, as it's called here).
My one real complaint, and this is truly minor, is the quality of illustration. The line-drawings look like Word pictures - not necessarily a bad thing, if done well. These aren't particularly professional-looking, though, and oddly stretched or squashed in many cases. Still, they're readable enough and make all the needed points.
A lesser point, and not the author's fault, is the editorial implication that this book introduces probabilitic models in general. It does not. This is strictly about HMMs, not Bayesian nets, bootstrap techniques, or any of the dozens of other probabilistic models used in bioinformatics. That is not a flaw of the book, just a flaw in how it's represented.
If you are dedicated to becoming an expert in HMM construction and application, you must have this book. It's a bit much, though, for people who just want the results that HMMs give.
Primarily for bio-mathematicians.......2003-07-01
Some of the highlights of the book include: 1. An overview of the probability theory to be used in the book. The material is fairly standard, including a review of continuous and discrete random variables, from the measure-theoretic point of view, i.e the author introduces them via a probability space which is set with its sigma field, and a probability measure on this field. The weight matrix or "profile" as it is sometimes called, is defined, this having many applications in bioinformatics. Bayesian learning is also discussed, and the author introduces what he calls the "missing information principle", and is fundamental to the probabilistic modeling of biological sequences. Applications of probability theory to DNA analysis are discussed, including shotgun assembly and the distribution of fragment lengths from restriction digests. A collection of interesting exercises is included at the end of the chapter, particularly the one on the null model for pairwise alignments. 2. An introduction to information theory and the relative entropy or "Kullback distance", the latter of which is used to learn sequence models from data. The author defines the mutual information between two probability distributions and the entropy, and calculates the latter for random DNA. He also proves some of the Shannon source coding theorems, one being the convergence to the entropy for independent, identically distributed random variables. The Kullback distance is then defined, as a distance between probability distributions, with the caution that it is not a metric because of lack of symmetry. 3. The overview of probabilistic learning theory, where 'learning from data' is defined as the process of inferring a general principle from observations of instances. 4. The very detailed treatment of the EM algorithm, including the discussion of a model for fragments with motifs. 5. The discussion of alignment and scoring, especially that of global similarity. Local alignment is treated in the exercises. 6. The discussion of the learning of Markov chains via Bayesian modeling applied to a training sequence via a family of Markov models. Frame dependent Markov chains are discussed in the context of Markovian models for DNA sequences. 7. The discussion of influence diagrams and nonstandard hidden Markov models, in particular the excellent diagrams drawn to illustrate the main properties, and excellent discussion is given of an "HMM with duration" in the context of the functional units of a eukaryotic gene. This is important in the GeneMark:hmm software available. 8. The treatment of motif-based HMM, in particular the discussion of the approximate common substring problem. 9. The discussion of the "quasi-stationary" property of some chains and the connection with the "Yaglom limit". 10. The treatment of Derin's formula for the smoothing posterior probability of a standard HMM. The author shows in detail that the probability of a finite length emitted sequence conditioned on a state sequence of the HMM depends only on a subsequence of the state sequence. 11. The treatment of the lumping of Markov chains, i.e. the question as to whether a function of a Markov chain is another Markov chain. 12. The very detailed treatment of the Forward-Backward algorithm and the Viterbi algorithm. 13. The discussion of the learning problem via the quasi-log likelihood function for HMM. 14. The discussion of the limit points for the Baum-Welch algorithm. Since the Baum-Welch algorithm deals with iterations of a map, its convergence can be proved by finding the fixed points of this map. These fixed points are in fact the stationary points of the likelihood function and can be related to the convergence of the algorithm via the Zangwill theory of algorithms. Unfortunately the author does not give the details of the Zangwill theory, but instead delegates it to the references (via an exercise). The Zangwill theory can be discussed in the context of nonlinear programming, with generalizations of it occurring in the field of nonlinear functional analysis. It might be interesting to investigate whether the properties of hidden Markov models, especially their rigorous statistical properties, can all be discussed in the context of nonlinear functional analysis.
Average customer rating: |
Medical Data Analysis: Third International Symposium, ISMDA 2002, Rome, Italy, October 8-11, 2002, Proceedings (Lecture Notes in Computer Science)
Manufacturer: Springer ProductGroup: Book Binding: Paperback ASIN: 3540000445 |
Book Description
This book constitutes the refereed proceedings of the Third International Symposium on Medical Data Analysis, ISMDA 2002, held in Rome, Italy, in October 2002. The 23 revised full papers presented were carefully reviewed and selected for inclusion in the book. The papers are organized in topical sections on data mining and decision support systems, medical informatics and modeling, time series analysis, and medical imaging.
Average customer rating: |
Statistical Advances in the Biomedical Sciences: Clinical Trials, Epidemiology, Survival Analysis, and Bioinformatics (Wiley Series in Probability and Statistics)
Manufacturer: Wiley-Interscience ProductGroup: Book Binding: Hardcover ASIN: 0471947539 |
Book Description
A number of eminent experts on Clinical Trials, Epidemiology, Survival Analysis, and Genomics/Proteomics have contributed 30 carefully prepared and peer-reviewed articles to this book. Within the four sections, the articles have been organized so as to make the thematic transition between them as smooth as possible. A structural uniformity is maintained across all the chapters, each starting with an introduction that discusses the general concepts and describes the biomedical problem under focus.
Average customer rating: |
Statistical Bioinformatics: For Biomedical And Life Science Researchers (Methods of Biochemical Analysis)
Jae K. Lee Manufacturer: John Wiley & Sons Inc ProductGroup: Book Binding: Paperback ASIN: 0471692727 |
Average customer rating: |
Wavelet methods and statistical applications: Network security and bioinformatics : (Dissertation)
Deukwoo Kwon Manufacturer: ProQuest Information and Learning ProductGroup: Book Binding: Digital ASIN: B000F6I6IC Release Date: 2006-03-28 |
Book Description
Citation DetailsBooks:
Recommended Books