Rebecca W. Doerge is the Trent and Judith Anderson Distinguished Professor of Statistics at Purdue University and is the President’s Fellow on Big Data and Simulation. She joined Purdue in 1995 and holds a joint appointment between the Colleges of Agriculture (Department of Agronomy) and Science (Department of Statistics). Doerge’s research program is focused on statistical bioinformatics, a component of bioinformatics that brings together many scientific disciplines into one arena to ask, answer and disseminate biologically interesting information in the quest to understand the ultimate function of DNA and epigenomic associations. Doerge has been the recipient of several awards at Purdue, including the Teaching for Tomorrow Award, the University Scholar Award and the Provost’s Award for Outstanding Graduate Faculty. She is an elected fellow of the American Statistical Association, an elected fellow of the American Association for the Advancement of Science, and a fellow of the Committee on Institutional Cooperation. She obtained her Ph.D. in statistics from NC State under the direction of Bruce Weir, and was a postdoctoral fellow with Gary Churchill in the Department of Biometrics and Plant Breeding at Cornell University.
What do you remember most about your time as a statistics graduate student at NC State?
I moved from Salt Lake City, Utah, to Raleigh. So, my most prominent memory is the lack of mountains and the number of trees! With respect to graduate school in statistics, my most significant memory is the necessary adjustment that I needed to make when moving from mathematical theory to statistics. In the Department of Statistics at NC State, the learning atmosphere was different, the classes were much larger, and there were so many faculty working on so many different areas and applications. I moved to NC State specifically to work with Bruce Weir, and to study statistics and quantitative genetics. Studying across disciplinary boundaries is another memory. I was an early adaptor (pre-bioinformatics) — not all faculty and students in statistics understood or supported interdisciplinary research. I spent a lot of time convincing people that there was a strong future in the marriage of statistics, genetics and computing.
Why did you decide to minor in genetics and do research in statistical bioinformatics? Did you take genetics classes as an undergraduate?
Interestingly and surprisingly, I took no genetics or biology classes as an undergraduate; I studied theoretical mathematics at the University of Utah from 1982 to 1986. During my master’s work in statistics in the Department of Mathematics at Utah from 1986 to 1988, I became interested in computing which led to applications in human genetics. This was the beginning of my interest in genetics, heredity and the technologies employed to investigate the genetic code for the purpose of producing data that required data analysis. The year following the completion of my master’s, I worked for a human genetics research group at the University of Utah Medical School, analyzing data. This experience was life-changing and a key motivator to continue on to a Ph.D. I realized the fun I was having and how much I had to learn about statistics, computing and genetics. When I moved to NC State, I transitioned from human genetics applications to agricultural applications. I like to joke that unlike humans, plants stay where you put them, eat what you tell them, and mate with whom they are told.
Is there anything you wished you had learned or done as a student that would have helped you later in your career?
In retrospect I wish I had more mentoring. I am first-generation-educated so honestly, at times, I had no idea what I was doing. I wish someone told me to be patient with my education (i.e., slow down), and that it would all be okay. One more year of graduate school would have greatly benefited both my education and maturity. Often, students are in such a rush to complete their Ph.D. that they fail to gain the maturity in doing research (time spent reading, talking and meeting people in your field). At the end of my Ph.D., I took a postdoctoral position at Cornell University which was the gift of time and maturity I needed to move forward with an academic career. I am still very grateful for my time at Cornell.
What kinds of methods in your research do you use most frequently?
Historically, the majority of the analyses we perform, and the novel statistical methods that we develop, are based on general linear models and/or machine learning applications. We rely heavily on resampling techniques to assess the statistical significance of our results. Toward this end, our work is highly computational as we work in high-dimensional data spaces that are very complex. Currently, we are connecting mathematical topology with the concept of statistical analysis of data produced from shapes, images and high through-put phenotyping in agriculture.
What are you most excited about in your research? Where do you see your research going in the next 5-10 years?
When people ask me what I do in my research program, my typical response is that my research group chases technology. We enjoy the position of being one of the first research groups in the world to figure out the statistical issues involved in new data collection to investigate biological data.
Thinking about technology in 5-10 years is almost impossible; everything is moving so quickly. Hypothetically, I can imagine that when a child is born, that person’s DNA and epigenome are both immediately understood (i.e., sequenced, or whatever word will be used in 5-10 years) and recorded in that person’s own personal data system. Maybe, if we stretch our minds, we can even imagine a nano-device implanted for the purpose of recording/assessing all activity (what we eat, drink, feel, etc.) for the purpose of maintaining the human body as we do for, say, a car. Similar to taking your car to the shop for diagnostics of its internal memory cards of performance, humans will have doctors’ visits or maybe at-home monitoring devices that will identify and diagnose medical and health issues. We can also imagine having the same information for our food sources, and being able to match food source information with personalized nutritional and health needs.
What are some reasons why graduate students in statistics should do research in statistical bioinformatics? What career opportunities are available to them?
The first reason is because this science is so cool and fun! The future for statistical bioinformatics is extremely bright. As I just described, the amount of data that are, and will continue to be, collected is overwhelming. Statisticians with interdisciplinary training in statistics, genetics/genomics/epigenomics, and computing have great potential to participate in research teams that are solving issues of global and societal impact. While about half of my Ph.D. students choose academic careers, the other half are employed in pharmaceuticals, medical clinics and national labs, as well as very-well-known companies in both San Francisco and Seattle. The statistics, communication and computing skills learned during statistical bioinformatics training are easily transferable to many others areas of science and industry.
If graduate students in statistics are interested in doing research in statistical bioinformatics, what advice would you give them in order to be best prepared for the field?
First, in addition to strong training in foundational statistics, I advise taking as much mathematics as possible. The ability to think abstractly is essential, and quite freeing when dealing with high-dimensional complex data spaces. Second, learn to program in a programming language (C, C++, Python, etc.). All statisticians need to be able to help themselves, think algorithmically and not be so reliant on packaged data analysis software. Third, take courses in, and read, foundational biology and genetics. Finally, learn to communicate to non-specialists. Unless you can explain — verbally and in writing — what you have done, why it is important and how your approach is advantageous to your collaborators, no one is going to believe you.
How related is your work to areas such as precision medicine, targeted therapy and dynamic treatment regimes? Do you collaborate with other researchers in those fields?
(See previous answer to the next 5-10 years) My research group works in the areas of human breast cancer and nutrition, as well as prostate cancer. The goal for both of these large projects is early diagnosis and drug development.
What do you think has made you successful in your career and what career advice would you give to other professionals in statistics?
My desire to make a difference in the world, to leave the world better place for future generations, is my main motivator. Whether it is training the next generation of students, or working together with teams of people in science, my success is the result of very hard work, good communication skills and knowing a lot of people who are equally passionate about their science. As far as advice, don’t get comfortable. Push yourself to change, grow and to say yes to opportunities that scare you. And, finally, when in doubt, be generous.