The scientific community is experiencing an explosion of data: Tens of millions of grant proposals, research articles, white papers and patents are produced each year. But, more than simply a change in scale, science has grown in complexity and specialization, making it more difficult for evaluators to judge the promise and progress of research investments. Hidden among these challenges, however, is also an unprecedented opportunity to understand the underlying conditions of success. This work is fueling the rapid expansion of the multidisciplinary field known as science of science.
The widespread impact of science of science lies in its ability to provide leaders with relevant and actionable data that informs the kinds of challenges decision-makers face today.
Thanks to remarkable developments in data science, network science, machine learning and artificial intelligence, we are able to leverage powerful new tools and techniques to make sense of this proliferation of data. Together, they tell a complex yet meaningful story about scientific relationships and dependencies, as well as how scientific progress emerges.
What are the fundamental mechanisms that make up a successful career in science? To what degree can we foresee the future impact of a scientist? How likely will a scientist make a future breakthrough? Our goal at CSSI is to develop a set of highly reproducible patterns that provide quantitative answers to these questions.
Our team discovered that the breakthroughs of a career occur randomly in the sequence of work. The random impact rule allowed us to uncover a unique individual parameter Q that governs the impact of individual scientists. By separating a scientist’s innate research quality from luck, the Q parameter not only systematically distinguishes high-impact scientists from their peers, but it also plays a primary role in funding and identifying high-impact scientists.
The CSSI team’s recent groundbreaking research also shows that individual careers, regardless of industry, are characterized by hot streaks over the course of a five-year period. We began by understanding the lifecycle of creativity and collecting large-scale career histories of individual artists, film directors and scientists. We found that across all three domains, impactful and successful work occurred in sequence. The team quantitatively probed this phenomenon and determined it is remarkably universal: Hot streaks are ubiquitous yet usually unique across different careers. Hot streaks emerge randomly within an individual’s sequence of work and are temporally localized and unassociated with any detectable change in productivity. Our research shows that because works produced during hot streaks garner substantially more impact, the uncovered hot streaks fundamentally drive the collective impact of an individual, and ignoring this leads us to systematically overestimate or underestimate the future impact of a career. These results not only deepen our quantitative understanding of patterns that govern individual ingenuity and success, but also may have implications for identifying and nurturing individuals whose work will have lasting impact.
Throughout history, a relatively small number of exceptional scientists have had a disproportionate and lasting impact on science and society. What factors differentiate these from other scientists with ostensibly comparable performance? To tackle this question, the CSSI team must leverage the massive datasets collected on accomplishments and impact. We seek to develop a quantitative framework capable of unveiling the phenomena that canonizes a few outstanding individuals. We are, however, cognizant of the great heterogeneity among individuals, and how common sense narratives of genius can strengthen biases that limit scientists from reaching their full potential of influence and impact. This research has the potential to reshape our understanding of the factors that govern the emergence of geniuses. Large-scale data can provide a new lens into gender and associated biases that could be strategically reversed to unleash hidden, underutilized talent for science.
The majority of science today is carried out by teams, with roughly 90% of all publications written by multiple authors. This points to the undeniable fact that teams are the fundamental engines of game-changing discoveries that impact societies and economies. However, this collaboration also introduces a new set of unique challenges surrounding team communication and coordination.
What are the factors that help or hinder the effectiveness of teams? Why do teams reward some members while others struggle to gain recognition for their contributions?
Through a series of papers, CSSI showed that large teams dominate the production of high-impact science across all times and disciplines. Our new work, however, reveals that the success of funding large teams is not guaranteed. Therefore, we seek to use data to generate a set of highly reproducible principles underlying successful teams, both large and small. The resulting insights would enable a mechanistic approach for identifying, assembling and nurturing the kind of risky, disruptive and high-impact teams that are best equipped for producing tomorrow’s breakthroughs.
One of the most universal trends in science and technology today is the expansion of large teams; meanwhile the development and support of solitary researchers and small teams seems to be diminishing. This shift led the CSSI team to question if the work produced by large teams differs from small teams, and if so, how it differs. We analyzed more than 65 million papers, patents and software products produced between 1954 and 2014. Our findings demonstrate that small teams tend to disrupt science and technology with new ideas and opportunities, whereas larger teams tend to develop existing ones. Work from larger teams builds on more recent and popular developments, and attention to their work comes immediately. By contrast, smaller teams search more deeply into the past, resulting in more disruptive work that has a longer range of success. These results demonstrate that both small and large teams are essential to a flourishing ecology of science and technology, and suggest that, to achieve this, science policies should aim to support a diversity of team sizes.
The importance of teams raises fundamental questions about the nature of team-based creativity, team assembly and credit assignment within teams. How do team outputs relate to team members’ individual productivity? The CSSI team believes that analyzing individual input and team output using millions of papers and patents could directly inform team organization, while also providing a new evaluation tool for organizing and evaluating team-produced science. We plan to build on a credit allocation algorithm developed to precisely decipher and predict credit share among team members. We plan to extend aggregate measures of scientific fitness to groups in order to generate a measure of collective intelligence. This will allow us to test whether teams can produce knowledge with greater impact than individuals.
CSSI seeks to establish an empirically grounded, quantitative understanding of why, how and when teams fail, which could serve as critical input for modeling studies that inform science policy and funding decisions. We will focus on team successes and failures, and study teams in pursuit of NIH grants and startup ventures to discover empirical evidence underlying team outcome uncertainty, which could substantially improve our understanding of the inner workings and design of high-impact, robust and disruptive teams. We will use findings from this task to seed large-scale online experiments that isolate causal mechanisms driving team advance.
Of the many tangible measures of scientific impact, one stands out: citations. They are an essential component of scientific communication by helping to uphold intellectual honesty, avoid plagiarism, attribute prior ideas to the correct sources, etc. But astonishingly, we estimate that only half of all papers ever published have been cited. This led the CSSI team to wonder: Why are most scientific discoveries rarely cited, while a few achieve runaway success? What determines which ideas are canonized and which are forgotten? Are there time-tested recipes for high-impact scientific ideas?
We began by deriving a mechanistic model for citations, allowing us to collapse the citation histories of papers from different disciplines into a single, universal curve. Our results unveiled a single fitness parameter that governs the long-term impact of an idea and demonstrates how, despite the countless factors affecting impact, simple patterns govern the emergence of exceptional scientific ideas. We also discovered that the highest-impact science is primarily grounded in conventional combinations of prior work, while simultaneously featuring unusual combinations. These two important findings have opened the door to new ways of predicting scientific breakthroughs at an increased level of accuracy and robustness.
While the rapid growth of research and data continues its upward trend, the CSSI team looks at why some papers seem to have an inherent “fitness” that can be interpreted as the scientific community’s response to the research. This predictability comes directly from a mechanistic model for citation dynamics of individual papers, allowing us to collapse the citation histories of papers from different journals and disciplines into a single curve, indicating that all papers tend to follow the same universal temporal pattern.These observed patterns not only help us uncover basic mechanisms that govern scientific impact, but also offer reliable measures of influence that may have potential policy implications.
Scientific research has long been used to enhance the development of innovation and practical application in the marketplace. To better understand this relationship, the CSSI team developed a new metric to understand the role distance/time plays between patents and scientific research. The value of this distance metric lies in its ability to provide a quantitative framework for the link between science and technology and the individuals or institutions behind the work. Thanks to this new metric, we were able to study 4.8 million patents and 32 million research articles with results that found that 80% of cited research articles link forward to a future patent, while 61% of patents link backward to prior research articles. This connection typically stands two to four degrees apart. We also found that universities tended to cite their own research directly in their patents (in other words, a distance of 1), but the distance was greater for companies, suggesting that companies rely on outsiders for their foundational research.
Computational identification of promising new discoveries and technologies is central to informed decision-making. At CSSI, we plan to apply and develop advanced ML and NLP techniques to identify, extract, classify and integrate large amounts of disjointed, structured and unstructured data sources. Thanks to our access to the full text of all papers through Elsevier and all funded/unfunded grant proposals in the NIH database, we will be able to go beyond treating papers as fundamental units of ideas. This deep learning will enable us to build a much richer representation of ideas and their adjacent possibilities.
The birth and growth of new fields depends on the foundational strength of academic disciplines. Traditionally, each field has been treated as an isolated entity, with its own citation patterns and funding sources. The CSSI team is working to build a rich network of interdisciplinary dependencies reflecting the flow of ideas at the intersection of science and technology. We will model the growth of fields as a function of both their own intrinsic behavior and their role within this network. This will deepen our understanding of the forces shaping academic growth and allow us to identify how things like new discoveries or funding reductions spread through science. It will also allow us to identify the emergence of new fields with the potential for transformative growth and impact.
It is becoming increasingly more important to study the possibilities, capabilities and impact of artificial intelligence and machine learning. Therefore, we seek to understand if this research and the fields that study social and societal trends are keeping pace with each other, as well as how these technologies can improve the way we innovate.
Rapid advances in artificial intelligence and automation technologies have the potential to significantly disrupt labor markets. While AI and automation can augment the productivity of some workers, they can replace the work done by others and will likely transform almost all occupations at least to some degree. Rising automation is happening in a period of growing economic inequality, raising fears of mass technological unemployment and a renewed call for policy efforts to address the consequences of technological change. The CSSI team will explore the barriers that inhibit scientists from measuring the effects of AI and automation on the future of work.
Recent studies have documented that central findings in many peer-reviewed publications cannot be reproduced, highlighting a critical need to understand and predict the uncertainties and robustness of ideas. Our preliminary work suggests that the robustness of ideas depends on the social fabric in which new ideas are woven: Centralized communities seem to perpetuate claims that certain findings are less likely to replicate even if widely agreed upon, whereas decentralized communities involve more independent teams and use more diverse methodologies, generating more robust, replicable results. Therefore, throughout our continuing work, we hypothesize that the emergence of AI can predict the robustness of a scientific idea at a level neither human nor machine could achieve alone.
Today’s AI has implications for the future of work, the stock market, medicine, transportation, the future of warfare and the governance of society, making it increasingly more important to study the social and societal implications of artificial intelligence adoption. On one hand, AI adoption has the positive potential to reduce human error and human bias. As examples, AI systems have balanced judges towards more equitable bail decisions; AI systems can assess the safety of neighborhoods from images; and AI systems can improve hiring decisions for board directors while reducing gender bias. On the other hand, recent examples suggest that AI technologies can be deployed without understanding the social biases they possess or the social questions they raise. Consider the recent reports of racial bias in facial recognition software, the ethical dilemmas of autonomous vehicles and income inequality from computer-driven automation. These examples highlight the diversity of today’s AI technology and the breadth of its application, an observation leading some to characterize AI as a general-purpose technology. As AI becomes increasingly widespread, researchers and policymakers must balance the positive and negative implications of AI adoption. Therefore, we will continue to ask how tightly connected are the social sciences and the cutting-edge machine intelligence research.
As the science of science and innovation continues to grow, our team is committed to exploring the depth of opportunities that big data and technology can offer us. Looking ahead, we are curious to study how other nascent frontiers such as block chain and advanced AI will impact society, and thus, how the CSSI team can conduct meaningful tests and extrapolate insights from these evolving disciplines.
We welcome collaborators of all kinds who are interested in joining or supporting our work.