Toward AI-Guided Regenerative Medicine
SingularityNET is applying an integrated AI approach to build an automated regenerative medicine system.
This article was co-written by Ben Goertzel & Mike Duncan
Regenerative Medicine offers possibilities that can greatly enhance the human condition and massively reduce various causes of suffering in living beings. Scientists have developed stem cells that can be changed to other cell types — a process called cellular differentiation. Through such guided acts of cellular differentiations, humanity can actualize the promises of Regenerative Medicines: grow organs, cure aging etc.
There are various protocols to enact such acts of cellular differentiations. However, the creation of such protocols is extremely complex . An appropriately designed automated system will be able to craft and adjust custom protocols for successful stem cells differentiations.
SingularityNET is applying an integrated AI approach to build such an automated system . OpenCog’s MOSES evolutionary program learning algorithm and the PLN probabilistic logical inference framework will be applied in learning stem cell differentiation protocols from experimental data and curated background knowledge.
The various components of the system will be implemented within the SingularityNET framework as independent Agents. This will not only allow for the reconfiguration of component tools but also for experimentation with other Agents of SingularityNET — some of which may help to further enhance and evolve the initially implemented system.
Therefore, this project plans to not only create an automated system for custom protocols of differentiations but also to explore the various possibilities of mixing such a specialized intelligence with the general capacities of other Agents on SingularityNET . Throughout 2018, this project will be used as a case study to improve the various designs, features, and capacities of SingularityNET.
The development of this project on a decentralized framework is a step towards democratizing medicine and ensuring that the fruits of Regenerative Medicine will not be reserved only for a privileged few. A project of this scale requires participation by a variety of researchers and developers, and we welcome everyone to participate in actualizing the potentials of Regenerative Medicine for the betterment of all the sentient beings of the planet.
1.0 Possibilities of Regenerative Medicine
To understand what Regenerative Medicine offers, it is pertinent to understand what exactly are Induced Pluripotent Stem Cells (iPSCs).
Induced Pluripotent Stem Cells (iPSCs) can be thought of as Master Cells: they have the capacity to change from one cell type to another.
The process by which iPSCs change their cell type is referred to as cellular differentiation . Although other types of stem cells also have the same capacity to be differentiated to other cells, the discovery of iPSCs has allowed for an ethical way to utilize the benefits of ‘guided’ differentiations.
The possibilities of Regenerative Medicine are, therefore, just as broad as the underlying potential of the cells being used. For example: the iPSCs can be made to differentiate into beta islet cells to treat diabetes, blood cells to create new cells free from cancer for a leukemia patient, or neurons to treat neurological diseases.
If tissues and organs of a patient have degenerated to an extent that they cannot be repaired, Regenerative Medicine holds the possibility of growing tissues and organs in the laboratory and transplanting them into the body. As the cells used to grow the organs are from the patient, the possibility exists to end the occurrence of organ transplant rejection and to eliminate the shortage of organs available for transplants.
In fact, scientists see Regenerative Medicine playing a definitive role in the radical extension of human health span by repairing the cellular damage caused by aging.
We believe AI can play a crucial role in order to fully actualize what Regenerative Medicine has to offer — and that it is crucial for such transformative medicinal technologies to be democratized so that they are not controlled by a handful of governments or pharmaceutical companies.
2.0 Differentiation Protocols and AI
As one might imagine, regenerating youthful tissues and organs inside living human bodies is not a simple matter — the specifics depend on exactly what types of cells and body systems one is working with, as well as on specifics of the patient involved. Biomedical researchers are working through these difficulties, but it is clear that progress could be accelerated significantly via appropriate application of AI.
The core technology required to make regenerative medicine work is the controlled differentiation of a patient’s cells to repair or replace compromised tissues and organs.
However, the creation of protocols to enable this differentiation involves many complexities, including various dependencies of the protocols on properties that vary from patient to patient.
Therefore, to allow for widespread clinical application of this technology, complex and dynamic protocols that allow for successful differentiation of cells are required. Such protocols will control graft production or in situ regeneration based on patient genotype and phenotypes.
Given the diversity of patient genotypes and phenotypes, it seems plain that making this kind of technology widely accessible will likely require significant automation.
An appropriately designed automated system will be able to craft and adjust custom protocols for iPSC differentiation.
In order to do this, such a system would need to organize the vast amounts of relevant existing experimental data and curated knowledge. Based on this knowledge, it would propose predictive models for development and in-silico testing of differentiation and trans-differentiation based cell and tissue culture protocols.
Toward this end, we are experimenting with using AI tools from the OpenCog framework for automated learning of regenerative medicine protocols.
The core AI tools we are exploring for this sort of application are OpenCog’s MOSES evolutionary program learning algorithm, and OpenCog’s PLN probabilistic logical inference framework — tools that have both been applied previously in the biomedical domain[i] [ii] [iii], but not in any application of such complexity.
At every step of the regenerative medicine treatment development pipeline, there are significant and urgent opportunities for automated knowledge discovery. Furthermore, integration tools will be needed to assist scientists and clinical researchers.
And so we are exploring how to connect these OpenCog AI tools to other bioinformatics tools and datasets within the SingularityNET framework. Such a connection will allow for two benefits:
- It will supply a highly configurable and adaptable decentralized framework for regenerative medicine applications and related biomedical informatics tasks.
- It will ensure that the data and results obtained via this work are handled in a democratic and participatory way.
So this project, in a nutshell, involves:
Application of evolutionary program learning and probabilistic logical inference to learning stem cell differentiation and somatic trans-differentiation protocols from experimental data and curated background knowledge.
This is not a small task, and we are still in the early stages. But we believe this is the kind of thing that must be done to really get the most out of regenerative medicine, and we believe that both OpenCog and the SingularityNET framework are highly appropriate tools for pursuing this work.
3.0 The First Steps and The End Goal
To make fully flexible and personalized regenerative medicine possible, we believe what is fundamentally needed is to create an integrated AI-based system for the automated creation of protocols for controlled cell differentiation in a regenerative medicine context.
In its advanced version, such a system would include:
- A continuously updated and integrated knowledge-base of public and customer proprietary pan-omic data sets and curated databases and ontologies
- Symbolic and numerical models of cell and tissue development, inferred from knowledge-base and generated de novo from experimental data
- Programmatic representations of experimental cell and tissue culture protocols
- Integration of automated inference and in-silico simulation experimentation for evaluating existing evidence regarding researcher-supplied hypotheses about the potential results of candidate protocols, and the dynamics of cell and tissue development
- Automated generation of novel candidate protocols, and hypotheses regarding the dynamics of cell and tissue development
- Application of multiple integrated machine learning techniques to evaluate experimental results and infer symbolic and numerical dynamics of experimental systems
- Integration of inferred knowledge from experimental findings into knowledge-base and model formulation
- Generation of proposals for experiments to generate missing knowledge
As a first step toward an ambitious system of this nature, what we are aiming at is using a novel integration of evolutionary program learning and probabilistic inference to learn new protocols based on data regarding cell and tissue biology plus data about existing protocols and their results in different contexts.
The plan is to first demonstrate the ability of this system to generate novel and useful results, and then extend the system gradually in the direction of the broader vision.
3.1 A Simple Test Problem
To get started in demonstrating the ability of our proposed AI system in generating novel and useful results, we are exploring some simple test problems.
Consider, for instance, a case where:
- An iPSC to liver cell protocol is known [iv]
- A human fibroblast to brown adipocyte protocol is known[v]
- However, an iPSC or somatic cell to functional hematopoietic precursor/stem cell is not known[vi]
So the challenge we are faced with involves:
Beginning with the first two protocols — plus related knowledge about cell and tissue development — and then inferring educated guesses as to protocols of the third type.
A variety of data types may be used to approach this challenge, for example: cell morphology imaging, immunocytochemical profiles, imaging of cells revealing tagged proteins or western blots, transcriptome profiling, and more.
4.0 Programmatic Formulation of Stem Cell Differentiation Protocols
One of the first steps towards automated learning of stem cell differentiation protocols is to represent such protocols in a precise language that is easily analyzed and manipulated by automated learning systems.
Keeping that in mind, we have developed a scheme for representing protocols as small programs in functional programming languages such as LISP. This is valuable in practice because many AI tools at our disposal, including OpenCog’s MOSES evolutionary learning algorithm[vii] and PLN probabilistic logic[viii] framework, are able to learn, manipulate and analyze small programs of this nature.
To illustrate the concept, we give here a concrete example: beginning with some auxiliary functions and ending up with the programmatic formulation of a simple protocol.
A LISP program for combining substrates and chemical differentiation factors (dfs) into:
(define myMedium (list (substrate hE-cad-Fc]) (growthFactors (list dfA dfB dfC)) ))
A program to combine cell lines and vectors for transcription factor inducers (TFi), miRNA inducers (mRNAi) into starting cell line:
(define MyStartCells (apply (list TFiX mRNAiY) cellLine3))
A program to define incubation protocol steps as functions of cells, additional factors, and incubation variables that return new cells:
(define DifStep1 (cells medium) (incubate cells medium))) (define DifFinal (cells medium) (incubate cells (add dfQ medium))))
A program to define measurement operations on cells that return truth value depending on a panel of cell culture measurements (the output of which is a fuzzy truth value indicating the fraction of cells differentiated successfully):
(measure someCells myPanel) # 0.3 (define DifCheck (cells medium panel) (if (> (measure cells panel) 0.7) (incubate cells (add dfP medium))) else (incubate (apply mRNAiZ cells) medium) ))
Building on the above functions, we can formulate in LISP an example protocol, defined as a sequence of steps with branch points determined by measurements of intermediate cell types:
(define myProtocol (cells medium) (DifStepFinal (DifCheck ( DifStep1 cells medium)))
The following code puts it all together and checks results
(measure (myProtocol myStartCells myMedium) resultPanel)
4.1 An Integrated AI Approach to Protocol Learning
The research we are exploring in this project involves the automated learning of protocols like the one presented in Section 4.0 (or more complex examples along similar lines).
This research can be approached via the integration of two AI techniques, both of which are currently implemented in the OpenCog AI framework: evolutionary program learning, and probabilistic logical inference.
In prior work we have explored the power of this combination for speculative inference in bioscience[ix], but the current application is different in that it involves practical laboratory procedures rather than inference of causal hypotheses from data.
The MOSES evolutionary program learning tool is able to learn LISP programs of the format exemplified in Section 4.0 above. It iteratively evaluates a population of candidate programs and then generates new candidates via mechanisms that include: mutation, combination and lastly, probabilistic modeling-and-generation — all of these operations act on the programs with the most successful evaluations.
However, like all evolutionary algorithms, MOSES requires a fitness function — a method of evaluating the likely success of the each program (in this case each protocol) in its population.
Evaluating each candidate protocol in the evolving population via actual biological experimentation is not viable at present: it would be too slow given the large number of candidates that need to be evaluated before something even remotely useful is found. There are two viable sources of fitness estimates in this context: simulation modeling, and probabilistic inference. The most successful approach will probably involve using the two sources together.
Probabilistic inference, for this sort of problem, is likely to be most successful if it centers on analogical reasoning.
For instance, in the test problem outlined in Section 3.1 — Probabilistic Logic Network (PLN) inference could use known iPSC to liver cell and human fibroblast to brown adipocyte protocols as analogies via which to reason about potential iPSC or somatic cell to functional hematopoietic precursor/stem cell protocols.
As such analogies are relatively subtle, PLN will need to use significant background knowledge regarding the underlying cells and tissues and their dynamics to carry out its reasoning.
4.2 SingularityNET Implementation
To implement the integrated AI approach to protocol learning within the SingularityNET framework, we will create separate agents embodying:
- MOSES evolutionary program learning — configured for learning stem cell differentiation procedures.
- PLN logical inference — configured for reasoning about genetic procedures, and supplied with a knowledge-base of relevant information.
- A Simulation Engine — configured for detailed numerical simulation of stem cell growth.
- Query Interfaces — to multiple relevant data containers, including generic data regarding stem cell biology and related multi-omic information, and specific data involving patients whose stem cells and whose reactions to stem cell therapies have been studied.
- Tools for Preprocessing and Normalizing Datasets — often implemented in R and then wrapped in SingularityNET interfaces.
The advantage of implementing this kind of workflow in a highly general framework like SingularityNET is that the component tools can then be easily reconfigured for use in different ways in different experimental applications. This ability to reconfigure component tools is highly valuable when one is doing exploratory research — and also in general when one is operating in a domain where the data, techniques and knowledge in use are so rapidly evolving and improving.
In the modularly designed SingularityNET, all the component tools are wrapped in API’s that allow them to communicate and exist as independent Agents. In such a modular structure it becomes straightforward to swap out the agent serving any one role in the process (say, the simulation engine) for another agent capable of serving the same role.
This ability to replace certain AI agents in the SingularityNET with their own will enable multiple researchers and developers to collaborate on the work, even if some of these developers or researchers may not fully understand the whole picture in which their code or data is being utilized.
For instance, a third party developer may only be working toward creating a highly effective Simulation Engine and may create such an Agent on the SingularityNET. If this new Agent is really effective at running numerical simulations of stem cells growth, it may come to be used as the most dominant Simulation Engine by the other Agents working towards this project. The modular structure of SingularityNET framework will hence allow for the overall automated system to improve and evolve by utilizing other Agents created by third party developers.
As the SingularityNET platform is gradually developed throughout 2018, we will use this integrated AI approach to protocol learning example as a case study to help us refine various aspects of the overall design of the network such as the “API of APIs” for communication between Agents and the integration with SingularityNET of third party data-preprocessing, simulation tools and large-scale numerical datasets.
The actual progress of this work will evolve based on the nature of the early results obtained. But for purposes of understanding and planning, it is worthwhile even at this early stage to sketch out our envisioned direction for development:
- Curate appropriate gene, protein, and cell level knowledge-base, within OpenCog’s Atomspace.
- Refine a symbolic/boolean model representation of cell type defining gene regulatory networks and associated protocols (in the vein of the LISP examples given in Section 4.0 ).
- Configure MOSES and PLN together to enable automated learning of transcription factor-based differentiation protocols, as outlined earlier.
- Integrate the combination of tools into the SingularityNET framework.
- Begin pursuing experimental verification of results obtained via this automated learning.
- Continue experimental verification of results
- Addition of small molecule representation to Atomspace
- Begin integration of numerical modeling of cell development into framework, this enables simulation modeling based fitness evaluation of candidate protocols (complementing inferential fitness evaluation)
- Development of knowledge representation for cell morphology and extra-cellular environment to model tissue development
- Expansion of protocol inference to tissue level systems
- Expansion of protocol inference to trans-differentiation systems
- Begin development of knowledge based process monitoring for automated real-time control of clinical product production
- Refine and optimize the embedding of the toolset in SingularityNET based on user feedback
- Complete integration of numerical modeling of cell differentiation and tissue development/regeneration
- Demonstrate automated control of clinically relevant cell product production
- Expansion of knowledge representation for organ level system modeling
- Expansion of protocol inference to organ system development
- Expand knowledge based process monitoring for automated real-time control of tissue and organ production
- Begin integration of automated high-throughput cell culture system for closed loop hypothesis inference, experimental validation, knowledge-base update and automated knowledge generation system.
6.0 Future Horizons
There is a reasonably long road to walk from here. Bringing this vision to reality will require the cooperation of a significant community of researchers and developers spanning bioinformatics, AI, software engineering, simulation modeling and biological and medical lab work.
Yet there is nothing in this project that’s miraculous or mysterious — just a step-by-step configuration, combination and application of known algorithmic tools to datasets that laboratories and clinics are now collecting. By bringing together an appropriate variety of tools within an adequately flexible framework and carrying out judicious experimentation, the science-fictional dreams of using AI to grow human organs on demand or of restoring youth to aging human body tissues can be systematically realized.
Observers of the high-level progress of the AI field may note that none of what we’ve described here requires anywhere near full-on human-level Artificial General Intelligence — rather, “merely” a careful combination of various narrow-AI techniques.
However, it’s also fascinating to think about what might result if one connected an AI system capable of guiding stem cell differentiation as described here, with an AI possessing basic human-like commonsense reasoning such as SingularityNET is working toward in its close collaboration with its co-founding partner firm Hanson Robotics.
Putting commonsense understanding together with this sort of specialized intelligence, could well result in radical scientific creativity of a type never seen before. Not merely creation of new or newly personalized stem cell differentiation protocols — but invention of whole new types of therapy, based on the understanding of the specifics of Regenerative Medicine in the context of the whole human organism and its embedding in a larger matrix including physics, chemistry and everyday human social life.
A mature SingularityNET will provide an excellent platform for experimenting with these broader integrations, as well as the narrower sorts of application-specific integrations we have elaborated here.
The relevance of the decentralized nature of the SingularityNET platform in this context is worthy of mention. Our work on Regenerative Medicine is driven by scientific and humanitarian rather than political goals, but one can never fully separate these domains of life.
The ability to restore aging bodies to youth will be a powerful gift, and it is important that society bestows it on everyone who wants it, rather than just an elite who can afford it. These technologies must be implemented in a way that naturally drives them toward wide dissemination and aggressive cost reduction.
The analysis of human stem cell data from human populations is something that can provide the bearer of the analytic results with considerable practical value in various regards. It is desirable that this value is held to a significant degree by the contributors of the data, and overall by humanity at large, rather than being stored in Data Silos and being cordoned off for the exploitation by a handful of governments or pharmaceutical firms.
This drive towards democratization is one of the main drivers behind the recent flourishing of blockchain-based medical data projects (such as SingularityNET’s partner project Shivom), and the performance of the AI-based regenerative medicine research described here within a decentralized framework constitutes one more step towards the democratization of medicine.
We believe in the application of AI for the good of all sentient beings, and in a participatory and adaptive way — and these general principles, if they are to be more than empty verbiage, need to be manifested in the implementation and rollout of practical AI like the regenerative medicine applications described here.
[i] Looks, Moshe, Ben Goertzel, Lucio de Souza Coelho, Mauricio Mudado, and Cassio Pennachin, “Clustering Gene Expression Data via Mining Ensembles of Classification Rules Evolved Using MOSES”, Genetic and Evolutionary Computation COnference (GECCO), 2007.
[ii] Looks, Moshe, Ben Goertzel, Lucio de Souza Coelho, Mauricio Mudado, and Cassio Pennachin, “Understanding Microarray Data through Applying Competent Program Evolution”, Genetic and Evolutionary Computation COnference (GECCO) , 2007
[iii] Ben Goertzel , Nil Geisweiller , Eddie Monroe , Mike Duncan , Selamawit Yilma , Meseret Dastaw , Misgana Bayetta , Amen Belayneh , Matthew Ikle’ , Gino Yu, Speculative Scientific Inference via Synergetic Combination of Probabilistic Logic and Evolutionary Pattern Recognition, Proceedings of the 8th International Conference on Artificial General Intelligence, July 22–22, 2015
[iv] Si-Tayeb K, Noto FK, et al. Highly Efficient Generation of Human Hepatocyte–like Cells from Induced Pluripotent Stem Cells. Hepatology. 2010 January ; 51(1): 297–305. doi:10.1002/hep.23354.
[v] Takeda Y, Harada Y, et al. Direct conversion of human fibroblasts to brown adipocytes by small chemical compounds. Scientific Reports; 7: 4304. doi:10.1038/s41598–017–04665-x
[vi] Daniel MG, Lemischka IR, Moore K. Converting cell fates: generating hematopoietic stem cells de novo via transcription factor reprogramming. Ann N Y Acad Sci. 2016 April ; 1370(1): 24–35. doi:10.1111/nyas.12989.
[vii] Looks M. Competent program evolution. PhD thesis, Washington Uinversity, 2006: St Louis, MO
[viii] Goertzel B, Ikle M, et al. Probabilistic Logic Networks: A Comprehensive Framework for
Uncertain Inference. Springer Publishing Co, Inc. 2008. ISBN:0387768718 9780387768717 (full text)
[ix] See reference iii above