Integration between HighRes Biosolutions’ Cellario lab automation software and the TetraScience Platform provides cloud-based data management and unlocks data science
Authors:
Kai Wang, Delivery Team Lead, TetraScience
Spin Wang, CEO and Co-Founder, TetraScience
Ira Hoffman, CEO, HighRes Biosolutions
HighRes Biosolutions Cellario lab automation system with the TetraScience Data Integration Platform to optimize data management and data flows in the cloud
HighRes Biosolutions designs and builds innovative laboratory automation systems, dynamic scheduling software, and lab automation instruments. Cellario, industry’s state-of-the-art lab automation software, enables instrument and robotics scheduling in the lab. Lab automation systems and software generate massive volumes of R&D data that can accelerate therapeutic discovery.
The integration between HighRes Biosolutions Cellario software and TetraScience will automate and streamline the collection of R&D data generated from Cellario into downstream data science applications and other informatics applications such as ELN and LIMS.
Cellario, with its coupled API layers, is designed to support a wide range of upstream and downstream data integration requirements. Cellario’s RESTful API can be used to integrate with any other software platform. In this case, TetraScience is leveraging Cellario’s publisher/subscriber event APIs to receive data events. In addition to standard data events that every reader creates, end users can easily customize the data stream by using scripts to create data events.
Data is centralized in the TetraScience Data Integration Platform and harmonized into the Intermediate Data Schema (IDS)-JSON, which is a structured, vendor-neutral format. Once R&D data is harmonized, it is directly queryable in web API or SQL, and can be further transformed into any format needed. Cellario produced data is now accessible by your favorite data science tools. It can also be combined with other R&D data that customers store in the TetraScience platform for further analysis.
How the integration works
Step 1: Configuration
Our connector was developed in collaboration with HighRes Biosolutions as part of our integration. You can simply configure connection to the Cellario software on the TetraScience platform web interface. No need to kick-off a multi-month customization project, write code from scratch, and then spend even more effort and money to maintain the connection [1].
The product roadmap for this integration includes more features in future releases, such as, filter by event/data types, data selection with a determined time frame. These are based on use cases crowdsourced from Life Sciences companies within the TetraScience Network. Continuous platform innovations, like new features and capabilities, are made available to customers regularly.
IMAGE: Configuration of HighRes Biosolutions Cellario software to the TetraScience Data Integration Platform
Step 2: Collect RAW Files and Attach Metadata
The first step after configuration is collecting the RAW files. The files are automatically extracted by our Cellario connector and uploaded into the TetraScience data lake. The connector also collects and attaches important metadata to the files. Metadata and tags are customizable, and often include information about the order, request, plate, Cellario protocol, and/or other relevant metadata. Tagging metadata provides powerful context to integrate data with ELN/LIMS and perform advanced data science and analytics. Sufficient context is one of the foundational steps in FAIR data principles.
IMAGE: Metadata attached to Cellario extracted files
IMAGE: Select Cellario Metadata & Tags
Step 3: Data Engineering + Data Science in the Cloud
After the automatic data collection process, a data pipeline parses the data into TetraScience IDS-JSON format. The data is now harmonized in the cloud-native data lake. Two immediate benefits of this data engineering are 1) your data is is queryable, which means scientists and data scientists can find it, and 2) once your data is accessible, queryable, and in a common format like JSON, it can be imported into a myriad of data science tools to discover actionable insights.
IMAGE: Visualization of HighRes Biosolutions lab automation systems usage
IMAGE: Visualization of data produced during a plate reader run
This process enables data and workflows that are accessible and scalable in a secure, cloud-native environment.
Step 4: Inline Cloud-based Data Analysis and DOE
This integration establishes a closed-loop in lab automation and Design of Experiment (DOE) software. There are also APIs to place orders and manage automation systems and Cellario-controlled instruments and devices.
IMAGE: Diagram of Design of Experiment (DOE)
To take it one step further, you can introduce cloud-based in-line analysis and calculation, using interactive data science tools like Jupyter Notebook. The lab automation system now benefits from in-line cloud computation. You can also leverage historical data sets – contributing to the model and instructing the next point to search in the parameter space.
IMAGE: Diagram of closed-loop experiment design
Summary
TetraScience + HighRes Biosolutions best-in-class workflow, control, and orchestration solution will no doubt allow scientists to unlock discoveries in life sciences faster. The manual effort associated with data integration and subsequent human error is virtually eliminated – allowing for high integrity and consistency of the data. Leveraging an enterprise-grade and cloud-native platform enables actionable insights in the entire development and discovery process.