Leverage Cloud-Based Data Management and Unlock Data Science by Integrating Cellario Whole Lab Automation Software and TetraScience Tetra Data Platform
HighRes Biosolutions designs and builds innovative laboratory automation systems, dynamic scheduling software, and lab automation instruments. Our Cellario whole lab automation software enables instrument and automation scheduling in the lab and is especially useful in concert with our Nucleus automation infrastructure systems. Lab automation systems and software generate massive volumes of research and development data that can accelerate therapeutic discovery.
TetraScience accelerates scientific discovery and development through cloud-based software. Their vendor-neutral and open Tetra Data Platform collects, centralizes, and harmonizes scientific data.
Streamline the collection of data
The integration between Cellario and Tetra Data Platform software platforms will automate and streamline the collection of data generated from a Nucleus automation infrastructure (powered by Cellario) into downstream data science applications and other informatics applications such as electronic lab notebooks (ELN) and laboratory information management systems (LIMS).
Cellario, with its coupled application programming interface (API) layers, is designed to support a wide range of upstream and downstream data integration requirements. Cellario’s representational state transfer (RESTful) API can be used to integrate with any other software platform.
In this case, TetraScience is leveraging Cellario’s publisher/subscriber event APIs to receive data events. In addition to standard data events that every reader creates, end-users can easily customize the data stream by using scripts to create data events.
Data is centralized in the Tetra Data Platform and harmonized into the Intermediate Data Schema (IDS)-JSON, which is a structured, vendor-neutral format. Once data is harmonized, it may be directly queried in web API or structured query language (SQL) and may be further transformed into any format needed.
Cellario produced data is now accessible by your favorite data science tools. It can also be combined with other research and development data that customers store in the TetraScience platform for further analysis.
Integration Step 1: Configuration
The connector was developed in collaboration with TetraScience and HighRes Biosolutions as part of the integration. Simply configure connection to the Cellario software on the TetraScience platform web interface. There’s no need to invest in a multi-month customization project, write code from scratch, or spend additional effort and money to maintain the connection.
In future releases, the TetraScience platform will include features such as filter by event/data types and data selection with a determined timeframe. These features are based use cases from life science companies within the TetraScience Network. Continuous innovations, such as those just mentioned, are regularly made available to customers.
Integration Step 2: Collect RAW Files and Attach Metadata
RAW files contain complete and uncompressed information related to images. These files are automatically extracted by the Cellario connector and uploaded into the TetraScience data lake.
A data lake is a centralized repository to store and process data in its native format for future use. Unlike a data warehouse that stores only structured data (i.e. easily searchable text and numbers), a data lake has flexibility to store structured data as well as unstructured data (i.e. difficult to search video, audio, and other rich media).
Powerful context to integrate data
The connector also collects and attaches important meta data to the files. Meta data and tags are customizable, and often include information about the order, request, microplate, Cellario protocol, and other relevant meta information. Tagging meta data provides powerful context to integrate data with ELN and LIMS and to perform advanced data science and analytics.
Sufficient context is one of the foundational steps in FAIR data principles. These four principles, as detailed by the GO FAIR initiative, advocate for data that is findable, accessible, interoperable, and reusable.
Integration Step 3: Data Engineering and Data Science in the Cloud
After automatic data collection, a data pipeline parses the data into Tetra Data. This is a universally adoptable data model that is liquid, actionable, harmonized, and adherent to FAIR principles. The data is now harmonized in the cloud-native data lake.
Two immediate benefits of this data engineering are:
- Data can be queried so that scientists and data scientists can find it.
- Data is now accessible, queryable, and in a common format like JavaScript Object Notation (JSON). This means that data can be imported into a myriad of data science tools to discover and propel actionable insights.
This process enables data and workflows that are accessible and scalable in a secure, cloud-native environment.
Integration Step 4: Inline Cloud-Based Data Analysis and Design of Experiment
This integration step establishes a closed loop in lab automation and design of experiment (DOE) software. APIs exist to facilitate sample order placement and manage automation systems such as Nucleus as well as any devices controlled by Cellario.
On top of this, you can introduce cloud-based inline analysis and calculation using interactive data science tools such as Jupyter Notebook from the non-profit and open-source Project Jupyter. The automation system now benefits from inline cloud computation.
You can also leverage historical data sets thus contributing to the model and instructing the next point to search in the parameter space.
Summary
TetraScience and HighRes Biosolutions offer a best-in-class workflow, control, and orchestration solution. This will no doubt empower scientists to unlock life science discoveries faster than ever before.
Manual efforts associated with data integration along with risks of human error are virtually eliminated to allow for high data integrity and consistency. Leveraging an enterprise-grade and cloud-native platform enables actionable insights throughout the entire development and discovery process.
Revision: BL-DIG-200730-01_RevC