Skip to main content

Connecting EHR to BigQuery for Healthcare Data Analytics - Low Code Approach

At Trillo, we always lean towards a declarative, serverless design. This yields a low-code application. The net result is, in 300 lines of code, we can connect an on-premise Electronic Health Record (EHR)/ Electronic Medical Record (EMR) systems to BigQuery or Cloud SQL and build a data warehouse. Next, by connecting it to DataStudio using custom connectors, you can build an end-to-end data analytics application in less than a week. The custom DataStudio connectors provide security and access control over the data. Your solution is guaranteed to be HIPAA compliant. You can bypass several cloud technologies and achieve your business goals in a short time and at a low cost.

End-to-End Pipeline - Integrated Experience

The following is a brief description of the end-to-end pipeline. Trillo Workbench provides integrated experience from data ingestion, modeling, scripting (serverless functions), orchestration, scheduling, audit logs, alerts, etc.

Data Ingestion

EHR/EMR data is ingested by bulk API, connecting to the database over a secure link (VPN) or SFTP. Trillo Workbench handles each form of data ingestion transparently.

Soring Data in BigQuery or CloudSQL

Once the data is ingested, it can be processed by Trillo Workbench in bulk or one record at a time. It then batch writes data either to BigQuery or CloudSQL. Trillo Workbench handles several tasks transparently and using directives provided by the user using a UI. These include:

  1. Creation of Database Schemas in CloudSQL and BigQuery (optional for BigQuery).
  2. Processing rows and executing an ETL logic.
  3. Efficiently bulk writes of data.
  4. De-duplication of rows (required in BigQuery to avoid DML operations)

Data Transformation (ETL)

Trillo Workbench supports the following techniques for the ETL.

  1. Use declarative mapping (specified using UI).
  2. Execute script or code per row.
  3. Execute SQL statements (including procedure) to create views and materialized views.

Orchestration

As seen above, the entire process consists of multiple steps. Trillo Workbench handles orchestration of steps including recovery, throttling, scheduling, audit logs, alerts, etc.

Monitoring

Trillo Workbench manages scheduled workflows, records execution so you can monitor and troubleshoot. If a workflow fails, it can send alerts.

Audit Logs

Trillo Workbench logs activities of the workflow and each step of processing (this is a summarized view and in addition to Google Stackdriver logs).

Visualization

Google’s DataStudio can be used for visualization. Or, any other tool such as Looker, Tableau can be connected to the data warehouse or materialized views (data marts). Trillo provides custom data connectors for BigQuery, CloudSQL to handle security, access control, and implementation of complex queries on the server as an API call.

Conclusion

Trillo Workbench provides an integrated experience of building data warehouses and data analytics applications on the Google Cloud. We chose EHR/EMR (healthcare domain) to demonstrate the security and compliance (HIPAA) aspects of the platform and the Google Cloud. But it can be used generically to build a data pipeline for any other domain.

trillo

Author trillo

More posts by trillo