It is no secret that today there is an abundance of data tools to help companies derive value from their data and use it more effectively throughout their business. Data teams have adopted these different tools, in an attempt to empower business and product teams to explore the data themselves and use it in their decision making process. However, the most time-consuming and critical step in extracting value from data is structuring the data in such a way that it can be used by downstream applications and teams.

The reason this step is so time-consuming is due to data being collected by multiple data sources in a business. Studies suggest organisations with 100 to 499 employees use an average of 47 SaaS applications, while those with more than 1000 employees use more than 140 SaaS applications. It is then the job of data/engineering team to understand:

  • What data is collected by each of these various sources?
  • How is this data structured and how do I define the relationships between these various sources?
  • How should I structure this data so that it can be used effectively by downstream applications and teams?

In addition to the above, understanding the domain knowledge specific to the various departments as well as the tooling used by the team is crucial in driving the effective use of your data, aspects which most data professionals lack.This is why we believe there needs to be a collective effort from the data community to solve this issue. Therefore, we at Cerebrium are open-sourcing our dbt packages for common Airbyte connectors to encourage innovation in analytics engineering through collaboration from the data community.

By open-sourcing our dbt packages we hope to achieve 3 things:

  1. Decreased time to value

    Dbt packages allow companies to get access to structured data almost instantaneously, bypassing the 3-step process we outlined above. It also provides a starting point for companies to elaborate on further, adding their business rules so that teams can use it downstream.

  2. Reduced errors

    Often there are issues in data which can be attributed to many factors, but two main factors are often the misunderstanding of business logic as well as incorrect SQL queries. Creating a community around analytics engineering empowers engineers to contribute to resolving errors both in code and business logic which the community at large can benefit from.

  3. Data enrichment across data sources

    In most cases, data is more valuable in abundance. With the rise in the number SaaS tools in a business, there are multiple sources of data that can help increase confidence in making a certain decision. By following an open-source approach we hope to see dbt packages created around combining multiple sources of data.

  4. Increased awareness around data-driven solutions to common business problems

    With the communities input on the structuring of data, many companies will be able to speak to their thought process, the need of the structure by downstream tools and the problem they are solving following this approach. We hope this will make businesses more aware of the way in which they can use data in order to effect change in their business.

So, what happens next? We have open-sourced our packages here and are releasing updates weekly. We have also started a Slack community where we discuss all things analytics engineering and have created a form for you to suggest the next data sources we should release as a dbt package. In order to contribute, let us know which data source you would like to add and we will create the repository which you can fork.

At Cerebrium, our vision has always been to make data analytics and machine learning more accessible and creating a community-driven approach to analytics engineering will lead to a new standard in data structures. We hope that data scientists and machine learning engineers can take advantage of this and create new and innovative solutions for businesses. Don't hesitate to experiment and share your projects – we'd love to see what you do!


Author

Michael Louis

Michael Louis

See how you can become data driven today