How to Manage Access to BigQuery Analytics Hub sharing Authorized Views

Samet Karadag
Google Cloud - Community
6 min readOct 4, 2023

--

In this blog post, we will explore how to effectively manage access to your Analytics Hub and authorized views in Google BigQuery. We will use a sample project structure and authorization flow to demonstrate best practices. Let’s dive in.

Project Structure

Imagine an organization, “Organization A,” with the following project structure:

  1. Ingestion Project: Responsible for ingesting data into the staging environment.
  2. Curated Project: Curates data from the ingestion project to a curation zone.
  3. DataExchange Project: Builds authorized views on top of curated tables for controlled data access.

We will discuss 2 different ways to restrict access. 1st with Session_user and authorized views, 2nd with BQ row level and column level access policies

OPTION 1 Authorization Flow with SESSION_USER()

In our example, we’ll use SESSION_USER() for Fine-Grained Access Control (FGAC). We want to create an authorized view that restricts data based on the user's identity. Here's a simplified example:

-- Sample Authorized View: vauthorizations
CREATE OR REPLACE VIEW fgac.vauthorizations AS
SELECT user_id, company, origin, columna_excl_, columnb_excl, columna_incl, columnb_incl
FROM `ec-mvp.fgac.authorizations`
LEFT JOIN UNNEST(data_provider) AS origin
LEFT JOIN UNNEST(columna_excl) AS columna_excl_
LEFT JOIN UNNEST(bo_country_excl) AS columnb_excl
LEFT JOIN UNNEST(ISIN_country_incl) AS columna_incl
LEFT JOIN UNNEST(bo_country_incl) AS columnb_incl
WHERE user_id = SESSION_USER() AND CURRENT_DATE() BETWEEN valid_from AND valid_to;

In this view, we’re using SESSION_USER() to filter data based on the user's identity and other conditions.

Next, we create an authorized view that utilizes the previously defined view:

-- Sample Authorized View: DataShare
CREATE OR REPLACE VIEW `data-exchange.dataset.DataShare` AS
SELECT t.* FROM `curated.curated_table` t
JOIN `fgac.vauthorizations` a
ON t.origin = a.origin
AND IFNULL(columna = a.columna_incl, TRUE)
AND IFNULL(columna != a.columna_excl , TRUE)
AND IFNULL(columnb = a.columnb_incl, TRUE)
AND IFNULL(columnb != a.columnb_excl, TRUE)
WHERE a.user_id = SESSION_USER();

In this view, we join the curated data with the authorization rules defined in vauthorizations, ensuring that users can only access data authorized for their identity.

Required IAM Grants and Security Layers

Now, let’s discuss the required IAM (Identity and Access Management) grants and security layers to ensure a secure and controlled environment:

  1. Project-Level IAM Permissions: Ensure that the appropriate individuals or groups have the necessary IAM roles at the project level. This includes roles like BigQuery Data Editor and BigQuery User to allow users to access and modify datasets.
  2. Dataset-Level Permissions: For each dataset, grant access to users or groups based on their roles and responsibilities. Use the BigQuery Data Viewer role for read-only access and BigQuery Data Editor for those who need to make changes.
  3. Authorized Views: Control access to authorized views like vauthorizations and DataShare by managing the permissions at the view level. Grant access only to the users or groups who should be able to query these views.
  4. Fine-Grained Access Control: Leverage SESSION_USER() and other conditions within your authorized views to enforce FGAC. This ensures that users only see data relevant to their identity.

Process Summary

  1. Define project structure with clear roles and responsibilities.
  2. Create authorized views like vauthorizations using SESSION_USER() for FGAC.
  3. Build authorized views like DataShare that rely on the previous view to enforce access control.
  4. Assign IAM roles at the project and dataset levels for data access.
  5. Secure authorized views by managing permissions at the view level.

By following these steps and leveraging SESSION_USER() for FGAC, you can effectively manage access to your Analytics Hub and ensure data security and compliance within your BigQuery environment.

Analytics Hub

You need to give the subscriber role to the subscriber users from other organizations or projects. So the subscriber can add the shared dataset into their GCP project. After that whoever has BigQuery user and viewer roles within the subscriber project can query the data.

Security Layers

1) IAM Authorizations:

  • The Security Admin assigns IAM roles and permissions at the project, dataset, and table levels. They ensure that engineers, analysts, and administrators have appropriate access.

2) Authorized View Permissions:

  • The DataHub Owner / Data Owner manages permissions on authorized views within the ingestion and curated schemas, ensuring that only authorized data_exchange views can query data.

3) FGAC Session_User() and Authorization Tables:

  • The Data Provider Security Admin is responsible for defining and maintaining FGAC rules for their data. They interact with GCS ACLs and authorization table to control fine-grained access.

4) Data Sharing Hub Subscription Permission:

  • The DataHub Owner grants subscription permissions to users in investor projects, allowing them to subscribe to shared datasets. This can be the same persona as the EC Security Admin.

5) IAM Controls on Subscriber Projects:

  • The Subscriber Security Admin manages IAM controls on subscriber projects, granting access to datasets and tables for users/groups in their organization.

6) Looker Authorizations:

  • Looker administrators define and manage user access to dashboards and reports. This is typically handled by Looker admins.

7) GCS ACLs (Google Cloud Storage Access Control Lists):

  • GCS ACLs are managed by data owners or administrators who control access to GCS buckets where data is ingested.

8) BigQuery/DataCatalog Policies for Row and Column-Based Access:

  • Data Owners define policies for fine-grained access control at the BigQuery and DataCatalog levels. This includes specifying which users or groups can access specific rows or columns.

9) BigQuery Dynamic Data Masking:

  • Data Owners may specify which columns should be dynamically masked for specific users or groups. This can be achieved through BigQuery’s dynamic data masking features.

Access to CMEK (Customer Managed Encryption Keys):

  • Data Provider Security Admins may also be responsible for managing access to CMEK, ensuring that encryption keys are properly controlled and used for data protection.

Security Personas and Their Interactions:

  1. Data Provider Security Admin: Responsible for defining FGAC rules, managing CMEK access, and specifying dynamic data masking rules.
  2. Data Platform Owner Security Admin: Manages IAM authorizations at the project level and defines BigQuery/DataCatalog policies.
  3. DataHub Owner: Manages permissions for subscriptions, authorized views, and oversees access control.
  4. Subscriber Security Admin: Controls access to shared data within their organization by managing IAM controls on subscriber projects.
  5. Looker Admin: Manages user authorizations within Looker for accessing dashboards.
  6. Data Owners/Administrators: Control Bigquery dataset/table/row/column level access and GCS ACLs for data stored in Google Cloud Storage.

Some possible errors and causes:

Error message: Access Denied: Table curated.table: User does not have permission to query table curated.table, or perhaps it does not exist in location x.

OPTION 2 Restricting Access With Row Level Policies

Row access policies are defined within bigquery.

Example SQL to create row access policy:

create row access policy fgac on dataset.tablename
grant to (‘group:consumer1-grp@samet.joonix.net’)
filter using (columnname like ‘somevalue%’ and column2 = ‘somevalue’ and (column3 IN (‘value’,’value2')))
(Here you can find more information)

Row level acccess patterns can be managed at group level for simplicity, then analytics hub subscribers should be added to the group and they can query the rows that they are allowed to see.

Column level policies are defined within the Bigquery console and it uses Data Catalog underneath.

In this case if a user/group that needs access to restricted/masked columns that should be granted Fine-Grained Reader/Masked Reader roles. Otherwise they will get an authorization error.

Similar to row policies, better is to manage consumption patterns via groups and assigning subscribers to the corresponding groups based on access needs/permissions.

--

--