Open in app

Sign In

Write

Sign In

Samet Karadag
Samet Karadag

139 Followers

Home

About

Published in

Google Cloud - Community

·Apr 20

How to Submit Spark Serverless Jobs, Manage Quota and Capture Errors

Today Dataproc Serverless is the modernest way to run your spark jobs in GCP. …

Dataproc

4 min read

How to Submit Spark Serverless Jobs, Manage Quota and Capture Errors
How to Submit Spark Serverless Jobs, Manage Quota and Capture Errors
Dataproc

4 min read


Feb 23

Dataproc Serverless java.lang.NoClassDefFoundError: scala/Serializable Error

Spark runtime version 2.0 has become the default version as of Jan 2023. Now you may be getting an error like below if you do not set version in your gcloud command; java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be instantiated at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:586)…

Dataproc

1 min read

Dataproc

1 min read


Published in

Google Cloud - Community

·Nov 11, 2022

Dynamically Handle SCD2 Merges in BigQuery using Composer

In this blog post you can find how you can create parametric/dynamic pipelines to handle SCD2 updates automatically for any table in BigQuery using Composer and Airflow. Slowly Changing Dimension (SCD) Type 2 is a common technique used within data warehouses to handle changes in the dimension tables. If you…

Bigquery

3 min read

Dynamically Handle SCD2 Merges in BigQuery with Composer
Dynamically Handle SCD2 Merges in BigQuery with Composer
Bigquery

3 min read


Published in

Google Cloud - Community

·Sep 4, 2022

Conditionally Initial or Delta Load BQ with Composer

Typical data replication job consists of two phases; the initial load and replicating changes. It may be time consuming to build a separate pipeline for each table for initial and incremental loads. It is better to build a dynamic pipeline driven by metadata. In this post you can find how…

Bigquery

2 min read

Conditionally Initial or Delta Load BQ with Composer
Conditionally Initial or Delta Load BQ with Composer
Bigquery

2 min read


Published in

Google Cloud - Community

·Aug 25, 2022

Dynamically Load Data to any BigQuery Table from GCS

How would you load 100s of tables from GCS to BigQuery? Probably, you would use some python script, an ETL tool like Dataflow or Cloud DataFusion. In this post, you will find; how you can load any BQ table from GCS with Composer and one dynamic DAG: Before getting into…

Bigquery

3 min read

Dynamically Load Data to any BigQuery Table from GCS
Dynamically Load Data to any BigQuery Table from GCS
Bigquery

3 min read


Published in

Google Cloud - Community

·May 25, 2022

Connect Oracle to BigQuery using DB Link

Background: I searched for an easy, end-to-end, working guide to configure Oracle Database Gateway with use of Simba BigQuery ODBC driver. However I could not find a complete guide so I decided to write this one. Oracle has a cool feature called heterogenous services (or Oracle Database Gateway) to connect…

Oracle

10 min read

Connect Oracle to BigQuery using DB Link
Connect Oracle to BigQuery using DB Link
Oracle

10 min read


Published in

Google Cloud - Community

·Jan 31, 2022

Easily deploy Trino on Dataproc with init action script

Do you want to deploy Trino on Dataproc easily? Are you searching for Trino initialization script? Here it is; Download “trino.sh” from github and upload it to your GCS bucket. Here is my github link https://github.com/sametkaradag/initialization-actions/blob/master/trino/trino.sh which is fork of https://github.com/GoogleCloudDataproc/initialization-actions (waiting for the pull as of writing this post)

Bigquery

2 min read

Bigquery

2 min read


Published in

Google Cloud - Community

·Jan 10, 2022

A way to generate sample dataset in BigQuery

Do you need to fill a BigQuery table with random set of numbers, dates…? cool go on.. BigQuery has a cool Generate_Array function which gets the range bounds as inputs and generates an array (nested field). Generate_date_array is the equivalent for generating date arrays. Unnest function can be used to…

Bigquery

3 min read

A way to generate sample dataset in BigQuery
A way to generate sample dataset in BigQuery
Bigquery

3 min read


Published in

Google Cloud - Community

·Dec 23, 2020

How to create CloudSQL Mysql instance with terraform

Prerequisites: You need terraform Refer to https://learn.hashicorp.com/tutorials/terraform/install-cli for installing terraform. 1- Create tfvar file: #cat terraform.tfvars project = "project-id" region = "europe-west4" 2- Create tf file and put mysql database related information in it, such as instance, database name and root password: #cat main.tf variable "project" {} variable "region" {} …

Gcp

2 min read

Gcp

2 min read


Published in

Google Cloud - Community

·Nov 19, 2020

How to deduplicate rows in a BigQuery table

Duplicate data sometimes can cause wrong aggregates or results. You probably need to remove those duplicate rows before doing any aggregation, join or calculation. There are various ways to deal with duplicate data and you can find one of these methods to deal with the duplicate keys/columns/rows in this post…

Bigquery

2 min read

Bigquery

2 min read

Samet Karadag

Samet Karadag

139 Followers

Strategic Cloud Engineer @ Google Cloud

Following
  • Robert Roy Britt

    Robert Roy Britt

  • Alex Mathers

    Alex Mathers

  • Bruno Aziza

    Bruno Aziza

  • Canburak Tümer

    Canburak Tümer

  • Deepak Mahto

    Deepak Mahto

See all (18)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams