Google Sheets (v3) | Stitch Documentation

Google Sheets extraction is supported by Stitch
This integration is powered by Singer's Google Sheets tap and certified by Stitch. Check out and contribute to the repo on GitHub.

For support, contact Stitch support.

Google Sheets integration summary

Stitch’s Google Sheets integration replicates data using the Google Sheets v4 AP1. Refer to the Schema section for a list of objects available for replication.

Stitch’s Google Sheets integration will generate tables containing data related to metadata and the individual sheets within a spreadsheet.

Note: There are a few limitations:

Currently, the Google Sheets integration replicates one spreadsheet at a time. To replicate another spreadsheet, you will need to create another Google Sheets integration in Stitch.
The IMPORTRANGE() function in Google Sheets isn’t currently supported. This integration identifies new and updated data using a spreadsheet’s last updated_at value, which the IMPORTRANGE() doesn’t update when used.

Google Sheets feature snapshot

A high-level look at Stitch's Google Sheets (v3) integration, including release status, useful links, and the features supported in Stitch.

STITCH
Release status	Deprecated on February 26, 2020	Supported by	Stitch
Stitch plan	Standard	API availability	Not available
Singer GitHub repository	singer-io/tap-google-sheets
REPLICATION SETTINGS
Anchor Scheduling	Supported	Advanced Scheduling	Supported
Table-level reset	Unsupported	Configurable Replication Methods	Unsupported
DATA SELECTION
Table selection	Supported	Column selection	Supported
Select all	Supported
TRANSPARENCY
Extraction Logs	Supported	Loading Reports	Supported

Connecting Google Sheets

Google Sheets setup requirements

To set up Google Sheets in Stitch, you need:

A spreadsheet in your Google Drive.
A header row with unique column values in the first row of every sheet you want to replicate. If there are multiple headers not in the first row, your worksheet data may not be replicated correctly. Headers that aren’t in the first row may be extracted as column data.
A full row of data in the second row of every sheet you want to replicate. Data must begin in the second row of the sheet. Values in this row may not be NULL or issues will arise during Extraction.

Step 1: Obtain your spreadsheet ID

Go to Google Sheets and log into the Google account associated with the spreadsheet you are looking to integrate.
Open the spreadsheet that you want to use in the integration.
The Spreadsheet ID is within the URL to the webpage. In the image below, the portion of the URL within the blue box is the Spreadsheet ID. Keep this readily available to continue with the integration.

Google Sheets URL containing the Spreadsheet ID.

Step 2: Add Google Sheets as a Stitch data source

Sign into your Stitch account.
On the Stitch Dashboard page, click the Add Integration button.
Click the Google Sheets icon.
Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.

For example, the name “Stitch Google Sheets” would create a schema called stitch_google_sheets in the destination. Note: Schema names cannot be changed after you save the integration.
In the Spreadsheet ID field, enter your Spreadsheet ID you obtained from the previous step. Note: To integrate another spreadsheet, you’ll need to repeat these steps over again with another Google Sheets integration.

Step 3: Define the historical replication start date

The Sync Historical Data setting defines the starting date for your Google Sheets integration. This means that data equal to or newer than this date will be replicated to your data warehouse.

Change this setting if you want to replicate data beyond Google Sheets’s default setting of 1 year. For a detailed look at historical replication jobs, check out the Syncing Historical SaaS Data guide.

Step 4: Create a replication schedule

Replication schedules affect the time Extraction begins, not the time to data loaded. Refer to the Replication Scheduling documentation for more information.

In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.

Google Sheets integrations support the following replication scheduling methods:

Replication Frequency
Anchor Scheduling
Advanced Scheduling using Cron (Advanced or Premium plans only)

To keep your row usage low, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.

Step 5: Authorize Stitch

Next, you’ll be prompted to log into your Google account and approve Stitch’s access to your Google Sheets data. Note that we will only ever read your data.
Select the See all your Google Sheets spreadsheets access.
Click Continue.

Step 6: Set objects to replicate

Is an object missing or not replicating? Verify that the object meets the requirements for selection and replication.

The last step is to select the tables and columns you want to replicate. Learn about the available tables for this integration.

Note: If a replication job is currently in progress, new selections won’t be used until the next job starts.

For Google Sheets integrations, you can select:

Individual tables and columns
All tables and columns

Click the tabs to view instructions for each selection method.

In the integration’s Tables to Replicate tab, locate a table you want to replicate.
To track a table, click the checkbox next to the table’s name. A blue checkmark means the table is set to replicate.
To track a column, click the checkbox next to the column’s name. A blue checkmark means the column is set to replicate.
Repeat this process for all the tables and columns you want to replicate.
When finished, click the Finalize Your Selections button at the bottom of the screen to save your selections.

Important: Using the Select All feature will overwrite any previous selections. However, selections aren’t final until Finalize Your Selections is clicked. Clicking Cancel will restore your previous selections. Refer to the Select All guide for more info about this feature.

Click into the integration from the Stitch Dashboard page.
Click the Tables to Replicate tab.
In the list of tables, click the box next to the Table Names column.
In the menu that displays, click Track all Tables and Fields:
Click the Finalize Your Selections button at the bottom of the page to save your data selections.

Initial and historical replication jobs

After you finish setting up Google Sheets, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.

For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.

Initial replication jobs with Anchor Scheduling

If using Anchor Scheduling, an initial replication job may not kick off immediately. This depends on the selected Replication Frequency and Anchor Time. Refer to the Anchor Scheduling documentation for more information.

Free historical data loads

The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.

Replication will continue after the seven days are over. If you’re no longer interested in this source, be sure to pause or delete the integration to prevent unwanted usage.

Google Sheets replication

In this section:

Details about Extraction, including object discovery and selecting data for replication
Details about how data replicated from Google Sheets is loaded into a destination

Extraction

For every table set to replicate, Stitch will perform the following during Extraction:

Discover table schemas and type discovered columns
Select records (files) for replication

Discovery

During Discovery, Stitch will:

Determine table schemas
Type the data in discovered columns

Determining table schemas

At the start of each replication job, Stitch will check the sheets’s header row and first data row (the second row in the sheet) for data.

To be detected and properly replicated, every sheet set to replicate must have:

Column headers with unique values in the first row. If there are duplicate column names, Stitch will skip the sheet and surface a duplicate column name error.

For example: Two columns in the header row can’t be named customer_id. Uniqueness must not rely on case. While customer_id and Customer_ID may be unique due to case differences, this may still cause errors during extraction and loading. For this reason, column names must be completely unique.
A full row of data in the second row. If any column in this row is empty but has a format (currency or datetime for example), the type will be determined using the format. If a cell is empty and has no format, the column type will be set to string by default.

If the sheet doesn’t contain a header row and a second row of data, Stitch will skip the sheet and surface an empty sheet message during extraction.

Data typing

To determine data types, Stitch will analyze the first two rows in the files included in object discovery.

If a column contains non-standard boolean language, Stitch will intentionally coerce those values into boolean. The following values are to be expected to be replicated as True:

YES/yes
Y/y
1
true (the string “true” prefixed with a tick [`])

The following values are expected to be replicated as False:

NO/no
N/n
0
false (the string “false” prefixed with a tick [`])

If a column has been specified as a STRING, Stitch will attempt to parse the value as a string, unless the column contains non-standard boolean language. If this fails, the column will be loaded as a nullable STRING.

For all other columns, Stitch will perform the following to determine the column’s data type:

Check the format of the column and parse the value based on that format.
If that fails, attempt to parse the value as a BOOLEAN value
If that fails, attempt to parse the value as an INTEGER
If that fails, attempt to parse the value as a DATE-TIME value
If that fails, attempt to parse the value as a DATE date
If that fails, attempt to parse the value as a TIME value
If that fails, type the column as a STRING

Data replication

After discovery is completed, Stitch will move onto extracting data from the sheets set to replicate.

While data from Google Sheets integrations is replicated using Key-based Incremental Replication, the behavior for this integration differs subtly from other integrations.

The table below compares Key-based Incremental Replication and Replication Key behavior for Google Sheets to that of other integrations.

	Google Sheets	Other integrations
What's replicated during a replication job?	The entire contents of a modified spreadsheet. This includes all sheets in the spreadsheet that are set to replicate, regardless of whether they have been modified.	Only new or updated rows in a table.
What's used as a Replication Key?	The time a spreadsheet is modified.	A column or columns in a table.
Are Replication Keys inclusive?	No. Only spreadsheets with a modification timestamp value greater than the last saved bookmark are replicated.	Yes. Rows with a Replication Key value greater than or equal to the last saved bookmark are replicated.

To reduce row usage, consider scheduling the integration to replicate less frequently.

Loading

For every sheet you set to replicate, Stitch will create a table in your destination. These tables will contain the columns you select for replication, along with some system columns created by Stitch. Refer to the sample table in the next section for an example.

Google Sheets table reference

Schemas and versioning

Schemas and naming conventions can change from version to version, so we recommend verifying your integration’s version before continuing.

The schema and info displayed below is for version 3 of this integration.

Table and column names in your destination

Depending on your destination, table and column names may not appear as they are outlined below.

For example: Object names are lowercased in Redshift (CusTomERs > customers), while case is maintained in PostgreSQL destinations (CusTomERs > CusTomERs). Refer to the Loading Guide for your destination for more info.

file_metadata

The file_metadata table contains metadata about the spreadsheet defined in the integration’s settings.

Replication Method	Key-based Incremental
Primary Key	id
Replication Key	modifiedTime
Useful links	Google Sheets documentation file_metadata schema on GitHub Google Sheets API method

file_metadata table schema

createdTime

DATE-TIME

driveId

STRING

lastModifyingUser

OBJECT

Click to expand lastModifyingUser

displayName

STRING

emailAddress

STRING

kind

STRING

modifiedTime

DATE-TIME

name

STRING

teamDriveId

STRING

version

INTEGER

sheet_metadata

The sheet_metadata table contains metadata about the sheets within the spreadsheet defined in the integration’s settings.

Replication Method

Full Table

Primary Key

sheetId

Useful links

Google Sheets documentation

sheet_metadata schema on GitHub

Google Sheets API method

sheet_metadata table schema

columns

ARRAY

Click to expand columns

columnIndex

INTEGER

columnLetter

STRING

columnName

STRING

columnSkipped

BOOLEAN

columnType

STRING

format

STRING

type

ARRAY

gridProperties

OBJECT

Click to expand gridProperties

columnCount

INTEGER

frozenColumnCount

INTEGER

frozenRowCount

INTEGER

rowCount

INTEGER

index

INTEGER

sheetId

INTEGER

sheetType

STRING

sheetUrl

STRING

spreadsheetId

STRING

title

STRING

sheets_loaded

The sheets_loaded table contains metadata about individual sheets loaded to your destination.

Replication Method

Full Table

Primary Keys

sheetId

spreadsheetId

loadDate

Useful links

Google Sheets documentation

sheets_loaded schema on GitHub

Google Sheets API method

sheets_loaded table schema

lastRowNumber

INTEGER

loadDate

DATE-TIME

sheetId

INTEGER

spreadsheetId

STRING

title

STRING

spreadsheet_metadata

The spreadsheet_metadata table contains metadata about the spreadsheet defined in the integration’s settings.

Replication Method

Full Table

Primary Key

spreadsheetId

Useful links

Google Sheets documentation

spreadsheet_metadata schema on GitHub

Google Sheets API method

spreadsheet_metadata table schema

properties

OBJECT

Click to expand properties

autoRecalc

STRING

locale

STRING

timeZone

STRING

title

STRING

spreadsheetId

STRING

spreadsheetUrl

STRING

Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.

Related	Troubleshooting
Destination & Integration Compatibility Replication Scheduling Syncing Historical SaaS Data Resetting Replication Keys Nested Data Structures & Row Count Impact	Third-Party Downtime Understanding & Reducing Your Usage Re-Authorizing Integrations Replication Issues