Skip to content

Commit c5b2220

Browse files
authored
Merge pull request #7888 from segmentio/DOC-1225
warehouse faqs docs edits
2 parents a3b6688 + 8364e91 commit c5b2220

File tree

1 file changed

+36
-36
lines changed
  • src/connections/storage/warehouses

1 file changed

+36
-36
lines changed

src/connections/storage/warehouses/faq.md

Lines changed: 36 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ Yes. Customers on Segment's [Business plan](https://segment.com/pricing) can cho
99

1010
Selective Sync helps manage the data Segment sends to each warehouse, allowing you to sync different sets of data from the same source to different warehouses.
1111

12-
When you disable a source, Segment no longer syncs data from that source. The historical data from the source remains in your warehouse, even after you disable a source. When you re-enable a source, Segment will automatically sync all events since the last successful data warehouse sync.
12+
When you disable a source, Segment no longer syncs data from that source. The historical data from the source remains in your warehouse, even after you disable a source. When you re-enable a source, Segment automatically syncs all events since the last successful data warehouse sync.
1313

14-
When you disable and then re-enable a collection or a property, Segment does not automatically backfill the events since the last successful sync. The only data in the first sync following the re-enabling of a collection or property is any data generated after you re-enabled the collection or property. To recover any data generated while a collection or property was disabled, please reach out to [[email protected]](mailto:[email protected]).
14+
When you disable and then re-enable a collection or a property, Segment doesn't automatically backfill the events since the last successful sync. The only data in the first sync following the re-enabling of a collection or property is any data generated after you re-enabled the collection or property. To recover any data generated while a collection or property was disabled, please reach out to [[email protected]](mailto:[email protected]).
1515

1616
You can also use the [Integration Object](/docs/guides/filtering-data/#filtering-with-the-integrations-object) to control whether or not data is sent to a specific warehouse.
1717

18-
### Don't send data to any Warehouse
18+
### Code to not send data to any warehouse
1919

2020
```js
2121
integrations: {
@@ -26,7 +26,7 @@ integrations: {
2626
}
2727
```
2828

29-
### Send data to all Warehouses
29+
### Code to send data to all warehouses
3030

3131
```js
3232
integrations: {
@@ -37,7 +37,7 @@ integrations: {
3737
}
3838
```
3939

40-
### Send data to specific Warehouses
40+
### Code to send data to specific warehouses
4141

4242
```js
4343
integrations: {
@@ -48,30 +48,30 @@ integrations: {
4848
}
4949
```
5050

51-
## Can we add, tweak, or delete some of the tables?
51+
## Can I add, tweak, or delete some of the tables?
5252

53-
You have full admin access to your Segment Warehouse. However, don't tweak or delete Segment generated tables, as this may cause problems for the systems that upload new data.
53+
You have full admin access to your Segment warehouse. However, don't tweak or delete Segment generated tables, as this may cause problems for the systems that upload new data.
5454

5555
If you want to join across additional datasets, feel free to create and upload additional tables.
5656

57-
## Can we transform or clean up old data to new formats or specs?
57+
## Can I transform or clean up old data to new formats or specs?
5858

5959
This is a common question if the data you're collecting has evolved over time. For example, if you used to track the event `Signup` but now track `Signed Up`, you'd probably like to merge those two tables to make querying simple and understandable.
6060

61-
Segment does not have a way to update the event data in the context of your warehouse to retroactively merge the tables created from changed events. Instead, you can create a "materialized" view of the unioned events. This is supported in [Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_VIEW.html), [Postgres](https://www.postgresql.org/docs/9.3/rules-materializedviews.html), [Snowflake](https://docs.snowflake.net/manuals/sql-reference/sql/create-view.html), and others, but may not be available in _all_ warehouses.
61+
Segment doesn't have a way to update the event data in the context of your warehouse to retroactively merge the tables created from changed events. Instead, you can create a *materialized* view of the unioned events. This is supported in [Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_VIEW.html){:target="_blank”}, [Postgres](https://www.postgresql.org/docs/9.3/rules-materializedviews.html){:target="_blank”}, [Snowflake](https://docs.snowflake.net/manuals/sql-reference/sql/create-view.html){:target="_blank”}, and others, but may not be available in _all_ warehouses.
6262

6363
Protocols customers can also use [Transformations](/docs/protocols/transform/) to change events at the source, which applies to all cloud-mode destinations (destinations that receive data from the Segment servers) _including_ your data warehouse. Protocols Transformations offer an excellent way to quickly resolve implementation mistakes and help transition events to a Segment spec.
6464

65-
> **Note**: Transformations are currently limited to event, property and trait name changes, and do **not** apply to historical data.
65+
> **Note**: Transformations are currently limited to event, property and trait name changes, and **don't** apply to historical data.
6666
6767
## Can I change the data type of a column in the warehouse?
6868

6969
Yes. Data types are initially set up in your warehouse based on the first value that comes in from a source, but you can request data type changes by reaching out to [Segment support](https://app.segment.com/workspaces?contact=1){:target="_blank”} for assistance.
7070

71-
Keep in mind that Segment only uses [general data types](/docs/connections/storage/warehouses/schema/#schema-evolution-and-compatibility){:target="_blank”} when loading data in your warehouse. Therefore, some of the common scenarios are:
72-
- Changing data type from `timestamp` to `varchar`
73-
- Changing data type from `integer` to `float`
74-
- Changing data type from `boolean` to `varchar`
71+
Keep in mind that Segment only uses [general data types](/docs/connections/storage/warehouses/schema/#schema-evolution-and-compatibility){:target="_blank”} when loading data in your warehouse. Therefore, some of the common scenarios are changing the data type from:
72+
- `timestamp` to `varchar`
73+
- `integer` to `float`
74+
- `boolean` to `varchar`
7575

7676
More granular changes (such as the examples below) wouldn’t normally be handled by the Support team, thus they often need to be made within the warehouse itself:
7777
- Expanding data type `varchar(256)` to `varchar(2048)`
@@ -91,28 +91,28 @@ Your source slug can be found in the URL when you're looking at the source desti
9191
`https://segment.com/[my-workspace]/sources/[my-source-slug]/overview`
9292

9393

94-
## How do I find my warehouse id?
94+
## How do I find my warehouse ID?
9595

96-
Your warehouse id appears in the URL when you look at the [warehouse destinations page](https://app.segment.com/goto-my-workspace/warehouses/). The URL structure looks like this:
96+
Your warehouse ID appears in the URL when you look at the [warehouse destinations page](https://app.segment.com/goto-my-workspace/warehouses/). The URL structure looks like this:
9797

9898
`app.segment.com/[my-workspace]/warehouses/[my-warehouse-id]/overview`
9999

100100

101-
## How fresh is the data in Segment Warehouses?
101+
## How fresh is the data in the Segment warehouses?
102102

103-
Data is available in Warehouses within 24-48 hours, depending on your tier's sync frequency. For more information about sync frequency by tier, see [Sync Frequency](/docs/connections/storage/warehouses/warehouse-syncs/#sync-frequency).
103+
Data is available in warehouses within 24-48 hours, depending on your tier's sync frequency. For more information about sync frequency by tier, see [Sync Frequency](/docs/connections/storage/warehouses/warehouse-syncs/#sync-frequency).
104104

105105
Real-time loading of the data into Segment Warehouses would cause significant performance degradation at query time. To optimize for your query speed, reliability, and robustness, Segment guarantees that your data will be available in your warehouse within 24 hours. The underlying datastore has a subtle tradeoff between data freshness, robustness, and query speed. For the best experience, Segment needs to balance all three of these.
106106

107107
## What if I want to add custom data to my warehouse?
108108

109-
You can freely load data into your Segment Warehouse to join against your source data tables.
109+
You can freely load data into your Segment warehouse to join against your source data tables.
110110

111-
The only restriction when loading your own data into your connected warehouse is that you should not add or remove tables within schemas generated by Segment for your sources. Those tables have a naming scheme of `<source-slug>.<table>` and should only be modified by Segment. Arbitrarily deleting columns from these tables may result in mismatches upon load.
111+
The only restriction when loading your own data into your connected warehouse is that you should not add or remove tables within schemas generated by Segment for your sources. Those tables have a naming scheme of `<source-slug>.<table>` and should only be modified by Segment. Deleting columns from these tables may result in mismatches upon load.
112112

113113
If you want to insert custom data into your warehouse, create new schemas that are not associated with an existing source, since these may be deleted upon a reload of the Segment data in the cluster.
114114

115-
Segment recommends scripting any sort of additions of data you might have to warehouse, so that you aren't doing one-off tasks that can be hard to recover from in the future in the case of hardware failure.
115+
Segment recommends scripting any sort of additions of data you might have to your warehouse, so that you aren't doing one-off tasks that can be hard to recover from in the future in the case of hardware failure.
116116

117117
## Which IPs should I allowlist?
118118

@@ -127,39 +127,39 @@ Users with workspaces in the EU must allowlist `3.251.148.96/29`.
127127

128128
Segment loads up to two months of your historical data when you connect a warehouse.
129129

130-
For full historical backfills you'll need to be a Segment Business plan customer. If you'd like to learn more about our Business plan and all the features that come with it, [check out our pricing page](https://segment.com/pricing).
130+
For full historical backfills you'll need to be a Segment Business plan customer. If you'd like to learn more about our Business plan and all the features that come with it, [check out Segment's pricing page](https://segment.com/pricing).
131131

132132
## What do you recommend for Postgres: Amazon or Heroku?
133133

134-
Heroku's simple set up and administration process make it a great option to get up and running quickly.
134+
Heroku's simple setup and administration process make it a great option to get up and running quickly.
135135

136136
Amazon's service has some more powerful features and will be more cost-effective for most cases. However, first time users of Amazon Web Services (AWS) will likely need to spend some time with the documentation to get set up properly.
137137

138138
## How do I prevent a source from syncing to some or all warehouses?
139139

140-
When you create a new source, the source syncs to all warehouse(s) in the workspace by default. You can prevent the source from syncing to some or all warehouses in the workspace in two ways:
140+
When you create a new source, the source syncs to all warehouses in the workspace by default. You can prevent the source from syncing to some or all warehouses in the workspace in two ways:
141141

142-
- **Segment app**: When you add a source from the Workspace Overview page, deselect the warehouse(s) you don't want the source to sync to as part of the "Add Source" process. All warehouses are automatically selected by default.
143-
- **Public API**: Send a request to the [Update Warehouse](https://docs.segmentapis.com/tag/Warehouses#operation/updateWarehouse) endpoint to update the settings for the warehouse(s) you want to prevent from syncing.
142+
- **Segment app**: When you add a source from the Workspace Overview page, deselect the warehouse(s) you don't want the source to sync to as part of the *Add Source* process. All warehouses are automatically selected by default.
143+
- **Public API**: Send a request to the [Update Warehouse](https://docs.segmentapis.com/tag/Warehouses#operation/updateWarehouse){:target="_blank”} endpoint to update the settings for the warehouse(s) you want to prevent from syncing.
144144

145145
After a source is created, you can enable or disable a warehouse sync within the Warehouse Settings page.
146146

147147
## Can I be notified when warehouse syncs fail?
148148

149-
If you enabled activity notifications for your storage destination, you'll receive notifications in the Segment app for the fifth and 20th consecutive warehouse failures for all incoming data. Segment does not track failures on a per connection ('source<>warehouse') basis. Segment's notification structure also identifies global issues encountered when connecting to your warehouse, like bad credentials or being completely inaccessible to Segment.
149+
If you enabled activity notifications for your storage destination, you'll receive notifications in the Segment app for the 5th and 20th consecutive warehouse failures for all incoming data. Segment doesn't track failures on a per connection (`source<>warehouse`) basis. Segment's notification structure also identifies global issues encountered when connecting to your warehouse, like bad credentials or being completely inaccessible to Segment.
150150

151151
To sign up for warehouse sync notifications:
152152
1. Open the Segment app.
153-
2. Go to **Settings** > **User Preferences**.
154-
3. In the Activity Notifications section, select **Storage Destinations**.
153+
2. Go to **Settings > User Preferences**.
154+
3. In the **Activity Notifications** section, select **Storage Destinations**.
155155
4. Enable **Storage Destination Sync Failed**.
156156

157157
## How is the data formatted in my warehouse?
158158

159-
Data in your warehouse is formatted into **schemas**, which involve a detailed description of database elements (tables, views, indexes, synonyms, etc.)
159+
Data in your warehouse is formatted into **schemas**, which involve a detailed description of database elements (like tables, views, indexes, synonyms)
160160
and the relationships that exist between elements. Segment's schemas use the following template: <br/>`<source>.<collection>.<property>`, for example,
161-
`segment_engineering.tracks.user_id`, where source refers to the source or project name (segment_engineering), collection refers to the event (tracks),
162-
and the property refers to the data being collected (user_id). **Note:** It is not possible to have different sources feed data into the same schema in your warehouse. While setting up a new schema, you cannot use a duplicate schema name.
161+
`segment_engineering.tracks.user_id`, where source refers to the source or project name (`segment_engineering`), collection refers to the event (`tracks`),
162+
and the property refers to the data being collected (`user_id`). **Note**: It's not possible to have different sources feed data into the same schema in your warehouse. While setting up a new schema, you can't use a duplicate schema name.
163163

164164
Schema data for Segment warehouses is represented in snake case.
165165

@@ -183,9 +183,9 @@ To change the name of your schema without disruptions:
183183
4. Disable the **Sync Data** toggle and click **Save Settings**.
184184
5. Select **Connections** and click **Sources**.
185185
6. Select a source that syncs data with your warehouse from your list of sources, and select **Settings**.
186-
7. Select **SQL Settings** and update the "Schema Name" field with the new name for your schema and click **Save Changes.**
187-
> **Note**: This will set the schema name for all existing and future destinations. The new name must be lowercase and may include underscores.
188-
8. Repeat steps six and seven until you rename all sources that sync data to your warehouse.
186+
7. Select **SQL Settings** and update the **Schema Name** field with the new name for your schema and click **Save Changes**.
187+
> **Note**: This sets the schema name for all existing and future destinations. The new name must be lowercase and may include underscores.
188+
8. Repeat steps 6 and 7 until you rename all sources that sync data to your warehouse.
189189
9. Open the third-party host of your database, and rename the schema.
190190
10. Open the Segment app, select **Connections** and click **Destinations**.
191191
11. Select the warehouse you disabled syncs for from the list of destinations.
@@ -194,7 +194,7 @@ To change the name of your schema without disruptions:
194194

195195
## Can I selectively filter data/events sent to my warehouse based on a property?
196196

197-
At the moment, there isn't a way to selectively filter events that are sent to the warehouse. The warehouse connector works quite differently from our streaming destinations and only has the [selective sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync) functionality that allows you to enable/disable specific properties or events.
197+
At the moment, there isn't a way to selectively filter events that are sent to the warehouse. The warehouse connector works differently from the streaming destinations and only has the [selective sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync) functionality that allows you to enable or disable specific properties or events.
198198

199199
## Can data from multiple sources be synced to the same database schema?
200200
It's not possible for different sources to sync data directly to the same schema in your warehouse. When setting up a new schema within the Segment UI, you can't use a schema name that's already in use by another source. Segment recommends syncing the data separately and then joining it downstream in your warehouse.

0 commit comments

Comments
 (0)