@@ -4,5 +4,233 @@ BigQuery DataFrames
44BigQuery DataFrames provides a Pythonic DataFrame and machine learning (ML) API
55powered by the BigQuery engine.
66
7- * ``bigframes.pandas `` provides a pandas-like API for analytics.
8- * ``bigframes.ml `` provides a Scikit-Learn-like API for ML.
7+ * ``bigframes.pandas `` provides a pandas-compatible API for analytics.
8+ * ``bigframes.ml `` provides a scikit-learn-like API for ML.
9+
10+ Documentation
11+ -------------
12+
13+ * `BigQuery DataFrames sample notebooks <https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks >`_
14+ * `BigQuery DataFrames API reference <https://cloud.google.com/python/docs/reference/bigframes/latest >`_
15+ * `BigQuery documentation <https://cloud.google.com/bigquery/docs/ >`_
16+
17+
18+ Quickstart
19+ ----------
20+
21+ Prerequisites
22+ ^^^^^^^^^^^^^
23+
24+ * Install the ``bigframes `` package.
25+ * Create a Google Cloud project and billing account.
26+ * When running locally, authenticate with application default credentials. See
27+ the `gcloud auth application-default login
28+ <https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login> `_
29+ reference.
30+
31+ Code sample
32+ ^^^^^^^^^^^
33+
34+ Import ``bigframes.pandas `` for a pandas-like interface. The ``read_gbq ``
35+ method accepts either a fully-qualified table ID or a SQL query.
36+
37+ .. code-block :: python
38+
39+ import bigframes.pandas as bpd
40+
41+ df1 = bpd.read_gbq(" project.dataset.table" )
42+ df2 = bpd.read_gbq(" SELECT a, b, c, FROM `project.dataset.table`" )
43+
44+ * `More code samples <https://github.com/googleapis/python-bigquery-dataframes/tree/main/samples/snippets >`_
45+
46+
47+ Locations
48+ ---------
49+ BigQuery DataFrames uses a
50+ `BigQuery session <https://cloud.google.com/bigquery/docs/sessions-intro >`_
51+ internally to manage metadata on the service side. This session is tied to a
52+ `location <https://cloud.google.com/bigquery/docs/locations >`_ .
53+ BigQuery DataFrames uses the US multi-region as the default location, but you
54+ can use ``session_options.location `` to set a different location. Every query
55+ in a session is executed in the location where the session was created.
56+
57+ If you want to reset the location of the created DataFrame or Series objects,
58+ can reset the session by executing ``bigframes.pandas.reset_session() ``.
59+ After that, you can reuse ``bigframes.pandas.options.bigquery.location `` to
60+ specify another location.
61+
62+
63+ ``read_gbq() `` requires you to specify a location if the dataset you are
64+ querying is not in the US multi-region. If you try to read a table from another
65+ location, you get a NotFound exception.
66+
67+
68+ ML locations
69+ ------------
70+
71+ ``bigframes.ml `` supports the same locations as BigQuery ML. BigQuery ML model
72+ prediction and other ML functions are supported in all BigQuery regions. Support
73+ for model training varies by region. For more information, see
74+ `BigQuery ML locations <https://cloud.google.com/bigquery/docs/locations#bqml-loc >`_.
75+
76+
77+ Data types
78+ ----------
79+
80+ BigQuery DataFrames supports the following numpy and pandas dtypes:
81+
82+ * ``numpy.dtype("O") ``
83+ * ``pandas.BooleanDtype() ``
84+ * ``pandas.Float64Dtype() ``
85+ * ``pandas.Int64Dtype() ``
86+ * ``pandas.StringDtype(storage="pyarrow") ``
87+ * ``pandas.ArrowDtype(pa.date32()) ``
88+ * ``pandas.ArrowDtype(pa.time64("us")) ``
89+ * ``pandas.ArrowDtype(pa.timestamp("us")) ``
90+ * ``pandas.ArrowDtype(pa.timestamp("us", tz="UTC")) ``
91+
92+ BigQuery DataFrames doesn’t support the following BigQuery data types:
93+
94+ * ``ARRAY ``
95+ * ``NUMERIC ``
96+ * ``BIGNUMERIC ``
97+ * ``INTERVAL ``
98+ * ``STRUCT ``
99+ * ``JSON ``
100+
101+ All other BigQuery data types display as the object type.
102+
103+
104+ Remote functions
105+ ----------------
106+
107+ BigQuery DataFrames gives you the ability to turn your custom scalar functions
108+ into `BigQuery remote functions
109+ <https://cloud.google.com/bigquery/docs/remote-functions> `_ . Creating a remote
110+ function in BigQuery DataFrames creates a BigQuery remote function, a `BigQuery
111+ connection
112+ <https://cloud.google.com/bigquery/docs/create-cloud-resource-connection> `_ ,
113+ and a `Cloud Functions (2nd gen) function
114+ <https://cloud.google.com/functions/docs/concepts/overview> `_ .
115+
116+ BigQuery connections are created in the same location as the BigQuery
117+ DataFrames session, using the name you provide in the custom function
118+ definition. To view and manage connections, do the following:
119+
120+ 1. Go to `BigQuery Studio <https://console.cloud.google.com/bigquery >`__.
121+ 2. Select the project in which you created the remote function.
122+ 3. In the Explorer pane, expand that project and then expand External connections.
123+
124+ BigQuery remote functions are created in the dataset you specify, or
125+ in a dataset with the name ``bigframes_temp_location ``, where location is
126+ the location used by the BigQuery DataFrames session. For example,
127+ ``bigframes_temp_us_central1 ``. To view and manage remote functions, do
128+ the following:
129+
130+ 1. Go to `BigQuery Studio <https://console.cloud.google.com/bigquery >`__.
131+ 2. Select the project in which you created the remote function.
132+ 3. In the Explorer pane, expand that project, expand the dataset in which you
133+ created the remote function, and then expand Routines.
134+
135+ To view and manage Cloud Functions functions, use the
136+ `Functions <https://console.cloud.google.com/functions/list?env=gen2 >`_
137+ page and use the project picker to select the project in which you
138+ created the function. For easy identification, the names of the functions
139+ created by BigQuery DataFrames are prefixed by ``bigframes- ``.
140+
141+ **Requirements **
142+
143+ BigQuery DataFrames uses the ``gcloud `` command-line interface internally,
144+ so you must run ``gcloud auth login `` before using remote functions.
145+
146+ To use BigQuery DataFrames remote functions, you must enable the following APIs:
147+
148+ * The BigQuery API (bigquery.googleapis.com)
149+ * The BigQuery Connection API (bigqueryconnection.googleapis.com)
150+ * The Cloud Functions API (cloudfunctions.googleapis.com)
151+ * The Cloud Run API (run.googleapis.com)
152+ * The Artifact Registry API (artifactregistry.googleapis.com)
153+ * The Cloud Build API (cloudbuild.googleapis.com )
154+ * The Cloud Resource Manager API (cloudresourcemanager.googleapis.com)
155+
156+ To use BigQuery DataFrames remote functions, you must be granted the
157+ following IAM roles:
158+
159+ * BigQuery Data Editor (roles/bigquery.dataEditor)
160+ * BigQuery Connection Admin (roles/bigquery.connectionAdmin)
161+ * Cloud Functions Developer (roles/cloudfunctions.developer)
162+ * Service Account User (roles/iam.serviceAccountUser)
163+ * Storage Object Viewer (roles/storage.objectViewer)
164+ * Project IAM Admin (roles/resourcemanager.projectIamAdmin)
165+
166+ **Limitations **
167+
168+ * Remote functions take about 90 seconds to become available when you first create them.
169+ * Trivial changes in the notebook, such as inserting a new cell or renaming a variable,
170+ might cause the remote function to be re-created, even if these changes are unrelated
171+ to the remote function code.
172+ * BigQuery DataFrames does not differentiate any personal data you include in the remote
173+ function code. The remote function code is serialized as an opaque box to deploy it as a
174+ Cloud Functions function.
175+ * The Cloud Functions (2nd gen) functions, BigQuery connections, and BigQuery remote
176+ functions created by BigQuery DataFrames persist in Google Cloud. If you don’t want to
177+ keep these resources, you must delete them separately using an appropriate Cloud Functions
178+ or BigQuery interface.
179+ * A project can have up to 1000 Cloud Functions (2nd gen) functions at a time. See Cloud
180+ Functions quotas for all the limits.
181+
182+
183+ Quotas and limits
184+ -----------------
185+
186+ `BigQuery quotas <https://cloud.google.com/bigquery/quotas >`_
187+ including hardware, software, and network components.
188+
189+
190+ Session termination
191+ -------------------
192+
193+ Each BigQuery DataFrames DataFrame or Series object is tied to a BigQuery
194+ DataFrames session, which is in turn based on a BigQuery session. BigQuery
195+ sessions
196+ `auto-terminate <https://cloud.google.com/bigquery/docs/sessions-terminating#auto-terminate_a_session >`_
197+ ; when this happens, you can’t use previously
198+ created DataFrame or Series objects and must re-create them using a new
199+ BigQuery DataFrames session. You can do this by running
200+ ``bigframes.pandas.reset_session() `` and then re-running the BigQuery
201+ DataFrames expressions.
202+
203+
204+ Data processing location
205+ ------------------------
206+
207+ BigQuery DataFrames is designed for scale, which it achieves by keeping data
208+ and processing on the BigQuery service. However, you can bring data into the
209+ memory of your client machine by calling ``.execute() `` on a DataFrame or Series
210+ object. If you choose to do this, the memory limitation of your client machine
211+ applies.
212+
213+
214+ License
215+ -------
216+
217+ BigQuery DataFrames is distributed with the `Apache-2.0 license
218+ <https://github.com/googleapis/python-bigquery-dataframes/blob/main/LICENSE> `_.
219+
220+ It also contains code derived from the following third-party packages:
221+
222+ * `Ibis <https://ibis-project.org/ >`_
223+ * `pandas <https://pandas.pydata.org/ >`_
224+ * `Python <https://www.python.org/ >`_
225+ * `scikit-learn <https://scikit-learn.org/ >`_
226+ * `XGBoost <https://xgboost.readthedocs.io/en/stable/ >`_
227+
228+ For details, see the `third_party
229+ <https://github.com/googleapis/python-bigquery-dataframes/tree/main/third_party/bigframes_vendored> `_
230+ directory.
231+
232+
233+ Contact Us
234+ ----------
235+
236+ For further help and provide feedback, you can email us at `
[email protected] <
https://mail.google.com/mail/?view=cm&fs=1&tf=1&[email protected] >`_.
0 commit comments