问题描述
首先, 用于在生成简单数据后将数据存储在Google Cloud Platform bigQuery表中的代码。 导入并使用了Apache-Beam库。 Runner使用了Google Cloud Platform Dataflow。
这里的代码。
from apache_beam.options.pipeline_options import PipelineOptions
import apache_beam as beam
pipeline_options = PipelineOptions(
project='project-id',runner='runner',temp_location='bucket-location'
)
def pardo_dofn_methods(test=None):
import apache_beam as beam
class testFunction(beam.DoFn):
def process(self,element):
result = element.split(',')
testing = {'test_column': result[0],'test_column2': result[1],'test_column3': result[2],'test_column4': result[3]}
return [testing]
def finish(self):
print('finish')
with beam.Pipeline(options=pipeline_options) as pipeline:
results = (
pipeline
| 'Generating data' >> beam.Create([
'test1,test2,test3,test4'
'test5,test6,test7,test8'
])
| beam.ParDo(testFunction())
| beam.io.WriteToBigQuery(
'project-id:bigQuery-dataset.table-name',schema='test_column:STRING,test_column2:STRING,test_column3:STRING,test_column4:STRING',create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE
)
)
pardo_dofn_methods()
运行时效果很好。 但是,有两个警告:
BeamDeprecationWarning: options is deprecated since First stable release. References to <pipeline>.options will not be supported
experiments = p.options.view_as(DebugOptions).experiments or []
BeamDeprecationWarning: BigQuerySink is deprecated since 2.11.0. Use WriteToBigQuery instead.
kms_key=self.kms_key))
我不知道为什么会有警告。 谢谢。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)