Skip to content

Add support for S3 storage #146

@awsazuser

Description

@awsazuser

Does cobrix support S3 file systems ?
I am getting "java.lang.IllegalArgumentException: Wrong FS" error when loading the copybook and datafile from a AWS S3 bucket.

Code:

val spark = SparkSession.builder().appName("Spark-Cobol").getOrCreate()
import spark.implicits._
import za.co.absa.cobrix.spark.cobol.source

val df = spark.read.format(
"za.co.absa.cobrix.spark.cobol.source").option(
"copybooks", "s3://xxxx/tesfile.cbl").load("s3://xxxx/sourcedata/DATAFILE0100")

df.printSchema
df.show()

Error:

java.lang.IllegalArgumentException: Wrong FS: s3://xxxx/tesfile.cbl, expected: hdfs://ip-xxx-xx-xx-85.ec2.internal:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:653)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1430)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$.za$co$absa$cobrix$spark$cobol$source$parameters$CobolParametersValidator$$validatePath$1(CobolParametersValidator.scala:71)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$$anonfun$validateOrThrow$2.apply(CobolParametersValidator.scala:94)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$$anonfun$validateOrThrow$2.apply(CobolParametersValidator.scala:93)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at za.co.absa.cobrix.spark.cobol.source.parameters.CobolParametersValidator$.validateOrThrow(CobolParametersValidator.scala:93)
at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:52)
at za.co.absa.cobrix.spark.cobol.source.DefaultSource.createRelation(DefaultSource.scala:48)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
... 160 elided

Metadata

Metadata

Assignees

Labels

acceptedAccepted for implementationenhancementNew feature or requesthelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions