Revisit S3 s3a/s3n support in recent Spark/Hadoop versions and update recipes accordingly.
Description
Preparation:
Fresh install of Spark 2.4.4. Attempt an s3a call through spark shell. Move in hadoop/aws JARs until it works. Record errors for inclusion in updated recipe.
configuring-s3:
Update Synopsis to say that s3n is deprecated.
Update “Introducing Amazon S3” text “very much a work in progress”.
Remove”s3n” from example in Limitations section.
Update s3n data in the Access Control section.
Update s3a recommended versions in Access Control section.
Remove “Configuring Your Bucket for s3n” section.
Remove “Configuring Your Bucket for Both” section.
Update link to AmazonS3 Hadoop page.
Update Change Log.
using-s3:
Update Synopsis to say that s3n is deprecated, and improve s3a detail.
Sync “S3 Support in Spark” with updated text from configuring-s3 recipe.
Remove s3n examples in “Testing the Protocol”.
Remove any OBE errors from “Testing the Protocol”.
Apache Hadoop ships with a connector to S3 called "S3A", with the url prefix "s3a:"; its previous connectors "s3", and "s3n" are deprecated and/or deleted from recent Hadoop versions.
The S3A connector is implemented in the hadoop-aws JAR. If it is not on the classpath: stack trace.
Do not attempt to mix a "hadoop-aws" version with other hadoop artifacts from different versions. They must be from exactly the same release. Otherwise: stack trace.
The S3A connector is depends on AWS SDK JARs. If they are not on the classpath: stack trace.
Do not attempt to use an amazon S3 SDK JAR different from the one which the hadoop version was built with. Otherwise: stack trace highly likely.
The normative list of dependencies of a specific version of the hadoop-aws JAR are stored in Maven, which can be viewed on mvnrepsitory.
Preparation:
Fresh install of Spark 2.4.4. Attempt an s3a call through spark shell. Move in hadoop/aws JARs until it works. Record errors for inclusion in updated recipe.
configuring-s3:
Update Synopsis to say that s3n is deprecated.
Update “Introducing Amazon S3” text “very much a work in progress”.
Remove”s3n” from example in Limitations section.
Update s3n data in the Access Control section.
Update s3a recommended versions in Access Control section.
Remove “Configuring Your Bucket for s3n” section.
Remove “Configuring Your Bucket for Both” section.
Update link to AmazonS3 Hadoop page.
Update Change Log.
using-s3:
Update Synopsis to say that s3n is deprecated, and improve s3a detail.
Sync “S3 Support in Spark” with updated text from configuring-s3 recipe.
Remove s3n examples in “Testing the Protocol”.
Remove any OBE errors from “Testing the Protocol”.
Update link to AmazonS3 Hadoop page.
Update Change Log.