Revisit S3 s3a/s3n support in recent Spark/Hadoop versions and update recipes accordingly.

Description

Preparation:

  • Fresh install of Spark 2.4.4. Attempt an s3a call through spark shell. Move in hadoop/aws JARs until it works. Record errors for inclusion in updated recipe.

configuring-s3:

  • Update Synopsis to say that s3n is deprecated.

  • Update “Introducing Amazon S3” text “very much a work in progress”.

  • Remove”s3n” from example in Limitations section.

  • Update s3n data in the Access Control section.

  • Update s3a recommended versions in Access Control section.

  • Remove “Configuring Your Bucket for s3n” section.

  • Remove “Configuring Your Bucket for Both” section.

  • Update link to AmazonS3 Hadoop page.

  • Update Change Log.

using-s3:

  • Update Synopsis to say that s3n is deprecated, and improve s3a detail.

  • Sync “S3 Support in Spark” with updated text from configuring-s3 recipe.

  • Remove s3n examples in “Testing the Protocol”.

  • Remove any OBE errors from “Testing the Protocol”.

  • Update link to AmazonS3 Hadoop page.

  • Update Change Log.

 

Activity

Show:
Brian Uri
October 20, 2019 at 8:53 PM

Commits:
04b29b9

Brian Uri
October 20, 2019 at 7:12 PM

https://cwiki.apache.org/confluence/display/HADOOP2/AmazonS3

Apache Hadoop ships with a connector to S3 called "S3A", with the url prefix "s3a:"; its previous connectors "s3", and "s3n" are deprecated and/or deleted from recent Hadoop versions.

  • The S3A connector is implemented in the hadoop-aws JAR. If it is not on the classpath: stack trace.

  • Do not attempt to mix a "hadoop-aws" version with other hadoop artifacts from different versions. They must be from exactly the same release. Otherwise: stack trace.

  • The S3A connector is depends on AWS SDK JARs. If they are not on the classpath: stack trace.

  • Do not attempt to use an amazon S3 SDK JAR different from the one which the hadoop version was built with. Otherwise: stack trace highly likely.

  • The normative list of dependencies of a specific version of the hadoop-aws JAR are stored in Maven, which can be viewed on mvnrepsitory.

 

Done

Details

Assignee

Reporter

Priority

Created January 22, 2019 at 1:34 AM
Updated October 20, 2019 at 8:53 PM
Resolved October 20, 2019 at 8:53 PM