How to configure Apache Hadoop/Spark to use SWIFT¶
Hadoop supports SWIFT since version 2.3.0, so that you can access directly a Swift storage without need to create an HDFS filesystem.
To configure Apache Hadoop or Apache Spark to use SWIFT you need to create a new service in your core-site.xml
file. The relevant section will look like:
<configuration>
<property>
<name>fs.swift.impl</name>
<value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value>
</property>
<property>
<name>fs.swift.service.ScienceCloud.auth.url</name>
<value>https://cloud.s3it.uzh.ch:5000/v2.0/tokens</value>
</property>
<property>
<name>fs.swift.service.ScienceCloud.auth.endpoint.prefix</name>
<value>/AUTH_</value>
</property>
<property>
<name>fs.swift.service.ScienceCloud.http.port</name>
<value>8080</value>
</property>
<property>
<name>fs.swift.service.ScienceCloud.region</name>
<value>RegionOne</value>
</property>
<property>
<name>fs.swift.service.ScienceCloud.public</name>
<value>false</value>
</property>
<property>
<name>fs.swift.service.ScienceCloud.tenant</name>
<value>SCIENCECLOUD-PROJECT-NAME</value>
</property>
<property>
<name>fs.swift.service.ScienceCloud.username</name>
<value>UZH-SHORTNAME</value>
</property>
<property>
<name>fs.swift.service.ScienceCloud.password</name>
<value>UZH-WEBPASS</value>
</property>
</configuration>
To access a container MyData from within Hadoop/Spark you can then use the URL
swift://MyData.ScienceCloud/objectname