Slide 1

Connect to a runnung Hadoop cluster

Slide 2

Connect to a runnung Hadoop cluster - Cluster status

Slide 3

Graphical HDFS file manager

Slide 4

Upload local files to HDFS

Slide 5

Upload local files to HDFS - Support for multi-file upload

Slide 6

Upload local files to HDFS - Non-blocking with progress notification

Slide 7

Upload local files to HDFS - Success/failure notification

Slide 8

Download files from HDFS - Directories as compressed archives

Slide 9

Download files from HDFS - Directories as compressed archives (2)

Slide 10

Download files from HDFS - Directories as compressed archives (3)

Slide 11

Move HDFS files/directories via Drag&Drop

Slide 12

Move HDFS files/directories via Drag&Drop (2)

Slide 13

File Viewer - Drag&Drop files/directories to file viewer

Slide 14

File Viewer - Tabular file preview

Slide 15

File Viewer - Chunk-wise browsing HDFS through set of HDFS files similar to Hadoop's Namenode WebUI

Slide 16

File Viewer - Data source definition with simple metadata-operations

Slide 17

File Viewer - Data source definition with simple metadata-operations (2)

Slide 18

File Viewer - Data source definition with simple metadata-operations (3)

Slide 19

File Viewer - Data source definition with simple metadata-operations (4)

Slide 20

Entity Resolution workflow definition - Input data

Slide 21

Entity Resolution workflow definition - Input data (2)

Slide 22

Entity Resolution workflow definition - Mapping of (relevant) attributes

Slide 23

Entity Resolution workflow definition - Define output directoy via Drag&Drop from HDFS file manager

Slide 24

Entity Resolution workflow definition - Define output directoy via Drag&Drop from HDFS file manager (2)

Slide 25

Entity Resolution workflow definition - Blocking strategy (Cartesian, Standard, PPJoin+, Sorted Neighborhood)

Slide 26

Entity Resolution workflow definition - Blocking strategy (Cartesian, Standard, PPJoin+, Sorted Neighborhood) (2)

Slide 27

Entity Resolution workflow definition - Selection of the Blocking key generation function

Slide 28

Entity Resolution workflow definition - Tokenizer to transform the attribute value(s) of an entity in a set of blocking keys

Slide 29

Entity Resolution workflow definition - Prefix-based blocking key generation

Slide 30

Entity Resolution workflow definition - Multiple blocking key generation functions, each operating on (an) other attribute(s)

Slide 31

Entity Resolution workflow definition - Threshold-based match decision based on multiple similarity measures

Slide 32

Entity Resolution workflow definition - Learning-based match decision based on trained WEKA-classifiers

Slide 33

Entity Resolution workflow definition - Optional match quality evaluation

Slide 34

Entity Resolution workflow definition - Workflow validation

Slide 35

Workflow Execution - Workflow translation in MapReduce jobs (IDF-index, Classifier, BDM, Similarity computation, Match quality), submission, and progress monitoring

Slide 36

Workflow Execution - Links to jobtracker WebUI

Slide 37

Workflow Execution - Links to jobtracker WebUI (2)

Slide 40

Running workflow - Simultanious handling of multiple workflows

Slide 41

Running workflow - Workflows connecting to different clusters are executed in parallel, workflows connecting to the same cluster are executed in FIFO order

Slide 38

Workflow Execution - Workflow successfully executed

Slide 39

Workflow Execution - Resulting match quality

Slide 46

EC2 - Launch a new Hadoop cluster on EC2 and setup SOCKS proxy server

Slide 47

EC2 - Launch a new Hadoop cluster on EC2 and set-up SOCKS proxy server (2)

Slide 48

EC2 - Launch a new Hadoop cluster on EC2 and set-up SOCKS proxy server (3)

Slide 49

EC2 - Optional EC2 cluster shutdown on disconnect

Slide 50

EC2 - Optional EC2 cluster shutdown on disconnect (2)

Slide 51

EC2 - Connect to a running Hadoop cluster on EC2 and set-up SOCKS proxy server

Slide 52

EC2 - Connect to a running Hadoop cluster on EC2 and set-up SOCKS proxy server (2)