Connect to a runnung Hadoop cluster
Connect to a runnung Hadoop cluster - Cluster status
Graphical HDFS file manager
Upload local files to HDFS
Upload local files to HDFS - Support for multi-file upload
Upload local files to HDFS - Non-blocking with progress notification
Upload local files to HDFS - Success/failure notification
Download files from HDFS - Directories as compressed archives
Download files from HDFS - Directories as compressed archives (2)
Download files from HDFS - Directories as compressed archives (3)
Move HDFS files/directories via Drag&Drop
Move HDFS files/directories via Drag&Drop (2)
File Viewer - Drag&Drop files/directories to file viewer
File Viewer - Tabular file preview
File Viewer - Chunk-wise browsing HDFS through set of HDFS files similar to Hadoop's Namenode WebUI
File Viewer - Data source definition with simple metadata-operations
File Viewer - Data source definition with simple metadata-operations (2)
File Viewer - Data source definition with simple metadata-operations (3)
File Viewer - Data source definition with simple metadata-operations (4)
Entity Resolution workflow definition - Input data
Entity Resolution workflow definition - Input data (2)
Entity Resolution workflow definition - Mapping of (relevant) attributes
Entity Resolution workflow definition - Define output directoy via Drag&Drop from HDFS file manager
Entity Resolution workflow definition - Define output directoy via Drag&Drop from HDFS file manager (2)
Entity Resolution workflow definition - Blocking strategy (Cartesian, Standard, PPJoin+, Sorted Neighborhood)
Entity Resolution workflow definition - Blocking strategy (Cartesian, Standard, PPJoin+, Sorted Neighborhood) (2)
Entity Resolution workflow definition - Selection of the Blocking key generation function
Entity Resolution workflow definition - Tokenizer to transform the attribute value(s) of an entity in a set of blocking keys
Entity Resolution workflow definition - Prefix-based blocking key generation
Entity Resolution workflow definition - Multiple blocking key generation functions, each operating on (an) other attribute(s)
Entity Resolution workflow definition - Threshold-based match decision based on multiple similarity measures
Entity Resolution workflow definition - Learning-based match decision based on trained WEKA-classifiers
Entity Resolution workflow definition - Optional match quality evaluation
Entity Resolution workflow definition - Workflow validation
Workflow Execution - Workflow translation in MapReduce jobs (IDF-index, Classifier, BDM, Similarity computation, Match quality), submission, and progress monitoring
Workflow Execution - Links to jobtracker WebUI
Workflow Execution - Links to jobtracker WebUI (2)
Running workflow - Simultanious handling of multiple workflows
Running workflow - Workflows connecting to different clusters are executed in parallel, workflows connecting to the same cluster are executed in FIFO order
Workflow Execution - Workflow successfully executed
Workflow Execution - Resulting match quality
EC2 - Launch a new Hadoop cluster on EC2 and setup SOCKS proxy server
EC2 - Launch a new Hadoop cluster on EC2 and set-up SOCKS proxy server (2)
EC2 - Launch a new Hadoop cluster on EC2 and set-up SOCKS proxy server (3)
EC2 - Optional EC2 cluster shutdown on disconnect
EC2 - Optional EC2 cluster shutdown on disconnect (2)
EC2 - Connect to a running Hadoop cluster on EC2 and set-up SOCKS proxy server
EC2 - Connect to a running Hadoop cluster on EC2 and set-up SOCKS proxy server (2)
For more informations, see the VLDB 2012 Demo paper and the project website
Developed by Lars Kolb
Student assistants: Axel Fischer, Ziad Sehili, and Sergej Sintschilin
Dedoop is available for download.