Data Wrangler

Posted on Nov 6, 2013 at 12:04 PM

The Wrangler research project is complete, and the software is no longer actively supported. The team behind Wrangler has moved on to work on a commercial venture, Trifacta.

The Wrangler Team

0 comment »

Introducing Script Export

Posted on Mar 7, 2011 at 02:42 PM

Some of you may have noticed a new feature: you can now export your transformation script as code! Script export is a useful option for handling large data sets: first transform a sample of your data in the Wrangler interface, then run the resulting script on the full data set.

Wrangler currently supports output scripts in two languages: Python (for data-crunching on the back end) and JavaScript (should you want to transform in the browser, or using node.js). To run either, you'll also need to download the corresponding Wrangler runtime. Though your mileage may vary, we've been able to quickly wrangle files with millions of rows using exported scripts.

To run exported Python code, install the Wrangler runtime via easy_install datawrangler or download it here.

To run exported JavaScript, download the JS runtime here.

Happy Wrangling!
The Wrangler Team

0 comment »

Welcome!

Posted on Feb 8, 2011 at 01:48 PM

We launched Wrangler just a few days ago, and are excited to have already received so many inquiries. Thanks to everyone who has shared their comments and feature requests! We'd like to briefly share some of our thoughts and respond to a few of the most common questions.

My data is BIG, can I use Wrangler? And can I export my script?
For our alpha launch, we are releasing Wrangler as a client-side JavaScript application. Obviously, this puts some heavy limitations on the size of the input data. In the future we plan to introduce a backend component that communicates with the front-end to greatly improve scalability.
In the meantime, we will soon augment the current Wrangler UI to enable export of data transformation scripts (for example, as Python or Hadoop scripts). So one strategy will be to paste a sub-sample of your data into Wrangler, and then use the resulting script on your full data set.

What are your plans for releasing the software?
At this initial stage we are simply making the web application available. By so doing, we hope to improve the tool with user feedback prior to our next release. We intend to eventually release the system as open-source software.

My data is private, can I use Wrangler?
During this initial experimental phase, we're interested in learning how Wrangler is being used, and we're hoping you will be willing to help. To that end we are logging transformation steps and their initiating interactions (column header clicks, text selection ranges). We do not transmit nor do we store your full pasted data set. However, data elements referenced within transformation steps (column names and selected ranges of text) are included in our log. In the future we plan to release the software so you can run it standalone on your own machine.

Can I learn more about data cleaning? Are there other tools to explore?
Yes! Data Wrangler builds on ideas published over ten years ago in the prescient Potter's Wheel system, as well as great work in the area of "programming by demonstration". And Wrangler is hardly the only kid on the block – you might also be interested in David Huynh and Co.'s work on Google Refine.

What else is coming down the pipeline?
We have some research surprises cooking in the laboratory. For example, more advanced visualizations to help further explore and clean the data once it has been "wrangled" into a tabular format.

Come back to this page from time to time for the lastest updates!

Thanks!
The Wrangler Team (Sean, Ravi, Phil, Andreas, Joe, and Jeff)

0 comment »