I often have to iterate over a collection and perform some remote, or long running task on each member of the collection.
Threaded Collections is a package for iterating through collections over multiple threads. With large collections, sometimes it can be more efficient to process a collection in parallel, provided that the collected items don’t have a interdependencies, or need to be processed in a specific order.
Usage:
1 2 3 4 5 6 7 8 9 | require "threaded_collections" threadcount = 2 arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0] tps = ThreadedCollectionProcessor.new(arr) tps.process(2) do |thread_id, item| puts "Thread #{thread_id} processed item: #{item}" sleep 1 end |
I abstracted this pattern from a web services client that posted items from a collection, but
each request took a second to process. The remote service had plenty of threads available, so
I parallelized the task with this pattern.
I have no plans to break the interface but I do plan to make two major enhancements:
- Make it possible to mix this functionality in to the Ruby iterators, so you don’t have create the ThreadedCollectionProcessor.
- Make it work with fibers and processes in addition to threads.
If you want to check it out, you can get the source from the threadedcollections github site.
To install the gem without git:
1 | bash# sudo gem install peteonrails-threaded-collections --source=http://gems.github.com |
Add New Comment
Viewing 2 Comments
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment