Text* Snippets: mechanize code

Scrape torrents on btjunkie

// download all .torrent links on the btjunkie frontpage
//
// more fun with mechanize @
// http://tramchase.com/scrape-myspace-youtube-torrents-for-fun-and-profit

agent = WWW::Mechanize.new
agent.get("http://btjunkie.org/")
links = agent.page.search('.tor_details tr a')
hrefs = links.map { |m| m['href'] }.select { |u| u =~ /\.torrent$/ } # just links ending in .torrent
FileUtils.mkdir_p('btjunkie-torrents') # keep it neat
hrefs.each { |torrent|
  filename = "btjunkie-torrents/#{torrent[0].split('/')[-2]}"
  puts "Saving #{torrent} as #{filename}"
  agent.get(torrent).save_as(filename)
}

to ruby www file mechanize scraping save scrape btjunkie torrents bittorrent by jamiew on Jan 12, 2008

Scrape MySpace friend thumbnails

// fetch all img's from a myspace profile's .friendSpace div
// more @ http://tramchase.com/scrape-myspace-youtube-torrents-for-fun-and-profit

agent = WWW::Mechanize.new
agent.get("http://myspace.com/graffitiresearchlab")
links = agent.page.search('.friendSpace img') # found w/ firebug
FileUtils.mkdir_p 'myspace-images' # make the images dir
links.each_with_index { |link, index| 
  url = link['src']
  puts "Saving thumbnail #{url}"
  agent.get(url).save_as("myspace-images/top_friend#{index}_#{File.basename url}")
}

to ruby www file mechanize scraping save scrape myspace by jamiew on Jan 12, 2008

Scrape YouTube thumbnails

// fun with mechanize
// more @ http://tramchase.com/scrape-myspace-youtube-torrents-for-fun-and-profit

agent = WWW::Mechanize.new
url = "http://gdata.youtube.com/feeds/api/standardfeeds/most_viewed" # all time
page = agent.get(url)
# parse again w/ Hpcricot for some XML convenience
doc = Hpricot.parse(page.body)
# pp (doc/:entry) # like "search"; cool division overload
images = (doc/'media:thumbnail') # use strings instead of symbols for namespaces
FileUtils.mkdir_p 'youtube-images' # make the images dir
urls = images.map { |i| i[:url] }
urls.each_with_index do |file,index|
  puts "Saving image #{file}"
  agent.get(file).save_as("youtube-images/vid#{index}_#{File.basename file}")
end

to ruby www file video mechanize scraping youtube save scrape by jamiew on Jan 12, 2008

Related Tags

Never been to TextSnippets before?

Scrape torrents on btjunkie

Scrape MySpace friend thumbnails

Scrape YouTube thumbnails