Category Archives: Code

Treating CSV files like tables in Ruby

Want to do some quick-and-dirty work in irb on data in CSV format? With the arrayfields gem, you can access array fields by name. I’ve mixed in some code to the CSV library to use the header row to provide fields for arrayfields, e.g.:

employees.csv
1 id First Name Last Name Username
2 1 Andrew Filer afiler
3 2 Ulysses Sername username
irb> require "csvtables"
irb> t = CSV::table('employees.csv')
irb> t.first.last_name
=> "Filer"

The code:

require 'arrayfields'
require 'csv'

class CSV
	def self.table(filename, mode='r')
		open(filename, mode).to_table
	end
end

class CSV::Reader
	def to_table
		struct = Array.struct self.first.map { |v| v.downcase.gsub(/\s/, '_').to_sym }
		self.map { |row| struct.new row }
	end
end

Ruby: including class methods as if they were a module

Ruby’s File class has many public class methods named like their shell command counterparts, like chmod, chown, basename, stat, etc. These are very convenient when writing a shell script in Ruby, but typing “File.chown” isn’t as convenient as just “chown”. Now, if File were a module, I could include this module and I’d be done. Many file-related methods are available in the module FileUtil (third-party in Ruby 1.8, standard in Ruby 1.9) as well as FileTest. Methods like basename and stat are available in neither, so I thought a quick solution might be to also include File’s class methods. There doesn’t seem to be any straightforward way to do this, perhaps because File may be one of the few classes with so many static methods that don’t come from a module. So, a quick, dirty, Ruby metaprogramming solution:

def include_class_methods(klass)
  eval <<-EOF
    class << self
      klass = eval('#{klass.name}')
      klass.singleton_methods.each do |sym|
        define_method(sym) do |*args|
          klass.send sym, *args
        end
      end
    end
  EOF
end

Note: this doesn't seem to work in Ruby 1.9, as File.singleton_methods is empty. One possible replacement for File.singleton_methods is (File.public_methods - File.public_instance_methods - File.ancestors.reject { |x| x == File }.map { |x| x.methods }.flatten). Yeesh.

libmdb-ruby

Out there, on computers all across the globe, Microsoft Access databases abound, with new data going in, but with little change to their structure (perhaps the last change was for “the Y2K problem”). Many of these, in machine shops, local museums, and little league offices, will exist for billions of years, perhaps even until the heat death of the universe. I’d like to change that. The first step in this process is a set of Ruby bindings for the mdbtools C library, which I’ve creatively titled libmdb-ruby [github].

html2csv

Does what it says on the tin. Do “gem install nokogiri” or “sudo apt-get install libnokogiri-ruby” if necessary.

#!/usr/bin/env ruby

require 'rubygems'
require 'nokogiri'
require 'csv'

def main(f)
  Nokogiri::HTML.parse(open(f)).search('table').each do |t|
    t.search('tr').each do |tr|
      puts CSV.generate_line(tr.xpath('th|td').collect do |td|
        td.text
      end)
    end
    puts
  end
end

ARGV.each { |f| main(f) }

Can your own web pages at home!

pickles, originally uploaded by valkyrieh116.

Want to bundle a web page into a single file, without a _files directory, or using the not-supported-everywhere .mht (IE, Opera) or .webarchive (Safari) formats? Use pagecan! I developed pagecan so I can return converted documents on doc.mar.cx as a single file.

pagecan will take an URL of an HTML document, grab all resources referenced by “src”, and bundle the page and encoded resources into a single file, through the use of the data URI scheme. pagecan is written in Ruby and uses the Nokogiri parser (you can install the gem with gem install nokogiri, or the Debian package with sudo apt-get install libnokogiri-ruby).

Usage: pagecan url [file | -]

If ‘-’ or no file is given, output is sent to stdout. pagecan has been tested only with HTTP URLs, but as it uses Ruby open-uri, other URIs and local files may work.

pagecan on github

Unicode weather forecasts

Want your weather forecast in one or two unicode characters? Go to weather.mar.cx (for location detection by IP) or add the city name to the end, like http://weather.mar.cx/Paris,_TX or http://weather.mar.cx/Paris,_France.

weather.mar.cx umbrella

Instant document conversions

Want to squeeze a text file out of a Word document you found online, or need a CSV from an Excel file? Use doc.mar.cx! For example,

http://doc.mar.cx/http://www.ieee.org/documents/IEEECopyrightForm.doc

This will give you an HTML version. If you’d like a different output type, insert that type’s extension in front of the URL. For a plain-text version instead, for example,

http://doc.mar.cx/txt/http://www.ieee.org/documents/IEEECopyrightForm.doc.

PDF, HTML, text, CSV, XLS, and DOC output formats are supported on the relevant data types. I’ll soon be adding ImageMagick support to convert from zillions of image formats, and conversions to/from .SHP shapefiles, KML files and other geodata should also be supported soon.

Want to know what input document types are supported? Just try the link. If it works, then that document type is supported. If it doesn’t work, then that document type isn’t supported.

More finger features

My finger gateway now supports much more of the Internet. It supports some sites specifically, like Facebook (try finger cdc@facebook.com@finger.afiler.com), but it also supports sites that have per-user RSS feeds linked to from the page at sitename.com/username (e.g. finger afiler@flickr.com@finger.afiler.com). It also supports queries on sites that have RSS feeds linked from their main page (e.g. finger afiler.com@finger.afiler.com).

More finger feature suggestions are welcome!

man pages in Windows

At work, I’ve found myself SSHing into this Windows web server (thanks to Cygwin) often enough that I start to just think of it as another Linux server. And while Cygwin allows you to run Windows commands from a bash prompt, Windows commands don’t come with man pages (just /?). Microsoft has an A-Z List of Windows commands online, but sometimes I’d just prefer to stick to the command prompt. Conveniently, the documentation renders well in a text-based browser. To make these show up as man pages, you just need to have a text-based web browser installed, and have the html files named as the commands in a particular “chapter” of the manual. I picked chapter 9, as that’s not generally assigned.

apt-cyg install wget links # You probably don't have apt-cyg installed,
# so grab that or just use Cygwin's setup.exe instead
# to ensure wget and links are installed
ln -s /usr/bin/links /usr/bin/lynx # Man expects lynx for html pages
mkdir -p /usr/local/share/man/html9
cd /usr/local/share/man/html9
wget -O- 'http://technet.microsoft.com/en-us/library/cc772390(WS.10).aspx' |\
grep -Po 'ctl00_MTCS_main_ctl.+href="\K([^"]+)(:.+>)([^>]+)(?=)' index.html |\
sed -r 's/^([^"]+).+>([^>]+)$/\1 \2/' | \
while read url name
do name=`echo ${name// /-} | tr '[:upper:]' '[:lower:]'`
wget -O "$name.9.html" "$url"
done

That will get you man pages for all the commands in that A-Z list. For man pages on subcommands like “net computer”, type “man net-computer”. If you look at that list you’ll notice “net computer” but no “net use” or any of the other usual commands — of course, many of the net commands are well-documented through “net help”. If you really want to be unixy, you can dump those net help pages out to the manual too. Since they’re not formatted, you’ll want to put them in the cat9 directory instead of the html9 directory, and drop the .html extensions.

mkdir -p /usr/local/share/man/cat9
cd /usr/local/share/man/cat9
for cmd in `net 2>&1 | grep '|' | sed 's/^NET//;s/[^A-Z]/ /g' | tr '[:upper:]' '[:lower:]'` ; do net help $cmd > net-$cmd.9 ; done

I don’t know of any comprehensive list of commands besides the A-Z list and the net commands list. But to create individual man pages, you can do something like:

wget -O sqlcmd.9.html 'http://msdn.microsoft.com/en-us/library/ms162773(d=printer).aspx'

This will give you a man page for sqlcmd, the command-line client for SQL Server. If you wanted to grab all the man pages for the net subcommands (instead of using the results from “net help”), do

for x in accounts computer continue file group help helpmsg localgroup name pause print sendshare session start statistics stop time use user view ; do wget -O net-$x.9.html "http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/net_$x.mspx" ; done
wget -O net-config.9.html 'http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/net_config_server.mspx'

If you’re looking for a little more Ubuntu/Debianism in Windows, try Richard’s apt-get update; apt-get dist-upgrade for Wndows. Got any more urls for man pages? Please post a comment and share!

finger-twitter gateway

A new service from afiler.com! Just finger “<twitteruser>@twitter.com@finger.afiler.com”. More services possibly coming soon.

$ finger fakeapstylebook@twitter.com@finger.afiler.com
[finger.afiler.com]
Login: FakeAPStylebook Name: Fake AP Stylebook
Bio: Style tips for proper writing. contact: fakeapstylebook at gmail dot com. No submissions, please. All material copyright The Bureau Chiefs, LLC.
Location:
Web: http://www.thebureauchiefs.com
Apr 22 16:00: For an international audience, spell the pop star's name as "KeUSDha."
Apr 22 11:30: Do not reference The Oxford English Dictionary. We speak American.
...
Apr 16 07:00: It's "for all intents and purposes." "Intensive Purposes" is the hot new medical drama from CBS.
Apr 15 16:00: Be sure not to confuse "aural" and "oral." The former is very uncomfortable.
Apr 15 14:17: Bureau Chiefs Poll: Who would you choose to perform at your son?@Ys bar mitzvah? http://bit.ly/ds489i