• dottie v0.01

    From Delbert@VERT/DELBERTS to Angus Mcleod on Wednesday, October 10, 2007 07:12:00
    Hey Angus,
    I rewrote dottie using HTML::Parser and HTML::TableContentParser.

    It's not done of course, but the basic functionality is there. Whadaya
    think?

    #!/usr/bin/perl -w

    # "Bring in the milk..."
    use LWP::Simple;
    use HTML::TableContentParser;
    use strict;

    # Globals
    our $last_name;
    our $URL = shift;
    our %bbs_list;

    # Default values
    our %default = (
    "Port" => "23",
    "ConnectionType" => "Modem" );

    # Fetch page and put all the tables into an array
    my $tables = get_tables($URL);

    # Select which table(s) we want to parse
    my $table = $tables->[0]; # Should be a command line arg

    # Loop through the rows and build hash
    foreach my $r (@{ $table->{rows} }) {
    my ($name, $addr, $port, $conn) = parse_table_row($r, $URL);
    next unless $name; # If we got nodda, we got nodda
    push @{$bbs_list{$name}}, [$addr, $port, $conn];
    }

    # Sort by name and unique the dups
    # Still need to separate the sorting from the printing
    foreach my $bbs (sort { lc($a) cmp lc($b) } keys %bbs_list) {
    my $dup = undef;
    if (scalar(@{$bbs_list{$bbs}})>1) { $dup = 1 }
    foreach (@{$bbs_list{$bbs}}) {
    # Print formatted data
    printf
    "[%s]\n".
    "Address=%s\n".
    "Port=%s\n".
    "ConnectionType=%s\n\n",
    (defined $dup ? sprintf "%s %s", $bbs, $dup++ : $bbs),
    # Need to add code here to NOT print "Port" if it's "serial"
    @{$_};
    }
    }

    exit 0;

    # Some day we might need to deal with
    # more than one table per page
    sub get_tables {
    my $URL = shift;
    my $page = get($URL) ||
    die "Uh-oh! Can NOT fetch from \"$URL\"\n$!";
    my $tcp = new HTML::TableContentParser;
    return $tcp->parse($page);
    }

    # Using HTML::Parse and HTML::TableContentParser
    # did away with all the of page specific parsing
    # except for the cell regexs below
    # Eventually specific cell parsing could be put
    # in an external table for flexability
    sub parse_table_row {
    my ($row, $URL) = @_;
    my ($name, $addr, $port, $conn, $bbs_addr);
    # extract cells
    $name = $row->{cells}[0]{data};
    $bbs_addr = $row->{cells}[4]{data};

    $name =~ m{^.*href=.*">(.+)</a.*$}i;
    $name = $1;
    # What I need here is to return with name=null
    # if we don't have a name at this point. Bad row,
    # no need to go further. Gets ignored above.

    # One way to deal with the phone number/modem
    if ($bbs_addr =~ m{((\d\d\d).*(\d\d\d).*(\d\d\d\d))}) {
    $addr = $1;
    $port = "serial"
    }
    else {
    ($conn, $addr) = $bbs_addr =~
    m{<a href=(.+)://([\w|\.|-]+)(>|(:(\d+))>)<.*}i;
    $port = $5;
    }

    if ($name) { $last_name=$name }
    return ($last_name, $addr,
    $port ? $port : $default{"Port"},
    $conn ? $conn : $default{"ConnectionType"});
    }


    -j-



    ---
    þ Synchronet þ Delbert's Place BBS | telnet://delberts.audizar.com
  • From Angus McLeod@VERT/ANJO to Delbert on Wednesday, October 10, 2007 16:10:00
    Re: dottie v0.01
    By: Delbert to Angus Mcleod on Wed Oct 10 2007 07:12:00

    Hey Angus,
    I rewrote dottie using HTML::Parser and HTML::TableContentParser.

    It's not done of course, but the basic functionality is there. Whadaya think?

    Not tried it but it looks good.

    printf
    "[%s]\n".
    "Address=%s\n".
    "Port=%s\n".
    "ConnectionType=%s\n\n",
    (defined $dup ? sprintf "%s %s", $bbs, $dup++ : $bbs),
    # Need to add code here to NOT print "Port" if it's "serial"
    @{$_};

    Separate printf()s so the Port can be optional? Or construct $format on
    the fly, leaving out Port=%s\n if port is 'serial', and finally doing a
    single printf $format with or without a Port argument?

    ---
    Playing: "Quantum Theory" by "Jarvis Cocker" from the "Jarvis" album.
    þ Synchronet þ Programatically generated on The ANJO BBS
  • From Delbert@VERT/DELBERTS to Angus McLeod on Wednesday, October 10, 2007 21:06:00
    Angus McLeod wrote to Delbert <=-

    Re: dottie v0.01
    By: Delbert to Angus Mcleod on Wed Oct 10 2007 07:12:00

    Hey Angus,
    I rewrote dottie using HTML::Parser and HTML::TableContentParser.

    It's not done of course, but the basic functionality is there. Whadaya think?

    Not tried it but it looks good.

    printf
    "[%s]\n".
    "Address=%s\n".
    "Port=%s\n".
    "ConnectionType=%s\n\n",
    (defined $dup ? sprintf "%s %s", $bbs, $dup++ : $bbs),
    # Need to add code here to NOT print "Port" if it's "serial"
    @{$_};

    Separate printf()s so the Port can be optional? Or construct $format
    on the fly, leaving out Port=%s\n if port is 'serial', and finally
    doing a single printf $format with or without a Port argument?

    I think a separate printing function so format can be adjustable from
    the command line.

    The other thing is where to do the dup mangling. I really think it
    should be part of the sort function so that dup or no_dup can be
    selectable at that point in the program.
    But if that's the case, I'm not sure about loading up an
    array like @keynames at that point because the names in the hash have to
    be mangled as well to be able to call them out later by name.

    How about a snippet to get me pointed in the right direction for a
    routine that will load @keynames with the new mangled names, as well as
    change the names in the original hash at the same time?

    Or do we just make a new hash AND a new names array at that point while looping through them to get the sort done?


    -j-

    ... MultiMail, the new multi-platform, multi-format offline reader!
    --- MultiMail/Linux v0.49
    þ Synchronet þ Delbert's Place BBS | telnet://delberts.audizar.com
  • From Angus McLeod@VERT/ANJO to Delbert on Thursday, October 11, 2007 00:40:00
    Re: Re: dottie v0.01
    By: Delbert to Angus McLeod on Wed Oct 10 2007 21:06:00

    Separate printf()s so the Port can be optional? Or construct $format on the fly, leaving out Port=%s\n if port is 'serial', and finally doing a single printf $format with or without a Port argument?

    I think a separate printing function so format can be adjustable from
    the command line.

    Or maybe something with 'format STDOUT', etc?

    The other thing is where to do the dup mangling. I really think it
    should be part of the sort function so that dup or no_dup can be
    selectable at that point in the program.

    I think of the program as two phases. One that collects the data and one
    that presents it. The collection phase should do as little data mangling
    or reduction as possible, so that the presentation phase has more options.

    If you separate sorting out as an interim phase (rather than a part of the presentation loop) then you *could* do it there, but I prefer to keep the
    data as complete and intact as possible for as long as possible, so as not
    to limit the capability of current and/or future presentation routines.

    But if that's the case, I'm not sure about loading up an array like @keynames at that point because the names in the hash have to be
    mangled as well to be able to call them out later by name.

    That is why I'd prefer to do the mangling *after* the sorting.

    Or do we just make a new hash AND a new names array at that point while looping through them to get the sort done?

    You could do that -- take the collected data and build a rationalized structure representing the data as you wish to print it. But I'd rather
    not go that road if it can be avoided.

    ---
    Amarok: 14,994 tracks from 1,177 albums by 598 artists, but none playing.
    þ Synchronet þ Programatically generated on The ANJO BBS
  • From Delbert@VERT/DELBERTS to Angus McLeod on Thursday, October 11, 2007 15:34:00
    Angus McLeod wrote to Delbert <=-

    Re: Re: dottie v0.01
    By: Delbert to Angus McLeod on Wed Oct 10 2007 21:06:00

    Separate printf()s so the Port can be optional? Or construct $format on the fly, leaving out Port=%s\n if port is 'serial', and finally doing a single printf $format with or without a Port argument?

    I think a separate printing function so format can be adjustable from
    the command line.

    Or maybe something with 'format STDOUT', etc?

    I might understand what you mean here with a bit more study. ;)

    That is why I'd prefer to do the mangling *after* the sorting.

    Or do we just make a new hash AND a new names array at that point while looping through them to get the sort done?

    You could do that -- take the collected data and build a rationalized structure representing the data as you wish to print it. But I'd
    rather not go that road if it can be avoided.

    So I put sort into it's own sub, and then have it build and return
    @keynames and %sorted... But that was before I read your reply.

    But I can just break the dup mangler part off into it's own sub and have
    it return a de-duped hash or array. I see your point about keeping the original hash as is, and then formatting and mangling as needed. I'll
    try to stick to that approach as I cobble along.

    That really calls for pretty much all of the rest of the stuff going
    into separate subs too then doesn't it? Then one can just call out which functions one wants to include in the processing and/or outputing of the
    data, yes?

    I'll work on getting the rest detangled. Meanwhile, here's the latest
    kludge:


    #!/usr/bin/perl -w

    # "Bring in the milk..."
    use LWP::Simple;
    use HTML::TableContentParser;
    use strict;

    # Globals
    # these will get passed from func to func eventually
    our $last_name;
    our $URL = shift;

    # Default values
    our %default = (
    "Port" => "23",
    "ConnectionType" => "Modem" );

    # Fetch page and put all the tables into an array
    my $tables = get_tables($URL);

    # Select which table(s) we want to parse
    my $table = $tables->[0]; # Should be a command line arg

    # Loop through the rows and build hash
    my %bbs_list;
    foreach my $r (@{ $table->{rows} }) {
    my ($name, $addr, $port, $conn) = parse_table_row($r, $URL);
    next unless $name; # If we got nodda, we got nodda
    push @{$bbs_list{$name}}, [$addr, $port, $conn];
    }

    sort_it(%bbs_list);

    exit 0;

    # Sort by name and unique the dups
    # return @keylist and %sorted
    # Still need to add a dup mangle yes/no
    # AND add sort spec vars
    # AND make de-dup it's own sub
    # AND make format/print it's own func er... I mean sub
    sub sort_it {
    my %bbs_list=@_;
    my @keylist;
    my %sorted;
    foreach my $bbs (sort { lc($a) cmp lc($b) } keys %bbs_list) {
    my $dup = undef;
    if (scalar(@{$bbs_list{$bbs}})>1) { $dup = 1 }
    foreach (@{$bbs_list{$bbs}}) {
    push @keylist, (defined $dup ? sprintf "%s %s", $bbs, $dup++ :
    $bbs);
    push @{$sorted{$keylist[$#keylist]}}, @{$_}; # WHAT's the real
    # Print formatted data syntax for this?
    printf
    "[%s]\n".
    "Address=%s\n".
    "Port=%s\n".
    "ConnectionType=%s\n\n",
    $keylist[$#keylist], @{$sorted{$keylist[$#keylist]}};
    }
    }
    return (@keylist, %sorted);
    } # not sure yet how to receieve returned multiple arrays/hashes.

    # Some day we might need to deal with
    # more than one table per page
    sub get_tables {
    my $URL = shift;
    my $page = get($URL) ||
    die "Uh-oh! Can NOT fetch from \"$URL\"\n$!";
    my $tcp = new HTML::TableContentParser;
    return $tcp->parse($page);
    }

    # Using HTML::Parse and HTML::TableContentParser
    # did away with all the of page specific parsing
    # except for the cell regexs below
    # Eventually specific cell parsing could be put
    # in an external table for flexability
    sub parse_table_row {
    my ($row, $URL) = @_;
    my ($name, $addr, $port, $conn, $bbs_addr);
    # extract cells
    $name = $row->{cells}[0]{data};
    $bbs_addr = $row->{cells}[4]{data};

    $name =~ m{^.*href=.*">(.+)</a.*$}i;
    $name = $1;
    # What I need here is to return with name=undef
    # if we don't have a name at this point. Bad row,
    # no need to go further. Gets ignored above.

    # One way to deal with the phone number/modem
    # now a little stricter
    if ($bbs_addr =~ m{((\d\d\d).(\d\d\d).(\d\d\d\d))}) {
    $addr = $1;
    $port = "serial"
    } # what's the perlly shorthand for if/else?
    else {
    ($conn, $addr) = $bbs_addr =~
    m{<a href=(.+)://([\w|\.|-]+)(>|(:(\d+))>)<.*}i;
    $port = $5;
    }

    if ($name) { $last_name=$name }
    return ($last_name, $addr,
    $port ? $port : $default{"Port"},
    $conn ? $conn : $default{"ConnectionType"});
    }


    -j-

    ... MultiMail, the new multi-platform, multi-format offline reader!
    --- MultiMail/Linux v0.49
    þ Synchronet þ Delbert's Place BBS | telnet://delberts.audizar.com
  • From Delbert@VERT/DELBERTS to Angus McLeod on Friday, October 12, 2007 00:30:00
    Re: Re: dottie v0.02
    By: Delbert to Angus McLeod on Thu Oct 11 2007 15:34:00

    Angus McLeod wrote to Delbert <=-

    Or do we just make a new hash AND a new names array at that point while looping through them to get the sort done?

    You could do that -- take the collected data and build a rationalized structure representing the data as you wish to print it. But I'd rather not go that road if it can be avoided.

    Is this more what you were getting at?

    print_it(my @keylist = sort_it(%bbs_list));
    # ^^^^^^^^^
    # Sending in whole global hash for now
    # but could be a presort filter
    exit 0;

    # Takes in hash, returns sorted array of keys
    # Needs vars for different sort options
    sub sort_it {
    my %bbs_list = @_;
    my @keylist = (sort { lc($a) cmp lc($b) } keys %bbs_list);
    return (@keylist);
    }

    # So I guess de-duping maybe makes the most sense when
    # considered as a formatting method in the output function?
    # Still needs vars and structure for formatting methods
    sub print_it {
    foreach my $bbs (@_) {
    my $dup = undef;
    if (scalar(@{$bbs_list{$bbs}})>1) { $dup = 1 }
    foreach (@{$bbs_list{$bbs}}) {
    # Print formatted data
    printf
    "[%s]\n".
    "Address=%s\n".
    "Port=%s\n".
    "ConnectionType=%s\n\n",
    (defined $dup ? sprintf "%s %s", $bbs, $dup++ : $bbs),
    # Need to add code here to NOT print "Port" if it's "serial"
    @{$_};
    }
    }
    }

    -j-



    ---
    þ Synchronet þ Delbert's Place BBS | telnet://delberts.audizar.com
  • From Delbert@VERT/DELBERTS to Angus McLeod on Friday, October 12, 2007 02:07:00
    Re: Re: dottie v0.02
    By: Delbert to Angus McLeod on Fri Oct 12 2007 00:30:28

    Hey look, v0.03 already.
    This works pretty good now. Needs some bounds checking and stuff
    to make it less alpha, and my coding style looks like I was raised
    on ASM and BASIC, but it's a start.

    #!/usr/bin/perl -w

    # dottie.pl v0.03 - fetches sbbs list and outputs syncterm phonebook

    # "Bring in the milk..."
    use LWP::Simple;
    use HTML::TableContentParser;
    use strict;

    # Globals
    our $last_name;
    our $URL = shift;
    our %bbs_list;

    # Default values
    our %default = (
    "Port" => "23",
    "ConnectionType" => "Modem" );

    # Fetch page and put all the tables into an array
    my $tables = get_tables($URL);

    # Select which table(s) we want to parse
    my $table = $tables->[0]; # Should be a command line arg

    # Loop through the rows and build hash
    foreach my $r (@{ $table->{rows} }) {
    my ($name, $addr, $port, $conn) = parse_table_row($r, $URL);
    next unless $name; # If we got nodda, we got nodda
    push @{$bbs_list{$name}}, [$addr, $port, $conn];
    }

    print_it(my @keylist = sort_it(%bbs_list));
    # { } { }
    # This could be a And this could be
    # postsort filter presort filter
    exit 0;

    # Takes in hash, returns sorted array of keys
    # Needs vars for different sort options
    sub sort_it {
    my %bbs_list = @_;
    my @keylist = (sort { lc($a) cmp lc($b) } keys %bbs_list);
    return (@keylist);
    }

    # So I guess de-duping maybe makes the most sense when
    # considered as a formatting method in the output function?
    # Still needs vars and structure for formatting methods
    sub print_it {
    foreach my $bbs (@_) {
    my $dup = undef;
    if (scalar(@{$bbs_list{$bbs}})>1) { $dup = 1 }
    foreach (@{$bbs_list{$bbs}}) {
    # I guess these can be dealt with this way
    my ( $addr, $port, $conn ) = @{$_};
    printf
    "[%s]\n".
    "Address=%s\n".
    "%s".
    "ConnectionType=%s\n\n",
    (defined $dup ? sprintf "%s %s", $bbs, $dup++ : $bbs),
    $addr,
    ($port eq "serial" ? sprintf "" : sprintf "Port:%s\n", $port),
    $conn;
    }
    }
    }

    # Return array of tables
    sub get_tables {
    my $URL = shift;
    my $page = get($URL) ||
    die "Uh-oh! Can NOT fetch from \"$URL\"\n$!";
    my $tcp = new HTML::TableContentParser;
    return $tcp->parse($page);
    }

    # Eventually specific cell parsing could be put
    # in an external table for flexability
    sub parse_table_row {
    my ($row, $URL) = @_;
    my ($name, $addr, $port, $conn, $bbs_addr);

    # Extract cells
    # Need command line vars to say which cells
    $name = $row->{cells}[0]{data};
    return unless $name;
    $bbs_addr = $row->{cells}[4]{data};

    $name =~ m{^.*href=.*">(.+)</a.*$}i;
    $name = $1;

    # One way to deal with the phone number/modem
    if ($bbs_addr =~ m{((\d\d\d).(\d\d\d).(\d\d\d\d))}) {
    $addr = $1;
    $port = "serial"
    } # what's the perlly shorthand for if/else?
    else {
    ($conn, $addr) = $bbs_addr =~
    m{<a href=(.+)://([\w|\.|-]+)(>|(:(\d+))>)<.*}i;
    $port = $5;
    }

    if ($name) { $last_name=$name }
    return ($last_name, $addr,
    $port ? $port : $default{"Port"},
    $conn ? $conn : $default{"ConnectionType"});
    }

    -j-



    ---
    þ Synchronet þ Delbert's Place BBS | telnet://delberts.audizar.com
  • From Angus McLeod@VERT/ANJO to Delbert on Friday, October 12, 2007 01:50:00
    Re: Re: dottie v0.02
    By: Delbert to Angus McLeod on Thu Oct 11 2007 15:34:00

    Or maybe something with 'format STDOUT', etc?

    I might understand what you mean here with a bit more study. ;)

    perldoc -f select
    perldoc -f format
    perldoc -f write

    And of course,

    perldoc perlform

    :-)

    Although, in practice this feature probably won't help the program much.

    ---
    Playing: "Midnight Sun" by "Badfinger" from the "Magic Christian Music" album.
    þ Synchronet þ Programatically generated on The ANJO BBS
  • From Angus McLeod@VERT/ANJO to Delbert on Friday, October 12, 2007 08:50:00
    Re: Re: dottie v0.02
    By: Delbert to Angus McLeod on Fri Oct 12 2007 00:30:00

    You could do that -- take the collected data and build a rationalize structure representing the data as you wish to print it. But I'd rather not go that road if it can be avoided.

    Is this more what you were getting at?

    print_it(my @keylist = sort_it(%bbs_list));
    # ^^^^^^^^^
    # Sending in whole global hash for now
    # but could be a presort filter

    Yes, pretty much.

    But let me say, you seem to be spending a lot of time on a program that already does what it needs to!

    ---
    Playing: "Heavy Load" by "Free" from the "Fire & Water" album.
    þ Synchronet þ Programatically generated on The ANJO BBS
  • From Delbert@VERT/DELBERTS to Angus McLeod on Saturday, October 13, 2007 02:12:00
    Re: Re: dottie v0.02
    By: Angus McLeod to Delbert on Fri Oct 12 2007 08:50:00

    But let me say, you seem to be spending a lot of time on a program that already does what it needs to!

    Not much time spent. And very educational.

    -j-



    ---
    þ Synchronet þ Delbert's Place BBS | telnet://delberts.audizar.com