• Nodediffs & db:s

    From Jesper S÷rensen@2:204/255 to Jan Vermeulen on Mon Jan 6 22:34:13 2003
    The problem is how do you know what data to add, update or delete?

    Diff-files tell you which records to add, which to replace, which
    to delete and which to leave alone.

    Well, yes, but the "records" are lines (nodes, hosts, regions, comments, errors
    etc.) in the text file nodelist, and the diff is useless if you don't have the original file. You don't know if the record to delete is a node (and its full node number if so).

    The diffs don't say "add 2:204/255..."; they say "copy (ignore) 17
    lines", "delete 3 lines", "add the following 4 lines" and so on, so
    you still need to have the original nodelist to be able to resolve
    the diff into something useful. :-(

    You are expected to work on complete records that have been
    arranged in a given order -- what are you planning to do to the data
    that you can't locate your records anymore?

    I'm importing the data into a db table and use the Fidonet address as the primary key (but it's also indexed on e.g. sysop name & location for faster searches). I'm perfectly able to locate the data, just not via the line number the node had in the original nodelist (which would be a pita to work with since
    it's likely to change every week).

    I'm sure it would be possible to make the table an exact copy of the nodelist file (with line numbers and everything) but it would be a very complex thing to
    work with and most db designers would probably start crying if they saw it. It's simpler to truncate the table and reload the entire nodelist from the updated file.

    Do you want to scatter your data all over the place without
    creating an index file or what?

    Of course not.

    If you know of a magic way to convert the nodediffs into SQL insert/update/delete commands please tell me how, but if you would try this yourself I'm sure you'd see the problem.

    Jesper,
    yeppe@enjoy.cc
    ---
    * Origin: Singularity/2 - Swedish Internet Backbone (2:204/255)
  • From Jan Vermeulen@2:280/100 to Jesper S÷rensen on Tue Jan 7 01:37:34 2003
    Quoting Jesper S÷rensen on Mon 6 Jan 2003 22:34 to Jan Vermeulen:

    The problem is how do you know what data to add, update or delete?

    Diff-files tell you which records to add, which to replace, which
    to delete and which to leave alone.

    Well, yes, but the "records" are lines (nodes, hosts, regions,
    comments, errors etc.) in the text file nodelist, and the diff is
    useless if you don't have the original file. You don't know if the
    record to delete is a node (and its full node number if so).

    Have you ever done serious work on nodelists?

    If your diff says D5 you just delete those 5 lines. It is not necessary to go in savvy mode and salvage a record.

    And yes, of course, you will need to place each node between brackets as in
    <record> this is a text line <record/>.

    The diffs don't say "add 2:204/255..."; they say "copy (ignore) 17
    lines", "delete 3 lines", "add the following 4 lines" and so on, so
    you still need to have the original nodelist to be able to resolve
    the diff into something useful. :-(

    You are expected to work on complete records that have been
    arranged in a given order -- what are you planning to do to the data
    that you can't locate your records anymore?

    I'm importing the data into a db table and use the Fidonet address
    as the primary key (but it's also indexed on e.g. sysop name &
    location for faster searches). I'm perfectly able to locate the
    data, just not via the line number the node had in the original
    nodelist (which would be a pita to work with since it's likely to
    change every week).

    The nodelist entries have no line numbers either. Their line number is their offset in lines from the start of the list, prolog included.

    You get a new diff each week to update last week's nodelist. As long as you
    have kept your records in order (and why shouldn't you?), you move them to an auxiliary base record by record, inserting the new nodes as per the diff telling you Axx, copying where it says Cxx and skipping where it says Dxx. Then
    kill the old file and rename the new one to the old one's name.

    Easy, ain't it? And that has been invented only eighteen years ago. It's just come of age...

    I'm sure it would be possible to make the table an exact copy of
    the nodelist file (with line numbers and everything) but it would
    be a very complex thing to work with and most db designers would
    probably start crying if they saw it. It's simpler to truncate the
    table and reload the entire nodelist from the updated file.

    You could, but you don't to have to.

    Do you want to scatter your data all over the place without
    creating an index file or what?

    Of course not.

    I thought so ;-)

    If you know of a magic way to convert the nodediffs into SQL insert/update/delete commands please tell me how, but if you would
    try this yourself I'm sure you'd see the problem.

    I'm not going to look backwards, Jesper. All I can add here is to advise you to do some serious low-level programming. It's refreshing.

    Do you know there is even elegance in writing assembler?


    -=<[ JV ]>=-


    * Origin: The Poor Man's Workstation -- Wormerveer NL (2:280/100)
  • From Scott Little@3:712/848 to Jan Vermeulen on Tue Jan 7 19:53:50 2003
    [ 07 Jan 03 01:37, Jan Vermeulen wrote to Jesper S÷rensen ]

    You get a new diff each week to update last week's nodelist. As
    long as you have kept your records in order (and why shouldn't you?),
    you move them to an auxiliary base record by record, inserting the new nodes as per the diff telling you Axx, copying where it says Cxx and skipping where it says Dxx. Then kill the old file and rename the new
    one to the old one's name.

    Thats extra complication that wouldn't be necessary if updates specified the node. Line based updates also mean taking the table offline (totally unacceptable) to do a complete copy/rename, using transactions (may not be available), or a post-diff comparison of the old and new tables to determine whats changed and update the live table accordingly (*shudder*).


    -- Scott Little [fidonet#3:712/848 / sysgod@sysgod.org]

    --- FMail/Win32 1.60+
    * Origin: Cyberia: All your msgbase are belong to us! (3:712/848)
  • From Jesper S÷rensen@2:204/255 to Jan Vermeulen on Tue Jan 7 12:44:34 2003
    Have you ever done serious work on nodelists?

    That depends on what you mean by serious work. I've written several nodelist processing tools, including a flag checker, a CC utility (to "mail bomb" all downlinks of a *C) and some half working MakeNL clones in both C++ and Java (maybe I have some Pascal code left too?). The most recent work I did was to write some simple scripts in Awk and Perl to convert the nodelist into SQL and XML so I think I know what I need to know about nodelists. Do I pass?

    The nodelist entries have no line numbers either. Their line
    number is their offset in lines from the start of the list, prolog included.

    That's what I meant.

    You get a new diff each week to update last week's nodelist. As
    long as you have kept your records in order (and why shouldn't you?),
    you move them to an auxiliary base record by record, inserting the new nodes as per the diff telling you Axx, copying where it says Cxx and skipping where it says Dxx. Then kill the old file and rename the new
    one to the old one's name.

    It would be interesting to see you actually implementing something like that with SQL. Do that before you tell me how easy it is.

    Easy, ain't it? And that has been invented only eighteen years
    ago. It's just come of age...

    I'm not saying that it's difficult if you're only working with files, but I'm not.

    If you know of a magic way to convert the nodediffs into SQL
    insert/update/delete commands please tell me how, but if you would
    try this yourself I'm sure you'd see the problem.

    I'm not going to look backwards, Jesper. All I can add here is to advise you to do some serious low-level programming. It's refreshing.

    Low level processing of diffs is super simple but that's not what I want to do.
    I want to update the nodelist in my db which I'm using from my Fidonet client right now.

    Do you know there is even elegance in writing assembler?

    Almost all kinds of programming have their elegance (well, maybe not VB ;-). I've written some assembler (for Motorola 68k and Intel 8088 processors) but that was 10-15 years ago. It has its charm but it's not very suitable for anything "bigger" if you ask me.

    The fact that I nowadays mainly use Java doesn't mean I don't know anything about lower level languages. I use Java because I like it and because it's very
    suitable for the kind of software I'm currently developing, not because it's the only language I know.

    Jesper,
    yeppe@enjoy.cc
    ---
    * Origin: Singularity/2 - Swedish Internet Backbone (2:204/255)
  • From Jan Vermeulen@2:280/100 to Scott Little on Tue Jan 7 12:48:40 2003
    Quoting Scott Little on Tue 7 Jan 2003 19:53 to Jan Vermeulen:

    You get a new diff each week to update last week's nodelist. As
    long as you have kept your records in order (and why shouldn't you?),
    you move them to an auxiliary base record by record, inserting the new
    nodes as per the diff telling you Axx, copying where it says Cxx and
    skipping where it says Dxx. Then kill the old file and rename the new
    one to the old one's name.

    Thats extra complication that wouldn't be necessary if updates
    specified the node. Line based updates also mean taking the table
    offline (totally unacceptable) to do a complete copy/rename, using transactions (may not be available), or a post-diff comparison of
    the old and new tables to determine whats changed and update the
    live table accordingly (*shudder*).

    I've never said you should do that on line. You can prepare all your files in a separate operation and than swap them. Should take not more than a few seconds on the kind of system you apparently use...


    -=<[ JV ]>=-


    * Origin: The Poor Man's Workstation -- Wormerveer NL (2:280/100)
  • From Scott Little@3:712/848 to Jan Vermeulen on Wed Jan 8 04:31:38 2003
    [ 07 Jan 03 12:48, Jan Vermeulen wrote to Scott Little ]

    I've never said you should do that on line. You can prepare all
    your files in a separate operation and than swap them. Should take not more than a few seconds on the kind of system you apparently use...

    Downtime of any kind, even a few seconds, is totally unacceptable. That's just
    a complete cop-out and poor design.


    -- Scott Little [fidonet#3:712/848 / sysgod@sysgod.org]

    --- FMail/Win32 1.60+
    * Origin: Cyberia: All your msgbase are belong to us! (3:712/848)
  • From Jan Vermeulen@2:280/100 to Jesper S÷rensen on Tue Jan 7 21:47:16 2003
    Quoting Jesper S÷rensen on Tue 7 Jan 2003 12:44 to Jan Vermeulen:

    Do I pass?

    You're not entirely hopeless ;-)

    It would be interesting to see you actually implementing something
    like that with SQL. Do that before you tell me how easy it is.

    You are the one wanting to unsimplify the problem, no me.

    Easy, ain't it? And that has been invented only eighteen years
    ago. It's just come of age...

    I'm not saying that it's difficult if you're only working with
    files, but I'm not.

    I am, and sofar I'm satisfied with what I get.

    I'm putting an end to this thread now, Jesper. What you and the others want
    to do is far to early. First of all we'll need to eliminate a small number of bugs in the current nodelist operation and I intend to work on it from now on.

    It's been fun sofar and I have no doubt that we'll met again later.

    -=<[ JV ]>=-


    * Origin: The Poor Man's Workstation -- Wormerveer NL (2:280/100)
  • From Jan Vermeulen@2:280/100 to Scott Little on Tue Jan 7 21:55:00 2003
    Quoting Scott Little on Wed 8 Jan 2003 4:31 to Jan Vermeulen:

    I've never said you should do that on line. You can prepare all
    your files in a separate operation and than swap them. Should take not
    more than a few seconds on the kind of system you apparently use...

    Downtime of any kind, even a few seconds, is totally unacceptable.
    That's just a complete cop-out and poor design.

    I'm going back to the immediate problems. I've alread spoiled to much time in this thread.

    TTUL.

    -=<[ JV ]>=-


    * Origin: The Poor Man's Workstation -- Wormerveer NL (2:280/100)