• long lines in a file

    From Michael Preslar@1:275/112 to All on Thu Dec 15 16:15:32 2005
    Ive recently learned of a small, well, snafu with pascal..

    Some example code:

    var tf:text;
    s:string;

    begin
    assign(tf,'main.ans');
    reset(tf);
    while not eof(tf) do
    begin
    readln(tf,s);
    ansi_write_line(s);
    end;
    close(tf);
    end.

    Now heres the snafu.. Say the first line in the file has 418 characters. For that matter, it could be any length as long as its greater than 255 chars. Pascal will read the first 255 chars, do the ansi_write_line and then move to the second line in the file, instead of reading the rest of the characters on the first line before moving on.

    So the question is:

    What would you guys suggest as the best way to handle such situations? blockread() then parse 255 characters at a time?
    --- SBBSecho 2.11-Win32
    * Origin: mount_z.synchro.net - Home of Lord/* (1:275/112)
  • From Sean Dennis@1:18/200 to Michael Preslar on Fri Dec 16 15:44:38 2005
    Michael,

    *** Quoting Michael Preslar from a message to All ***

    What would you guys suggest as the best way to handle such
    situations? blockread() then parse 255 characters at a time?

    ...or maybe you could write a procedure to parse the ANSI codes "on the fly" so
    that way you can just read everything in one character at a time, but parse the
    ANSI as it comes in.

    I don't know much about blockread, but hope this might help somehow.

    Later,
    Sean

    // sean@outpostbbs.net | http://outpostbbs.net | ICQ: 19965647

    --- Telegard/2 v3.09.g2-sp4/mL
    * Origin: Outpost BBS - outpostbbs.darktech.org (1:18/200)
  • From Scott Little@3:712/848 to Michael Preslar on Sat Dec 17 08:27:54 2005
    [ 15 Dec 05 16:15, Michael Preslar wrote to All ]


    What would you guys suggest as the best way to handle such situations? blockread() then parse 255 characters at a time?

    I assume ansi_write_line is doing something with the ANSI strings (local emulation, for eg), therefore you'll need to either pass the entire line in one
    call, or tokenise (?) the input first. Otherwise you'll end up breaking ANSI sequences into parts that won't be interpreted properly across calls.

    As long as you have all the source, it'd be easiest to modify ansi_write_line to accept a buffer instead of a string, since it probably already does all its own ANSI parsing. Rename it ansi_write_buff and make ansi_write_line a frontend (convert string to buff).

    So yes, you're going to have to blockread, but you'll have to loop it to return
    a complete buffer for each line.


    -- Scott Little [fidonet#3:712/848 / sysgod@sysgod.org]

    --- GoldED+/W32-MINGW 1.1.5-b20051022
    * Origin: Cyberia: 100% Grade "A" mansteak baby. (3:712/848)
  • From Michael Preslar@1:275/112 to Sean Dennis on Fri Dec 16 16:46:02 2005
    Re: Re: long lines in a file
    By: Sean Dennis to Michael Preslar on Fri Dec 16 2005 03:44 pm

    What would you guys suggest as the best way to handle such
    situations? blockread() then parse 255 characters at a time?

    ...or maybe you could write a procedure to parse the ANSI codes "on the fly" that way you can just read everything in one character at a time, but parse ANSI as it comes in.

    I don't know much about blockread, but hope this might help somehow.

    Character by character would be waaaay to slow.

    Im thinking that I'll have to:

    repeat
    blockread(f,buf,2048);
    s = copy(buf,1,255);
    ansi_write_line(s);
    until eof(f);

    or something.
    --- SBBSecho 2.11-Win32
    * Origin: mount_z.synchro.net - Home of Lord/* (1:275/112)
  • From Sean Dennis@1:18/200 to Michael Preslar on Fri Dec 16 21:09:18 2005
    Michael,

    *** Quoting Michael Preslar from a message to Sean Dennis ***

    repeat blockread(f,buf,2048); s = copy(buf,1,255); ansi_write_line(s); until eof(f);

    That sounds good and a lot faster than my suggestion. :)

    Later,
    Sean

    // sean@outpostbbs.net | http://outpostbbs.net | ICQ: 19965647

    --- Telegard/2 v3.09.g2-sp4/mL
    * Origin: Outpost BBS - outpostbbs.darktech.org (1:18/200)
  • From Björn Felten@2:203/2 to Sean Dennis on Sat Dec 17 17:11:53 2005
    repeat blockread(f,buf,2048); s = copy(buf,1,255);
    ansi_write_line(s); until eof(f);

    That sounds good and a lot faster than my suggestion. :)

    I think it sounds really bad. Whenever a blockread of 2048 is performed the file pointer is moved 2048 ahead, meaning that his procedure will read only 255
    bytes out of every 2048 batch. A nice way to reduce a big text file by almost 90%, but probably not what was intended.

    No, I think he'll be better off with one of the procedures I've published in
    the SWAG. I don't remember right now the exact name of it, but I'm sure it can be easily found. Should anyone be really interested, I can dig it up and give the proper link to it...

    ---
    * Origin: news://felten.yi.org (2:203/2)
  • From Sean Dennis@1:18/200 to Björn Felten on Sat Dec 17 15:23:22 2005
    Bjorn,

    *** Quoting Björn Felten from a message to Sean Dennis ***

    I think it sounds really bad. Whenever a blockread of 2048 is

    Well, what do I know...I don't know much about big files yet. :)

    I did purchase "Advanced Turbo Pascal" though...online, for $3. It covers all sorts of good stuff (sorting, searching, stacks, queues, linked lists, binary trees, dynamic allocations, expression parsing, simulation, assembly language interfacing, efficency, porting and debugging). A very good (and a little dry)
    read for me.

    No, I think he'll be better off with one of the procedures I've published in the SWAG. I don't remember right now the exact name of
    it, but I'm sure it can be easily found. Should anyone be really interested, I can dig it up and give the proper link to it...

    I've seen your name pop up in there from time to time...I've it here, but haven't gotten everything installed on my personal machine yet for programming.

    Later,
    Sean

    // sean@outpostbbs.net | http://outpostbbs.net | ICQ: 19965647

    --- Telegard/2 v3.09.g2-sp4/mL
    * Origin: Outpost BBS - outpostbbs.darktech.org (1:18/200)
  • From Roelof Beverdam@2:280/5218 to Michael Preslar on Sat Dec 17 09:25:20 2005
    Hello Nichael,

    What would you guys suggest as the best way to handle such
    situations? blockread() then parse 255 characters at a time?
    [cut]
    Character by character would be waaaay to slow.

    Either this or a *properly implemented* use of BlockRead might do the trick, if
    you are lucky...

    Basically: without further knowledge your problem is NOT solvable. The code in your initial posting implies the usage of a compiler processing strings of indefinite length. Appearantly it was written for another compiler than the Turbo Pascal (or clone) you are using.

    repeat
    blockread(f,buf,2048);
    s = copy(buf,1,255);
    ansi_write_line(s);
    until eof(f);

    Won't work this way. You read 2K of stuff and write only about an eighth of it.
    You "forget" even more data than your initial code would do. And embedded newlines are written to ansi_write_line(); the original code dropped the trailing newline (internal behaviour of readln()) and passes only the text of the line.

    To begin with, you would build a second loop around the copy() function and ansi_write_line() procedure moving all 2K of data in chuncks of a quarter K at a time to ansi_write_line().

    If ansi_write_line() is the only procedure available to process your data, you're simply stuck! The problem cannot be solved whatever trick you may want to imagine. Only another compiler and/or the usage of PChar strings might solve
    it the easiest way. If Turbo Pascal is the compiler you have, you might investigate the ansi_write_xxx() portion of your code. Like Standard Pascal has
    a ReadLn() as well as a Read() function to read data into a string (and Turbo Pascal conforms to this standard), you should have (and probably have!) an ansi_write() as well as an ansi_write_line() function.

    Assuming this, you may modify your code to:

    begin
    assign(tf,'main.ans');
    reset(tf);
    AtEof = eof(tf);
    while not AtEof do
    read(tf,s); AtEof = eof(tf); AtEoln = eoln(tf);
    ansi_write(s);
    if atEoln then
    begin
    if not AtEof then begin readln(tf); AtEof = eof(tf) end;
    ansi_write_line('')
    end
    end;
    close(tf)
    end.

    But beware: this assumes there is no need to hand over a full line of data to ansi_write_() or ansi_write_line() in a single call, but may split this over several calls. If the ansi_xxx routines don't accept this, there is no solution
    at all.

    Except choosing a proper compiler, of course... <VBG>

    Cheers,
    Roelof Beverdam

    --- Dutchie V3.10.11
    * Origin: The Beaver's Nest (2:280/5218)