• Committing file changes

    From MIKE RUSKAI@1:3603/140 to CORIDON HENSHAW on Sun Aug 13 10:49:00 2000
    Some senseless babbling from Coridon Henshaw to All
    on 08-11-00 23:38 about Committing file changes...

    What's the proper way to 'checkpoint' an open file so as to ensure
    that the file's control structures are consistant on disk? I would
    have thought that calling fflush() after every file write would be sufficient, but a recent trap proved that calling fflush() after file writes was no protection against CHKDSK truncating the file well before the last write. I suppose I could close and reopen the file after
    every update, but I was hoping to find a more elegant solution. Any ideas?

    I had thought that fflush() would call DosResetBuffer(), but it would seem
    that at least some compilers only flush the CRT's buffers.

    So, what you'd need to do is get an OS/2 handle to the file, and use DosResetBuffer() directly. If you can't get a handle, you can also use
    that API to flush all open files for the current process.

    That should solve your problem.

    Mike Ruskai
    thannymeister@yahoo.com


    ... Any problem can be solved by shooting the right person.

    ___ Blue Wave/QWK v2.20
    --- Platinum Xpress/Win/Wildcat5! v3.0pr3
    * Origin: Get all your fido mail here.. www.docsplace.org (1:3603/140)
  • From Herbert Rosenau@2:2476/493 to Coridon Henshaw on Sun Aug 13 12:41:35 2000
    Am 11.08.00 23:38 schrieb Coridon Henshaw

    What's the proper way to 'checkpoint' an open file so as to ensure that
    the
    file's control structures are consistant on disk? I would have thought that calling fflush() after every file write would be sufficient, but a recent trap proved that calling fflush() after file writes was no protection against CHKDSK truncating the file well before the last write. I suppose I could close and reopen the file after every update, but I was hoping to find a more elegant solution. Any ideas?

    Don't use the C runtime! Use DosOpen(....OPEN_FLAGS_WRITE_THROUGH....) and other Dos... APIs instead.

    The C runtime can't handle OS/2 specific flags.

    fflush() does nothing than flush the runtime buffers - not the filesystem/device ones. This means the runtime doen#t always write directly to disk. And if it has to do so (eg. fflush() the driver itself doen't it because the filesystem cache (cache.exe/diskcache) doesn't it.

    If a file is closed the filesystem my delay the last write until the disk is idle for a time.

    The only way YOU can define exactly that the file is written directly to disk is using DosOpen and set the flag. This will disable the cache for that given file and all DosWrite goes directly on disk.

    But absolutly nothing will get you sure the file is clean if a unpropper reboot
    occures. The only you can do is to hold the window small - AND make a clean shutdown.

    --- Sqed/32 1.15/development 1122:
    * Origin: Lieber arm und gesund, als reich und beim Bund (2:2476/493)
  • From mark lewis@1:3634/12 to Coridon Henshaw on Sun Aug 13 03:05:54 2000
    What's the proper way to 'checkpoint' an open file so as to
    ensure that the file's control structures are consistant on
    disk? I would have thought that calling fflush() after every
    file write would be sufficient,

    is there a COMMIT instruction?

    but a recent trap proved that
    calling fflush() after file writes was no protection against
    CHKDSK truncating the file well before the last write. I
    suppose I could close and reopen the file after every update,
    but I was hoping to find a more elegant solution. Any ideas?

    sounds like it was caught up in the cache subsystem and not written to disk in time...

    )\/(ark


    * Origin: (1:3634/12)
  • From Coridon Henshaw@1:250/820 to All on Fri Aug 11 16:38:06 2000
    What's the proper way to 'checkpoint' an open file so as to ensure that the file's control structures are consistant on disk? I would have thought that calling fflush() after every file write would be sufficient, but a recent trap proved that calling fflush() after file writes was no protection against CHKDSK
    truncating the file well before the last write. I suppose I could close and reopen the file after every update, but I was hoping to find a more elegant solution. Any ideas?

    --- GoldED/2 3.0.1
    * Origin: Life sucks and then you croak. (1:250/820)
  • From David Noon@2:257/609.5 to Coridon Henshaw on Mon Aug 14 13:23:44 2000
    Hi Coridon,

    Replying to a message of Coridon Henshaw to All:

    What's the proper way to 'checkpoint' an open file so as to ensure
    that the file's control structures are consistant on disk? I would
    have thought that calling fflush() after every file write would be sufficient, but a recent trap proved that calling fflush() after file writes was no protection against CHKDSK truncating the file well
    before the last write. I suppose I could close and reopen the file
    after every update, but I was hoping to find a more elegant solution.

    How low do you want to go?

    Firstly, you should not be doing buffered I/O if your updates must be committed
    immediately, so you should not use fopen() and fwrite() without a setbuf() call
    to suppress buffer allocation. Better yet, you should consider using open() and
    write() instead, and use the UNIX-like unbuffered I/O routines. If you want to be a real fundamentalist, you should use DosOpen() and DosWrite() without risking the CRTL tampering with your data flow.

    Moreover, if your data resources are critically important then you should be handling any traps that occur in your program and cleaning up the critical data
    resources in an orderly manner. This is far and away the most professional approach to the situation. About the only things you can't handle are kernel level traps and power outages.

    In your situation, I would have used the second facility before considering any
    intermediate commits.

    Another consideration is choice of language. The more robust languages have RTL's that perform a formal close on all files, even when the application code fails to handle an error. This is because their RTL's catch all unhandled application errors, and they keep track of all files opened through the RTL. Hence, either PL/I or COBOL would have been a far better choice of language than C/C++, if you are not rolling your own trap handler. The ultimate moral is: code your own trap handler, or use a language that provides one built-in.

    Regards

    Dave
    <Team PL/I>

    --- FleetStreet 1.25.1
    * Origin: My other computer is an IBM S/390 (2:257/609.5)
  • From Coridon Henshaw@1:250/820 to Herbert Rosenau on Mon Aug 14 05:54:14 2000
    On Sunday August 13 2000 at 19:41, Herbert Rosenau wrote to Coridon Henshaw:

    Don't use the C runtime! Use DosOpen(....OPEN_FLAGS_WRITE_THROUGH....) and other Dos... APIs instead.

    Well, erm, yes, but that would mean re-inventing much of the C filesystem RTL on every platform I intend to port my project to. Following Mike Billow's suggestion to call DosResetBuffer (albeit through a small wrapper) far better suited to the problem at hand.

    --- GoldED/2 3.0.1
    * Origin: Life sucks and then you croak. (1:250/820)
  • From Coridon Henshaw@1:250/820 to MIKE RUSKAI on Mon Aug 14 07:19:02 2000
    On Sunday August 13 2000 at 17:49, MIKE RUSKAI wrote to CORIDON HENSHAW:

    So, what you'd need to do is get an OS/2 handle to the file, and use DosResetBuffer() directly. If you can't get a handle, you can also
    use that API to flush all open files for the current process.

    This sounds like it'd work, although I'm not about to push the Little Red Button to test the fix. :-)

    Thanks.

    --- GoldED/2 3.0.1
    * Origin: Life sucks and then you croak. (1:250/820)
  • From Coridon Henshaw@1:250/820 to David Noon on Mon Aug 14 16:33:20 2000
    On Monday August 14 2028 at 20:23, David Noon wrote to Coridon Henshaw:

    How low do you want to go?

    I'm building an open-source databasing offline Usenet news system, basically along the lines of standard Fidonet message tossers and readers, except designed from the ground up for Usenet news. As I intend the system to be portable, I'd like to keep the number of platform-specific API calls to an absolute minimum.

    Incidentally, if anyone would like to take a look at an alpha copy of NewsDB, or would like to contribute to the project, drop me a message.

    Firstly, you should not be doing buffered I/O if your updates must be committed immediately, so you should not use fopen() and fwrite() without
    a
    setbuf() call to suppress buffer allocation. Better yet, you should consider using open() and write() instead, and use the UNIX-like unbuffered I/O routines.

    I'm concerned that disabling buffering entirely is going to hurt performance very badly as my application does lots of short (4 to 256 byte) IO calls. Relying on the disk cache to handle this kind of load seems a bit wasteful.

    Moreover, if your data resources are critically important then you should be handling any traps that occur in your program and cleaning up the critical data resources in an orderly manner. This is far and away the
    most
    professional approach to the situation. About the only things you can't handle are kernel level traps and power outages.

    The problem I ran into was that the kernel trapped (for reasons unrelated to this project) a few hours after I wrote an article into the article database. Since database was still in open (I leave the article reader running 24x7), the
    file system structures were inconsistant enough that CHKDSK truncated the database well before its proper end point. As you say, catching exceptions wouldn't help much here.

    My database format and engine implementations are robust enough to cope with applications dying unexpectedly without finishing write operations; they're not
    robust enough to handle boot-up CHKDSK removing 80Kb of data from the end of a 100Kb file.

    In your situation, I would have used the second facility before considering any intermediate commits.

    It's not intermediate commits I need: what I need is some way to flush out write operations made to files which might be open for days or weeks at a time.

    --- GoldED/2 3.0.1
    * Origin: Life sucks and then you croak. (1:250/820)
  • From David Noon@2:257/609.5 to Coridon Henshaw on Sun Aug 20 01:00:14 2000
    Hi Coridon,

    Replying to a message of Coridon Henshaw to David Noon:

    I'm building an open-source databasing offline Usenet news system, basically along the lines of standard Fidonet message tossers and
    readers, except designed from the ground up for Usenet news. As I
    intend the system to be portable, I'd like to keep the number of platform-specific API calls to an absolute minimum.

    Thats poses some difficulties. A combination of safety, performance and platform-independence is a big ask. I would tend to compromise that last one before I compromised the first two.

    Firstly, you should not be doing buffered I/O if your updates must be
    committed immediately, so you should not use fopen() and fwrite()
    without a setbuf() call to suppress buffer allocation. Better yet,
    you should consider using open() and write() instead, and use the
    UNIX-like unbuffered I/O routines.

    I'm concerned that disabling buffering entirely is going to hurt performance very badly as my application does lots of short (4 to 256 byte) IO calls. Relying on the disk cache to handle this kind of
    load seems a bit wasteful.

    Since 4-to-256 bytes does not constitute a typical Usenet article, those would not be your logical syncpoints. You should be physically writing the data to disk at your syncpoints and only at your syncpoints.

    Moreover, if your data resources are critically important then you
    should be handling any traps that occur in your program and cleaning
    up the critical data resources in an orderly manner. This is far and
    away the most professional approach to the situation. About the only
    things you can't handle are kernel level traps and power outages.

    The problem I ran into was that the kernel trapped (for reasons
    unrelated to this project) a few hours after I wrote an article into
    the article database. Since database was still in open (I leave the article reader running 24x7), the file system structures were
    inconsistant enough that CHKDSK truncated the database well before
    its proper end point. As you say, catching exceptions wouldn't help
    much here.

    The flip side is that kernel traps are far less frequent than application traps, especially during development of the application. If your data integrity
    is critical you should not only be handling any exceptions that arise, but you should be rolling back to your most recent syncpoint when an error does arise.

    My database format and engine implementations are robust enough to
    cope with applications dying unexpectedly without finishing write operations; they're not robust enough to handle boot-up CHKDSK
    removing 80Kb of data from the end of a 100Kb file.

    So you do have a syncpoint architecture, then?

    In your situation, I would have used the second facility before
    considering any intermediate commits.

    It's not intermediate commits I need: what I need is some way to flush
    out write operations made to files which might be open for days or
    weeks at a time.

    That's what an intermediate commit is.

    The way industrial strength database management systems work [since at least the days of IMS/360, over 30 years ago] is that an application would have defined within it points in its execution where a logical unit of work was complete and the state of the data on disk should by synchronized with the state of the data in memory; this is how the term "syncpoint" arose, and the processing between syncpoints became known as a transaction. The process of writing the changes in data to disk became known as commiting the changes. The SQL statement that performs this operation under DB2, Oracle, Sybase and other RDBMS's is COMMIT.

    These RDBMS's also have another statement, coded as ROLLBACK. This backs out a partially complete unit of work when an error condition has arisen. The upshot is that the content of the database on disk can be assured to conform to the data model the application is suposed to support. It does not mean that every byte of input has been captured; it means, instead, that the data structures on
    disk are consistent with some design.

    This seems to me to be the type of activity you really want to perform. One of your problems is that your input stream is not persistent, as it would be a socket connected to a NNTP server [if I read your design correctly, and assume you are coding from the ground up]. This means that you need to be able to restart a failed instance of the application, resuming from its most recent succesful syncpoint. The usual method to deal with this is to use a log or journal file that keeps track of "in flight" transactions; the journal is where
    your I/O remains unbuffered. If your NNTP server allows you to re-fetch articles -- and most do -- you can keep your journal in RAM or on a RAMDISK; this prevents performance hits for doing short I/O's.

    This design and implementation seem like a lot of work, and I suppose they are.
    But some old timers were doing this on machines with only 128KiB of RAM when I was in high school, so a modern PC should handle it easily. To save yourself a lot of coding, you might care to use a commercial DBMS; a copy of DB2 UDB Personal Developer Edition can be had free for the download, or on CD for the price of the medium and shipping. Start at:
    http://www.software.ibm.com/data/db2/
    and follow the links to the download areas, or ask Indelible Blue about CD prices.

    Using a multi-platform commercial product will provide you with platform independence, as well as safety. It is the simplest and most robust approach unless you are prepared either to do a lot of coding or compromise on the safety of your application.

    Regards

    Dave
    <Team PL/I>

    --- FleetStreet 1.25.1
    * Origin: My other computer is an IBM S/390 (2:257/609.5)
  • From David Noon@2:257/609.5 to George White on Sun Aug 20 01:58:00 2000
    Hi George,

    Replying to a message of George White to Coridon Henshaw:

    It's not intermediate commits I need: what I need is some way to
    flush out write operations made to files which might be open for
    days or weeks at a time

    The only reliable way I know of is _NOT_ to keep the files open but to open and close them as needed. It is the _only_ way I know which is guaranteed to update the directory information (Inode under *NIX) so
    that a chkdsk won't cause you that sort of grief. In a similar
    situation I ended up opening and closing the file during normal
    operation to ensure the on-disk information and structures were
    updated. Originally I opened the file on start-up and kept it open.

    This is true when one is keeping things simple, such as using sequential file structures.

    A genuine database [and that is what Coridon claims he is coding] does not restrict itself to simple file structures. The usual approach is to allocate and pre-format a suitably large area of disk, known as a tablespace in DB2, and
    then maintain database-specific structural data within that. The pre-format operation finishes by closing the physical file, thus ensuring the underlying file system has recorded the number and size of all disk extents allocated to the file. The DBMS is then free to "suballocate" the disk space as and how it sees fit. It also takes on the responsibility to ensure the consistency of the database's content.

    We will see how Coridon implements such a database system.

    Regards

    Dave
    <Team PL/I>

    --- FleetStreet 1.25.1
    * Origin: My other computer is an IBM S/390 (2:257/609.5)
  • From George White@2:257/609.6 to Coridon Henshaw on Sat Aug 19 14:59:07 2000
    Hi Coridon,

    On 14-Aug-00, Coridon Henshaw wrote to David Noon:

    <snip>

    Moreover, if your data resources are critically important then
    you should be handling any traps that occur in your program and
    cleaning up the critical data resources in an orderly manner.
    This is far and away the most professional approach to the
    situation. About the only things you can't handle are kernel
    level traps and power outages.

    The problem I ran into was that the kernel trapped (for reasons
    unrelated to this project) a few hours after I wrote an article
    into the article database. Since database was still in open (I
    leave the article reader running 24x7), the file system structures
    were inconsistant enough that CHKDSK truncated the database well
    before its proper end point. As you say, catching exceptions
    wouldn't help much here

    My database format and engine implementations are robust enough to
    cope with applications dying unexpectedly without finishing write operations; they're not robust enough to handle boot-up CHKDSK
    removing 80Kb of data from the end of a 100Kb file

    In your situation, I would have used the second facility before
    considering any intermediate commits.

    It's not intermediate commits I need: what I need is some way to
    flush out write operations made to files which might be open for
    days or weeks at a time

    The only reliable way I know of is _NOT_ to keep the files open but to
    open and close them as needed. It is the _only_ way I know which is
    guaranteed to update the directory information (Inode under *NIX) so
    that a chkdsk won't cause you that sort of grief. In a similar
    situation I ended up opening and closing the file during normal
    operation to ensure the on-disk information and structures were
    updated. Originally I opened the file on start-up and kept it open.


    George

    --- Terminate 5.00/Pro
    * Origin: A country point under OS/2 (2:257/609.6)
  • From George White@2:257/609.6 to David Noon on Mon Aug 21 01:09:07 2000
    Hi David,

    On 20-Aug-00, David Noon wrote to George White:

    Replying to a message of George White to Coridon Henshaw:

    It's not intermediate commits I need: what I need is some way to
    flush out write operations made to files which might be open for
    days or weeks at a time

    The only reliable way I know of is _NOT_ to keep the files open
    but to open and close them as needed. It is the _only_ way I know
    which is guaranteed to update the directory information (Inode
    under *NIX) so that a chkdsk won't cause you that sort of grief.
    In a similar situation I ended up opening and closing the file
    during normal operation to ensure the on-disk information and
    structures were updated. Originally I opened the file on start-up
    and kept it open.

    This is true when one is keeping things simple, such as using
    sequential file structures

    Which is how Coridon appears to be doing things at present.

    A genuine database [and that is what Coridon claims he is coding]
    does not restrict itself to simple file structures. The usual
    approach is to allocate and pre-format a suitably large area of
    disk, known as a tablespace in DB2, and then maintain
    database-specific structural data within that. The pre-format
    operation finishes by closing the physical file, thus ensuring the underlying file system has recorded the number and size of all
    disk extents allocated to the file. The DBMS is then free to
    "suballocate" the disk space as and how it sees fit. It also takes
    on the responsibility to ensure the consistency of the database's
    content

    From Coridon's description of his problem after the kernel trap, he is
    not working that way, but adding variable length records to the file.
    That of course means that the normal file operations can leave the
    file in an inconsistant state. Certainly pre-allocating the data space
    and working within it means that the file should never have to be
    closed. In my experience some of the file caching on the PC platform
    does not seem to handle the situation where a particular part of the
    data structures (sector or cluster in the underlying file system) is
    repeatedly written to and read from with the file kept open, opening
    and closing the file seems to overcome the problem. I've never put in
    the work to confirm this worry, just found a way to get reliable
    operation and got on with codeing other things - I didn't have the
    time when it arose and now the project is history I don't have any
    inclination to look into it...

    We will see how Coridon implements such a database system.

    Like you, I'm watching with interest.

    George

    --- Terminate 5.00/Pro
    * Origin: A country point under OS/2 (2:257/609.6)
  • From Coridon Henshaw@1:250/820 to David Noon on Mon Aug 21 17:09:54 2000
    On Sunday August 20 2000 at 08:00, David Noon wrote to Coridon Henshaw:

    Since 4-to-256 bytes does not constitute a typical Usenet article, those would not be your logical syncpoints. You should be physically writing the data to disk at your syncpoints and only at your syncpoints.

    I break up articles into 251 byte chunks and write the chunks as a linked list.
    Since the database will reuse article chunks which have been freed as a result
    of article expiry, the article linked lists need not be sequential. As such, when the DB engine writes an article, it writes 251 bytes, reads and updates the five byte control structure, then seeks to the next block. This process continues until the entire article is written. It's not really possible to break up these writes without giving up the linked list structure, and with it,
    either the ability to rapidly grow the database, or the ability of the DB to reuse existing space as articles are expired.

    My database format and engine implementations are robust enough to
    cope with applications dying unexpectedly without finishing write
    operations; they're not robust enough to handle boot-up CHKDSK
    removing 80Kb of data from the end of a 100Kb file.

    So you do have a syncpoint architecture, then?

    <snip>

    While I appricate your comments, what you suggest is vast overkill for my application. The NewsDB engine isn't sophisticated enough to support syncpoints or rollback. Think along the lines of Squish MSGAPI rather than DB2: NewsDB is basically Squish-for-Usenet. My intention is to produce a lightweight multiuser offline news system so small groups of users (1-25) can read Usenet offline without needing to install a full news server. As a lightweight alternative to a local news server, NewsDB doesn't need the overhead of a fully-fledged SQL engine.

    NewsDB is decidedly not intended for mission critical environments; surviving Anything and Everything isn't part of the design requirements. Rather, my intention is to contain common errors to the extent that they can be repaired by automated repair tools.

    This seems to me to be the type of activity you really want to perform.
    One
    of your problems is that your input stream is not persistent, as it would be a socket connected to a NNTP server [if I read your design correctly, and assume you are coding from the ground up]. This means that you need to be able to restart a failed instance of the application, resuming from its most recent succesful syncpoint. The usual method to deal with this is to use a log or journal file that keeps track of "in flight" transactions;
    the
    journal is where your I/O remains unbuffered. If your NNTP server allows you to re-fetch articles -- and most do -- you can keep your journal in
    RAM
    or on a RAMDISK; this prevents performance hits for doing short I/O's.

    Just to clarify things: NewsDB isn't a single application. It's a RFC-based message base format similar in purpose to Squish and JAM. I'm writing an access library (NewsDBLib) to work with the NewsDB format. I'm also writing two applications (an importer and a reader) which use NewsDBLib.

    At the moment, none of these applications download news. The importer reads SOUP packets from disk just so I can avoid messing with NNTP. Reading from disk also gives me the flexibility to, at a later date, import other packet formats such as UUCP news and FTN PKT.

    --- GoldED/2 3.0.1
    * Origin: Life sucks and then you croak. (1:250/820)