• Paper page to view file

    From Mike Luther@1:117/3001 to All on Wed Jan 30 14:54:32 2008
    Thoughts please?

    I have a client who 'inherited' about 4000 client files averaging about 50 pages of paper per file. These are all text based records and do not seem to favor any kind of OCR scanning to convert them to text/data files of any kind.
    It appears that all must be scanned visually, then saved in black and white as images of the pages. The file images look like they will have to be targeted for hard drive storage, the names,indexing and ability to organize the file names not being key issues at the moment.

    I'm not familiar with this in OS/2. Outside people are suggesting that the final storage 'format' for these images should either be .JPG or .PDF type files. For research and study, I do have a fully paid-for PMView tool set, as well as a fully latest Lotus Smart Suite for OS/2 with all the latest fixpacks applied, as well as the public release free version of TrueSpec Graphis Pro. That together with access to the SANE/TAME products which I have never installed to date.

    Although I'm not the party to be saddled with this project (PROJECT!) I still need to know a couple things. For test purposes I have both an HP 1120 and one
    other HP combination printer and scanner, the drivers for which I think I have as part of the OS/2 MCP2 latest device drivers and so on. But that done, even USB and installed, plus supported, I guess,by SANE/TAME or whatever, some other
    thoughts need answers before I even start to learn more about this.

    As best I can tell, from looking at things like this, a black and white image of hand written and/or typed notes in each page, at a 300 DPI resolution would amount to about a megabyte per page as an image. That noted, in a format such as .JPG, from what I think I see, it might be about 240K per page as a file of that image. True, looking at a .PDF file of a page of text and minor imaging, I see about 9K to 10K per page as a file of that type.

    But what, on the average, might I be looking at for file storage of captured images of this, say 200,000 such pages? Unless my mental math is wrong at even
    30,000K per page that is some 6 terrabytes of disk space.

    Suggestions as to how one looks at this kind of thing in OS/2? Knowing what type of programs are already available as cited above. And so on?

    Thanks!


    Sleep well; OS/2's still awake! ;)

    Mike @ 1:117/3001

    --- Maximus/2 3.01
    * Origin: Ziplog Public Port (1:117/3001)
  • From Bob Ackley@1:300/3 to Mike Luther on Thu Jan 31 07:24:34 2008
    Replying to a message of Mike Luther to All:

    Thoughts please?

    <big snip>

    But what, on the average, might I be looking at for file storage of captured images of this, say 200,000 such pages? Unless my mental
    math is wrong at even 30,000K per page that is some 6 terrabytes of
    disk space.

    A computer screen full of characters is 1,920 bytes; a page is about 2.5 computer
    screens, or a bit over 5K bytes. However, the vast majority of text documents are
    not full pages of characters, there's a lot of whitespace (excluding spaces between
    words) that doesn't have to be stored. At very roughly 5KB/page, 200,000 pages

    would be about a billion bytes, or about a gig - somewhat less than your estimate. <g>

    Note that when you're scanning in text documents you don't need to store them as
    24-bit color images, plain ol' black and white will do - and it requires about 1/24 the
    space.

    I have a W98 box I use for scanning documents (in b/w, altho' it's a color scanner).
    I scan in a BMP image and use Xerox's Textbridge to convert that to an ASCII file.
    When I get enough ASCII files I burn them onto a CD and transfer them to my main
    OS/2 box.

    Suggestions as to how one looks at this kind of thing in OS/2?
    Knowing what type of programs are already available as cited above.
    And so on?

    Thanks!


    Sleep well; OS/2's still awake! ;)

    Mike @ 1:117/3001

    -!- Maximus/2 3.01
    ! Origin: Ziplog Public Port (1:117/3001)

    --- FleetStreet 1.19+
    * Origin: Bob's Boneyard, Emerson, Iowa (1:300/3)
  • From Herbert Rosenau@2:2476/493 to Mike Luther on Sat Feb 2 02:57:25 2008
    Am 30.01.08 14:54 schrieb Mike Luther

    Thoughts please?

    I have a client who 'inherited' about 4000 client files averaging
    about 50 pages of paper per file. These are all text based
    records and do not seem to favor any kind of OCR scanning to
    convert them to text/data files of any kind. It appears that all
    must be scanned visually, then saved in black and white as images
    of the pages. The file images look like they will have to be
    targeted for hard drive storage, the names,indexing and ability
    to organize the file names not being key issues at the moment.

    When they are really paper then scan them simply using TAME.

    TAME gives you the chance to store that in any format you likes, that starts with bmp, goes over tiff (much smaller than bmp, GIF or whatever you likes.

    When it is only black/white then scan simply b/w. to save a lot of space.

    Look in GFD or HOBBES to get one of the OCR tools available for OS/2 or grab the old textbridge (DOS) to OCR it.

    TAME is a high quality scanner tool. It works with nearly all SCSI, some old parallel port ones and even on some (however not really all) USB scanners. It is a graphical too on top of SANE. Give it a try and play with its settings to get best (that means NOT neccessary highest!) resolution.

    There is nothing that fits your needs for scanning better than TAME even under other OSes.

    --- Sqed/32 1.15/development 312:
    * Origin: das Wetter macht mich ganz fertig... (2:2476/493)