• Dupeloops

    From Ward Dossche@2:292/854 to All on Sun Jun 17 13:34:18 2018
    Is the anothe BBBS or Mystic system "in the loop" ?

    \%/@rd

    --- D'Bridge 3.99 SR28
    * Origin: Resist-Insist-Persist-Enlist / onwardtogether.org (2:292/854)
  • From Roger Nelson@1:3828/7 to Ward Dossche on Sun Jun 17 17:52:25 2018
    On Sun Jun-17-2018 13:34, Ward Dossche (2:292/854) wrote to All:

    Is the anothe BBBS or Mystic system "in the loop" ?

    My guess is it's between 261/38 and 229/426. It may even be 261/38. I'm getting a lot of dupes destined for the COOKING echo, but D'Bridge catches most
    of them and tosses them into the BADECHO folder. I have 236 dupes since December, 2017. The totality of the ones before that I erased. There had to be thousands of them.


    Regards,

    Roger
    --- timEd/386 1.10.y2k+ Mathilda May
    * Origin: NCS BBS - Houma, LoUiSiAna - (1:3828/7)
  • From John McCoy@1:249/400 to Ward Dossche on Sun Jun 17 19:53:47 2018
    On 06/17/18, Ward Dossche said the following...

    Is the anothe BBBS or Mystic system "in the loop" ?

    Possibly. They're all marked with @RESCANNED kludges.

    --- Mystic BBS v1.12 A39 2018/04/21 (Linux/64)
    * Origin: Subcarrier BBS (1:249/400)
  • From Wilfred van Velzen@2:280/464 to Ward Dossche on Mon Jun 18 08:38:19 2018
    Hi Ward,

    On 2018-06-17 13:34:18, you wrote to All:

    Is the anothe BBBS or Mystic system "in the loop" ?

    The answer was in the messages:

    @RESCANNED 2:240/1120
    ...
    @PATH: ... 240/1120 5832 280/464

    And it was deliberate (see the NODES.024 area). Thorsten asked for a rescan, and thought it was a good test of the dupe detection system of fidonet, so didn't take any extra measures.

    On my system most were detected as too old. A lot were detected as dupes. But a
    few were already "rotated" from my dupes database and escaped detection:

    ---------- Sun 2018-06-17 10:50:49, FMail-lnx64-2.1.0.18-Beta20170815 - Toss Summary

    Board Area name #Msgs Dupes
    ----- -------------------------------------------------- ----- -----
    200 Bad messages (Too old) 1469
    [...]
    ----- -------------------------------------------------- ----- -----
    13 1585 525

    Received from node #Msgs Dupes Sec V
    ------------------------ ----- ----- -----
    2:240/5832 116 525 0



    Bye, Wilfred.

    --- FMail-lnx64 2.1.0.18-B20170815
    * Origin: FMail development HQ (2:280/464)
  • From mark lewis@1:3634/12.73 to Roger Nelson on Mon Jun 18 08:33:56 2018

    On 2018 Jun 17 17:52:24, you wrote to Ward Dossche:

    I'm getting a lot of dupes destined for the COOKING echo, but D'Bridge catches most of them and tosses them into the BADECHO folder. I have
    236 dupes since December, 2017. The totality of the ones before that
    I erased. There had to be thousands of them.

    there will always be dupes flowing in today's fidonet as long as fidoweb style connections are being used... by that i mean having redundant connections for the same echo... we will not get away from dupes because of that and we cannot depend on the appearance of dupes to point out loop problems...

    )\/(ark

    Always Mount a Scratch Monkey
    Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
    ... I intend to stuff my turkeys until it kills me!
    ---
    * Origin: (1:3634/12.73)
  • From mark lewis@1:3634/12.73 to John McCoy on Mon Jun 18 08:41:20 2018

    On 2018 Jun 17 19:53:46, you wrote to Ward Dossche:

    Is the anothe BBBS or Mystic system "in the loop" ?

    Possibly. They're all marked with @RESCANNED kludges.

    that's a different problem... messages so marked should not be packaged and sent to other links... mail tossers should skip sending those to other systems when they arrive after a rescan has been requested... the rescan control line will tell you where they were rescanned from but there's nothing that tells who
    initiated the rescan other than possibly the path line...

    )\/(ark

    Always Mount a Scratch Monkey
    Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
    ... Nothing happens to you that hasn't already happened to someone else!
    ---
    * Origin: (1:3634/12.73)
  • From John McCoy@1:249/400 to mark lewis on Tue Jun 19 04:02:48 2018
    On 06/18/18, mark lewis said the following...

    that's a different problem... messages so marked should not be packaged and sent to other links...

    It's more that the last one was also triggered by a rescan. Not specifically where it was rescanned from.

    --- Mystic BBS v1.12 A39 2018/04/21 (Linux/64)
    * Origin: Subcarrier BBS (1:249/400)
  • From mark lewis@1:3634/12.73 to John McCoy on Tue Jun 19 09:34:44 2018

    On 2018 Jun 19 04:02:48, you wrote to me:

    that's a different problem... messages so marked should not be
    packaged and sent to other links...

    It's more that the last one was also triggered by a rescan. Not specifically where it was rescanned from.

    my point is specifically that messages with a ^aRESCANNED control line should not be passed on to other links... ever... that will stop them from triggering what looks like a regurge or "dupe dump"... they will be different than the original message because of the ^aRESCANNED control line so they will not be caught by most dupe detection techniques... that's the real problem...

    )\/(ark

    Always Mount a Scratch Monkey
    Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
    ... What is your damage, Heather?
    ---
    * Origin: (1:3634/12.73)
  • From Rob Swindell to mark lewis on Tue Jun 19 11:31:09 2018
    Re: Dupeloops
    By: mark lewis to John McCoy on Tue Jun 19 2018 09:34 am


    On 2018 Jun 19 04:02:48, you wrote to me:

    that's a different problem... messages so marked should not be
    packaged and sent to other links...

    It's more that the last one was also triggered by a rescan. Not specifically where it was rescanned from.

    my point is specifically that messages with a ^aRESCANNED control line should not be passed on to other links... ever... that will stop them from triggering what looks like a regurge or "dupe dump"... they will be different than the original message because of the ^aRESCANNED control line so they will not be caught by most dupe detection techniques... that's the real problem...

    Is that true? Synchronet/SBBSecho uses 2 methods of dupe messge detection:

    1. Message-ID (in the case of FTN, that's everything between "\1MSGID: " and
    the CR) - the Message-ID doesn't change when messages a re-scanned
    2. Message body text (not including kludge/control lines, paths/seen-bys,
    and tear/tag/origin lines)

    Rescanned messages would (should) be caught as dupes just fine.

    digital man

    This Is Spinal Tap quote #36:
    Bobbi Flekman: Money talks, and bullshit walks.
    Norco, CA WX: 74.4°F, 56.0% humidity, 2 mph E wind, 0.00 inches rain/24hrs
  • From mark lewis@1:3634/12.73 to Rob Swindell on Tue Jun 19 15:20:42 2018

    On 2018 Jun 19 11:31:08, you wrote to me:

    my point is specifically that messages with a ^aRESCANNED control line
    should not be passed on to other links... ever... that will stop them
    from triggering what looks like a regurge or "dupe dump"... they will
    be different than the original message because of the ^aRESCANNED
    control line so they will not be caught by most dupe detection
    techniques... that's the real problem...

    Is that true?

    in numerous cases, yes... but, if i want a rescan of an area that had damaged data files and i'm trying to recover the last year's messages, why should the rescanned messages be sent on to any other system? mine is the only one that wants or needs them... why should other linked systems have to do the additional work? if we just don't send ^aRESCANNED messages on to other systems, no other systems would be bothered...

    Synchronet/SBBSecho uses 2 methods of dupe messge detection:

    1. Message-ID (in the case of FTN, that's everything between "\1MSGID: " and
    the CR) - the Message-ID doesn't change when messages a re-scanned
    2. Message body text (not including kludge/control lines, paths/seen-bys,
    and tear/tag/origin lines)

    Rescanned messages would (should) be caught as dupes just fine.

    that looks ok but not everyone goes that route with their dupe detection code...

    i've seen the second one cause systems to only see, for example, the first monthly posting of something and they never see it again in any of the following months... then it is purged out of their message base and they don't have it any more and don't receive it either... maybe it is echo rules... maybe
    it is a monthly PSA...

    )\/(ark

    Always Mount a Scratch Monkey
    Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
    ... Every artist is a cannibal, every poet a thief.
    ---
    * Origin: (1:3634/12.73)
  • From Rob Swindell to mark lewis on Tue Jun 19 14:26:36 2018
    Re: Dupeloops
    By: mark lewis to Rob Swindell on Tue Jun 19 2018 03:20 pm


    On 2018 Jun 19 11:31:08, you wrote to me:

    my point is specifically that messages with a ^aRESCANNED control line
    should not be passed on to other links... ever... that will stop them
    from triggering what looks like a regurge or "dupe dump"... they will
    be different than the original message because of the ^aRESCANNED
    control line so they will not be caught by most dupe detection
    techniques... that's the real problem...

    Is that true?

    in numerous cases, yes... but, if i want a rescan of an area that had damaged data files and i'm trying to recover the last year's messages, why should the rescanned messages be sent on to any other system? mine is the only one that wants or needs them... why should other linked systems have to do the additional work? if we just don't send ^aRESCANNED messages on to other systems, no other systems would be bothered...

    I don't dispute that rescanned message shouldn't be forwarded to downlinks and I just committed a change to SBBSecho to that effect.

    Synchronet/SBBSecho uses 2 methods of dupe messge detection:

    1. Message-ID (in the case of FTN, that's everything between "\1MSGID: " and
    the CR) - the Message-ID doesn't change when messages a re-scanned 2. Message body text (not including kludge/control lines, paths/seen-bys,
    and tear/tag/origin lines)

    Rescanned messages would (should) be caught as dupes just fine.

    that looks ok but not everyone goes that route with their dupe detection code...

    i've seen the second one cause systems to only see, for example, the first monthly posting of something and they never see it again in any of the following months... then it is purged out of their message base and they don't have it any more and don't receive it either... maybe it is echo rules... maybe
    it is a monthly PSA...

    And if it's duplicate, it's a duplicate. That's why auto-posters should (?) put timestamps or other unique data in their message body if they really want to avoid being ignored as dupes. But including metadata (control lines) in the dupe detection seems like a bad approach. If message takes a different path, it'll have different metadata, but it's still a dupe (and often that's how dupes arrive, via a different path than the original).

    digital man

    This Is Spinal Tap quote #43:
    I feel my role in the band is ... kind of like lukewarm water.
    Norco, CA WX: 80.4°F, 47.0% humidity, 12 mph ENE wind, 0.00 inches rain/24hrs
  • From John McCoy@1:249/400 to mark lewis on Tue Jun 19 17:04:14 2018
    On 06/19/18, mark lewis said the following...

    my point is specifically that messages with a ^aRESCANNED control line should not be passed on to other links... ever...

    I realize that. I think we're on the same page and I'm just being too terse.

    --- Mystic BBS v1.12 A39 2018/04/21 (Linux/64)
    * Origin: Subcarrier BBS (1:249/400)
  • From mark lewis@1:3634/12.73 to Rob Swindell on Tue Jun 19 19:10:18 2018
    On 2018 Jun 19 14:26:36, you wrote to me:

    in numerous cases, yes... but, if i want a rescan of an area that had
    damaged data files and i'm trying to recover the last year's messages,
    why should the rescanned messages be sent on to any other system? mine
    is the only one that wants or needs them... why should other linked
    systems have to do the additional work? if we just don't send
    ^aRESCANNED messages on to other systems, no other systems would be
    bothered...

    I don't dispute that rescanned message shouldn't be forwarded to
    downlinks and I just committed a change to SBBSecho to that effect.

    that is so very cool, man! <3

    i've seen the second one cause systems to only see, for example, the
    first monthly posting of something and they never see it again in any
    of the following months... then it is purged out of their message base
    and they don't have it any more and don't receive it either... maybe
    it is echo rules... maybe it is a monthly PSA...

    And if it's duplicate, it's a duplicate. That's why auto-posters should
    (?)
    put timestamps or other unique data in their message body if they really want to avoid being ignored as dupes.

    there is that... it is something i have considered adding to my automated postings but haven't...

    But including metadata (control lines) in the dupe detection seems
    like a bad approach. If message takes a different path, it'll have different metadata, but it's still a dupe (and often that's how dupes arrive, via a different path than the original).

    AFAIK, seenbys and paths are not included in most dupe detection schemes... other non-changing control lines are fine to be included... one of the problems
    comes when some system sort those control lines on messages they are passing along... we don't see so much of that like we did at one time ;)

    )\/(ark

    Always Mount a Scratch Monkey
    Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
    ... Impossibility is an excuse before the law.
    ---
    * Origin: (1:3634/12.73)
  • From mark lewis@1:3634/12.73 to John McCoy on Tue Jun 19 19:14:04 2018
    On 2018 Jun 19 17:04:14, you wrote to me:

    my point is specifically that messages with a ^aRESCANNED control
    line should not be passed on to other links... ever...

    I realize that. I think we're on the same page and I'm just being too terse.

    :) it might also have been a lack of c0ffee, for me, too ;)

    )\/(ark

    Always Mount a Scratch Monkey
    Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
    ... God told me it's none of your business. - Jimmy Swaggart
    ---
    * Origin: (1:3634/12.73)
  • From Rob Swindell to mark lewis on Tue Jun 19 22:43:24 2018
    Re: Dupeloops
    By: mark lewis to Rob Swindell on Tue Jun 19 2018 07:10 pm

    AFAIK, seenbys and paths are not included in most dupe detection schemes... other non-changing control lines are fine to be included... one of the problems
    comes when some system sort those control lines on messages they are passing along... we don't see so much of that like we did at one time ;)

    So some metadata is included in the data that is hashed for dupe detection and some is not? Are you sure about that? Anyway, duplicate Message-IDs *should* be caught be any FTN software written or updated in the past 20 years.

    digital man

    Synchronet "Real Fact" #76:
    Michael Swindell still has the "Synchronet Blimp" in his possession.
    Norco, CA WX: 62.9°F, 83.0% humidity, 3 mph ESE wind, 0.00 inches rain/24hrs
  • From mark lewis@1:3634/12.73 to Rob Swindell on Wed Jun 20 08:08:24 2018

    On 2018 Jun 19 22:43:24, you wrote to me:

    AFAIK, seenbys and paths are not included in most dupe detection
    schemes... other non-changing control lines are fine to be included...
    one of the problems comes when some system sort those control lines on
    messages they are passing along... we don't see so much of that like we
    did at one time ;)

    So some metadata is included in the data that is hashed for dupe
    detection and some is not?

    yes...

    Are you sure about that?

    yes... in fact, and i don't recall who pointed this out to me back in the '90s,
    dbridge does exactly this in a manner of speaking... it takes the whole message
    header plus X bytes immediately following the message header and uses all of that as at least part of the checksum calculation... this was pointed out to me
    when i was working on my posting tool and was adding MSGID support to it...

    i was using a library and just letting it do its thing... some of my test posts
    were reported as dupes when they clearly weren't... IIRC, they were detected as
    dupes because they were posted within the same second... it turned out that my MSGID was somewhere in the middle of the control lines at the beginning of the message body and only my dbridge using testers were seeing this... someone pointed out this thing about dbridge also using X bytes from the beginning of the message body in addition to the message header so i moved my posting tool's
    MSGID to the top of the list and no more dupes were detected by those dbridge systems...

    i don't know what other systems do... there's only a very few that provide this
    information... SBBS is one of them... when i was testing Mystic, there was some
    discussion about dupe detection as james worked to try to figure out the best method he liked... i have used fastecho here for decades but i don't know what data it uses for its checksums... i do know it uses two checksums, though... i know this because i was being nosy one day and looking at FE's dupe database file (one for all message areas) with a hex viewer and noticed that groups of bytes were repeated all throughout the file... i asked about this and was told i found a bug... basically, FE has two checksums that it uses for each message and both are supposed to be stored in the database... what i found was that only one was being used and written to both fields... toby fixed that problem right quick... i just don't know what data is used to calculate them...

    back in the day, dupe detection formulas were not really shared around... maybe
    a couple of developers talking amongst themselves would tell each other what they were doing but this information was not published where everyone could find it... it was more or less black majik to a point...

    Anyway, duplicate Message-IDs *should* be caught be any FTN software written or updated in the past 20 years.

    true... and we still have some systems that don't do MSGID at all so other methods must be used on them...

    )\/(ark

    Always Mount a Scratch Monkey
    Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
    ... WANTED: assistant to magician in beheading illusion. Blue Cross & salary. ---
    * Origin: (1:3634/12.73)
  • From Rob Swindell to mark lewis on Wed Jun 20 11:44:26 2018
    Re: Dupeloops
    By: mark lewis to Rob Swindell on Wed Jun 20 2018 08:08 am


    On 2018 Jun 19 22:43:24, you wrote to me:

    AFAIK, seenbys and paths are not included in most dupe detection
    schemes... other non-changing control lines are fine to be included...
    one of the problems comes when some system sort those control lines on
    messages they are passing along... we don't see so much of that like we
    did at one time ;)

    So some metadata is included in the data that is hashed for dupe detection and some is not?

    yes...

    Are you sure about that?

    yes... in fact, and i don't recall who pointed this out to me back in the '90s,
    dbridge does exactly this in a manner of speaking... it takes the whole message
    header plus X bytes immediately following the message header and uses all of that as at least part of the checksum calculation... this was pointed out to me
    when i was working on my posting tool and was adding MSGID support to it...

    i was using a library and just letting it do its thing... some of my test posts
    were reported as dupes when they clearly weren't... IIRC, they were detected as
    dupes because they were posted within the same second... it turned out that my MSGID was somewhere in the middle of the control lines at the beginning of the message body and only my dbridge using testers were seeing this... someone pointed out this thing about dbridge also using X bytes from the beginning of the message body in addition to the message header so i moved my posting tool's
    MSGID to the top of the list and no more dupes were detected by those dbridge systems...

    i don't know what other systems do... there's only a very few that provide this
    information... SBBS is one of them... when i was testing Mystic, there was some
    discussion about dupe detection as james worked to try to figure out the best method he liked... i have used fastecho here for decades but i don't know what data it uses for its checksums... i do know it uses two checksums, though... i know this because i was being nosy one day and looking at FE's dupe database file (one for all message areas) with a hex viewer and noticed that groups of bytes were repeated all throughout the file... i asked about this and was told i found a bug... basically, FE has two checksums that it uses for each message and both are supposed to be stored in the database... what i found was that only one was being used and written to both fields... toby fixed that problem right quick... i just don't know what data is used to calculate them...

    back in the day, dupe detection formulas were not really shared around... maybe
    a couple of developers talking amongst themselves would tell each other what they were doing but this information was not published where everyone could find it... it was more or less black majik to a point...

    To complete the discussion, Synchronet (smblib) actually uses multiple methods of body text dupe detection:

    1. A "legacy" CRC-32 hash of the body text, excluding any metadata, like FTN
    control lines and excluding any trailing white-space or control-characters 2. A tuple of hashes (MD5 digest, CRC-32, and CRC-16) and length (char count)
    of the body text excluding any metadata and *all* white-space characters

    These, in addition to duplicate Internet (RFC-822) compliant Message-ID and FTN-compliant Message-ID checks.

    No black majik here. :-)

    digital man

    Synchronet "Real Fact" #64:
    Synchronet PCMS (introduced w/v2.0) is Programmable Command and Menu Structure. Norco, CA WX: 77.6°F, 57.0% humidity, 8 mph ENE wind, 0.00 inches rain/24hrs