throbber
(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2006/0129523 A1
`(43) Pub. Date:
`Jun. 15, 2006
`Roman et al.
`
`US 2006O129523A1
`
`(54)
`
`DETECTION OF OBSCURED COPYING
`USING KNOWN TRANSLATIONS FILES AND
`OTHER OPERATIONAL DATA
`
`(76)
`
`Inventors: Kendyl Allen Roman, Sunnyvale, CA
`(US); Paul Raposo, San Francisco, CA
`(US)
`
`Correspondence Address:
`KENDYL A ROMAN
`730 BARTEY COURT
`SUNNYVALE, CA 94087 (US)
`
`(21)
`
`Appl. No.:
`
`11/299.529
`
`(22)
`
`Filed:
`
`Dec. 12, 2005
`
`(60)
`
`Related U.S. Application Data
`Provisional application No. 60/635,908, filed on Dec.
`10, 2004. Provisional application No. 60/635,562,
`filed on Dec. 13, 2004.
`
`Publication Classification
`
`(51) Int. Cl.
`(2006.01)
`G06F 7/30
`(52) U.S. Cl. .................................................................. 707/1
`(57)
`ABSTRACT
`Systems and methods that automatically compare sets of
`files to determine what has been copied even when sophis
`ticated techniques for hiding or obscuring the copying have
`been employed. The file compare system comprises a file
`compare program that uses various operational data and user
`interface options to detect illicit copying, highlight and align
`matching lines, and to produced a formatted report. A known
`translations file is used to match translated tokens. Other
`operation data files specify rules that the file program then
`used to improve its results. The generated report contains
`statistics and full disclosures of the known translations used
`and the other methods used in creating the exhibits. The
`system includes a bulk compare program that automatically
`detects likely file pairings and candidates for validation as
`known translations, which can be used on iterative runs. The
`user is given full control in the final output and the system
`automatically reforms the reports and recalculations the
`statistics for consistent and accurate final presentation.
`
`180
`
`100
`
`User
`Interface
`Options
`
`File A
`
`
`
`1 10
`
`160
`
`
`
`File
`Compare
`
`150
`
`Formatted
`Report
`
`
`
`
`
`
`
`
`
`
`
`Operational
`Data
`
`
`
`140
`
`Instacart, Ex. 1035
`
`1
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 1 of 27
`
`US 2006/0129523 A1
`
`180
`
`User
`Interface
`Options
`
`100
`
`A1
`
`File A
`
`
`
`1 10
`
`160
`
`
`
`File
`Compare
`
`Formatted
`Report
`
`150
`
`
`
`
`
`
`
`
`
`Operational
`Data
`
`
`
`140
`
`Fig 1
`
`2
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 2 of 27
`
`US 2006/0129523 A1
`
`ii include Kst dio.h>
`
`// The quick brown fox jumped over the lazy dog. How many tries did it take?
`int dog Height, int jump Increment)
`void tries (int initial Fox Jump Height,
`
`int jumpHeight = initial Fox Jumpheight;
`int numTries = 0;
`
`while (jump Height < dog Height)
`
`jumpHeight + F jump Increment;
`numTries + +;
`
`printf("Number of tries: %d\n", numTries) ;
`
`File:jump.c
`
`Fig. 2A
`
`3
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 3 of 27
`
`US 2006/0129523 A1
`
`include <stdio.h>
`
`// A fast auburn wolf leaped above a passive canine. How many attempts did it
`take?
`void attempts (int start Wolf LeapHeight, int canine Height, int leap Increment)
`{
`
`int leapHeight as start Wolf LeapHeight;
`int numberOf Attempts = 0;
`while (leapHeight < canine Height)
`{
`
`leapHeight + = leap Increment;
`numberOfAttempts++;
`
`}
`printf("Number of attempts : & d \n", numberOfAttempts) ;
`
`File: leap.c
`Fig 2B
`
`4
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 4 of 27
`
`US 2006/0129523 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`2300
`
`2310
`
`
`
`
`
`
`
`2338
`2340
`
`2300a
`
`2300b
`
`Fig 2C
`
`5
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 5 of 27
`
`US 2006/0129523 A1
`
`1.
`Exhibit. 2d
`WaViump.c
`
`40
`Z 8
`
`iia. ...
`
`iii
`
`Ifan
`
`2
`3.
`
`s
`
`8
`3.
`
`i:
`i3
`14
`5
`
`uic: Lorily fox imple: over the laz
`it take?
`is lifox Jureieight,
`
`, Hiw
`
`Iapieit:
`it.
`irst nir is a ;
`
`initial Fix Jixie.ht;
`
`while ki; ;ssie?t k disfielist
`
`iliarieight +s in Ici Aerit;
`nutries + -
`
`i
`
`priiti ("liter of tries
`
`d\r", nur? fries i
`
`WBV leap.c
`
`2410
`
`:
`2
`3.
`
`s
`
`gic ice cacts:
`
`17 A fast. iiibu in wolf sapieci aixow e s passive ca: irie.
`ilt
`itsary atte
`did it take?
`vic atterists
`
`i; it ca::itiehei,
`
`irst leapisight s starts f.?eacheicine;
`it, rurasarofatterests at 9:
`
`while lea:Height ... ca: nir:e Height:
`
`.spieight -= encreisent:
`ruitberift teiisits++;
`
`intf("Nicer cf atterests: civi", nitier fatters
`
`li
`12
`
`3
`
`i
`
`te: ;
`
`in or code,
`23 E. : :
`
`2432
`- 2436
`
`= 7.4
`2438 - Filtered
`The following transiation equivalents les-1
`found auci used in highlightig this file
`
`
`
`2430
`
`2450
`
`2402
`1.
`Confidelitiality Legend
`
`2404
`
`vertative
`triest attempts
`browns a burn
`CQCsCallie
`2452
`diagHeights canineHeight
`auicks fast -
`jumpheight-leapHeight
`LinpIncrements leapincrement
`umped seaped
`numrics truncer of Attapta
`Exhibit 2D
`
`2406 - of 2
`
`2400a
`
`2400b
`
`Fig 2D-1
`
`6
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 6 of 27
`
`US 2006/0129523 A1
`
`2400
`24.08
`
`Pete
`
`2410
`
`VBV leap. c.
`lazy-passive
`initial Fox:Jumpheights startWolf LeapHeight
`fox=wolf
`
`2450
`
`fate: During formatting tabs are converted to four spaces
`and all lines longer than 53 characters are wrapped. All
`wrapped lines are denoted with a
`character at the
`beginning of the line; however, highlighting is based on
`the full line prior to formatting.
`2460
`
`2402
`Confidentiality Legend
`
`2404
`
`Exhibit 2D
`
`2406 -2 of 2
`
`2400a
`
`2400b
`
`Fig 2D-2
`
`7
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 7 of 27
`
`US 2006/0129523 A1
`
`3100
`
`3102
`
`3104
`
`Read File A
`
`1. 3106
`
`Read File B
`
`11
`3 110
`
`3.108
`
`3 112
`
`Read Operational Data
`Files
`
`3114
`
`31 16
`
`See Fig 3B
`
`Compare Files
`
`3 118
`
`3.120
`Calculate Similarities
`
`See Fig 3D
`
`324
`
`NO
`
`3132
`
`3.134
`
`3122
`
`Similarity >
`Threshold?
`
`
`
`3126
`
`330
`
`Output
`Reports
`
`Fig 3A
`
`8
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 8 of 27
`
`US 2006/0129523 A1
`
`3200
`
`
`
`More Lines
`in File B?
`
`Look Back & Identify
`Out of Order Matches
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Yes
`
`3208
`
`See
`3216 Fig 3C
`3218
`Mark Matching Lines
`
`Find Next Match
`
`3210
`
`3237
`
`
`
`
`
`Matches
`Found?
`
`
`
`NO
`
`
`
`3226
`
`Mark Pending Lines
`of Files A and B
`
`3228
`
`3230
`
`Final Look Back &
`Identify Out of Order
`3232
`
`
`
`Do Remaining Lines
`of File A
`
`Fig 3B
`
`9
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 9 of 27
`
`US 2006/0129523 A1
`
`3300
`
`3302 -
`
`Increment Offsets &
`Block Sizes
`3338
`
`3340
`
`3308
`
`No
`
`Offset >Start
`of File A
`
`Get & Tokenize Next
`Line of File B
`
`33 12
`Determine Significant
`Tokens
`
`3314
`
`
`
`
`
`* - 3346
`3344
`Get & Tokenize
`Previous Lines of
`Both Files
`
`3348
`
`3350
`
`No
`
`336
`
`1. 3342
`
`Do Tokens
`Match?
`
`3352
`
`
`
`
`
`
`
`Any
`Significant?
`
`No
`
`3320
`
`Yes
`
`Yes - 3356
`Adjust Both Offsets &
`Block Sizes
`
`3334
`
`3358
`
`3374
`
`3372
`Increment Block Sizes
`
`3364
`
`Get & Tokenize Next
`Lines of Both Files
`
`3326
`Get & Tokenize Next
`Line of File A
`
`
`
`No
`
`
`
`Any Tokens
`Match?
`
`3330
`
`Yes
`
`
`
`
`
`Fig 3C
`
`3376
`
`No
`
`3378
`
`10
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 10 of 27
`
`US 2006/0129523 A1
`
`Start
`
`3400
`
`3402
`
`3404
`
`Append Stats Line to
`Stats File
`
`3406
`Open Output Files
`
`3408
`
`3410
`
`Output Formatted
`Headers
`
`3414
`
`Output Formatted
`File A Body
`3418
`
`Output Formatted
`File B Body
`3422
`
`Output Compare
`Statistics
`
`3426
`
`Close Files
`
`3412
`
`3416
`
`3420
`
`3424
`
`3428
`
`3430
`
`Finish
`
`3432
`
`Fig 3D
`
`11
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 11 of 27
`
`US 2006/0129523 A1
`
`4
`80
`
`
`
`User
`Interface
`Options
`
`160
`
`482
`
`File
`Compare
`
`
`
`43
`1 O
`166
`
`Formatted
`Report
`
`150
`
`110
`
`120
`
`File A
`
`File B
`
`162
`
`464
`
`Statistics
`
`452
`
`New
`PoSSible
`Translations
`
`454
`
`Translation
`Used
`
`456
`
`Filtered
`Translations
`
`458
`
`
`
`
`
`
`
`
`
`
`
`Known
`Translations
`
`442
`
`Suspected
`Translations
`
`Exclusions
`
`
`
`
`
`
`
`
`
`Obscured
`Lines
`
`448
`
`Language
`Specific
`
`470
`
`Language
`Keywords
`
`472
`
`400
`
`Fig. 4
`
`12
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 12 of 27
`
`US 2006/0129523 A1
`
`# include <stdio.h>
`
`// The quick brown fox jumped over the lazy dog. How many tries did it take?
`void tries (int initial Fox Jumpheight, int dog Height,
`int jump Increment)
`
`int jumpheight = initial Fox JumpHeight;
`int numTries = 0;
`
`while (jumpHeight < dog Height)
`{
`
`jumpHeight += jump Increment;
`numTries + +;
`
`}
`printf("Number of tries: %d\n", numTries) ;
`
`// Verify jump
`while (numTries > 0)
`
`jumpheight -= jump Increment;
`numTries--;
`
`if (jumphleight == initial Fox Jumpheight)
`{
`
`printf(" - Verified \n");
`
`File: jumpVerify.c
`
`Fig 5A
`
`13
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 13 of 27
`
`US 2006/0129523 A1
`
`# /usr/local/bin/perl5
`
`# A fast auburn wolf leaped above a passive canine. How Inany attempts did it take?
`Sub Attempts (S startWolf LeapHeight, Scanine Height, SleapIncr)
`
`Sleap Height = $startWolf LeapHeight;
`SnumberOf Attempts = 0;
`
`while (SleapHeight < Scanine Height) // MvP
`A / MvP
`// MvP
`// MvP
`// MvP
`
`SleapHeight + = Sleapincr;
`SnumberOfAttempts++;
`
`}
`
`printf("Number of Attempts: %d\n", SnumberOf Attempts);
`
`f / Confirm leap
`for (; SnumberOfAttempts > 0; SnumberOfAttempts--)
`
`SleapHeight -= Sleap Incr;
`
`}
`print " - Verified \n" if (SleapHeight == $startWolf LeapHeight) ;
`
`File: leapConfirm.pl
`Fig 5B
`
`14
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 14 of 27
`
`US 2006/0129523 A1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A
`
`Known Translations
`
`5300
`
`5340
`
`s.4
`
`5300a
`
`5300b
`
`Fig 5C
`
`15
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 15 of 27
`
`US 2006/0129523 A1
`
`
`
`
`
`5400
`Suspected Translations
`Original Words
`Translation Equivalents
`Verify
`5410a
`Confirm
`541 Ob
`
`
`
`A
`
`54.00a
`
`
`
`A.
`
`54OOb
`
`5410
`
`- 5412
`
`Fig 5D
`
`16
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 16 of 27
`
`US 2006/0129523 A1
`
`
`
`
`
`Exclusions
`
`5500
`
`
`
`\s' \/\/\sMvP's ul 5510a // MvP comment at the end of a line- 551Ob
`int
`- 5512a
`int (sp) anywhere on the line - 5512b
`
`5510
`5512
`
`5500a
`
`5500b
`
`Fig 5E
`
`17
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 17 of 27
`
`US 2006/0129523 A1
`
`
`
`
`
`Obscured Lines
`Block A
`Block B
`2- 56.10a | - 5610b - 5610c - 5610d
`
`
`
`*h
`
`3D- 5610e
`
`5600
`
`5610
`5612
`
`A
`5600a
`
`5600b
`
`5600c
`
`5600d
`
`A.
`5600e
`
`Fig 5F
`
`18
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 18 of 27
`
`US 2006/0129523 A1
`
`ul 2400
`24.08
`i.ity.
`sincurde stolio.h>
`
`23.
`
`imate:
`
`we the laz it
`
`ava, hiow
`
`inc dosaheight, i.
`
`inplinic: Eile::
`
`rintf("hillreiser of tries: dr", hurries: i.
`
`l
`2
`3.
`
`5
`
`VBV leapConfirm.pl
`
`2410
`f : fusr/local/bin/perl5
`
`is 35 & axxv. 3 passive ca: i.iv.,
`# A fast a laxii; W., f
`ow Italy atterficts sid it take
`Sstatic fieaeieight, Scanisaeig:
`
`i.
`
`if
`
`eight = initiallox
`
`eight
`
`printf(" - verified \n") i
`
`- Werified Vin" if
`rint
`O?capeight) i.
`
`liceapHeight
`
`-
`
`}
`
`:
`2
`3.
`
`
`
`
`
`
`
`
`
`22
`23
`24
`25
`2.
`
`2402
`1.
`Confidentiality Legend
`
`2434
`2438
`
`2404
`
`Escod, T 2432
`copied
`i5 - 65.22.
`2436
`fbscuri
`3 - 13. H
`Filtered
`0 a
`The following translation equivalents wege1
`found and used in highlighting this fillic:
`Exhibit SD
`2406 - 1 of 2
`
`2430
`
`2400a
`
`Fig 5G-1
`
`2400b
`
`19
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 19 of 27
`
`US 2006/0129523 A1
`
`ul 2400
`
`Exhibit 5)
`VAWumpwerify.c
`
`2408
`
`Confidentiality Legend
`
`- 2402
`A.
`
`2400a
`
`Vav leapconfirm.pl.
`
`2410
`
`ff
`dog Height=Scanine Height
`upHeight-Seapleight
`junipricements Sleapinch
`ruin":ries=Shurberof Attempts
`initial Foxupheight=SstartWolf LeapHeight
`The A
`triests Attempts
`verifyi-Confirm
`the=a
`overa above
`triest attempts
`
`
`
`2450
`
`dor callirie
`quicks fast- 24.52
`imprleap
`umped-leaped
`lazy-passive
`2460
`void-sub
`/
`foxws wolf
`ote: During formatting tabs are convected to for spaces
`and all lines longer than 53 characters are wrapped. All
`wrapped lines are denoted with a
`character at the
`beginning of the line; however, highlighting is based or
`the full line prior to fortatti rig.
`Also the following tokens were ignored during comparison
`if these files;
`5772
`int (sp} anywhere on the line
`A f 4ve comment at the end of &N5774
`
`5770
`
`5768
`
`2404
`
`Exllibi 5D
`
`2406 - 2 of 2
`
`A.
`
`2400b
`
`Fig 5G-2
`
`20
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 20 of 27
`
`US 2006/0129523 A1
`
`
`
`
`
`
`
`610
`
`600
`<
`
`Bulk User Interface
`Options
`
`2
`
`680
`
`
`
`630
`
`652
`
`Bulk Statistics
`
`668
`Possible Translations
`
`v
`
`100 or 400
`
`Fig 6
`
`21
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 21 of 27
`
`US 2006/0129523 A1
`
`700
`File Pair Combinations
`le
`710a B
`lar 7 Ob
`B2
`B3
`B
`B2
`B3
`B1
`B2
`B3
`B1
`B2
`B
`
`
`
`Ali
`
`A
`Al
`A2
`A2
`A2
`A3
`A3
`A3
`A
`A4
`A
`
`700a.
`
`700b
`
`|
`
`710
`712
`714
`716
`718
`720
`722
`724
`726
`728
`730
`732
`
`- 740
`
`- 742
`
`- 744
`
`- 746
`
`Fig 7
`
`22
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 22 of 27
`
`US 2006/0129523 A1
`
`800
`
`810
`
`82
`
`Perform Bulk
`Compare
`14
`8
`Analyze Statistics
`
`816
`
`818
`
`
`
`820
`
`Expert Review:
`Select Known Translations
`Determine File Pairing
`
`822
`
`826
`
`850
`
`Yes
`
`824
`
`830
`
`832
`
`No
`
`860
`
`Fig 8
`
`834
`
`Perform
`File Compare
`
`23
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 23 of 27
`
`US 2006/0129523 A1
`
`900
`
`902
`
`Perform File
`Compare
`906
`
`Manually Modify
`Markup
`
`910
`
`834
`
`908
`
`912
`
`Reformat and
`Recalculate Statistics
`914
`
`Finish
`
`916
`
`F 9.
`
`24
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 24 of 27
`
`US 2006/0129523 A1
`
`
`
`
`
`
`
`
`
`
`
`
`150a
`
`150b
`
`
`
`File A
`Listing
`
`File B
`Listing
`
`150
`
`1000
`
`A1
`
`Formatted
`Listing A
`
`
`
`1010
`
`Formatted
`Listing B
`
`See Fig 11
`
`See Fig 12
`
`Fig 10
`
`25
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 25 of 27
`
`US 2006/0129523 A1
`
`1100
`Exhibit 2D-A.
`MAVuump. c.
`
`08
`
`g
`2
`3.
`4.
`5.
`5
`7
`8
`9
`lo
`
`12
`13
`14
`
`18
`
`tii clide as too. 2:Y
`
`Af. The quick trainfax tape aver the lasty dog. Floti ray tries did it take
`void tries int. initial Fox JunapHeight, int dogheight, it jump Increatient)
`at
`titpieight
`it it laioxJurophie light:
`it names as O:
`
`while juropellit.: diagHsight
`
`Jutpieight it
`numriest++;
`
`u(piriterrent:
`
`}
`
`print. "Nuret of tries; it's ni", numTries:
`
`u 1102
`Confidentiality Legend
`A.
`1100a
`
`1 104
`
`Exhibit 2D-A
`
`106 - 1 of 1
`
`Fig 11
`
`26
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 26 of 27
`
`US 2006/0129523 A1
`
`lau?, 100
`VBvleap, c
`108
`
`i ice Ksci
`
`3.
`4.
`t
`
`8
`s
`
`2
`3
`
`fast autarn self leaped allove a gassive ceinine Hagian? atteres i? it take?
`;
`vo is attempts rat startolfi.eepheight int canine Height it leap Increment)
`into leap Feight is starticleepeat:
`int number of Attempts a h;
`sile leapeight. K animeieight
`
`i.eapeight is leap Ierentent;
`rther Ofitteltips--:
`
`printf "urber of attempts: d n, number of Attaripts:
`
`ul 1102
`Confidentiality Legend
`A
`
`1 100a
`
`1 104
`
`kit. 2d-
`
`1 106 - 1 of 1
`
`Fig 12
`
`27
`
`

`

`Patent Application Publication Jun. 15, 2006 Sheet 27 of 27
`
`US 2006/0129523 A1
`
`See Fig 11
`
`See Fig 12
`
`1300
`
`
`
`
`
`
`
`
`
`1304
`Parse Compare File &
`Calculate Statistics
`
`1306
`Output File A Listing
`
`1308
`
`1310
`
`132
`
`Output File B Listing
`
`1316
`
`1 314
`Output Compare File
`with Updated Stats
`1318
`
`1320
`
`Fig 13
`
`28
`
`

`

`US 2006/0129523 A1
`
`Jun. 15, 2006
`
`DETECTION OF OBSCURED COPYING USING
`KNOWN TRANSLATIONS FILES AND OTHER
`OPERATIONAL DATA
`
`RELATED APPLICATIONS
`0001. This application claims priority under 35 U.S.C. S
`199(e) of the co-pending U.S. provisional application Ser.
`No. 60/635,908, filed Dec. 10, 2004, entitled “DETECTION
`OF OBSCURED COPYING USING KNOWN TRANSLA
`TIONS FILES AND OTHER OPERATIONAL DATA,
`which is hereby incorporated by reference.
`0002 This application claims priority under 35 U.S.C. S
`199(e) of the co-pending U.S. provisional application Ser.
`No. 60/635,562, filed Dec. 11, 2004, entitled “DETECTION
`OF OBSCURED COPYING USING KNOWN TRANSLA
`TIONS FILES AND OTHER OPERATIONAL DATA,
`which is hereby incorporated by reference.
`
`BACKGROUND FIELD OF THE INVENTION
`0003. This invention relates to systems and methods for
`comparing files to detect the use of copied information, and
`more particularly to such systems and methods that detect
`copying where the copying has been obscured by various
`techniques.
`
`BACKGROUND THE PROBLEM
`0004 We are in the midst of the Information Age. More
`and more people make their living as information workers.
`The technologies fueling the Information Age are still being
`developed at an intense rate. For example, during the last
`few decades there has been unprecedented development and
`growth in the use of the Internet. The Internet information
`space known as the World Wide Web has become a signifi
`cant tool for communications, commerce, research, and
`education. Almost all of this information is stored electroni
`cally in computer files, which can be easily copied, trans
`ferred anywhere in the world, and modified. At the same
`time, many have made extreme efforts to share in the
`fortunes to be made in this new era of computer based
`information and communication. Some of this has been
`evidenced by the “irrational exuberance' of the Internet
`boom.
`0005. Unfortunately, the ease of access to information
`and the ease at which information can be copied and
`modified, combined with both personal and corporate greed,
`has led to what appears to be unprecedented levels of illegal
`copying of copyrighted materials, including the computer
`programs that run on the computers of the information age
`and the information found on the World Wide Web. This
`illegal copying has led to numerous lawsuits claiming Fed
`eral copyright infringement and both Federal and state trade
`secret misappropriation. Significant trade secret theft can
`also lead to criminal prosecution.
`0006. At the same time, computer equipment has become
`more powerful and increased in storage capacity—both
`primary memory (RAM) and secondary storage (disk and
`tape drives). Computer programs, likewise, have grown in
`size and complexity. Some Software projects are comprised
`of tens of thousands of source code files, collectively con
`taining millions of lines of code. The source version control
`systems for those projects may contain billions of lines of
`
`code. The version control systems may also include other
`types of media including design documents, database sche
`mas, graphics files, and other data, all Subject to copyright
`and trade secret protection.
`0007. The courts are interested in the literal copying and
`use of the literal lines of code that make up these computer
`programs. Copyright extends to translations of the original
`work as well. Trade secrets can be copied without copying
`the literal lines of code. Literal copying and literal transla
`tion are direct evidence of copying. The courts have also
`said, “Where there is no direct evidence of copying, a
`plaintiff may establish an inference of copying by showing
`(1) access to the allegedly-infringed work by the defen
`dant(s) and (2) a substantial similarity between the two
`works at issue.” In determining Substantial similarity, the
`first step is to filter out those elements that were not
`protectable, namely those which are not original to the
`copyright holder or which required minimal creativity.
`0008 Also, the courts have recognized that a significant
`portion of the work and creative effort of developing com
`puter programs is found in tasks not limited to the actual
`writing of the lines of Source code, but include many layers
`of abstract design. This work includes understanding cus
`tomer and system requirements, designing external inter
`faces, designing internal interfaces, architecting the struc
`ture of the system and individual modules, developing
`abstract algorithms, coding, integration, testing, bug fixing,
`and maintenance. Because of this, the courts recognized
`copying of the non-literal aspects of the computer program
`as well.
`0009 Because of the highly technical nature of computer
`programming, the courts rely on technical experts to shed
`light on what was copied, whether the copying was allow
`able, and whether the copying was Substantial. The courts
`have provided various guidelines for determining non-literal
`copying. One guideline is to analyze the sequence, structure,
`and organization of the computer program. More recently,
`the courts are adopting an “abstraction-filtration-compari
`Son' test. In this test, first the computer program is broken
`down into layers of abstraction, second, the elements that are
`not protected are filtered out, and third, the remaining
`elements are compared against the alleged infringing work
`(at each of the levels of abstraction). The courts have been
`interested in the literal lines of code as well as more abstract
`aspects of the computer program, Such as the algorithms, the
`parameter lists, modules or files that make up each program,
`the database architecture, and the system level architecture.
`0010. The similarities at each of these levels can be
`shown by creating side-by-side listings of the copied mate
`rials. The various aspects of the comparison can be indicated
`with various types of formatting.
`0011. In trade secret cases, information that was general
`knowledge (as opposed to specific knowledge) or which is
`readily ascertainable must also be filtered.
`0012 However, in order to prepare the side-by-side list
`ings, the expert must first determine which pairs of files from
`the respective works to compare. Once a pair of files with
`Some level of copying has been found, the literal and
`non-literal aspects of the copying must be indicated in some
`manner. This can be done manually using a word processor,
`such as Microsoft Word brand or FrameMaker brand word
`
`29
`
`

`

`US 2006/0129523 A1
`
`Jun. 15, 2006
`
`processors. However, when there are tens of thousands of
`files and millions of lines of code it becomes-almost impos
`sible for an expert or group of experts to accurately find all
`instances of copying and to properly apply the filtering and
`formatting required for presentation to the judge and jury.
`Further, to qualify as a technical expert, the individual must
`have recognized experience and expertise in the computer
`Science, as well as the ability to present the information,
`testify, and overcome the challenges and rigors of the court
`room. Qualified individuals, who are at the peak of their
`careers and are in high demand, earn relatively high hourly
`compensation. A typical case may require hundreds or
`thousands of hours of analysis and exhibit preparation. The
`cost of doing the work manually can be prohibitive. Further,
`the volume of work can be difficult to perform error free.
`Any errors in the analysis or presentation can be used to
`challenge the reliability of the evidence and the credibility of
`the expert witness.
`BACKGROUND PRIOR ART
`0013 Software developers are aware of a number of code
`comparison tools associated with their development envi
`ronment. For example the UNIX brand development envi
`ronment has long had a utility known as “diff which
`compare lines of files for exact matching. The diffutility will
`produce output that indicates which block of lines are
`identical, which block of lines have been added, and which
`block of lines have been deleted. It is typical for an inte
`grated development environment (IDE), such as Microsoft
`Developer Studio brand, Microsoft SourceSafe brand,
`Metrowerks CodeWarrior brand, or Apple Xcode brand
`IDEs, to include a file compare utility. There are also
`stand-alone programs such as WinDiff brand or Helios
`Software Solutions TextPad brand file compare programs.
`Many of these programs provide the same comparison
`features as the original Unix brand diffutility. Some of these
`show lines added, changed and deleted with colored high
`lighting. Some include a graphical user interface that aligns
`identically matching lines of code in a side-by-side format
`that can be scrolled in a window.
`0014) However all of these diff-like programs are limited
`in detecting illegal copying because they only report lines
`that match exactly. Small insignificant changes can easily be
`made to each copied line and these diff-like programs will
`report that no lines are identical, giving a false indication
`that there is no copying.
`00.15
`Editing programs, such as Microsoft Word and
`those found in the various IDEs, have a feature that allows
`all the occurrences of a certain word or phrase to be changed
`(or translated) to a different word or phrase. For example
`every occurrence of “dog” could be translated to “canine'.
`This is known as “Change All or “global query/replace'.
`Software developers can easily generate a list of the impor
`tant names (or identifiers) in a computer program. Software
`developers with nefarious intent can easily develop a list of
`substitute words for each of those identifiers, and change
`every important name wherever it occurs throughout a set of
`copied files. In a matter of minutes the computer can make
`millions of changes to tens of thousands of files. The
`program would still be structured and behave identically
`even though none of the important lines of code would
`match identically.
`0016. These diff-like programs cannot detect such global
`changes.
`
`0017 Further, the diff program algorithm is limited. It
`can get confused in its comparison. If a block of code is
`copied but moved out of order, the diff program may fail to
`detect the identical lines simply because they have been
`rearranged within the file.
`0018. A software developer with nefarious intent can
`easily defeat the illegal copying detection capabilities of
`programs such as diff.
`
`BACKGROUND MORE SOPHISTICATED
`COPYING
`0019. A software developer who is attempting to copy a
`set of source code, and has some understanding that they
`cannot literally copy the Source code without detection, can
`employ various techniques to avoid literal copying that can
`easily be detected, while still effectively copying the source
`code. To avoid being caught, an illicit copier can employ
`more Sophisticated techniques to hide or obscure the evi
`dence of their illegal copying.
`0020. As discussed above, the easiest approach is to
`simply use an editor to make global changes throughout the
`code to identifiers such as variable and method names. This
`makes it difficult for conventional comparison programs to
`detect the copying.
`0021 Another approach is to add spaces, tabs, carriage
`returns, words or comments that don’t change the essential
`function of the code, but will defeat diff-like programs.
`0022. Another approach is to reorder the code so that the
`sections work the same but have been moved around to
`avoid side-by-side comparison.
`0023. Another approach is to re-write the same algo
`rithms in a different language, for example, translating from
`C to Visual Basic, from C to C++, from Basic to C++, and
`so forth.
`0024. Another approach is to rewrite every line of code
`using different but equivalent programming constructs. This
`makes individual line-by-line comparison impossible
`because the equivalent elements may be split across non
`contiguous lines.
`
`BACKGROUND MY EARLIER TESTING
`0025 I conceived of a basic technique to overcome and
`detect some of these techniques, such as the global change
`of important identifiers. I developed custom file compare
`test programs that read two files and broke the words and
`symbols of the files into individual elements called tokens.
`As I manually compared the files, I added special instruc
`tions and data into each different custom test program to
`reverse the global changes that had been made by the illicit
`copier. These programs also output a report where the two
`programs were presented side-by-side with line numbers.
`When these early test programs were successful in identi
`fying translated lines of code, the lines were lined up (or
`aligned) side-by-side by inserting extra blank lines. Lines of
`code that have been literally copied or translated were
`shown in red and are underlined. The lines were numbered
`with the original line numbers. Lines that were too long were
`truncated (cut off) so that the lines would still match up.
`0026. While these situation specific test programs vali
`dated this basic approach, and saved a significant amount of
`
`30
`
`

`

`US 2006/0129523 A1
`
`Jun. 15, 2006
`
`time preparing exhibits that could be edited by hand for
`completeness, it was clear that I had not yet developed a
`complete Solution that would meet the needs of general use
`over a wide range of situations.
`0027. One problem was that the translation rules and
`terms are built-in to each custom program. This required
`changes to the program each time a new rule or new
`matching pair of translation equivalents were found. The
`required repeated modification of the program resulted in
`multiple versions and constant changing of the program.
`0028. Another problem was that each project required its
`own custom program so that the program could never be
`finished. Another problem was maintaining a growing set of
`custom programs. It was difficult to fix software defects or
`to add general enhancements. A fix to one custom program
`might break another custom program that had a different set
`of features.
`0029 Further, testing with a broader range of test cases
`revealed that many techniques for hiding illicit copying were
`still not covered by these simple test programs. For example,
`a situation where the illicit copier added carriage returns,
`words or comments that didn't change the essential function
`of the code, still defeated my early test programs. Also, some
`programming environments include unique numbers on
`every line in a file. The simple act of copying the contents
`of a file into another file will cause every line to no longer
`match because of the unique numbers.
`0030. In some situations subsets of files, appearing in the
`same projects, were found to have been translated using
`different translations for the same words. My early test
`programs could not handle multiple translations of the same
`words.
`0031. Also, the process of finding pairs of files to be
`compared was still a time consuming manual process.
`0032. Further, once I produced a side-by-side listing with
`marking showing the lines that were copied, it was necessary
`to filter out, for example, lines that were in the public
`domain or which were generally known. In some cases, an
`employee of one of the parties may be the best domain
`expert to review what should be filtered versus what would
`be proprietary or trade secret information. However, often
`that person may be limited because of protective orders from
`seeing both sides of the comparison. There is a need to
`prepare marked up listings of either side of a side-by-side
`comparison, that is identical in markup and presentation to
`the side-by-side listings but which contains on the code from
`one of the parties.
`
`BACKGROUND SOLUTION NEEDED
`0033 What is needed is a comprehensive system that will
`automatically:
`0034 (a) find and mark literal copying
`0035 (b) find and mark literal translation
`0036) (c) filter material that should be filtered
`0037 (d) identify copied material that has been filtered
`0038 (e) calculate statistics on total lines, lines copied,
`lines obscured, lines filtered, and percentages
`0039 (f) identify translations that have been used
`
`(g) identify copying even when the code was
`004.0
`translated from one programming language to another
`0041
`(h) identify copying even when words and com
`ments have been changed without changing the essen
`tial function of the code
`0042 (i) provide a mechanism to identify copying
`even when the carriage returns were added
`0043 () provide a mechanism to exclude portions of
`each line prior to comparing the more meaning portions
`(e.g. exclude unique number of each line)
`0044) (k) determine which pairs of files should be
`compared
`0045 (1) skip pairs of files that have little or no
`similarity so that those that do have similarity can be
`presented sooner and with fewer resources
`0046 (m) identify possible translations that might not
`yet have become known
`0047 (n) apply customized rules based on observed
`technique for obscuring copying
`0048 (o

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket