throbber
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY(PCT)
`
`(19) World Intellectual Property Organization
`International Bureau
`
`(43) International Publication Date
`31 January 2002 (31.01.2002)
`
` (10) International Publication Number
`
`WO 02/08948 A2
`
`(51) International Patent Classification’:
`
`GO6F 17/00
`
`Inventors; and
`(72)
`(75)
`Inventors/Applicants (for US only); SULL, Sanghoon
`(21) International Application Number:=PC'1/US01/23631
`[KR/KR]; Gaeop 4-cha Woosung Apt.
`8-402, Do-
`Gop-Dong, KangNam-Ku, Seoul, 135-270 (KR). KIM,
`Hyeokman [KR/KR]; A-Nam Apt.
`101-308 Myun-
`gRyun-Dong, Jong-Ro-Ku, Seoul, 110-521 (KR). CHOI,
`Hyungseok [KR/KR]; HyunDai Apt.
`103-104 Ssang-
`Moon 4-Dong, Dobong-Ku, Seoul,
`132-034 (KR).
`CHUNG, Min, Gyo [KR/KR]; DaeWon Apt. 806-901,
`GumGok-Dong, PunDang-Ku, SungNam City, Kyonggi,
`463-480 (KR). YOON, Ja-Cheon [KR/KR]; SangRok Soo
`Apt.
`204-303, I'WonBon-Dong, KangNam-Ku, Seoul,
`135-947 (KR). OH, Jeongtaek [KR/KR]; DaeRim Apt.
`207-2104 ChungGye-dong, NoWon-gu, Seoul, 139-220
`(KR). LEE, Sangwook [KR/KR]; 102-801 Oksu Heights
`Apt., 100 Oksu Dong, Sundong-Ku, Seoul, 133-100 (KR).
`SONG,S., Moon-Ho [KR/KR]; Yongsan-gu Ichon-Dong
`402, Gangchon Apt. 102-702, Seoul, 133-100 (KR). KIM,
`Jung, Rim [KR/KR]; Lotte Apt. 108-1701, Kuro-Dong,
`
`(22) InternationalFiling Date:
`
`23 July 2001 (23.07.2001)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(30) Priority Data:
`60/221 ,394
`60/221,843
`60/222,373
`60/271,908
`60/291,728
`
`24 July 2000 (24.07.2000)
`28 July 2000 (28.07.2000)
`31 July 2000 (31.07.2000)
`27 February 2001 (27.02.2001)
`17 May 2001 (17.05.2001)
`
`US
`US
`US
`US
`US
`
`(71) Applicant (for all designated States except US): VIV-
`COM, INC. [US/US]; 4180 Wallis Ct., Palo Alto, CA
`94306 (US).
`
`(54) Title: SYSTEM AND METHOD FOR INDEXING, SEARCHING, IDENTIFYING, AND EDITING PORTIONS OF ELEC-
`‘TRONIC MULTIMEDIA FILES
`
`{Continued on next page]
`
`
`
`
`
`Positional
`Information
`Content
`Information
`
`224
`226
`
`
`List of Multimedia Bookmarks
`p— 222,
`
`Positional
`Information
`
`
`Information |
`
`|
`
` Content
`
`WO02/08948A2
`
`(57) Abstract: A method and system are provided
`for tagging, indexing, searching, retrieving, manipu-
`lating, and editing video images on a wide area net-
`work such as the Internet. A first set of methods is
`
`provided for enabling users to add bookmarksto mul-
`timedia files, such as movies, and audio files, such
`as music. The multimedia bookmarkfacilitates the
`
`searching of portions or segments of multimediafiles,
`particularly when used in conjunction with a search
`engine. Additional methods are provided that refor-
`mat a video imagefor use on a variety of devicesthat
`have a wide range of resolutions by selecting some
`material (in the case of smaller resolutions) or more
`material (in the case of larger resolutions) from the
`same multimedia file. Still more methods are pro-
`videdfor interrogating images that contain textual in-
`formation (in graphical form) so that the text may be
`copied to a tag or bookmarkthat can itself be indexed
`and searched to facilitate later retrieval via a search
`
`engine.
`
`Amazon vy. Audio Pod
`USPatent 10,805,111
`
`Amazon EX-1060
`
`-i-
`
`Amazon v. Audio Pod
`US Patent 10,805,111
`Amazon EX-1060
`
`

`

`WO 02/08948
`
`A2
`
`
`
`Kuro-gu, Seoul, 152-055 (KR). LEE, Keansub [KR/KR];
`972-2 Pyokjokgol Jugong Apt. 836-1701, Yongtong-Dong,
`Paldal-gu, Suwon City, Kyonggi, 463-060 (KR). CHUN,
`Seong, Soo [KR/KR]; Dusan apt. 425-1402, Imae-dong,
`Pundang-gu, Songnam City, Kyonggi, 463-060 (KR). OH,
`Sangwook |KR/KR|; 609-42 Yongdam2-Dong, Cheju
`City, Cheju, 690-042 (KR). KIM, Yunam [KR/KR];
`2529-3daeYu JanRah Mansion 302, Nollyun-Dong,
`CheJu City, Cheju, 690-180 (KR).
`
`(74)
`
`Agents: CHICHESTER,Ronald,L. et al.; Baker Botts
`L.L-P., One Shell Plaza, 910 Louisiana, Houston, TX 77002
`(US).
`
`(81)
`
`Designated States (national): AE, AG, AL, AM, AT, AT
`(utility model), AU, AZ, BA, BB, BG, BR, BY, BZ, CA,
`CH, CN, CO, CR, CU, CZ, DE (utility model), DK (utility
`model), DM, DZ, EC, EE (utility model), ES, FI (utility
`modcl), GB, GD, GE, GH, GM, HR, HU, 1D,IL, IN, IS,
`JP, KE, KG, KP, KR (utility model), KZ, LC, LK, LR, LS,
`
`LT, LU, LV, MA, MD, MG,MK, MN, MW,Mx, MZ, NO,
`NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK (utility model),
`SL, TJ, TM, TR, TT, TZ, UA, UG, US, UZ, VN, YU, ZA,
`ZW.
`
`(84)
`
`Designated States (regional): ARIPO patent (GH, GM,
`KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZW), Eurasian
`patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European
`patent (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, TE,
`IT, LU, MC, NL, PT, SE, TR), OAPI patent (BF, BJ, CF,
`CG, CI, CM, GA, GN, GQ, GW, ML, MR, NE, SN, TD,
`TG).
`
`Published:
`without international search report and to be republished
`upon receipt of that report
`
`For two-letter codes and other abbreviations, refer to the "Guid-
`ance Notes on Codes andAbbreviations" appearing at the begin-
`ning ofeach regular issue ofthe PCT Gazette.
`
`-ii-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`SYSTEM AND METHODFORINDEXING, SEARCHING, IDENTIFYING,
`
`AND EDITING PORTIONS OF ELECTRONIC MULTIMEDIA FILES
`
`Backgroundof the Invention
`
`Field of the Invention
`
`The present invention relates generally to marking multimedia files. More
`
`10
`
`specifically, the present invention relates to applying or inserting tags ito multimedia
`
`files for indexing and searching, as well as for editing portions of multimedia files, all
`
`to facilitate the storing, searching, and retrieving of the multimedia information.
`
`Backgroundofthe Related Art
`
`1.
`
`Multimedia Bookmarks -
`
`15
`
`With the phenomenal growth of the Internet, the amount of multimedia content
`
`that can be accessed by the public has virtually exploded. There are occasions where a
`
`user who once accessed particular multimedia content needs or desires to access the
`
`content again at a later time, possibly at or from a different place. For example, in the
`
`case of data interruption due to a poor network condition, the user may be required to
`
`20
`
`access the content again.
`
`In another case, a user who once viewed multimedia content
`
`at work may want to continue to view the content at home. Most users would want to
`
`restart accessing the content from the point where they had left off Moreover,
`
`subsequent access may be initiated by a different user in an exchange of information
`
`between users. Unfortunately, multimedia content is represented in a streaming file
`format so that a user has to view the file from the beginning in orderto look for the
`exact point wherethefirst user left off.
`
`In orderto save the time involved in browsing the data from the beginning, the
`
`concept of a bookmark may be used. A conventional bookmark marks a document
`such as a static web page forlaterretrieval by saving a link (address) to the document.
`
`30
`
`For example, Internet browsers support a bookmark facility by saving an address called
`
`a Uniform Resource Identifier
`
`(URI)
`
`to a particular
`
`file.
`
`Internet Explorer,
`
`manufactured by the Microsoft Corporation of Redmond, Washington, uses the term
`
`“favorite” to describe a similar concept.
`
`-4-
`
`-1-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`Conventional bookmarks, however, store only the information related to the
`
`location of a file, such as the directory name with a file name, a Universal Resource
`
`Locator (URL), or the URI The files referred to by conventional bookmarks are
`
`treated in the same way regardless of the data formats for storing the content.
`
`Typically, a simple link is used for multimedia content also. For example, to link to a
`
`multimedia content file through the Internet, a URI is used. Each time the file is
`
`revisited using the bookmark, the multimedia content associated with the bookmark is
`
`always played from the beginning.
`
`Figure 1 illustrates a list 108 of conventional bookmarks 110, each comprising
`
`positional
`
`information 112 and title 114.
`
`The positional
`
`information 112 of a
`
`conventional bookmark is composed of a URI as well as a bookmarked position 106.
`
`The bookmarked positionis a relative time or byte position measured from a beginning
`
`of the multimedia content. The title 114 can be specified by a user, as well as delivered
`
`with the content, and it
`
`is typically used to make the user easily recognize the
`
`15
`
`bookmarked URI in a bookmark list 108. For the case of a conventional bookmark
`
`without using a bookmarked position, when a user wants to replay the specified
`
`multimediafile, the file is played from the beginning ofthe file each time, regardless of
`how muchofthefile the user has already viewed. The user has no choice but to record
`the last accessed position on a memo and to move manually the last stopped point. If
`
`20
`
`the multimedia file is viewed by streaming, the user must go through a series of
`
`buffering to find out the last accessed position, thus wasting much time. Even for the
`
`conventional bookmark with a bookmarked position, the same problem occurs when
`
`the multimedia content is delivered in live broadcast, since the bookmarked position
`
`within the multimedia content is not usually available, as well as when the user wants
`to replay one ofthe variations ofthe bookmarked multimedia content.
`
`25
`
`Further, conventional bookmarks do not provide a convenient way of switching
`
`between different data formats. Multimedia content may be generated and stored in a
`
`variety of formats. For example, video may be stored in the formats such as MPEG,
`
`ASF, RM, MOV, and AVI. Audio may be stored in the formats such as MID, MP3,
`
`30
`
`and WAV. There may be occasions where a user wants to switch the play of content
`
`from one format to another. Since different data formats produced from the same
`
`multimedia content are often encoded independently, the same segment is stored at
`
`-2-
`
`-2-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`different
`
`temporal positions within the different
`
`formats.
`
`Since conventional
`
`bookmarks have no facility to store any content information, users have no choice but
`
`to review the multimedia content from the beginning and to search manually for the
`
`last-accessed segment within the content.
`
`Time information may be incorporated into a bookmark to return to the last-
`
`accessed segment within the multimedia content. The use of time information only,
`
`however, fails to return to exactly the same segment at a later time for the following
`
`reasons.
`
`If a bookmark incorporating time information was used to save the last-
`
`accessed segment during the preview of multimedia content broadcast, the bookmark
`
`10
`
`information would not be valid during a regular full-version broadcast, so as to return
`
`to the last-accessed segment. Similarly, if a bookmark incorporating time information
`
`was used to save the last-accessed segment during real-time broadcast, the bookmark
`
`would not be effective during later access becausethe later available version may have
`
`been edited or a time code wasnot available during the real-time broadcast.
`
`Many video and audio archiving systems, consisting of several differently
`
`compressed files called "variations", could be produced from a single source
`multimedia content. Many web-casting sites provide multiple streaming files for a
`single video content with different bandwidths according to each video format. For
`
`example, CNN.com provides five different streaming videos for a single video content:
`two different types of streaming videos with the bandwidths of 28.8 kbps and 80 kbps,
`both encoded in Microsoft's Advanced Streaming Format (ASF). CNN.com also
`provides RM streaming format by RealNetworks, Inc. of Seattle, Washington (RM),
`and a streaming video with the smart bandwidth encoded in Apple Computer, Inc.’s
`QuickTime streaming format (MOV).
`In this case, the five video files may start and
`end at different time points from the viewpoint of the source video content, since each
`
`variation may be produced by an independent encoding process varying the values
`chosen for encoding formats, bandwidths, resolutions, etc. This results in mismatches
`
`of time points because a specific time point of the source video content may be
`presented as different media time points in the five videofiles.
`When a multimedia bookmarkis utilized, the mismatches of positions cause a
`problem of mis-positioned playback. Consider a simple case where one makes a
`multimedia bookmark on a master file of a multimedia content (for example, video
`
`13
`
`20
`
`25
`
`30
`
`-3-
`
`-3-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`encoded in a given format), and tries to play another variation (for example, video
`
`encoded in a different format) from the bookmarked position. If the two variations do
`
`not start at the same position of the source content, the playback will not start at the
`
`bookmarked position. That is, the playback will start at the position that is temporally
`
`shifted with the difference between the start positions of the two variations.
`
`The entire multimedia presentation is often lengthy. However,
`
`there are
`
`frequent occasions when the presentation is interrupted, voluntarily or forcibly, to
`
`terminate before finishing. Examples include a user whostarts playing a video at work
`
`leaves the office and desires to continue watching the video at home, or a user who may
`
`10
`
`be forced to stop watching the video and log out due to system shutdown.
`
`It is thus
`
`necessary to save the termination position of the multimedia file into persistent storage
`
`in order to return directly to the point of termination without a time-consuming
`
`playback of the multimedia file from the beginning.
`
`The interrupted presentation of the multimedia file will usually resume exactly
`
`15
`
`at the previously saved terminated position. However, in some cases,it is desirable to
`
`begin the playback of the multimedia file a certain time before the terminated point,
`since such rewinding could help refresh the user’s memory.
`
`In the priorart, the EPG (Electronic Program Guide) has played a crucialrole as
`
`a provider of TV programming information. EPG facilitates a user’s efforts to search
`
`20
`
`for TV programs that he or she wants to view. However, EPG’s two-dimensional
`
`presentation (channels vs. time slots) becomes cumbersomeas terrestrial, cable, and
`
`satellite systems send out
`
`thousands of programs through hundreds of channels.
`
`Navigation through a large table of rows and columns in order to search for desired
`
`programs is frustrating.
`
`25
`
`30
`
`One of the features provided by the recent set-top box (STB) is the personal
`video recording (PVR) that allows simultaneous recording and playback. Such STB
`usually contains digital video encoder/decoder based on an international digital video
`compression standard such as MPEG-1/2, as well as the large local storage for the
`digitally compressed video data. Some ofthe recent STBs also allow connection to the
`
`Internet. Thus, STB users can experience new services such as time-shifting and web-
`enhancedtelevision (TV).
`
`However,there still exist some problems for the PVR-enabled STBs. Thefirst
`
`-4-
`
`-4-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`problem is that even the latest STBs alone cannot fully satisfy users’ ever-increasing
`
`desire for diverse functionalities. The STBs now on the market are very limited in
`
`terms of computing and memory and so it
`
`is not easy to execute most CPU and
`
`memory intensive applications. For example, the people who are bored with plain
`
`playback of the recorded video may desire more advanced features such as video
`
`browsing/summary and search. Actually, all of those features require metadata for the
`
`recorded video. The metadata are usually the data describing content, suchas thetitle,
`
`genre and summary ofa television program. The metadata also include audiovisual
`
`characteristic data such as raw image data corresponding to a specific frame of the
`
`10
`
`video stream. Someof the description is structured around "segments" that represent
`
`spatial, temporal or spatio-temporal components of the audio-visual content.
`
`In the
`
`case of video content, the segment may be a single frame, a single shot consisting of
`
`successive frames, or a group of several successive shots. Each segment may be
`
`described by some elementary semantic information using texts.
`
`The segment
`
`is
`
`15
`
`referenced by the metadata using media locators such as frame numberor time codes,
`
`However, the generation of such video metadata usually requires intensive computation
`
`and a human operator’s help, so practically speaking, it is not feasible to generate the
`
`metadata in the current STB. Thus, one possible solution for this problem is to
`
`generate the metadata in the server connected to the STB and todeliver it to the STB
`via network. However, in this scenario, it is essential to know the start position of
`
`20
`
`recorded video with respect to the video stream used to generate the metadata in the
`
`server/content provider in order to match the temporal position referenced by the
`
`metadata to the position of the recorded video.
`
`The second problem is related to discrepancy between the two time instants: the
`
`25
`
`time instant at which the STB starts the recording of the user-requested TV program,
`
`and the time instant at which the TV program is actually broadcast. Suppose, for
`instance, that a user initiated PVR request for a TV program scheduledto go on the air
`
`at 11:30 AM,butthe actual broadcasting time is 11:31 AM.
`
`In this case, when the user
`
`wants to play the recorded program, the user has to watch the unwanted segmentat the
`
`30
`
`beginning of the recorded video, which lasts for one minute. This time mismatch could
`
`bring some inconvenience to the user who wants to view only the requested program.
`
`However, the time mismatch problem can be solved by using metadata delivered from
`
`-5-
`
`-5-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`the server, for example,
`
`reference frames/segment representing the beginning of the
`
`TV program. The exact location of the TV program, then, can be easily found by
`
`simply matching the reference frames withall the recorded frames for the program.
`
`2.
`
`Search
`
`The rapid expansion of the World Wide Web (WWW)
`
`and mobile
`
`communications has also brought great interest in efficient multimedia data search,
`
`browsing and management. Content-based image retrieval
`
`(CBIR)
`
`is a powerful
`
`concept for finding images based on image contents, and content-based image search
`
`and browsing have been tested using many CBIR systems. See, M. Flickner, Harpreet
`
`10
`
`Sawhney, Wayne Niblack, Jonathan Ashley, Q. Huang, Byron Dom, Monika Gorkani,
`
`Jim Hafine, Denis Lee, Dragutin Petkovic, David Steele and Peter Yanker, "Query by
`
`image and video content: The QBIC system," JEEE Computer, Vol. 28. No. 9, pp. 23-
`
`32, Sept., 1995; Carson, Chad ez ai., "Region-Based Image Querying [Blobworldj,”
`
`Workshop on Content-Based Access of Image and Video Libraries, Puerto Rico, Jun.
`
`15
`
`1997,
`
`J. R. Smith and S. Chang, "Visually searching the web for content," JEEE
`
`Multimedia Magazine, Vol. 4, No. 3, pp. 12-20, Summer 1997, also Columbia U.
`
`CU/CTR Technical Report 459-96-25; A. Pentland, R. W. Picard and S. Sclaroff, "A
`
`Photobook:tools for content-based manipulation of image databases,” in Proc. Of SPIE
`
`Conf. On Storage and Retrievalfor Image and Video Databases-II, No. 2185, pp. 34-
`
`20
`
`47, San Jose, CA, Feb., 1944,
`
`J. R. Bach, C. Fuller, A. Guppy, A. Hampapur, B.
`
`Horowitz, R. Humphrey, R. C. Jain and C. Shu, "Virage image search engine: an open
`
`framework for image management,” Symposium on Electronic Imaging: Science and
`
`Technology --Storage & Retrieval for Image and Video Databases IV, IS&T/SPIE’96,
`
`Feb., 1996;
`
`J. R. Smith and 8. Chang, "VisualSEEk: A Fully Automated Content-
`
`25
`
`Based Image Query System," ACM Multimedia Conference, Boston, MA, Nov. 1996;
`Jing Huang, S. Ravi Kumar, Mandar Mitra, Wei-Jing Zhu and Ramin Zabih. "Image
`
`Indexing Using Color Correlograms," in JEEE Conference on Computer Vision and
`Pattern Recognition, pp. 762-768, Jun., 1997; and Simone Santini, and Ramesh Jain,
`
`"The 'El Nino' Image Database System," in International Conference on Multimedia
`
`30
`
`Computing and Systems, pp. 524-529, Jun., 1999.
`
`Currently, most of the content-based image search engines rely on low-level
`
`image features such as color, texture and shape. While high-level image descriptors are
`
`-6-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`potentially more intuitive for common users, the derivation of high-level descriptors is
`
`still in its experimental stages in the field of computer vision and requires complex
`
`vision processing. Despite its efficiency and ease of implementation, on the other hand,
`
`the main disadvantage of low-level image features is that they are perceptually non-
`
`intuitive for both expert and non-expert users, and therefor, do not normally represent
`
`users' intent effectively. Furthermore, they are highly sensitive to a small amount of
`
`image variation in feature shape, size, position, orientation, brightness and color.
`
`Perceptually similar images are often highly dissimilar in terms of low-level image
`
`features. Searches made by low-level features are often unsuccessful and it usually
`
`10
`
`takes manytrials to find images satisfactory to a user.
`
`Efforts have been made to overcome the limitations of low-level features.
`
`Relevance feedback is a popular idea for incorporating user’s perceptual feedback in
`
`the image search.
`
`See, Y. Rui, T. Huang, and S. Mehrota, "A relevance feedback
`
`architecture in content-based multimedia information retrieval systems," in JEEE
`
`15
`
`Workshop on Content-based Access ofImage and Video Libraries, Puerto Rico, pp. 82-
`
`89, Jun., 1997, Yong Rui, Thomas S. Huang, Michael Ortega, and Sharad Mehrotra,
`
`“Relevance Feedback: A Power Tool in Interactive Content-Based ImageRetrieval," in
`
`IEEE Tran on Circuits and Systems for Video Technology, Special
`
`Issue on
`
`Segmentation, Description, and Retrieval of Video Content, pp. 644-655, Vol. 8, No. 5,
`
`20
`
`Sept., 1998, G. Aggarwal, P. Dubey, S. Ghosal, A. Kulshreshtha, and A. Sarkar,
`
`in Proc. of IEEE
`"7PURE: perceptual and user-friendly retrieval of images,"
`International Conference on Multimedia and Exposition, Vol. 2, pp. 693-696, Jul.,
`
`2000, Ye Lu, Chunhui Hu, Xingquan Zhu, HongJiang Zhang and Qiang Yang, “A
`
`unified framework for semantics and feature based relevance feedback in image
`retrieval systems," in Proc. ofACM International Conference on Multimedia, pp. 31-
`37, Oct., 2000, H. Muller, W. Muller, S. Marchand-Maillet, and T. Pun, "Strategies for
`
`positive and negative relevance feedback in image retrieval," in Proc. of IEEE
`Conference on Pattern Recognition, Vol. 1, pp. 1043-1046, Sept., 2000, S. Aksoy, R.
`M. Haralick, F. A. Cheikh, and M. Gabbouj, "A weighted distance approach to
`relevance feedback," in Proc. ofIEEE Conference on Pattern Recognition, Vol. 4, pp.
`
`812-815, Sept., 2000.;
`
`I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, and P.
`
`N. Yianilos, "The Bayesian imageretrieval system, PicHunter:theory, implementation,
`
`25
`
`30
`
`-7-
`
`-7-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`and psychophysical experiments," in JEEE Transaction on Image Processing, Vol. 9,
`
`pp. 20-37, Jan., 2000; P. Muncesawang, and Guan Ling, "Multi-resolution-histogram
`
`indexing and relevance feedback learning for image retrieval," in Proc. of IEEE
`
`International Conference on Image Processing, Vol. 2, pp. 526-529, Jan., 2001. A user
`
`can manually establish relevance between a query and retrieved images, and the
`
`relevant images can be used for refining the query. When the refinement is made by
`
`adjusting a set of low-level
`
`feature weights, however,
`
`the user’s intent
`
`is still
`
`represented by low-level features and their basic limitationsstill remain.
`
`Several approaches have been made to the integration of human perceptual
`
`10
`
`responses and low-level features in image retrieval. One notable approach is to adjust
`
`an image’s feature’s distance attributes based on the human perceptual input.
`
`See,
`
`Simone Santini, and Ramesh Jain, "The "El Nino' Image Database System," in
`
`International Conference on Multimedia Computing and Systems, pp. 524-529, Jun.,
`
`1999. Another approach, called "blob world," combines low-level features to derive
`
`15
`
`slightly higher-level descriptions and presents the "blobs'""of grouped features to a user
`to provide a better understanding of feature characteristics. See, Carson, Chad,et al.,
`"Region-Based Image Querying [Blobworld]," Workshop on Content-Based Access of
`
`Image and Video Libraries, Puerto Rico, Jun., 1997. While those schemes successfully
`
`reflect a user’s intent to some degree, it remains to be seen how grouping of features or
`
`20
`
`feature distance modification can achieve the perceptual relevance in imageretrieval.
`
`A more traditional computer vision approach to the derivation of high-level object
`
`descriptors based on generic object recognition has been presented for imageretrieval.
`
`See, David A. Forsyth and Margaret Fleck, "Body Plans," in JEEE Conference on
`Computer Vision and Pattern Recognition, pp. 678-683, Jun., 1997, Dueto its limited
`feasibility for general
`image objects and complex processing,
`its utility is still
`
`restricted.
`With the rapid proliferation of large image/video databases, there has been an
`increasing demand for effective methods to search the large image/video databases
`automatically by their content. For a query image/video clip given by a user, these
`methods search the databases for the images/videos that are most similar to the query.
`In other words, the goal of the image/video search is to find best matches to the query
`image/video from the database.
`
`25
`
`30
`
`-8-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`Several approaches have been made towards the development of the fast,
`
`effective multimedia search methods. Milanes ef al. utilized hierarchical clustering to
`' organize an image database into visually similar groupings.
`See, R. Milanese, D.
`
`Squire, and T. Pun, "Correspondence analysis and hierarchical indexing for content-
`
`based image retrieval,” in Proc. IEEE Int. Conf: Image Processing, Vol. 3, Lausanne,
`
`Switzerland, pp. 859-862, Sept., 1996. Zhang and Zhong provided a hierarchical self-
`organizing map (HSOM) methodto organize an image database into a two-dimensional
`grid.
`See, H. J. Zhang and D. Zhong, "A scheme for visual feature based image
`
`indexing," in Proc. SPIE/IS&T Conf. Storage Retrieval Image Video Database III, Vol.
`
`10
`
`2420, pp. 36-46, San Jose, CA, Feb., 1995. However, a weakness of HSOM isthatit is
`
`generally too computationally expensive to apply to a large multimedia database.
`
`In addition, there are other well known solutions using Voronoi diagram, Kd-
`
`tree, and R-tree.
`
`See, J. Bentley, “Multidimensional binary search trees used for
`
`associative searching," Comm. of the ACM, Vol. 18, No. 9, pp. 509-517, 1975, S. Brin,
`"Near neighbor search in large metric spaces," in Proc. 21° Conf. On Very Large
`
`15
`
`Databases (VLDB’95), Zurich, Switzerland, pp. 574-584, 1995. However, it is also
`
`knownthat those approaches are not adequate for the high dimensional feature vector
`
`spaces, and thus, they are useful only in low dimensional feature spaces.
`
`Peer to Peer Searching
`
`20
`
`Peer-to-Peer (P2P) is a class of applications making the most of previously
`
`unused resources (for example, storage, content, and/or CPU cycles), which are
`
`available on the peers at the edges of networks. P2P computing allows the peers to
`
`share the resources and services, or to aggregate CPU cycles, orto chat with each other,
`
`by direct exchange. Two of the more popular implementations of P2P computing are
`
`25
`
`Napster and Gnutella. Napster has its peers register files with a broker, and uses the
`
`broker to search for files to copy. The broker plays the role of server in a client-server
`
`modelto facilitate the interaction between the peers. Gnutella has peers register files
`
`with network neighbors, and searches the P2P network for files to copy. Since this
`
`model does not require a centralized broker, Gnutella is considered to be a true P2P
`
`30
`
`system.
`
`3.
`
`Editing
`
`-9-
`
`-9-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`-10-
`
`In the prior art, video files were edited through video editing software by
`
`copying several segments of the input videos and pasting them to an output video. The
`
`prior art method, however, confronts two major problems mentioned below.
`
`The first problem of the prior art methodis that it requires additional storage to
`
`store the new version of an edited video file. Conventional video editing software
`
`generally uses the original input video file to create an edited video.
`
`In most of the
`
`cases, editors having a large database of videos attempt to edit the videos to create a
`
`new one. In this case, the storage is wasted storing duplicated portions of the video.
`
`The second problem with the prior art method is that a whole new metadata have to be
`
`generated for a newly created video. If the metadata are not edited in accordance with
`
`the edition of the video, even if the metadata for the specific segment of the input video
`
`are already constructed, the metadata may not accurately reflect the content. Because
`
`considerable effort is required to create the metadata of videos, it is desirable to reuse
`
`efficiently existing metadata,if possible.
`
`Metadata of a video segment contain textual
`
`information such as
`
`time
`
`information (for example, starting frame number and duration, or starting frame
`
`numberas well as the finishing frame number), title, keyword, and annotation, as well
`
`as image information such as the key frame of a segment. The metadata of segments
`
`10
`
`15
`
`can form a hierarchical structure where the larger segment contains the smaller
`
`20
`
`segments. Because it is hard to store both the video and their metadata into a single
`
`file, the video metadata are separately stored as a metafile, or stored in a database
`
`management system (DBMS).
`
`If metadata having a hierarchical structure are used, browsing a whole video,
`
`searching for a segment using the keyword and annotation of each segment, and using
`the key frames of each segment for visual summary of the video are supported. Also,
`
`25
`
`not only does it support the existing simple playback, but also the playback and
`repeated playback of a specific segment. Therefor, the use of hierarchically-structured
`metadata is becoming popular.
`
`30
`
`4.
`
`Transcoding
`With the advance of information technology, such as the popularity of the
`Internet, multimedia presentation proliferates into ever increasing kinds of media,
`including wireless media. Multimedia data are accessed by ever increasing kinds of
`
`-10-
`
`-10-
`
`

`

`WO 02/08948
`
`PCT/US01/23631
`
`-11-
`
`devices such as hand-held computers (HHCs), personal digital assistants (PDAs), and
`
`smart cellular phones. There is a need for accessing multimedia content in a universal
`
`fashion from a wide variety of devices.
`
`See,
`
`J. R. Smith, R. Mohan and C. Li,
`
`"Transcoding Internet Content for Heterogeneous Client Devices," in Proc. ISCASA,
`
`Monterey, California, 1998.
`
`Several approaches have been made to enable effectively such universal
`
`multimedia access (UMA). A data representation, the InfoPyramid,is a framework for
`
`aggregating the
`
`individual
`
`components of multimedia
`
`content with content
`
`descriptions, and methods and rules for handling the content and content descriptions.
`
`See, C. Li, R. Mohan and J. R. Smith, "Multimedia Content Description in the
`
`InfoPyramid," in Proc. IEEE Intern. Conf. on Acoustics, Speech and Signal Processing,
`
`May, 1998. The InfoPyramid describes content in different modalities, at different
`
`resolutions and at multiple abstractions. Then a transcoding tool dynamically selects
`
`the resolutions or modalities that best meet the client capabilities from the InfoPyramid.
`
`i)
`
`J. R. Smith proposed a notion of importance value for each of the regions of an image
`
`as a hint to reduce the overall data size in bits of the transcoded image. See, J. R.
`
`Smith, R. Mohan and C. Li, "Content-based Transcoding of Images in the Internet," in-
`
`Proc. IEEE Intern. Conf: on Image Processing, Oct., 1998, S. Paek and J.R. Smith,
`
`"Detecting Image Purpose in World-Wide Web Documents,” in Proc. SPIEAS&T
`
`20
`
`Photonics West, Document Recognition, Jan., 1998. The importance value describes
`
`the relative importance of the region/block in the image presentation compared with the
`
`other regions. This value ranges from 0 to 1, where 1 stands for the highest important
`
`region and O for the lowest.
`
`For example,
`
`the regions of high importance are
`
`compressed with a lower compression factor than the remaining part of the image.
`
`25
`
`Then, the other parts of the image are first blurred and then compressed with a higher
`
`compression factor in orderto reduce the overall data size of the compressed image.
`
`When an image is transmitted to a variety of client devices with different
`
`display siz

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket