RE: [flalug] Powerpoint image extraction

From: Eben King (eben1@tampabay.rr.com)
Date: Fri Jul 15 2005 - 17:48:10 EDT


On Fri, 15 Jul 2005, peter osmar wrote:

> >From: Eben King <eben1@tampabay.rr.com>
> >Reply-To: flalug@nks.net
> >To: "Florida Linux Users' Group" <flalug@nks.net>
> >Subject: [flalug] Powerpoint image extraction
> >Date: Fri, 15 Jul 2005 12:05:03 -0400 (EDT)
> >
> >So a correspondent sent me some images, encapsulated in a .PPT file.
> >(Naked JPEGs would have worked fine, but nooooo...) Is there any way,
> >using
> >od, grep, dd, or maybe some tool I don't have yet, to get them out? I can
> >view it in Windows (I suppose Windows-in-VMware too) using MS*spit*'s free
> >"Powerpoint Viewer", but I can't do squat with it.
> >
> Have you tried Open Office .I have opened a couple power points Ooorg.
> Presentation. Worth a try. Pete

Yeah, I got them out that way. Thanks. Also, I found a way that does not
involve OpenOffice. I wrote a scriptlet:

for skipcount in `seq 2 1032700`; do
  echo -n "$skipcount "
  dd if=body_paint.ppt skip=$skipcount bs=1 2>/dev/null | file -
done | grep -v ': *data$' > /tmp/file2

(1032700 is a nice round number a little less than the file size in bytes)

In /tmp/file2, there is lots of stuff like

70942 standard input: LZH compressed data, original name >¿þªø?Å_¯¾Ëp~ßsmßþ?5ŸNœÿ
71008 standard input: Sendmail frozen configuration - version þ?5ol÷þŸžqXÿ
71042 standard input: DBase 3 data file with memo(s) (16772103 records)

and most of it is total BS. But among the dreck, I found

537 standard input: JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96
and
49958 standard input: JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96

(so far) If I do

dd if=filename.ppt skip=537 bs=1 | djpeg | xv -

there is one of the images from the file! I can do the same with the other
offset. Now, I don't know if they were JPEGs originally, or that's just
what Powerpoint uses. And it's slow. Very slow. And (as you see) subject
to misidentification by "file". Also, I don't know but what filename.ppt is
fragmented internally (a la "fast save" in Word), so I'd never retrieve the
images this way. But it's a possibility, for users who don't have OO, and
_do_ have lots of time, and who won't get too upset about missing images.

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?           [TOFU := text oben,
A: Top-posting.                                       followup unten]
Q: What is the most annoying thing on usenet?        -- Daniel Jensen



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 20:05:41 EDT