PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Mar 28, 2024 4:35 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: Thu Jan 05, 2017 4:24 pm 
Offline

Joined: Thu Jan 05, 2017 4:08 pm
Posts: 1
Hello,

i'm converting .tiff into pdf, for document i have, for example 1300 pages and i do a single document with them.

all is going well, it makes the job in 10 mins, but i would like to reduce the size of it, the total size of the 1300 .tiff is 140mo, at the end the pdf size is 240mo.

I use the version 1.50.4000.0

i have tried all the options below but there's no change.

Code:
s_document.Options.UseFlateDecoderForJpegImages = PdfUseFlateDecoderForJpegImages.Automatic;
s_document.Options.FlateEncodeMode = PdfFlateEncodeMode.BestCompression;
s_document.Options.EnableCcittCompressionForBilevelImages = true;
s_document.Options.CompressContentStreams = true;
s_document.Options.NoCompression = false;


i have also tried to compress the .tiff first in jpeg and than send the stream to the pdf but the final size is even bigger and it consumes enormous quantity of ram.

Code:
            ImageCodecInfo codecInfo = ImageCodecInfo.GetImageEncoders()
                    .Where(r => r.CodecName.ToUpperInvariant().Contains("JPEG"))
                    .Select(r => r).FirstOrDefault();

            var encoder = System.Drawing.Imaging.Encoder.Quality;
            var parameters = new EncoderParameters(1);
            var parameter = new EncoderParameter(encoder, 50L);
            parameters.Param[0] = parameter;

            foreach (var file in filePaths)
            {
                PdfPage page = s_document.AddPage();
                XGraphics gfx = XGraphics.FromPdfPage(page);

                System.Drawing.Image imageSys = System.Drawing.Image.FromFile(file);
                MemoryStream streamJPG = new MemoryStream();
                imageSys.Save(streamJPG, codecInfo, parameters);
                XImage image = XImage.FromStream(streamJPG);
               
                page.Width = image.PointWidth;
                page.Height = image.PointHeight;
                gfx.DrawImage(image, 0, 0);
                image.Dispose();
            }
            s_document.Save(@"c:\DEV\docNoTiff.pdf");


do you have ideas about how i could reduce the size.

I have also tried the DevExpress plugin conversion, with no compression the size is 245 mo and take 20 mins to convert.
With the Jpeg compression set to high quality, the size is 160mo and it takes 35 mins to convert, there's nearly no visible loss of quality.
I Have millions of documents to convert so time is important.

Kind regards
Geoffrey


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 06, 2017 6:48 pm 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 909
Location: CCAA
Hi!
ukanoldai wrote:
i have tried all the options below but there's no change.
I don't believe that.
"PdfFlateEncodeMode.BestCompression" should make a difference, but only in the small single-digit percent range.

PDFsharp stores TIFF images using lossless compression. Do not expect miracles.
If you reduce the TIFF images (say 80% or 75% of the original size) then you should see a big difference of the file size, but with a loss of quality.

Do you use NuGet packages?
If you use the PDFsharp source code, make sure to make all tests with a Release build.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jan 09, 2017 2:14 pm 
Offline

Joined: Tue Aug 02, 2016 9:56 am
Posts: 40
Location: Amsterdam, The Netherlands
If, as it looks like, PdfSharp converts TIFFs that use JPEG compression to lossless compression, you're lucky that the files only increase from 140MB to 240MB; I would have expected more.

What you could do is to convert the TIFFs to JPEG files and then add those to the PDF, because PdfSharp will keep the image data in JPEG format. However, I know of no tool that converts TIFF to JPEG while avoiding generation loss.

However, something much easier that you can do is to use the tiff2pdf tool from libtiff ( http://libtiff.maptools.org/ ), which can do a lossless conversion directly from TIFF to PDF. You will need to first put the TIFFs together into one huge multi-page TIFF using tiffcp, also from libtiff. If necessary, you could then use PdfSharp to edit the PDF for any additional changes you need.

_________________
Gerben Vos
Developer


Top
 Profile  
Reply with quote  
PostPosted: Wed May 10, 2017 2:41 pm 
Offline

Joined: Wed May 10, 2017 2:35 pm
Posts: 8
or use the free ImageProcessor nuget package to pre-process JPG's like this:
using ImageProcessor;
using ImageProcessor.Imaging.Formats;
using System.Drawing;
Code:
private static void CompressImage(string filename)
{
    // Read a file and resize it.
    byte[] photoBytes = File.ReadAllBytes(filename);
    ISupportedImageFormat format = new JpegFormat { Quality = 50 };

    using (MemoryStream inStream = new MemoryStream(photoBytes))
        using (MemoryStream outStream = new MemoryStream())
            using (ImageFactory imageFactory = new ImageFactory())
                imageFactory.Load(inStream).Format(format).Save($"new_{filename}");
}


Last edited by phirewind on Thu May 11, 2017 3:22 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed May 10, 2017 2:46 pm 
Offline

Joined: Tue Aug 02, 2016 9:56 am
Posts: 40
Location: Amsterdam, The Netherlands
phirewind wrote:
or use the free ImageProcessor nuget package to pre-process JPG's like this:
That is possible, but note that you will lose image quality because you are uncompressing and re-compressing with JPEG compression.

_________________
Gerben Vos
Developer


Top
 Profile  
Reply with quote  
PostPosted: Wed May 10, 2017 3:02 pm 
Offline

Joined: Wed May 10, 2017 2:35 pm
Posts: 8
Yes, and that factor is best weighted against your document content. For artwork or resolution-sensitive images it would not be an application-compatible solution, however I am working with scanned paper documents, and even at 50% quality, the artifacts introduced are negligible for this purpose.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 10, 2017 3:10 pm 
Offline

Joined: Tue Aug 02, 2016 9:56 am
Posts: 40
Location: Amsterdam, The Netherlands
phirewind wrote:
Yes, and that factor is best weighted against your document content. For artwork or resolution-sensitive images it would not be an application-compatible solution, however I am working with scanned paper documents, and even at 50% quality, the artifacts introduced are negligible for this purpose.
Yes, but note that you should not do that with TIFFs that are already JPEG-compressed, and so already have some artifacts. Recompressing will make them worse. But now I read it again, it looks like the original poster's original solution involved TIFFs with another compression. Those would be okay to compress to JPEG, with the caveats you write.

_________________
Gerben Vos
Developer


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 20, 2020 3:18 am 
Offline

Joined: Mon Apr 20, 2020 3:12 am
Posts: 1
I used code below to compress PDF file:

Code:
            foreach (PdfPage page in document.Pages)
            {
                PdfDictionary resources = page.Elements.GetDictionary("/Resources");
                if (resources != null)
                {
                    PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
                    if (xObjects != null)
                    {
                        ICollection<PdfItem> items = xObjects.Elements.Values;
                        foreach (PdfItem item in items)
                        {
                            if (item is PdfReference reference)
                            {
                                if (reference.Value is PdfDictionary xObject && xObject.Elements.GetString("/Subtype") == "/Image")
                                {
                                    byte[] stream = xObject.Stream.Value;
                                    int width = xObject.Elements.GetInteger(PdfImage.Keys.Width);
                                    int height = xObject.Elements.GetInteger(PdfImage.Keys.Height);

                                    using (MemoryStream inStream = new MemoryStream(stream))
                                    {
                                        using (MemoryStream outStream = new MemoryStream())
                                        {
                                            using (ImageFactory imageFactory = new ImageFactory())
                                            {
                                                imageFactory.Load(inStream).Format(new JpegFormat { Quality = 50 }).Resize(new System.Drawing.Size(width, height)).Resolution(96, 96).Save(outStream);
                                            }

                                            xObject.Stream.Value = outStream.ToArray();
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }


You need to add the reference of ImageProcessor and add these usings:

Code:
using PdfSharp.Pdf;
using PdfSharp.Pdf.Advanced;
using ImageProcessor;
using ImageProcessor.Imaging.Formats;


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 20, 2020 8:02 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
Hi!
FeiShengWu wrote:
I used code below to compress PDF file

Thanks for the feedback.

I didn't try your code, but I guess it only works for images that only use the DCTFilter. Extra code is required to also support DCT images that are also flate encoded and to skip non-DCT images.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 168 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group