PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Mon May 20, 2024 5:07 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 2 posts ] 
Author Message
PostPosted: Wed Oct 24, 2012 8:18 pm 
Offline

Joined: Wed Oct 24, 2012 7:00 pm
Posts: 2
Location: Porto Alegre, Brazil
I am using this method to get total pages of a PDF file:

Code:
public static Int32 CountPages(String filename)
{
    using(PdfSharp.Pdf.PdfDocument inputDocument = PdfSharp.Pdf.IO.PdfReader.Open(filename, PdfSharp.Pdf.IO.PdfDocumentOpenMode.InformationOnly))
    {
        return inputDocument.PageCount;
    }
}


The "InformationOnly" parameter works fine. This is the first free library I have tested that can count pages of large PDF files (>300MB).

But when I run my application, the memory increases at first line of the method and don't down after the using, and in few seconds my application throws a OutOfMemoryException (on another part of my app).

So I looked for dispose method on PdfDocument and I get it:

Code:
public void Dispose()
{
  Dispose(true);
  //GC.SuppressFinalize(this);
}
void Dispose(bool disposing)
{
  if (this.state != DocumentState.Disposed)
  {
    if (disposing)
    {
      // Dispose managed resources.
    }
    //PdfDocument.Gob.DetatchDocument(Handle);
  }
  this.state = DocumentState.Disposed;
}
(PdfDocument.cs lines 151..168)

Its appears that is not disposing anything. So I've debugged the Open() method on PdfReader class and see memory increasing at this loop:

Code:
// Read all indirect objects
for (int idx = 0; idx < count; idx++)
{
    PdfReference iref = irefs[idx];
    if (iref.Value == null)
    {
        try
        {
            Debug.Assert(document.irefTable.Contains(iref.ObjectID));
            PdfObject pdfObject = parser.ReadObject(null, iref.ObjectID, false);
            Debug.Assert(pdfObject.Reference == iref);
            pdfObject.Reference = iref;
            Debug.Assert(pdfObject.Reference.Value != null, "something got wrong");
        }
        catch (Exception ex)
        {
            Debug.WriteLine(ex.Message);
        }
    }
    else
    {
        Debug.Assert(document.irefTable.Contains(iref.ObjectID));
        iref.GetType();
    }
    // Set maximum object number
    document.irefTable.maxObjectNumber = Math.Max(document.irefTable.maxObjectNumber, iref.ObjectNumber);
}
(PdfReader.cs lines 346..372)

It is not clear to me which object is retaining data in memory.

Do anyone knows how to correctly dispose PdfDocument?


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 08, 2012 1:47 pm 
Offline

Joined: Wed Oct 24, 2012 7:00 pm
Posts: 2
Location: Porto Alegre, Brazil
After some days trying to figure out this problem I found a workaround for our application.

First, I need to change the PDFDocument class to set all private members to null on Dispose and recompile the library:

Code:
void Dispose(bool disposing)
{
    if (this.state != DocumentState.Disposed)
    {
        if (disposing)
        {
            // Dispose managed resources.
            this.info = null;
            this.pages = null;
            this.fontTable = null;
            this.catalog = null;
            this.trailer = null;
            this.iref = null;
            this.irefTable = null;
        }
        //PdfDocument.Gob.DetatchDocument(Handle);
    }
    this.state = DocumentState.Disposed;
}


And according to Thomas Hoevel comment on this post, I need to call GC after the reading operation:

Code:
public static Int32 CountPages(String filename)
{
    try
    {
        using(PdfSharp.Pdf.PdfDocument inputDocument = PdfSharp.Pdf.IO.PdfReader.Open(filename, PdfSharp.Pdf.IO.PdfDocumentOpenMode.InformationOnly))
        {
            return inputDocument.PageCount;
        }
    }
    finally
    {
        GC.Collect();
        GC.WaitForPendingFinalizers();
    }
}


It is a workaround and not a definitive fix, I have noted, in some cases, that the memory are not freely after calling the GC collector, but my application can "survive" running without a memory exceptions while reading a sequence of large PDf files (tested a sequence of 10 files with 300MB each). If I don't change the dispose method or don't call the GC, then the memory exceptions are raised when we read the third or fourth file.

Thanks for maintaining this fantastic library freely.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 2 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 25 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group