PDFsharp & MigraDoc Foundation • View topic

View unanswered posts | View active topics

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Forum rules

Please read this before posting on this forum: Forum Rules

PdfDocument memory leaking

Moderator: Stefan Lange

Page 1 of 1

[ 2 posts ]

Print view

Previous topic | Next topic

Author

Message

rkawano

Post subject: PdfDocument memory leaking

Posted: Wed Oct 24, 2012 8:18 pm

Joined: Wed Oct 24, 2012 7:00 pm
Posts: 2
Location: Porto Alegre, Brazil

I am using this method to get total pages of a PDF file:

Code:

public static Int32 CountPages(String filename)
{
    using(PdfSharp.Pdf.PdfDocument inputDocument = PdfSharp.Pdf.IO.PdfReader.Open(filename, PdfSharp.Pdf.IO.PdfDocumentOpenMode.InformationOnly))
    {
        return inputDocument.PageCount;
    }
}

The "InformationOnly" parameter works fine. This is the first free library I have tested that can count pages of large PDF files (>300MB).

But when I run my application, the memory increases at first line of the method and don't down after the using, and in few seconds my application throws a OutOfMemoryException (on another part of my app).

So I looked for dispose method on PdfDocument and I get it:

Code:

public void Dispose()
{
  Dispose(true);
  //GC.SuppressFinalize(this);
}
void Dispose(bool disposing)
{
  if (this.state != DocumentState.Disposed)
  {
    if (disposing)
    {
      // Dispose managed resources.
    }
    //PdfDocument.Gob.DetatchDocument(Handle);
  }
  this.state = DocumentState.Disposed;
}

(PdfDocument.cs lines 151..168)

Its appears that is not disposing anything. So I've debugged the Open() method on PdfReader class and see memory increasing at this loop:

Code:

// Read all indirect objects
for (int idx = 0; idx < count; idx++)
{
    PdfReference iref = irefs[idx];
    if (iref.Value == null)
    {
        try
        {
            Debug.Assert(document.irefTable.Contains(iref.ObjectID));
            PdfObject pdfObject = parser.ReadObject(null, iref.ObjectID, false);
            Debug.Assert(pdfObject.Reference == iref);
            pdfObject.Reference = iref;
            Debug.Assert(pdfObject.Reference.Value != null, "something got wrong");
        }
        catch (Exception ex)
        {
            Debug.WriteLine(ex.Message);
        }
    }
    else
    {
        Debug.Assert(document.irefTable.Contains(iref.ObjectID));
        iref.GetType();
    }
    // Set maximum object number
    document.irefTable.maxObjectNumber = Math.Max(document.irefTable.maxObjectNumber, iref.ObjectNumber);
}

(PdfReader.cs lines 346..372)

It is not clear to me which object is retaining data in memory.

Do anyone knows how to correctly dispose PdfDocument?

Top

rkawano

Post subject: Re: PdfDocument memory leaking

Posted: Thu Nov 08, 2012 1:47 pm

Joined: Wed Oct 24, 2012 7:00 pm
Posts: 2
Location: Porto Alegre, Brazil

After some days trying to figure out this problem I found a workaround for our application.

First, I need to change the PDFDocument class to set all private members to null on Dispose and recompile the library:

Code:

void Dispose(bool disposing)
{
    if (this.state != DocumentState.Disposed)
    {
        if (disposing)
        {
            // Dispose managed resources.
            this.info = null;
            this.pages = null;
            this.fontTable = null;
            this.catalog = null;
            this.trailer = null;
            this.iref = null;
            this.irefTable = null;
        }
        //PdfDocument.Gob.DetatchDocument(Handle);
    }
    this.state = DocumentState.Disposed;
}

And according to Thomas Hoevel comment on this post, I need to call GC after the reading operation:

Code:

public static Int32 CountPages(String filename)
{
    try
    {
        using(PdfSharp.Pdf.PdfDocument inputDocument = PdfSharp.Pdf.IO.PdfReader.Open(filename, PdfSharp.Pdf.IO.PdfDocumentOpenMode.InformationOnly))
        {
            return inputDocument.PageCount;
        }
    }
    finally
    {
        GC.Collect();
        GC.WaitForPendingFinalizers();
    }
}

It is a workaround and not a definitive fix, I have noted, in some cases, that the memory are not freely after calling the GC collector, but my application can "survive" running without a memory exceptions while reading a sequence of large PDf files (tested a sequence of 10 files with 300MB each). If I don't change the dispose method or don't call the GC, then the memory exceptions are raised when we read the third or fourth file.

Thanks for maintaining this fantastic library freely.

Top

Page 1 of 1

[ 2 posts ]

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Who is online

Users browsing this forum: No registered users and 25 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum