PDFsharp & MigraDoc Foundation
http://forum.pdfsharp.de/

Split By Size
http://forum.pdfsharp.de/viewtopic.php?f=2&t=1480
Page 1 of 1

Author:  wilson [ Thu Dec 16, 2010 12:46 pm ]
Post subject:  Split By Size

Hi,

We are merging 20s of PDF files into a bigger file before we email it out.

However, we might need to split the file if it is too big to be received by the email recipients.

Says they could only receipt file attachment, pdf in our case, of 4Mb, how could we split the massive file by every 4M of size?

Thanks for the help..
Wilson

Author:  Thomas Hoevel [ Thu Dec 16, 2010 1:26 pm ]
Post subject:  Re: Split By Size

Hi!

A better approach (IMHO): stop merging before file size exceeds 4 MB.

I think a joined PDF file won't be bigger than the individual PDF files it consists of.

Author:  jackylui [ Fri Dec 17, 2010 7:04 am ]
Post subject:  Re: Split By Size

I read the post and try it as below coding:
Dim outputDocument As PdfDocument = New PdfDocument()
Dim inputDocument As PdfDocument = PdfReader.Open(dtDocu.Rows(j).Item("FilePath").ToString.Trim & dtDocu.Rows(j).Item("FilesName").ToString.Trim, PdfDocumentOpenMode.Import)
' Iterate pages
Dim count As Long = inputDocument.PageCount
Dim k As Integer
For k = 0 To count - 1
' Get the page from the external document...
Dim page As PdfPage = inputDocument.Pages(k)
' ...and add it to the output document.
outputDocument.AddPage(page)
Next
I tried to get the outputdocument.filesize (the outputdocument has several page), but it is zero. How do I get outputDocument expected file size?
Attachment:
screen1.png
screen1.png [ 30.07 KiB | Viewed 10739 times ]


Attachments:
screen.png
screen.png [ 71.91 KiB | Viewed 10739 times ]

Author:  Remis [ Sun Dec 19, 2010 9:56 pm ]
Post subject:  Re: Split By Size

2 wilson

I think you should stop spamming :) (IMHO your PDFs seem too big to email)

Author:  () => true [ Fri Dec 24, 2010 9:11 am ]
Post subject:  Re: Split By Size

jackylui wrote:
How do I get outputDocument expected file size?

I don't know.
I would sum up the sizes of the input files and stop before this sum exceeds the allowed value.

The resulting combined file (using Release build) shouldn't be bigger than the sum of the combined files.

Author:  spottedmahn [ Tue Jan 11, 2011 8:02 pm ]
Post subject:  Re: Split By Size

jackylui wrote:
I tried to get the outputdocument.filesize (the outputdocument has several page), but it is zero. How do I get outputDocument expected file size?


It appears as though the FileSize property is only set when calling PdfReader.Open(). And it gets the value from the stream.Length property.

One hack I've come up with is to update the FileSize property in the Save method in PdfDocument.

I sure wish the FileSize property was consistently updated.

We have a similar problem we're trying to solve: take one large PDF and split it into manageable files that can be emailed. For example we want to take a 50MB pdf and split it into 10 5MB files.

Another hack I've thought about is coming up with a ratio of the average page size and dividing it up that way but this is flawed. If some of the pages are really large the file sizes could vary greatly.

Is anybody versed enough in the internals of PDFSharp to provide some ideas how to modify the code base so that the FileSize property is updated upon add/removing pages/objects?


Thanks,
Mike DePouw

Author:  spottedmahn [ Tue Jan 11, 2011 8:37 pm ]
Post subject:  Re: Split By Size

Another hack I've come up is to call PDFDocument.Save and pass it a MemoryStream. Then check the Length of the MemoryStream to determine the current size. This is very inefficient but it works.

//pseudo code
foreach page in pdfToSplit.Pages
add page to a PDFDocument
write pdfDoc to a memoryStream
check the length of the memoryStream
if larger than desired
write pdfDoc to disk
create a new pdfDoc obj

Author:  Thomas Hoevel [ Wed Jan 12, 2011 9:35 am ]
Post subject:  Re: Split By Size

spottedmahn wrote:
For example we want to take a 50MB pdf and split it into 10 5MB files.

Images, fonts, and other resources can be used on several pages.
When splitting files, you might get several copies of those resources in the split files.

So 50 MB will give 11 or 12 files with 5 MB.

Author:  spottedmahn [ Wed Jan 12, 2011 5:45 pm ]
Post subject:  Re: Split By Size

Hi Thomas,

Thanks for the reply. That's interesting to know, thanks for the info.

Do you have any input on how to determine the file size of a PdfDocument object? As it stands write now I'm using the Save method and passing in a MemoryStream and then checking the length of the memory stream object. Is there a better approach than that?

Thanks,
Mike D.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/