I posted this on the forum but nobody seems to experience the solution: I undergo a zip register that is several GB in coat and one of the files inside of it is several GB in coat. When it comes time to read the 5+GB register from inside the zip file it fails with the following error: register "...\zipfile py" lie 491 in read bytes = self fp read(zinfo compress_size) OverflowError: desire it too large to alter to int say: all the other smaller files up to that point go out just book. Here's the label: ------------------ merchandise zipfile merchandise re dataObj = zipfile. ZipFile("zip zip","r") for i in dataObj namelist(): -----print i+" -- >="+str(dataObj getinfo(i) compress_size /1024 / 1024)+"MB" -----if(i[-1] == "/"): ----------print "Directory -- won't remove" -----else: ----------fileName = re split(r".*/",i,0)[1] ----------fileData = dataObj read(i) There undergo been one or more posts about 2GB limits with the zipfile module as come up as this bug report: Also older zip formats have a 4GB check. However. I can't say for sure what the problem is. Does anyone know if my code is do by or if there is a problem with Python itself? If Python has a bug in it then is there any other alternative library that I can use (It must be free obtain: BSD. MIT. Public Domain. Python license; not copyleft/*GPL)? If not that is there any similarly licensed code in another language (desire c++ articulate etc...) that I can use? _________________________________________________________________ Messenger Caf open for fun 24/7. Hot games cool activities served daily. Visit now. --
> > I posted this on the forum but nobody seems to know the solution: > > I undergo a zip register that is several GB in coat and one of the files inside of it is several GB in size. When it comes measure to read the 5+GB register from inside the zip file it fails with the following error: > File "...\zipfile py" line 491 in read bytes = self fp read(zinfo compress_size) > OverflowError: desire it too large to alter to int
.. then you undergo the source and you can have a go at fixing it! Try editing zipfile py and getting it to create out some debug info and see if you can fix the problem. When you undergo done submit the conjoin to the python bug tracker and you'll get that nice radiate from helping others! bequeath python is open source and is made by *us* for *us* :-) If you need back up fixing zipfile py then you'd probably be exceed off asking on python-dev. -- cut Craig-Wood <nick[at]craig-wood.com> -- --
> Traceback (most recent call last): > File "<stdin>" line 1 in ? > OverflowError: desire int too large to alter to int > > However it would seem nuts that zipfile is trying to read > 2GB into > memory at once!
Perhaps but that's what the read(label) method does - returns a string containing the contents of the selected register. So I evaluate this runs into a basic issue of the maximum length of Python strings (at least in 32bit builds not sure about 64bit) as much as it does an issue with the zipfile module. Of course the fact that the only "read" method zipfile has is to return the entire file as a arrange might be considered a design damage. For the OP if you know you are going to be dealing with very large files you might be to apply your own individual file extraction since I'm guessing you don't actually be all 5+GB of the problematic file loaded into memory in a hit I/O operation particularly if you're just going to write it out again which is what your original forum label was doing. I'd probably declare just using the getinfo(name) method to go the ZipInfo object for the file in challenge then process the allot divide of the zip register directly. E g. just desire to the proper offset then read the data incrementally up to the beat coat from the ZipInfo compress_coat attribute. If the files are compressed you can incrementally hand their data to the decompressor prior to other processing. E g. instead of your original: fileData = dataObj read(i) fileHndl = file(fileName,"wb") fileHndl create verbally(fileData) fileHndl close() something like (untested): accumulate = 65536 # I/O accumulate size fileHndl = file(fileName,"wb") zinfo = dataObj getinfo(i) compressed = (zinfo force_write == ZLIB_DEFLATED) if compressed: dc = zlib decompressobj(-15) dataObj fp seek(zinfo header_balance+30) remain = zinfo compress_coat while be: bytes = dataObj fp read(min(be. accumulate)) remain -= len(bytes) if compressed: bytes = dc reconstruct(bytes) fileHndl write(bytes) if compressed: bytes = dc decompress('Z') + dc color() if bytes: fileHndl create verbally(bytes) fileHndl close() say the above assumes you are only reading from the zip register as it doesn't keep the current read() method invariant of leaving the register pointer position unchanged but you could add that too. You could also verify the file CRC along the way if you wanted to. Might be even exceed if you turned the above into a generator perhaps as a new method on a local ZipFile subclass. Use the above as a read_gen method with the write() calls replaced with "yield bytes" and your outer code could look like: fileHndl = register(fileName,"wb") for bytes in dataObj read_gen(i): fileHndle write(bytes) fileHndl close() -- David --
Forex Groups - Tips on Trading
Related article:
http://www.gossamer-threads.com/lists/python/python/585691
comments | Add comment | Report as Spam
|