How to estimate the size of your scanning project

One of the challenges of figuring out how many pages you need scanned. Here are some loose guide lines on estimating how many pages you have. 

Loose pages 150-180 Pgs/in

Banker Box Letter size15 inch long 2400-3000 pages.

Banker Box Legal 10x12x15 2000-2250 pages

Banker Box Letter/Legal 24 inches long 4500-5000

File cabinet drawer 24 inch long up to 4000 pages

How many disk will it be on

One of the common questions is how many disc will it take.

8000-9000 pages per CD

56,000 pages per DVD

Posted in Uncategorized | 1 Comment

How Docu Arc LLC protects your files and data.

We take protecting your files and data seriously. We do this by limiting the number of people that have access to the documents from the time we pick them up to the time they are returned or destroyed.

The scanned images are only handled by the employees responsible for scanning and indexing your files. Once the scanning is done the data is delivered to you in the agreed upon manner. This can be by hand delivered CD/DVDs or other media. In high security situation, files can be transported by encrypted drive with part of the encryption key being sent in advance to insure that the data can not be recovered until it reaches it’s destination and then transferred to other media for your use.

After delivery. All scrap media is securely erased or destroyed. The back up copy of the data is then securely erased, from the encrypted hard drive, after the agreed hold time. If we are maintaining back ups for an extended time they are placed on removable encrypted hard drives and stored offline in a fire safe. Encryption keys are kept in two separate locations.

Posted in Document, Scanning, Security, Uncategorized | Tagged , , | 1 Comment

How to get a directory listing into a spread sheet.

Why would you want all the files in a directory in a spread sheet? There are a number of reasons here’s a few.

  1. You need to import files into a management program
  2. You need to extract data from the file names.
  3. To create a spread sheet that contains the original file name and the file name you want to rename the file to (I’ll cover this in another posting).


To get started we will need a list of all the files in the directory you want to work on. This is very easy to do at the command line or it can been done with a batch file in the window. I’ll cover on how to do this from the directory window.

Step 1) Navigate to the directory you want to make the file list from.

Step 2) Right click on an white space inside that window and scroll down to New and then left click on Text Document

Step 3) name you new text document. (example File_List.txt)

Step 4) Click on the new file. (it should open in Note Pad)

Step 5) Type in the following line. Dir /b >List.csv

Step 6) Click on the File tab and Save.

Step 7) Click on the File tab and click on Save As and save the file as File_List.bat

Step 8) Close note pad.

Step 9) Double click the new file File_List.bat

Step 10) Find the new file created called List.csv.

Step 11) Left click on the file, scroll to Open With and click o your spread sheet program.

This will cause it to open an import window. You will see a lot of options. But to get the file names in we just need to make sure of a couple things.

Step 12) Set up import. Select delimter to Tab and then click on the colum that says standard then click on the drop down box above and change to text.

Step 13) Click Import.

Step 14) Save in your spread sheets file format.


Now you have a spread sheet of all your files names in that directory. You can now use the tools of the spread sheet to extract data from the file name to create other fields or to fill in what the new file names are going to be if your doing bulk renaming. To keep things simple do not use _ instead of spaces. It makes creating the batch file much easier.



Posted in Document, How To, Scanning | Tagged , | Leave a comment

Securing our WordPress blog. Just an extension on our approach to data security.

Securing online line assets such as a blog can be a tricking thing. One has to balance access of readers, commenters, contributors and security. Our blog is no different. I have to screen all the comments and make sure our readers are protected from malicious links in our comment sections. From day one I saw hackers trying to log into our blog as well as post malicious links in the comments. I am going to go over the steps I used to secure our WordPress blog.


  1. Use passwords over 14 characters.
  2. Review server logs 3 times a day.
  3. Use a WordPress plug-in to screen all comments and then review all comments.
  4. Use a log in security plug in to force all pass words to be over 14 characters
  5. Use a plug in that limits log-in attempts to 4 with increasing lock outs and a 36 hour ban after 16 failed attempts.
  6. Install a Captcha plug in to prevent automated log-in attempts.
  7. If you have access to the web servers files such as .htaccess you can block IP addresses or ranges or IP addresses.


This may seem like a lot but in reality it is rather simple to do with WordPress. To generate the passwords that I use as administrator are I use . And to test my password I used . In fact my password is estimated to take 1.07 hundred million trillion trillion years to crack and with all the additional security features it is just even harder.


UPDATE: Our blog was under attack 24 hours a day for 2 weeks and we had no security breech. Not being content to rely on what has worked so far, I am working with my web host to further restrict access to log-in pages.



Posted in Security | Tagged , , | Leave a comment

Notes from the Junk email

Email Spoofing.

Beware of email that ask you to copy a IP address and paste it into your web browser.

There have been many scams that use an email such as you have a post card from blank. The
purpose of these types of email’s is to get you to copy and paste an IP address into your browser
so that they can then attack your computer with key loggers, viruses or other malicious software.

Below appears to be the latest version of the email. Note that they misspelled temporary in
temporary password. I have changed numbers on this email to obscure any ids that might
belong to an innocent. Notice that again they use an IP address for you to type in or click on. I
can not overstate this enough. DO NOT DO THAT. If you think that you have an email like this
that is legit go to the website you signed up on and log into from there.

New Member,

We are glad you joined Web Cooking.

Membership Number: 84887294xxxx37
Your Login ID: userxxxx
Temorary Password: ihxxx

Your temporary Login Info will expire in 24 hours. Please login and
change it.

This link will allow you to securely change your login info:


Thank You,
Welcome Department
Web Cooking

Posted in Email, Security, SPAM | 1 Comment

Authors hit hurdles trying to get their previous work converted to e-book formats.

Book publishers tend to impose methodologies and practices that are over a 100 years old even though modern technology has made these practices obsolete. They did not retain the electronic files even into the desktop publishing era. Even with lower price storage space few people saw any benefit in retaining the files after printing. Even today some publishers do not retain digital copies after printing.

E-books do not fit the 100+ year old model, so many publishers are actively hostile to e-books. Few publisher will admit it though, but they will actively try to discourage authors from converting previously published works to e-book formats. So when a author approaches their publisher to get their previous works, especially back list titles, released as an e-book It is left to the author to produce the electronic files.

If there is no useable electronic file for the book the author is left with two possible solutions. One is to pay a service to retype the book to recreate the book in a word processor, this can be rather expensive. Another option is to have the book scanned and OCR’ed (optical character recognition). Scanning and OCR are services that & can provide.

This is what Docu Arc / will do. Remove the book from it’s binding, scan and apply OCR to the images. Correct the OCR results that the program flags. Correct zones miss identified by the software so that images missed by the software are captured for the Ms-word file. Once this extended OCR process is complete, output a Search able image pdf and a MS-Word file (.DOC) Although the .DOC file is relatively close to the original book it is highly recommend that the author or editor review the file for various format and style issues.

Posted in OCR (Optical Character Recognition), Scanning | Tagged , , , | 1 Comment

What is OCR? OCR stands for optical character recognition.

OCR makes it possible to make search-able PDFs of your scanned documents. It also allows you to get your documents converted into text documents. I would like to note that OCR is not 100% accurate, there will be formatting and conversion errors apparent in conversions to text. Search-able PDFs, these errors are not apparent since you are looking at the image and not the search-able text beneath the image.

Posted in Document, Scanning | Tagged , | Leave a comment

If you can’t get to your office can you get to your files?

Not being able to get to the office to get to an important file can be rather frustrating.  So don’t give your competition the upper hand plan ahead.

Setting up a Virtual Private Network (VPN) between your office and your laptop would allow you to access your network resources at your office.  And of course you would have access to your documents stored on your office’s server.  And since it is VPN all data is transferred encrypted.

No server at the office then you could use cloud storage.  Your documents would be hosted on a secure server in a data center. this would allow you and co-workers to access the documents anywhere there is an internet connection.  And of course you would need to select a plan that has a secure connection between your cloud storage and your remote computer.

Another option for viewing business documents remotely is using a service like  This service works with any internet connected computer, Mac and PC.  They also have applications for iOS and Android.  This service actually allows you to access programs and documents as if you where on your computer at work.   And everything is encrypted   between your work computer and your remote computing device.

So plan for the day that you can’t get to the office.  Have a service ready to go and your critical document imaged before getting to the office is not possible or just to dangerous.


Posted in Document, Scanning | Tagged , , | Leave a comment

Reasons for Scanning

Paper has been king for business records for a long time. In that time we have battled with lost pages, damaged paper, paper dust, insects and miss filled folders. Also paper records take up a lot of room to store and twice as much room if you have to have backups of your documents. Document scanning offers many advantages over paper.

An image of a document does not degrade with time. With proper back up can last indefinitely. If a scanned documents file is deleted or damaged, all that you have to do is copy the image from the backup copy. Once a file is scanned there is no worry about a page being misplaced since the file is on the computer and can not fall out of the file or get lost. Also multiple people can view a file at one time.
File cabinets are bulky and take up a lot of room. Scanned files on the other hand easily fit on one or more hard drives. And with data storage prices going down while drive sizes are going up it is much cheaper to have multiple back ups of your documents verses a hard back up of 40 filing cabinets. 40 filing cabinets can easily fit on a 10gig flash drive, imagine how many documents can be stored on a 1Tera byte hard drive.
Locating a File once being scanned is just as easy as locating it in a filing cabinet except you do not have to leave you desk and if you can’t find a file you can do key word searches to locate a file that you can not find. With search-able PDFs you have even more search options.
And once you have found the file you can even share 1 page or the whole file with another person by emailing one page or the whole document. And best of all you still have the document stored at your place intact.
Either through VPN or the many cloud storage options available you can have your documents available to you any where and any time.

Posted in Document, Large Format, Scanning | Tagged , , | Leave a comment

Document Scanning

Document scanning is the conversion of printed paper to images stored on computer media as PDF, Tiff or Jpeg. There are several aspects to document scanning. Document preparation, Scanning and Indexing.

In order for documents to be scanned they have to be released from their bindings, such as staples, paper clips, binders and etc. Also while the documents are being prepared they are examined for folded pages and low contrast originals.

During the actual scanning the images will be monitored and pages will be rescanned as needed. Software is used to deskew and rotate images.

While documents are being scanned the indexing process begins. The purpose of indexing is to allow the user to find the scanned document on the disk or computer. For simple projects, the image file is named the same as the physical folder. This allows the image file of a folder to be located much the same way as it would be located in a filing cabinet. More complicated projects a spread sheet of containing the file name and data extracted from the documents is created. This file can then be used to load the documents into a document management system at a later date.

Posted in Document, Scanning, Uncategorized | Tagged | 2 Comments