Technical specifications for digitisation
This section provides guidance on determining the technical specifications for digitisation projects. It highlights the importance of selecting the right parameters, such as image resolution and file formats, to ensure quality and consistency.
Software and hardware for digitisation often allow for variable parameters, including:
- image resolution
- output file formats
- colour resolution or bit depth
- compression
- colour management.
These parameters significantly impact the quality and size of digital images. Establishing technical specifications before beginning a back-capture digitisation project is critical to ensure:
- consistency
- legibility
- quality.
Well-defined specifications guarantee that images are suitable for their intended purpose and inform other decisions, such as equipment selection.
Incorrect specifications can result in illegible or insufficient images that fail to capture essential details. This diminishes the value of your digitisation project and can negatively impact business processes dependent on these images.
Setting technical specifications
The primary objective is to create legible, high-quality digital images suitable for their intended purpose and usable for the required time frame. For images with long-term or archival value, ensure they withstand time and multiple migrations.
General rule
Use the highest technical specifications that your organisation can realistically support, especially for records that:
- are required as State archives
- may replace originals as State archives (for records created or received after 1 January 1980)
- serve as evidence of business activity if originals are destroyed.
High-quality master images maximise the return on digitisation efforts and support long-term accessibility.
Determining the right specifications
The appropriate specifications depend on project needs:
- For short-term use, lower-quality images may suffice.
- For archival records, create a high-quality 'master' image and lower-quality derivatives for access.
Consult guidelines like the National Library of Australia's Digital capture and image creation for advice on specifications for masters and web delivery.
Factors to consider
Analyse these factors to determine suitable specifications:
- criticality of records (for example, use in court)
- essential characteristics to reproduce
- whether original paper records will be destroyed
- the record version to serve as the official business record
- retention and accessibility periods
- current and future uses of images.
For instance:
- A project digitising high-risk records for long-term use retained rigorous specifications to ensure usability.
- A project digitising short-term records for web delivery used less rigorous specifications due to stakeholder needs for quick access over low bandwidth connections.
Consultation for State archives
If considering reduced specifications for State archives, consult Museums of History NSW.
Key considerations
When defining technical specifications:
- Seek accessible, product-independent technical expertise.
- Ensure ongoing maintenance and migration of digital images.
- Avoid expedient decisions that increase organisational risk.
File size
File size is a valid consideration but should not override the need for:
- legible images
- fitness for purpose
- retention of essential characteristics.
If resources cannot meet these requirements, reconsider the digitisation program's viability.
Get more guidance on technical specification elements, on Getty Research Institute's Introduction to Imaging.
Specifications
What are file formats?
File formats encode information into a form which is able to be processed and used by specific combinations of hardware and software.
The following table describes some categories of file formats:
File format category | Description |
---|---|
Rastor | Also known as bit-mapped formats. Images take the form of a grid or matrix with each picture element (pixel) having a unique location and independent colour value. Examples are TIFF, JPG/JPEG, GIF and PNG. |
Vector | Also known as object oriented formats. Based on a set of mathematical instructions typically used by drawing programs to construct an image. Not of relevance to digitisation which will use raster formats. |
Encoding | Also known as metafiles which may contain either vector or raster images. Such formats enable the contents to be consistently displayed and used across different computer programs and operating systems. Typically, they support internal metadata and multi-page images and enable security management. Examples include Adobe PDF and TIFF. |
How do you determine the right file format for your digitisation project?
The following table describes some factors to consider when determining the file format required:
Factors to consider | For example |
---|---|
Creation of the best quality possible | Organisations should create the best quality image possible given their available resources and the purpose of the images. |
The way digital images will be delivered | If good quality masters are too large for some modes of delivery, your organisation can consider creating derivatives in non-archival file formats, for example, JPEG (while keeping the masters at better quality). |
The format’s support by hardware and software platforms | Some hardware and software only support certain file formats. This is changing: the trend for interoperability and compatibility has led to a situation where many file formats are supported by a range of hardware and software platforms which is preferable. |
Whether the file format is proprietary | If digital images are held in proprietary formats, they are at risk of becoming obsolete if the vendors go out of business, or of becoming unreadable if the relationship with the vendor changes. This could particularly be a problem if the records are required long term or as State archives. Where possible, choose open-source formats that are widely used, with published technical specifications available in the public domain. |
The format’s ability to be read by a plug-in | If the specific production software is not available to all users, plug-ins may be used for viewing digital images. |
The format’s use of embedded objects or links | Formats should not contain embedded objects or link out to external objects beyond the specific version of the format. |
The format's ability to capture automated metadata or to support colour requirements may be other factors for consideration.
For example:
TIFF allows a range of automated metadata capture where PDF can be more limited.
36–48-bit RBG colour will require a format like TIFF or PNG to support it.
What is resolution and how is it quantified?
Resolution measures the ability to capture detail in the original work. It is commonly quantified in pixels per inch (ppi), which indicates the resolution of digital displays. A higher ppi means better resolution and a clearer image.
Note: Dots per inch (dpi) is often used interchangeably with ppi but refers specifically to printer resolution.
How to determine the right resolution for your digitisation project
Choosing the correct resolution is essential before starting digitisation, as resolution cannot be increased after the process. If a higher resolution is needed, the record must be re-digitised.
The table below outlines factors to consider when selecting resolution:
Factor | Details |
---|---|
The nature of the records | Photographs and detailed images require much higher resolution than text-based documents. |
Usage of the digital images | For enlargements or images requiring fine detail for viewing and printing, use higher resolution. For reduced images, use lower resolution. |
File size | Higher ppi creates more detailed images but also larger file sizes. Test and analyse these trade-offs to ensure the selected resolution suits your needs. File size alone should not determine resolution. |
Note: consider the optical resolution of your capture device when selecting a capture resolution. Exceeding the device's optical resolution can degrade image quality.
Get more examples of how resolution affects image quality on Getty Research Institute.
What is bit depth?
Bit depth refers to the number of bits used to describe the colour of each pixel in an image. It can range from 1 bit to 48 bits, with greater bit depth allowing for a wider range of colours or shades of grey.
Here’s a breakdown of common bit depths:
- 1 bit (black and white or line art): Only black and white pixels.
- Greyscale: Black and white, plus a range of intermediate greys (8 bits per pixel).
- 8 bit colour: A palette of 256 colours.
- 24 bit colour: 8 bits each for the red, green, and blue components, providing a much larger palette of colours.
- 36-48 bit RGB colour: An extended colour space, resulting in a larger file size and requiring specific formats like TIFF or PNG.
How do you determine the right bit depth?
When selecting the bit depth for a digitisation project, consider these factors:
Factor | Example |
---|---|
The nature of the records to be digitised | 1 bit depth is suitable for black and white text documents, while higher bit depths are needed for documents with greyscale or colour. |
File size | File size is influenced by both the resolution (number of pixels) and the bit depth (colour depth of each pixel). A higher bit depth increases the file size. |
For example, uncompressed file sizes for an A4 page at different bit depths and resolutions:
Colour depth | Resolution (PPI) | Total bits | Uncompressed file size (MB) |
---|---|---|---|
1 bit bi-tonal | 300 | 8,700,867 | 1.04 |
1 bit bi-tonal | 600 | 34,803,468 | 4.15 |
8 bit grey or colour | 300 | 69,606,936 | 8.30 |
8 bit grey or colour | 600 | 278,427,744 | 34.00 |
24 bit colour | 300 | 208,820,808 | 24.89 |
24 bit colour | 600 | 835,283,232 | 101.96 |
Note: Capturing a record with a lower bit depth than recommended may result in a noticeably different image. Using a higher bit depth than necessary, such as 24-bit colour for a black and white text document, increases the file size without improving image quality.
What is compression?
Compression reduces the size of a digital image to facilitate storage or transmission. It can be categorised into two types:
- Lossy: Removes data during compression, meaning some information is lost (irreversible).
- Lossless: No data is lost, and the original image can be exactly recreated when decompressed.
How do you determine the right compression for your project?
Compression choice depends on how the digital images will be used:
Scenario | Recommended compression |
---|---|
Records are to be State archives or long-term | Lossless compression or no compression to maintain image accuracy. |
Records where original paper records will be retained | Choose an appropriate compression method based on the record's nature and intended use, ensuring any loss is minimal and doesn't affect the image significantly. |
Example: JPEG (lossy) compression is useful for reducing file sizes of photographic images, but not as effective for text or simple graphics.
What is colour management?
Colour management ensures consistent colour representation across different devices (e.g., monitors and printers), using standards like the ICC colour management system. It helps maintain the accuracy of colours as they appear on various output devices.
Halftones
Halftones are used in printing to create varying shades of grey by adjusting the size of spots printed with one colour. When digitising such images, dithering can simulate halftones. However, low-resolution scans may not capture the details effectively, and halftones can create unwanted patterns like Moire.
Watermarks
Watermarks, annotations, or highlights can be problematic if captured at 1-bit depth, as they might obscure text. Higher bit depths or using a black background behind the document can improve the visibility of such features.
Resources
Technical specifications
Yes | No |
---|---|
Are there documented technical specifications for the digitisation project? | |
Have the recommended technical specifications (in Appendix 1) been adopted? | |
If not, has the organisation conducted an analysis to determine if the technical specifications: | |
- are fit for purpose | |
- enable the capture of the essential characteristics of the original paper records | |
- enable the retention of the digital images for as long as required? | |
If records are required as State archives has the organisation contacted Museums of History NSW about the proposed digitisation? |
Recommended technical specifications for digitisation in this Appendix were designed by Archives New Zealand. The highest technical specifications possible and supportable should be selected.
If your organisation chooses to vary these technical specifications, they should conduct an assessment of all factors and document this along with the reasons for choosing alternative specifications. The primary considerations should always be to ensure:
- the legibility of the digital image
- the reproduction of the original records' essential characteristics
- that the image is fit for purpose.
Document type | Resolution* | Bit Depth | File Format | Compression |
---|---|---|---|---|
Text only, black and white | Minimum 300ppi | 1 bit (bi-tonal) | TIFF, PDF/A† containing TIFF or JPEG 2000‡ | Lossless compression |
Documents with watermarks, grey shading, grey graphics | Minimum 600ppi | 8 bit greyscale | TIFF, JPEG2000, PDF/A containing TIFF or JPEG 2000 | Lossless compression |
Documents with discrete colour used in text or diagrams | Minimum 600ppi | Minimum 8 bit colour | TIFF, JPEG2000, PDF/A containing TIFF or JPEG 2000 | Lossless compression |
Black and white photographs | Sufficient to provide >3000 pixels across long dimensions | 8 bit greyscale | TIFF, JPEG2000, PDF/A containing TIFF or JPEG 2000 | Lossless compression |
Colour photographs | Sufficient to provide >3000 pixels across long dimensions | 24 bit colour | TIFF, JPEG2000, PDF/A containing TIFF or JPEG 2000 | Lossless compression |
Black and white negatives | Sufficient to provide >3000 pixels across long dimensions | 8 bit greyscale or 24 bit colour | TIFF, JPEG2000, PDF/A containing TIFF or JPEG 2000 | Lossless compression |
Colour negatives and transparencies | Sufficient to provide >3000 pixels across long dimensions | 24 bit colour | TIFF, JPEG2000, PDF/A containing TIFF or JPEG 2000 | Lossless compression |
*The scale/ratio for resolution here is 1:1.
† PDF/A is a constrained version of PDF version 1.4 with various proprietary fonts and formats removed, issued as ISO 19005-1:2004.
‡ JPEG 2000 is defined in ISO 15444-1:2000.
Note regarding resolution for photographs, negatives and transparencies
For photographs, negatives, and transparencies, the required resolution will vary according to the size of the photograph or negative. In these cases, measure the longest side of the photograph in inches then calculate the required resolution by dividing 3000 by the length of that long side.
For example:
If you have a photograph that is 5 inches by 8 inches, then 8 inches is the longest side. 3000 divided by 8 = minimum 375ppi.
As rough rules:
- For photographs with a longest side measuring 15 inches or greater for the longest side, use at least 200ppi.
- Between 10 and 15 inches, use at least 300ppi.
- Between 5 and 10 inches, use at least 600ppi.
The National Library of Australia’s Image capture guidelines may also help you to determine a suitable ppi.