[Top] [Contents] [Index] [ ? ]

SimpleMagic

Version 1.17 – December 2021

This package provides some simple magic value features that simulate the Unix file(1) command to determine the type of a file or of bytes from the content. It has an internal set of magic number information or it can process the magic files from local ~Unix system configuration.

To get started quickly using SimpleMagic, see section Start Using Quickly. There is also a PDF version of this documentation.

Gray Watson http://256stuff.com/gray/


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1. Start Using Quickly

To use SimpleMagic you need to do the following steps. For more information, see section Using SimpleMagic.

  1. Download SimpleMagic from the SimpleMagic release page. See section Downloading Jar.
  2. Optionally load in the magic entries from local file(s). By default, if you construct a ContentInfoUtil instance with the default constructor, it will load the internal magic entries file. See section How To Load Magic Entries.
  3. Use the ContentInfoUtil class to get content-types for files or byte[]:
     
    ContentInfoUtil util = new ContentInfoUtil();
    ContentInfo info = util.findMatch("/tmp/upload.tmp");
    // or   ContentInfo info = util.findMatch(inputStream);
    // or   ContentInfo info = util.findMatch(contentByteArray);
    // display content type information
    if (info == null) {
       System.out.println("Unknown content-type");
    } else {
       // other information in ContentInfo type
       System.out.println("Content-type is: " + info.getName());
    }
    

If the findMatch(...) method does not recognize the content then it will return null. If it does match one of the entries then it will return a ContentInfo class which provides:

Here are some examples of ContentInfo output:

For somewhat more extensive instructions, see section Using SimpleMagic.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2. Using SimpleMagic


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1 Downloading Jar

To get started with SimpleMagic, you will need to download the jar file. The SimpleMagic release page is the default repository but the jars are also available from the central maven repository.

The code works with Java 6 or later.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.2 How To Load Magic Entries

The library uses various magic byte information to be able to find and determine details about random blocks of bytes. By default, SimpleMagic has a built in version of a magic file that was copied from a CentOS Linux system. It contains, ~2400 magic file entries describing a number of different file types. It also has an additional ~6600 lines which provide more details about the detected content types.

The magic entries are relatively complex but in general look something like the following. The configuration line says to look at the start of the file for the string "GIF8". If it is there then the file is "GIF image data".

 
0       string          GIF8            GIF image data

If you do not want to use the internal magic definitions, you can also construct the ContentInfoUtil class with a file or directory to have it parse and use another definition file.

 
ContentInfoUtil util = new ContentInfoUtil("/etc/magic");

WARNING: although we’ve tried to support different types of magic entries, there are local per-OS variations that may not be supported.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.3 How To Find The Content Info

Once you have loaded the magic entry information into your ContentInfoUtil, you can use the utility class to find the content info of files, byte arrays, or InputStreams. The base method gets content info information from a byte[].

 
byte[] uploadedBytes = ...;
ContentInfo info = util.findMatch(uploadedBytes);

You can also get the content type of a file which is read with a FileInputStream:

 
ContentInfo info = util.findMatch("/tmp/uploadedFile.tmp");
// File uploadedFile = ...
// ContentInfo info = util.findMatch(uploadedFile);

If you have an InputStream, you can also use it directly:

 
InputStream inputStream = ...
ContentInfo info = util.findMatch(inputStream);

If you want to process a stream of bytes as the bytes are being read, you can use the ContentInfoInputStreamWrapper utility class. This takes an InputStream which it wraps and delegates to. After you have read the bytes through the wrapper, you can call the findMatch() method to get its content information.

 
HttpServletRequest request = ...
ContentInfoInputStreamWrapper inputStream
   = new ContentInfoInputStreamWrapper(request.getInputStream());
// read in the file from the http request, ...
// after we have read it in, we can get its content-info 
ContentInfo info = inputStream.findMatch();

For the file and stream versions, the first 10 kilobytes of the data is read and processed.

There is also a long internal list of file types copied from the Apache list. Not all of the files in this list have associated magic number information. However, with the list you can look up mime-types or by file-extension and get the associated information.

You can use the internal list to lookup by file-extension:

 
// find details about files with .pdf extension
ContentInfo info =
   ContentInfoUtil.findExtensionMatch("file.pdf");
// you can even just pass in the extension name
ContentInfo info =
   ContentInfoUtil.findExtensionMatch("DOC");

Or you can look up by mime-type:

 
// find details about this mime-type
ContentInfo info =
   ContentInfoUtil.findMimeTypeMatch("image/vnd.dwg");

Some internal entries provide more information than others. This list is a work in progress. Please submit improvements and edits as necessary.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.4 Content Information

If the findMatch(...) method does not recognize the content then it will return null. If it does match one of the entries then it will return a ContentInfo class which provides:

Here are some examples of ContentInfo output:


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.5 Using With Maven

To use SimpleMagic with maven, include the following dependency in your ‘pom.xml’ file:

 
<dependency>
	<groupId>com.j256.simplemagic</groupId>
	<artifactId>simplemagic</artifactId>
	<version>1.17</version>
</dependency>

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3. Open Source License

This document is part of the SimpleMagic project.

Copyright 2021, Gray Watson

Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that this permission notice appear in all copies.

THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

The author may be contacted via the SimpleMagic home page.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

Index of Concepts

Jump to:   /  
A   B   C   D   E   F   G   H   I   L   M   O   P   Q   S   U   W  
Index Entry Section

/
/etc/magic2.2 How To Load Magic Entries

A
alternative magic files2.2 How To Load Magic Entries
authorSimpleMagic

B
byte array content2.3 How To Find The Content Info

C
ContentInfoInputStreamWrapper2.3 How To Find The Content Info
ContentInfoUtil1. Start Using Quickly

D
default magic entries2.2 How To Load Magic Entries
delegate to input stream2.3 How To Find The Content Info
downloading the jars2.1 Downloading Jar

E
extensions2.3 How To Find The Content Info
extensions2.4 Content Information

F
file content2.3 How To Find The Content Info
file extensions2.3 How To Find The Content Info
file extensions2.4 Content Information

G
getting started1. Start Using Quickly

H
how to download the jars2.1 Downloading Jar
how to get started1. Start Using Quickly
how to use2. Using SimpleMagic

I
input stream content2.3 How To Find The Content Info
input stream wrapper2.3 How To Find The Content Info
introductionSimpleMagic

L
license3. Open Source License
loading magic entries2.2 How To Load Magic Entries

M
magic files2.2 How To Load Magic Entries
Maven, use with2.5 Using With Maven
mime-type2.3 How To Find The Content Info
mime-type2.4 Content Information

O
open source license3. Open Source License

P
pom.xml dependency2.5 Using With Maven

Q
quick start1. Start Using Quickly

S
sample magic definition2.2 How To Load Magic Entries
simple magicSimpleMagic
system magic entries2.2 How To Load Magic Entries

U
using SimpleMagic2. Using SimpleMagic

W
where to get new jars2.1 Downloading Jar
wrapped input stream2.3 How To Find The Content Info

Jump to:   /  
A   B   C   D   E   F   G   H   I   L   M   O   P   Q   S   U   W  

[Top] [Contents] [Index] [ ? ]

Table of Contents


[Top] [Contents] [Index] [ ? ]

About This Document

This document was generated by Gray Watson on December 29, 2021 using texi2html 1.82.

The buttons in the navigation panels have the following meaning:

Button Name Go to From 1.2.3 go to
[ < ] Back Previous section in reading order 1.2.2
[ > ] Forward Next section in reading order 1.2.4
[ << ] FastBack Beginning of this chapter or previous chapter 1
[ Up ] Up Up section 1.2
[ >> ] FastForward Next chapter 2
[Top] Top Cover (top) of document  
[Contents] Contents Table of contents  
[Index] Index Index  
[ ? ] About About (help)  

where the Example assumes that the current position is at Subsubsection One-Two-Three of a document of the following structure:


This document was generated by Gray Watson on December 29, 2021 using texi2html 1.82.