Learning Resources
Scanning and Formatting
Scanning
Objects of type Scanner
are useful for breaking down formatted input into tokens and translating individual tokens according to their data type.
Breaking Input into Tokens
By default, a scanner uses white space to separate tokens. (White space characters include blanks, tabs, and line terminators. For the full list, refer to the documentation for Character.isWhitespace
.) To see how scanning works, let's look at ScanXan
, a program that reads the individual words in xanadu.txt
and prints them out, one per line.
import java.io.*; import java.util.Scanner; public class ScanXan { public static void main(String[] args) throws IOException { Scanner s = null; try { s = new Scanner(new BufferedReader(new FileReader("xanadu.txt"))); while (s.hasNext()) { System.out.println(s.next()); } } finally { if (s != null) { s.close(); } } } }
Notice that ScanXan
invokes Scanner
's close
method when it is done with the scanner object. Even though a scanner is not a stream, you need to close it to indicate that you're done with its underlying stream.
The output of ScanXan
looks like this:
In Xanadu did Kubla Khan A stately pleasure-dome ...
To use a different token separator, invoke useDelimiter()
, specifying a regular expression. For example, suppose you wanted the token separator to be a comma, optionally followed by white space. You would invoke,
s.useDelimiter(",\\s*");
Translating Individual Tokens
The ScanXan
example treats all input tokens as simple String
values. Scanner
also supports tokens for all of the Java language's primitive types (except for char
), as well as BigInteger
and BigDecimal
. Also, numeric values can use thousands separators. Thus, in a US
locale, Scanner
correctly reads the string "32,767" as representing an integer value.
We have to mention the locale, because thousands separators and decimal symbols are locale specific. So, the following example would not work correctly in all locales if we didn't specify that the scanner should use the US
locale. That's not something you usually have to worry about, because your input data usually comes from sources that use the same locale as you do. But this example is part of the Java Tutorial and gets distributed all over the world.
The ScanSum
example reads a list of double
values and adds them up. Here's the source:
import java.io.FileReader; import java.io.BufferedReader; import java.io.IOException; import java.util.Scanner; import java.util.Locale; public class ScanSum { public static void main(String[] args) throws IOException { Scanner s = null; double sum = 0; try { s = new Scanner(new BufferedReader(new FileReader("usnumbers.txt"))); s.useLocale(Locale.US); while (s.hasNext()) { if (s.hasNextDouble()) { sum += s.nextDouble(); } else { s.next(); } } } finally { s.close(); } System.out.println(sum); } }
And here's the sample input file, usnumbers.txt
8.5 32,767 3.14159 1,000,000.1
The output string is "1032778.74159". The period will be a different character in some locales, because System.out
is a PrintStream
object, and that class doesn't provide a way to override the default locale. We could override the locale for the whole program — or we could just use formatting, as described in the next topic, Formatting.
Formatting
Stream objects that implement formatting are instances of either PrintWriter
, a character stream class, or PrintStream
, a byte stream class.
Note: The only
PrintStream
objects you are likely to need are System.out
and System.err
. (See I/O from the Command Line for more on these objects.) When you need to create a formatted output stream, instantiate PrintWriter
, not PrintStream
.
Like all byte and character stream objects, instances of PrintStream
and PrintWriter
implement a standard set of write
methods for simple byte and character output. In addition, both PrintStream
and PrintWriter
implement the same set of methods for converting internal data into formatted output. Two levels of formatting are provided:
-
print
andprintln
format individual values in a standard way. -
format
formats almost any number of values based on a format string, with many options for precise formatting.
The print
and println
Methods
Invoking print
or println
outputs a single value after converting the value using the appropriate toString
method. We can see this in the Root
example:
public class Root { public static void main(String[] args) { int i = 2; double r = Math.sqrt(i); System.out.print("The square root of "); System.out.print(i); System.out.print(" is "); System.out.print(r); System.out.println("."); i = 5; r = Math.sqrt(i); System.out.println("The square root of " + i + " is " + r + "."); } }
Here is the output of Root
:
The square root of 2 is 1.4142135623730951. The square root of 5 is 2.23606797749979.
The i
and r
variables are formatted twice: the first time using code in an overload of print
, the second time by conversion code automatically generated by the Java compiler, which also utilizes toString
. You can format any value this way, but you don't have much control over the results.
The format
Method
The format
method formats multiple arguments based on a format string. The format string consists of static text embedded with format specifiers; except for the format specifiers, the format string is output unchanged.
Format strings support many features. In this tutorial, we'll just cover some basics. For a complete description, see format string syntax
in the API specification.
The Root2
example formats two values with a single format
invocation:
public class Root2 { public static void main(String[] args) { int i = 2; double r = Math.sqrt(i); System.out.format("The square root of %d is %f.%n", i, r); } }
Here is the output:
The square root of 2 is 1.414214.
Like the three used in this example, all format specifiers begin with a %
and end with a 1- or 2-character conversion that specifies the kind of formatted output being generated. The three conversions used here are:
-
d
formats an integer value as a decimal value. -
f
formats a floating point value as a decimal value. -
n
outputs a platform-specific line terminator.
Here are some other conversions:
-
x
formats an integer as a hexadecimal value. -
s
formats any value as a string. -
tB
formats an integer as a locale-specific month name.
There are many other conversions.
Note:
Except for %%
and %n
, all format specifiers must match an argument. If they don't, an exception is thrown.
In the Java programming language, the \n
escape always generates the linefeed character (\u000A
). Don't use \n
unless you specifically want a linefeed character. To get the correct line separator for the local platform, use %n
.
In addition to the conversion, a format specifier can contain several additional elements that further customize the formatted output. Here's an example, Format
, that uses every possible kind of element.
public class Format { public static void main(String[] args) { System.out.format("%f, %1$+020.10f %n", Math.PI); } }
Here's the output:
3.141593, +00000003.1415926536
The additional elements are all optional. The following figure shows how the longer specifier breaks down into elements.
Elements of a Format Specifier.
The elements must appear in the order shown. Working from the right, the optional elements are:
-
Precision. For floating point values, this is the mathematical precision of the formatted value. For
s
and other general conversions, this is the maximum width of the formatted value; the value is right-truncated if necessary. - Width. The minimum width of the formatted value; the value is padded if necessary. By default the value is left-padded with blanks.
-
Flags specify additional formatting options. In the
Format
example, the+
flag specifies that the number should always be formatted with a sign, and the0
flag specifies that0
is the padding character. Other flags include-
(pad on the right) and,
(format number with locale-specific thousands separators). Note that some flags cannot be used with certain other flags or with certain conversions. -
The Argument Index allows you to explicitly match a designated argument. You can also specify
<
to match the same argument as the previous specifier. Thus the example could have said:System.out.format("%f, %<+020.10f %n", Math.PI);