Introducing Julia/Working with text files

維基教科書,自由的教學讀本


« Introducing Julia
Working with text files
»
Strings and characters Working with dates and times

從文件中讀取[編輯]

從文本文件獲取信息的標準方法是使用 open(), read()close() 函數.

打開[編輯]

要從文件中讀取文本,請首先獲取文件的 handle:

f = open("sherlock-holmes.txt")

f 現在是 Julia 與磁盤上的文件的連接。完成文件處理後,應使用以下命令關閉連接:

close(f)

通常,在 Julia 中處理文件的推薦方法是將任何文件處理函數包裝在 do 代碼塊中:

open("sherlock-holmes.txt") do file
    # do stuff with the open file
end

當此代碼塊完成時,打開的文件將自動關閉。更多有關 do block 內容,見 Controlling the flow

由於塊中局部變量的範圍,您可能希望保留一些已處理的信息:

totaltime, totallines = open("sherlock-holmes.txt") do f
    linecounter = 0
    timetaken = @elapsed for l in eachline(f)
        linecounter += 1
    end
    (timetaken, linecounter)
end
julia> totaltime, totallines
(0.004484679, 76803)

一次性讀取整個文件 ![編輯]

可以使用 read()一次性讀取打開文件的全部內容:

julia> s = read(f, String)

這會將文件的內容存儲在 s 中:

s = open("sherlock-holmes.txt") do file
    read(file, String)
end

你可以使用 readlines() 來以數組的形式讀取整個文件,並且每行都有一個元素,請執行以下操作:

julia> f = open("sherlock-holmes.txt");

julia> lines = readlines(f)
76803-element Array{String,1}:
"THE ADVENTURES OF SHERLOCK HOLMES by SIR ARTHUR CONAN DOYLE\r\n"
"\r\n"
"   I. A Scandal in Bohemia\r\n"
"  II. The Red-headed League\r\n"
...
"Holmes, rather to my disappointment, manifested no further\r\n"
"interest in her when once she had ceased to be the centre of one\r\n"
"of his problems, and she is now the head of a private school at\r\n"
"Walsall, where I believe that she has met with considerable success.\r\n"
julia> close(f)

現在,您可以對每行執行單步操作:

counter = 1
for l in lines
   println("$counter $l")
   counter += 1
end
1 THE ADVENTURES OF SHERLOCK HOLMES by SIR ARTHUR CONAN DOYLE
2
3    I. A Scandal in Bohemia
4   II. The Red-headed League
5  III. A Case of Identity
6   IV. The Boscombe Valley Mystery
...
12638 interest in her when once she had ceased to be the centre of one
12639 of his problems, and she is now the head of a private school at
12640 Walsall, where I believe that she has met with considerable success.

有一種更好的方法可以做到這一點-參見下面的 enumerate()

您可能會發現 chomp() 函數很有用-它從字符串中刪除後面的換行符。


一行行讀取[編輯]

The eachline() function turns a source into an iterator. This allows you to process a file a line at a time:

open("sherlock-holmes.txt") do file
    for ln in eachline(file)
        println("$(length(ln)), $(ln)")
    end
end
1, THE ADVENTURES OF SHERLOCK HOLMES by SIR ARTHUR CONAN DOYLE
2,
28,    I. A Scandal in Bohemia
29,   II. The Red-headed League
26,  III. A Case of Identity
35,   IV. The Boscombe Valley Mystery
…
62, the island of Mauritius. As to Miss Violet Hunter, my friend
60, Holmes, rather to my disappointment, manifested no further
66, interest in her when once she had ceased to be the centre of one
65, of his problems, and she is now the head of a private school at
70, Walsall, where I believe that she has met with considerable success.

另一種方法是一直讀到文件的末尾。您可能需要跟蹤您所在的行:

 open("sherlock-holmes.txt") do f
   line = 1
   while !eof(f)
     x = readline(f)
     println("$line $x")
     line += 1
   end
 end

一種更好的方法是在可迭代對象上使用 enumerate() -您將額外得到一個的行號:

open("sherlock-holmes.txt") do f
    for i in enumerate(eachline(f))
      println(i[1], ": ", i[2])
    end
end

如果您有一個特定的函數要對文件調用,則可以使用以下語法:

function shout(f::IOStream)
    return uppercase(read(f, String))
end
julia> shoutversion = open(shout, "sherlock-holmes.txt");
julia> shoutversion[30237:30400]
"ELEMENTARY PROBLEMS. LET HIM, ON MEETING A\nFELLOW-MORTAL, LEARN AT A GLANCE TO DISTINGUISH THE HISTORY OF THE\nMAN, AND THE TRADE OR  PROFESSION TO WHICH HE BELONGS. "

這將打開該文件,對其運行 shout()函數,然後再次關閉該文件,並將處理後的內容分配給該變量。

You can use the DelimitedFiles.readdlm() function to read lines delimited with certain characters, such as data files, arrays stored as text files, and tables. If you use the DataFrames package, there's also a readtable() specifically designed to read data into a table. See also the package CSV.jl.

對路徑和文件名進行操作[編輯]

These functions will be useful for working with filenames:

  • cd(path) changes the current directory
  • readdir(path) returns a lists of the contents of a named directory, or the current directory,
  • abspath(path) adds the current directory's path to a filename to make an absolute pathname
  • joinpath(str, str, ...) assembles a pathname from pieces
  • isdir(path) tells you whether the path is a directory
  • splitdir(path) - split a path into a tuple of the directory name and file name.
  • splitdrive(path) - on Windows, split a path into the drive letter part and the path part. On Unix systems, the first component is always the empty string.
  • splitext(path) - if the last component of a path contains a dot, split the path into everything before the dot and everything including and after the dot. Otherwise, return a tuple of the argument unmodified and the empty string.
  • expanduser(path) - replace a tilde character at the start of a path with the current user's home directory.
  • normpath(path) - normalize a path, removing "." and ".." entries.
  • realpath(path) - canonicalize a path by expanding symbolic links and removing "." and ".." entries.
  • homedir() - current user's home directory.
  • dirname(path) - get the directory part of a path.
  • basename(path)- get the file name part of a path.

To work on a restricted selection of files in a directory, use filter() and an anonymous function to filter the file names and just keep the ones you want. (filter() is more of a fishing net or sieve, rather than a coffee filter, in that it catches what you want to keep.)

for f in filter(x -> endswith(x, "jl"), readdir())
    println(f)
end

Astro.jl
calendar.jl
constants.jl
coordinates.jl
...
pseudoscience.jl
riseset.jl
sidereal.jl
sun.jl
utils.jl
vsop87d.jl

If you want to match a group of files using a regular expression, then use occursin(). Let's look for files with ".jpg" or ".png" suffixes (remembering to escape the "."):

for f in filter(x -> occursin(r"(?i)\.jpg|\.png", x), readdir())
    println(f)
end
034571172750.jpg
034571172750.png
51ZN2sCNfVL._SS400_.jpg
51bU7lucOJL._SL500_AA300_.jpg
Voronoy.jpg
kblue.png
korange.png
penrose.jpg
r-home-id-r4.png
wave.jpg

To examine a file hierarchy, use walkdir(), which lets you work through a directory, and examine the files in each directory in turn.

文件信息[編輯]

If you want information about a specific file, use stat("pathname"), and then use one of the fields to find out the information. Here's how to get all the information and the field names listed for a file "i":

 for n in fieldnames(typeof(stat("i")))
    println(n, ": ", getfield(stat("i"),n))
end
device: 16777219
inode: 2955324
mode: 16877
nlink: 943
uid: 502
gid: 20
rdev: 0
size: 32062
blksize: 4096
blocks: 0
mtime:1.409769933e9
ctime:1.409769933e9

You can access these fields via a 'stat' structure:

julia> s = stat("Untitled1.ipynb")
StatStruct(mode=100644, size=64424)
julia> s.ctime
1.446649269e9

and you can also use some of them directly:

julia> ctime("Untitled2.ipynb")
1.446649269e9

although not size:

julia> s.size
64424

To work on specific files that meet conditions — all IPython files modified after a certain date, for example — you could use something like this:

using Dates
function output_file(path)
    println(stat(path).size, ": ", path)
end 

for afile in filter!(f -> endswith(f, "ipynb") && (mtime(f) > Dates.datetime2unix(DateTime("2015-11-03T09:00"))),
    readdir())
    output_file(realpath(afile))
end

與文件系統交互[編輯]

The cp(), mv(), rm(), and touch() functions have the same names and functions as their Unix shell counterparts.

To convert filenames to pathnames, use abspath(). You can map this over a list of files in a directory:

julia> map(abspath, readdir())
67-element Array{String,1}:
"/Users/me/.CFUserTextEncoding"
"/Users/me/.DS_Store"
"/Users/me/.Trash"
"/Users/me/.Xauthority"
"/Users/me/.ahbbighrc"
"/Users/me/.apdisk"
"/Users/me/.atom"
...

To restrict the list to filenames that contain a particular substring, use an anonymous function inside filter() — something like this:

julia> filter(x -> occursin("re", x), map(abspath, readdir()))
4-element Array{String,1}:
"/Users/me/.DS_Store"
"/Users/me/.gitignore"
"/Users/me/.hgignore_global"
"/Users/me/Pictures"
...

To restrict the list to regular expression matches, try this:

julia> filter(x -> occursin(r"recur.*\.jl", x), map(abspath, readdir()))
2-element Array{String,1}:
 "/Users/me/julia/recursive-directory-scan.jl"
 "/Users/me/julia/recursive-text.jl"

寫入文件[編輯]

To write to a text file, open it using the "w" flag and make sure that you have permission to create the file in the specified directory:

open("/tmp/t.txt", "w") do f
    write(f, "A, B, C, D\n")
end

Here's how to write 20 lines of 4 random numbers between 1 and 10, separated by commas:

function fourrandom()
    return rand(1:10,4)
end

open("/tmp/t.txt", "w") do f
           for i in 1:20
              n1, n2, n3, n4 = fourrandom()
              write(f, "$n1, $n2, $n3, $n4 \n")
           end
       end

A quicker alternative to this is to use the DelimitedFiles.writedlm() function, described next:

using DelimitedFiles
writedlm("/tmp/test.txt", rand(1:10, 20, 4), ", ")


在文件中寫入和讀取數組[編輯]

In the DelimitedFiles package are two convenient functions, writedlm() and readdlm(). These let you read/write an array or collection from/to a file.

writedlm() writes the contents of an object to a text file, and readdlm() reads the data from a file into an array:

julia> numbers = rand(5,5)
5x5 Array{Float64,2}:
0.913583  0.312291  0.0855798  0.0592331  0.371789
0.13747   0.422435  0.295057   0.736044   0.763928
0.360894  0.434373  0.870768   0.469624   0.268495
0.620462  0.456771  0.258094   0.646355   0.275826
0.497492  0.854383  0.171938   0.870345   0.783558

julia> writedlm("/tmp/test.txt", numbers)

You can see the file using the shell (type a semicolon ";" to switch):

<shell> cat "/tmp/test.txt"
.9135833328830523	.3122905420350348	.08557977218948465	.0592330821115965	.3717889559226475
.13747015238054083	.42243494637594203	.29505701073304524	.7360443978397753	.7639280496847236
.36089432672073607	.43437288984307787	.870767989032692	.4696243851552686	.26849468736154325
.6204624598015906	.4567706404666232	.25809436255988105	.6463554854347682	.27582613759302377
.4974916625466639	.8543829989347014	.17193814498701587	.8703447748713236	.783557793485824

The elements are separated by tabs unless you specify another delimiter. Here, a colon is used to delimit the numbers:

julia> writedlm("/tmp/test.txt", rand(1:6, 10, 10), ":")
shell> cat "/tmp/test.txt"
3:3:3:2:3:2:6:2:3:5
3:1:2:1:5:6:6:1:3:6
5:2:3:1:4:4:4:3:4:1
3:2:1:3:3:1:1:1:5:6
4:2:4:4:4:2:3:5:1:6
6:6:4:1:6:6:3:4:5:4
2:1:3:1:4:1:5:4:6:6
4:4:6:4:6:6:1:4:2:3
1:4:4:1:1:1:5:6:5:6
2:4:4:3:6:6:1:1:5:5

To read in data from a text file, you can use readdlm().

julia> numbers = rand(5,5)
5x5 Array{Float64,2}:
0.862955  0.00827944  0.811526  0.854526  0.747977
0.661742  0.535057    0.186404  0.592903  0.758013
0.800939  0.949748    0.86552   0.113001  0.0849006
0.691113  0.0184901   0.170052  0.421047  0.374274
0.536154  0.48647     0.926233  0.683502  0.116988
julia> writedlm("/tmp/test.txt", numbers)

julia> numbers = readdlm("/tmp/test.txt")
5x5 Array{Float64,2}:
0.862955  0.00827944  0.811526  0.854526  0.747977
0.661742  0.535057    0.186404  0.592903  0.758013
0.800939  0.949748    0.86552   0.113001  0.0849006
0.691113  0.0184901   0.170052  0.421047  0.374274
0.536154  0.48647     0.926233  0.683502  0.116988

There are also a number of Julia packages specifically designed for reading and writing data to files, including DataFrames.jl and CSV.jl. Look through the Julia package directory for these and more.

« Introducing Julia
Working with text files
»
Strings and characters Working with dates and times