Introducing Julia/Dictionaries and sets

«	Introducing Julia Dictionaries and sets	»
Functions	Introducing Julia Dictionaries and sets	Strings and characters

字典 Dict

到目前為止，介紹的許多函數都是在數組(和元組)上工作的。數組只是集合的一種類型，Julia 還有其他的集合類型。

簡單的查找表是組織多種類型數據的有用方法：給定單個信息 (例如稱為鍵的數字、字符串或符號)，對應的數據值是什麼？為此，Julia提供了 Dictionary 對象，簡稱為 Dict。它是一個「關聯集合」，因為它將鍵與值相關聯。

創建字典

可以使用以下語法創建簡單字典：

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3)
Dict{String,Int64} with 3 entries:
 "c" => 3
 "b" => 2
 "a" => 1

dict 現在是字典了。鍵是「a」、「b」和「c」，對應的值是1、2和3。操作符 => 稱為 Pair 函數。在字典中，鍵總是唯一的 - 不能有兩個同名的鍵。

如果預先知道鍵和值的類型，則可以(而且很可能應該) 在 Dict 關鍵字後用大括號指定它們：

julia> dict = Dict{String,Integer}("a"=>1, "b" => 2)
Dict{String,Integer} with 2 entries:
 "b" => 2
 "a" => 1

還可以使用推導 complementsions 語法創建詞典:

julia> dict = Dict(string(i) => sind(i) for i = 0:5:360)
Dict{String,Float64} with 73 entries:
 "320" => -0.642788
 "65"  => 0.906308
 "155" => 0.422618
 ⋮     => ⋮

使用以下語法創建類型化的空字典：

julia> dict = Dict{String,Int64}()
Dict{String,Int64} with 0 entries

or you can omit the types, and get an untyped dictionary:

julia> dict = Dict()
Dict{Any,Any} with 0 entries

It's sometimes useful to create dictionary entries using a for loop:

files = ["a.txt", "b.txt", "c.txt"]
fvars = Dict()
for (n, f) in enumerate(files)
   fvars["x_$(n)"] = f
end

This is one way you could create a set of 'variables' stored in a dictionary:

julia> fvars
Dict{Any,Any} with 3 entries:
 "x_1" => "a.txt"
 "x_2" => "b.txt"
 "x_3" => "c.txt"

查看內容

To get a value, if you have the key:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5)

julia> dict["a"]
1

if the keys are strings. Or, if the keys are symbols:

julia> symdict = Dict(:x => 1, :y => 3, :z => 6)
Dict{Symbol,Int64} with 3 entries:
 :z => 6
 :x => 1
 :y => 3

julia> symdict[:x]
1

Or if the keys are integers:

julia> intdict = Dict(1 => "one", 2 => "two", 3  => "three")
Dict{Int64,String} with 3 entries:
 2 => "two"
 3 => "three"
 1 => "one"

julia> intdict[2]
"two"

You can instead use the get() function, and provide a fail-safe default value if there's no value for that particular key:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5)

julia> get(dict, "a", 0)
1

julia> get(dict, "Z", 0)
0

If you don't use a default value as a safety precaution like this, you'll get an error if there's no key:

julia> get(dict, "Z")
ERROR: MethodError: no method matching get(::Dict{String,Int64}, ::String)
Closest candidates are:
 get(::Dict{K,V}, ::Any, ::Any) where {K, V} at dict.jl:508
 get(::Base.EnvDict, ::AbstractString, ::Any) at env.jl:77

If you don't want get() to provide a default value, use a try...catch block:

try
    dict["Z"]
    catch error
       if isa(error, KeyError)
           println("sorry, I couldn't find anything")
       end
end

sorry, I couldn't find anything

To change a value assigned to an existing key (or assign a value to a hitherto unseen key):

julia> dict["a"] = 10
10

Keys

Keys must be unique for a dictionary. There's always only one key called a in this dictionary, so when you assign a value to a key that already exists, you're not creating a new one, just modifying an existing one.

To see if the dictionary contains a key, use haskey():

julia> haskey(dict, "Z")
false

To check for the existence of a key/value pair:

julia> in(("b" => 2), dict)
true

To add a new key and value to a dictionary, use this:

julia> dict["d"] = 4
4

You can delete a key from the dictionary, using delete!():

julia> delete!(dict, "d")
Dict{String,Int64} with 4 entries:
 "c" => 3
 "e" => 5
 "b" => 2
 "a" => 1

You'll notice that the dictionary doesn't seem to be sorted in any way — at least, the keys are in no particular order. This is due to the way they're stored, and you can't sort them in place. (But see Sorting, below.)

To get all keys, use the keys() function:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5);
julia> keys(dict)
Base.KeySet for a Dict{String,Int64} with 5 entries. Keys:
 "c"
 "e"
 "b"
 "a"
 "d"

The result is an iterator that has just one job: to iterate through a dictionary key by key:

julia> collect(keys(dict))
5-element Array{String,1}:
"c"
"e"
"b"
"a"
"d"

julia> [uppercase(key) for key in keys(dict)]
5-element Array{Any,1}:
"C"
"E"
"B"
"A"
"D"

This uses the list comprehension form ([ new-element for loop-variable in iterator ]) and each new element is collected into an array. An alternative would be:

julia> map(uppercase, collect(keys(dict)))
5-element Array{String,1}:
"C"
"E"
"B"
"A"
"D"

Values

To retrieve all the values, use the values() function:

julia> values(dict)
Base.ValueIterator for a Dict{String,Int64} with 5 entries. Values:
 3
 5
 2
 1
 4

If you want to go through a dictionary and process each key/value, you can make use the fact that dictionaries themselves are iterable objects:

julia> for kv in dict
   println(kv)
end

"c"=>3
"e"=>5
"b"=>2
"a"=>1
"d"=>4

where kv is a tuple containing each key/value pair in turn.

Or you could do:

julia> for k in keys(dict)
          println(k, " ==> ", dict[k])
       end

c ==> 3
e ==> 5
b ==> 2
a ==> 1
d ==> 4

Even better, you can use a key/value tuple to simplify the iteration even more:

julia> for (key, value) in dict
           println(key, " ==> ", value)
       end

c ==> 3
e ==> 5
b ==> 2
a ==> 1
d ==> 4

Here's another example:

for tuple in Dict("1"=>"Hydrogen", "2"=>"Helium", "3"=>"Lithium")
    println("Element $(tuple[1]) is $(tuple[2])")
end

Element 1 is Hydrogen
Element 2 is Helium
Element 3 is Lithium

(Notice the string interpolation operator, $. This allows you to use a variable's name in a string and get the variable's value when the string is printed. You can include any Julia expression in a string using $().)

字典排序

Because dictionaries don't store the keys in any particular order, you might want to output the dictionary to a sorted array to obtain the items in order:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6)
Dict{String,Int64} with 6 entries:
 "f" => 6
 "c" => 3
 "e" => 5
 "b" => 2
 "a" => 1
 "d" => 4

julia> for key in sort(collect(keys(dict)))
   println("$key => $(dict[key])")
end
a => 1
b => 2
c => 3
d => 4
e => 5
f => 6

If you really need to have a dictionary that remains sorted all the time, you can use the SortedDict data type from the DataStructures.jl package (after having installed it).

julia> import DataStructures
julia> dict = DataStructures.SortedDict("b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6)
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 5 entries:
 "b" => 2
 "c" => 3
 "d" => 4
 "e" => 5
 "f" => 6

julia> dict["a"] = 1
1

julia> dict
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 6 entries:
 "a" => 1
 "b" => 2
 "c" => 3
 "d" => 4
 "e" => 5
 "f" => 6

例子：統計單詞

A simple application of a dictionary is to count how many times each word appears in a piece of text. Each word is a key, and the value of the key is the number of times that word appears in the text.

Let's count the words in the Sherlock Holmes stories. I've downloaded the text from the excellent Project Gutenberg and stored them in a file "sherlock-holmes-canon.txt". To create a list of words from the loaded text in canon, we'll split the text using a regular expression, and convert every word to lowercase. (There are probably faster methods.)

julia> f = open("sherlock-holmes-canon.txt")
julia> wordlist = String[]
julia> for line in eachline(f)
   words = split(line, r"\W")
   map(w -> push!(wordlist, lowercase(w)), words)
end
julia> filter!(!isempty, wordlist)
julia> close(f)

wordlist is now an array of nearly 700,000 words:

julia> wordlist[1:20]
20-element Array{String,1}:
"THE"     
"COMPLETE"
"SHERLOCK"
"HOLMES"  
"Arthur"  
"Conan"   
"Doyle"   
"Table"   
"of"      
"contents"
"A"       
"Study"   
"In"      
"Scarlet" 
"The"     
"Sign"    
"of"      
"the"     
"Four"    
"The"

To store the words and the word counts, we'll create a dictionary:

julia> wordcounts = Dict{String,Int64}()
Dict{String,Int64} with 0 entries

To build the dictionary, loop through the list of words, and use get() to look up the current tally, if any. If the word has already been seen, the count can be increased. If the word hasn't been seen before, the fall-back third argument of get() ensures that the absence doesn't cause an error, and 1 is stored instead.

for word in wordlist
    wordcounts[word]=get(wordcounts, word, 0) + 1
end

Now you can look up words in the wordcounts dictionary and find out how many times they appear:

julia> wordcounts["watson"]
1040

julia> wordcounts["holmes"]
3057

julia> wordcounts["sherlock"]
415

julia> wordcounts["lestrade"]
244

Dictionaries aren't sorted, but you can use the collect() and keys() functions on the dictionary to collect the keys and then sort them. In a loop you can work through the dictionary in alphabetical order:

for i in sort(collect(keys(wordcounts)))
  println("$i, $(wordcounts[i])")
end
 000, 5
 1, 8
 10, 7
 100, 4
 1000, 9
 104, 1
 109, 1
 10s, 2
 10th, 1
 11, 9
 1100, 1
 117, 2
 117th, 2
 11th, 1
 12, 2
 120, 2
 126b, 3
 ⋮           
 zamba, 2
 zeal, 5
 zealand, 3
 zealous, 3
 zenith, 1
 zeppelin, 1
 zero, 2
 zest, 3
 zig, 1
 zigzag, 3
 zigzagged, 1
 zinc, 3
 zion, 2
 zoo, 1
 zoology, 2
 zu, 1
 zum, 2
 â, 41
 ã, 4

But how do you find out the most common words? One way is to use collect() to convert the dictionary to an array of tuples, and then to sort the array by looking at the last value of each tuple:

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)
19171-element Array{Pair{String,Int64},1}:
("the",36244)     
("and",17593)     
("i",17357)       
("of",16779)      
("to",16041)      
("a",15848)       
("that",11506)   
⋮                 
("enrage",1)      
("smuggled",1)    
("lounges",1)     
("devotes",1)     
("reverberated",1)
("munitions",1)   
("graybeard",1)

To see only the top 20 words:

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)[1:20]
20-element Array{Pair{String,Int64},1}:
("the",36244) 
("and",17593) 
("i",17357)   
("of",16779)  
("to",16041)  
("a",15848)   
("that",11506)
("it",11101)  
("in",10766)  
("he",10366)  
("was",9844)  
("you",9688)  
("his",7836)  
("is",6650)   
("had",6057)  
("have",5532) 
("my",5293)   
("with",5256) 
("as",4755)   
("for",4713)

In a similar way, you can use the filter() function to find, for example, all words that start with "k" and occur less than four times:

julia> filter(tuple -> startswith(first(tuple), "k") && last(tuple) < 4, collect(wordcounts))
73-element Array{Pair{String,Int64},1}:
("keg",1)
("klux",2)
("knifing",1)
("keening",1)
("kansas",3)
⋮
("kaiser",1)
("kidnap",2)
("keswick",1)
("kings",2)
("kratides",3)
("ken",2)
("kindliness",2)
("klan",2)
("keepsake",1)
("kindled",2)
("kit",2)
("kicking",1)
("kramm",2)
("knob",1)

更加複雜的結構

A dictionary can hold many different types of values. Here for example is a dictionary where the keys are strings and the values are arrays of arrays of points (assuming that the Point type has been defined already). For example, this could be used to store graphical shapes describing the letters of the alphabet (some of which have two or more loops):

julia> p = Dict{String, Array{Array}}()
Dict{String,Array{Array{T,N},N}}
    
julia> p["a"] = Array[[Point(0,0), Point(1,1)], [Point(34, 23), Point(5,6)]]
2-element Array{Array{T,N},1}:
 [Point(0.0,0.0), Point(1.0,1.0)]
 [Point(34.0,23.0), Point(5.0,6.0)]
   
julia> push!(p["a"], [Point(34.0,23.0), Point(5.0,6.0)])
3-element Array{Array{T,N},1}:
 [Point(0.0,0.0), Point(1.0,1.0)]
 [Point(34.0,23.0), Point(5.0,6.0)]
 [Point(34.0,23.0), Point(5.0,6.0)]

Or create a dictionary with some already-known values:

julia> d = Dict("shape1" => Array [ [ Point(0,0), Point(-20,57)], [Point(34, -23), Point(-10,12) ] ])
Dict{String,Array{Array{T,N},1}} with 1 entry:
 "shape1" => Array [ [ Point(0.0,0.0), Point(-20.0,57.0)], [Point(34.0,-23.0), Point(-10.0,12.0) ] ]

Add another array to the first one:

julia> push!(d["shape1"], [Point(-124.0, 37.0), Point(25.0,32.0)])
3-element Array{Array{T,N},1}:
 [Point(0.0,0.0), Point(-20.0,57.0)]
 [Point(34.0,-23.0), Point(-10.0,12.0)]
 [Point(-124.0,37.0), Point(25.0,32.0)]

集合 Set

Set 是元素的集合，就像是一個沒有重複元素的字典或數組。

Set 和其他類型的集合有兩個不同之處：

Set 中每個元素只能有一份

元素的順序不重要

（而數組對同一個元素可以有多份，並且是有序的）

你可以通過使用 Set 構造函數創建一個空的集合：

julia> colors = Set()
Set{Any}({})

和 Julia 的其他地方一樣，你可以指定類型：

julia> primes = Set{Int64}()
Set(Int64)[]

可以一次操作創建和填充 Set：

julia> colors = Set{String}(["red","green","blue","yellow"])
Set(String["yellow","blue","green","red"])

或者你可以讓 Julia 「猜出類型」：

julia> colors = Set(["red","green","blue","yellow"])
Set{String}({"yellow","blue","green","red"})

相當一部分處理數組的函數也可以用於處理集合。例如，將元素添加到集合類似於將元素添加到數組。您可以使用 push!() ：

julia> push!(colors, "black") 
Set{String}({"yellow","blue","green","black","red"})

But you can't use pushfirst!(), because that works only for things that have a concept of "first", like arrays.

What happens if you try to add something to the set that's already there? Absolutely nothing. You don't get a copy added, because it's a set, not an array, and sets don't store repeated elements.

To see if something is in the set, you can use in():

julia> in("green", colors)
true

There are some standard operations you can do with sets, namely find their union, intersection, and difference, with the functions, union(), intersect(), and setdiff():

julia> rainbow = Set(["red","orange","yellow","green","blue","indigo","violet"])
Set(String["indigo","yellow","orange","blue","violet","green","red"])

The union of two sets is the set of everything that is in one or the other sets. The result is another set – so you can't have two "yellow"s here, even though we've got a "yellow" in each set:

julia> union(colors, rainbow)
Set(String["indigo","yellow","orange","blue","violet","green","black","red"])

The intersection of two sets is the set that contains every element that belongs to both sets:

julia> intersect(colors, rainbow)
Set(String["yellow","blue","green","red"])

The difference between two sets is the set of elements that are in the first set, but not in the second. This time, the order in which you supply the sets matters. The setdiff() function finds the elements that are in the first set, colors, but not in the second set, rainbow:

julia> setdiff(colors, rainbow)
Set(String["black"])

其他函數

處理數組和集合的函數有時也適用於字典和其他集合。例如，某些集合的操作可應用於詞典，而不僅僅是 Set 和數組：

julia> d1 = Dict(1=>"a", 2 => "b")
Dict{Int64,String} with 2 entries:
  2 => "b"
  1 => "a"
 
julia> d2 = Dict(2 => "b", 3 =>"c", 4 => "d")
Dict{Int64,String} with 3 entries:
  4 => "d"
  2 => "b"
  3 => "c"

julia> union(d1, d2)
4-element Array{Pair{Int64,String},1}:
 2=>"b"
 1=>"a"
 4=>"d"
 3=>"c"

julia> intersect(d1, d2)
1-element Array{Pair{Int64,String},1}:
 2=>"b"
 
julia> setdiff(d1, d2)
1-element Array{Pair{Int64,String},1}:
 1=>"a"

請注意，結果是以對數組的形式返回的，而不是以字典的形式返回的。

filter(), map() 和 collect() 等函數 (我們已經看到它們用於數組) 也適用於字典：

julia> filter((k, v) -> k == 1, d1)
Dict{Int64,String} with 1 entry:
  1 => "a"

有一個 merge() 函數，它可以合併兩個字典：

julia> merge(d1, d2)
Dict{Int64,String} with 4 entries:
  4 => "d"
  2 => "b"
  3 => "c"
  1 => "a"

findmin()函數可以在字典中找到最小值，然後返回值及其鍵。

julia> d1 = Dict(:a => 1, :b => 2, :c => 0)
Dict{Symbol,Int64} with 3 entries:
 :a => 1
 :b => 2
 :c => 0

julia> findmin(d1)
(0, :c)

«	Introducing Julia Dictionaries and sets	»
Functions	Introducing Julia Dictionaries and sets	Strings and characters