Introducing Julia/Dictionaries and sets

«	Introducing Julia Dictionaries and sets	»
Functions	Introducing Julia Dictionaries and sets	Strings and characters

字典 Dict

到目前为止，介绍的许多函数都是在数组(和元组)上工作的。数组只是集合的一种类型，Julia 还有其他的集合类型。

简单的查找表是组织多种类型数据的有用方法：给定单个信息 (例如称为键的数字、字符串或符号)，对应的数据值是什么？为此，Julia提供了 Dictionary 对象，简称为 Dict。它是一个“关联集合”，因为它将键与值相关联。

创建字典

可以使用以下语法创建简单字典：

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3)
Dict{String,Int64} with 3 entries:
 "c" => 3
 "b" => 2
 "a" => 1

dict 现在是字典了。键是“a”、“b”和“c”，对应的值是1、2和3。操作符 => 称为 Pair 函数。在字典中，键总是唯一的 - 不能有两个同名的键。

如果预先知道键和值的类型，则可以(而且很可能应该) 在 Dict 关键字后用大括号指定它们：

julia> dict = Dict{String,Integer}("a"=>1, "b" => 2)
Dict{String,Integer} with 2 entries:
 "b" => 2
 "a" => 1

还可以使用推导 complementsions 语法创建词典:

julia> dict = Dict(string(i) => sind(i) for i = 0:5:360)
Dict{String,Float64} with 73 entries:
 "320" => -0.642788
 "65"  => 0.906308
 "155" => 0.422618
 ⋮     => ⋮

使用以下语法创建类型化的空字典：

julia> dict = Dict{String,Int64}()
Dict{String,Int64} with 0 entries

or you can omit the types, and get an untyped dictionary:

julia> dict = Dict()
Dict{Any,Any} with 0 entries

It's sometimes useful to create dictionary entries using a for loop:

files = ["a.txt", "b.txt", "c.txt"]
fvars = Dict()
for (n, f) in enumerate(files)
   fvars["x_$(n)"] = f
end

This is one way you could create a set of 'variables' stored in a dictionary:

julia> fvars
Dict{Any,Any} with 3 entries:
 "x_1" => "a.txt"
 "x_2" => "b.txt"
 "x_3" => "c.txt"

查看内容

To get a value, if you have the key:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5)

julia> dict["a"]
1

if the keys are strings. Or, if the keys are symbols:

julia> symdict = Dict(:x => 1, :y => 3, :z => 6)
Dict{Symbol,Int64} with 3 entries:
 :z => 6
 :x => 1
 :y => 3

julia> symdict[:x]
1

Or if the keys are integers:

julia> intdict = Dict(1 => "one", 2 => "two", 3  => "three")
Dict{Int64,String} with 3 entries:
 2 => "two"
 3 => "three"
 1 => "one"

julia> intdict[2]
"two"

You can instead use the get() function, and provide a fail-safe default value if there's no value for that particular key:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5)

julia> get(dict, "a", 0)
1

julia> get(dict, "Z", 0)
0

If you don't use a default value as a safety precaution like this, you'll get an error if there's no key:

julia> get(dict, "Z")
ERROR: MethodError: no method matching get(::Dict{String,Int64}, ::String)
Closest candidates are:
 get(::Dict{K,V}, ::Any, ::Any) where {K, V} at dict.jl:508
 get(::Base.EnvDict, ::AbstractString, ::Any) at env.jl:77

If you don't want get() to provide a default value, use a try...catch block:

try
    dict["Z"]
    catch error
       if isa(error, KeyError)
           println("sorry, I couldn't find anything")
       end
end

sorry, I couldn't find anything

To change a value assigned to an existing key (or assign a value to a hitherto unseen key):

julia> dict["a"] = 10
10

Keys

Keys must be unique for a dictionary. There's always only one key called a in this dictionary, so when you assign a value to a key that already exists, you're not creating a new one, just modifying an existing one.

To see if the dictionary contains a key, use haskey():

julia> haskey(dict, "Z")
false

To check for the existence of a key/value pair:

julia> in(("b" => 2), dict)
true

To add a new key and value to a dictionary, use this:

julia> dict["d"] = 4
4

You can delete a key from the dictionary, using delete!():

julia> delete!(dict, "d")
Dict{String,Int64} with 4 entries:
 "c" => 3
 "e" => 5
 "b" => 2
 "a" => 1

You'll notice that the dictionary doesn't seem to be sorted in any way — at least, the keys are in no particular order. This is due to the way they're stored, and you can't sort them in place. (But see Sorting, below.)

To get all keys, use the keys() function:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5);
julia> keys(dict)
Base.KeySet for a Dict{String,Int64} with 5 entries. Keys:
 "c"
 "e"
 "b"
 "a"
 "d"

The result is an iterator that has just one job: to iterate through a dictionary key by key:

julia> collect(keys(dict))
5-element Array{String,1}:
"c"
"e"
"b"
"a"
"d"

julia> [uppercase(key) for key in keys(dict)]
5-element Array{Any,1}:
"C"
"E"
"B"
"A"
"D"

This uses the list comprehension form ([ new-element for loop-variable in iterator ]) and each new element is collected into an array. An alternative would be:

julia> map(uppercase, collect(keys(dict)))
5-element Array{String,1}:
"C"
"E"
"B"
"A"
"D"

Values

To retrieve all the values, use the values() function:

julia> values(dict)
Base.ValueIterator for a Dict{String,Int64} with 5 entries. Values:
 3
 5
 2
 1
 4

If you want to go through a dictionary and process each key/value, you can make use the fact that dictionaries themselves are iterable objects:

julia> for kv in dict
   println(kv)
end

"c"=>3
"e"=>5
"b"=>2
"a"=>1
"d"=>4

where kv is a tuple containing each key/value pair in turn.

Or you could do:

julia> for k in keys(dict)
          println(k, " ==> ", dict[k])
       end

c ==> 3
e ==> 5
b ==> 2
a ==> 1
d ==> 4

Even better, you can use a key/value tuple to simplify the iteration even more:

julia> for (key, value) in dict
           println(key, " ==> ", value)
       end

c ==> 3
e ==> 5
b ==> 2
a ==> 1
d ==> 4

Here's another example:

for tuple in Dict("1"=>"Hydrogen", "2"=>"Helium", "3"=>"Lithium")
    println("Element $(tuple[1]) is $(tuple[2])")
end

Element 1 is Hydrogen
Element 2 is Helium
Element 3 is Lithium

(Notice the string interpolation operator, $. This allows you to use a variable's name in a string and get the variable's value when the string is printed. You can include any Julia expression in a string using $().)

字典排序

Because dictionaries don't store the keys in any particular order, you might want to output the dictionary to a sorted array to obtain the items in order:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6)
Dict{String,Int64} with 6 entries:
 "f" => 6
 "c" => 3
 "e" => 5
 "b" => 2
 "a" => 1
 "d" => 4

julia> for key in sort(collect(keys(dict)))
   println("$key => $(dict[key])")
end
a => 1
b => 2
c => 3
d => 4
e => 5
f => 6

If you really need to have a dictionary that remains sorted all the time, you can use the SortedDict data type from the DataStructures.jl package (after having installed it).

julia> import DataStructures
julia> dict = DataStructures.SortedDict("b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6)
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 5 entries:
 "b" => 2
 "c" => 3
 "d" => 4
 "e" => 5
 "f" => 6

julia> dict["a"] = 1
1

julia> dict
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 6 entries:
 "a" => 1
 "b" => 2
 "c" => 3
 "d" => 4
 "e" => 5
 "f" => 6

例子：统计单词

A simple application of a dictionary is to count how many times each word appears in a piece of text. Each word is a key, and the value of the key is the number of times that word appears in the text.

Let's count the words in the Sherlock Holmes stories. I've downloaded the text from the excellent Project Gutenberg and stored them in a file "sherlock-holmes-canon.txt". To create a list of words from the loaded text in canon, we'll split the text using a regular expression, and convert every word to lowercase. (There are probably faster methods.)

julia> f = open("sherlock-holmes-canon.txt")
julia> wordlist = String[]
julia> for line in eachline(f)
   words = split(line, r"\W")
   map(w -> push!(wordlist, lowercase(w)), words)
end
julia> filter!(!isempty, wordlist)
julia> close(f)

wordlist is now an array of nearly 700,000 words:

julia> wordlist[1:20]
20-element Array{String,1}:
"THE"     
"COMPLETE"
"SHERLOCK"
"HOLMES"  
"Arthur"  
"Conan"   
"Doyle"   
"Table"   
"of"      
"contents"
"A"       
"Study"   
"In"      
"Scarlet" 
"The"     
"Sign"    
"of"      
"the"     
"Four"    
"The"

To store the words and the word counts, we'll create a dictionary:

julia> wordcounts = Dict{String,Int64}()
Dict{String,Int64} with 0 entries

To build the dictionary, loop through the list of words, and use get() to look up the current tally, if any. If the word has already been seen, the count can be increased. If the word hasn't been seen before, the fall-back third argument of get() ensures that the absence doesn't cause an error, and 1 is stored instead.

for word in wordlist
    wordcounts[word]=get(wordcounts, word, 0) + 1
end

Now you can look up words in the wordcounts dictionary and find out how many times they appear:

julia> wordcounts["watson"]
1040

julia> wordcounts["holmes"]
3057

julia> wordcounts["sherlock"]
415

julia> wordcounts["lestrade"]
244

Dictionaries aren't sorted, but you can use the collect() and keys() functions on the dictionary to collect the keys and then sort them. In a loop you can work through the dictionary in alphabetical order:

for i in sort(collect(keys(wordcounts)))
  println("$i, $(wordcounts[i])")
end
 000, 5
 1, 8
 10, 7
 100, 4
 1000, 9
 104, 1
 109, 1
 10s, 2
 10th, 1
 11, 9
 1100, 1
 117, 2
 117th, 2
 11th, 1
 12, 2
 120, 2
 126b, 3
 ⋮           
 zamba, 2
 zeal, 5
 zealand, 3
 zealous, 3
 zenith, 1
 zeppelin, 1
 zero, 2
 zest, 3
 zig, 1
 zigzag, 3
 zigzagged, 1
 zinc, 3
 zion, 2
 zoo, 1
 zoology, 2
 zu, 1
 zum, 2
 â, 41
 ã, 4

But how do you find out the most common words? One way is to use collect() to convert the dictionary to an array of tuples, and then to sort the array by looking at the last value of each tuple:

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)
19171-element Array{Pair{String,Int64},1}:
("the",36244)     
("and",17593)     
("i",17357)       
("of",16779)      
("to",16041)      
("a",15848)       
("that",11506)   
⋮                 
("enrage",1)      
("smuggled",1)    
("lounges",1)     
("devotes",1)     
("reverberated",1)
("munitions",1)   
("graybeard",1)

To see only the top 20 words:

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)[1:20]
20-element Array{Pair{String,Int64},1}:
("the",36244) 
("and",17593) 
("i",17357)   
("of",16779)  
("to",16041)  
("a",15848)   
("that",11506)
("it",11101)  
("in",10766)  
("he",10366)  
("was",9844)  
("you",9688)  
("his",7836)  
("is",6650)   
("had",6057)  
("have",5532) 
("my",5293)   
("with",5256) 
("as",4755)   
("for",4713)

In a similar way, you can use the filter() function to find, for example, all words that start with "k" and occur less than four times:

julia> filter(tuple -> startswith(first(tuple), "k") && last(tuple) < 4, collect(wordcounts))
73-element Array{Pair{String,Int64},1}:
("keg",1)
("klux",2)
("knifing",1)
("keening",1)
("kansas",3)
⋮
("kaiser",1)
("kidnap",2)
("keswick",1)
("kings",2)
("kratides",3)
("ken",2)
("kindliness",2)
("klan",2)
("keepsake",1)
("kindled",2)
("kit",2)
("kicking",1)
("kramm",2)
("knob",1)

更加复杂的结构

A dictionary can hold many different types of values. Here for example is a dictionary where the keys are strings and the values are arrays of arrays of points (assuming that the Point type has been defined already). For example, this could be used to store graphical shapes describing the letters of the alphabet (some of which have two or more loops):

julia> p = Dict{String, Array{Array}}()
Dict{String,Array{Array{T,N},N}}
    
julia> p["a"] = Array[[Point(0,0), Point(1,1)], [Point(34, 23), Point(5,6)]]
2-element Array{Array{T,N},1}:
 [Point(0.0,0.0), Point(1.0,1.0)]
 [Point(34.0,23.0), Point(5.0,6.0)]
   
julia> push!(p["a"], [Point(34.0,23.0), Point(5.0,6.0)])
3-element Array{Array{T,N},1}:
 [Point(0.0,0.0), Point(1.0,1.0)]
 [Point(34.0,23.0), Point(5.0,6.0)]
 [Point(34.0,23.0), Point(5.0,6.0)]

Or create a dictionary with some already-known values:

julia> d = Dict("shape1" => Array [ [ Point(0,0), Point(-20,57)], [Point(34, -23), Point(-10,12) ] ])
Dict{String,Array{Array{T,N},1}} with 1 entry:
 "shape1" => Array [ [ Point(0.0,0.0), Point(-20.0,57.0)], [Point(34.0,-23.0), Point(-10.0,12.0) ] ]

Add another array to the first one:

julia> push!(d["shape1"], [Point(-124.0, 37.0), Point(25.0,32.0)])
3-element Array{Array{T,N},1}:
 [Point(0.0,0.0), Point(-20.0,57.0)]
 [Point(34.0,-23.0), Point(-10.0,12.0)]
 [Point(-124.0,37.0), Point(25.0,32.0)]

集合 Set

Set 是元素的集合，就像是一个没有重复元素的字典或数组。

Set 和其他类型的集合有两个不同之处：

Set 中每个元素只能有一份

元素的顺序不重要

（而数组对同一个元素可以有多份，并且是有序的）

你可以通过使用 Set 构造函数创建一个空的集合：

julia> colors = Set()
Set{Any}({})

和 Julia 的其他地方一样，你可以指定类型：

julia> primes = Set{Int64}()
Set(Int64)[]

可以一次操作创建和填充 Set：

julia> colors = Set{String}(["red","green","blue","yellow"])
Set(String["yellow","blue","green","red"])

或者你可以让 Julia “猜出类型”：

julia> colors = Set(["red","green","blue","yellow"])
Set{String}({"yellow","blue","green","red"})

相当一部分处理数组的函数也可以用于处理集合。例如，将元素添加到集合类似于将元素添加到数组。您可以使用 push!() ：

julia> push!(colors, "black") 
Set{String}({"yellow","blue","green","black","red"})

But you can't use pushfirst!(), because that works only for things that have a concept of "first", like arrays.

What happens if you try to add something to the set that's already there? Absolutely nothing. You don't get a copy added, because it's a set, not an array, and sets don't store repeated elements.

To see if something is in the set, you can use in():

julia> in("green", colors)
true

There are some standard operations you can do with sets, namely find their union, intersection, and difference, with the functions, union(), intersect(), and setdiff():

julia> rainbow = Set(["red","orange","yellow","green","blue","indigo","violet"])
Set(String["indigo","yellow","orange","blue","violet","green","red"])

The union of two sets is the set of everything that is in one or the other sets. The result is another set – so you can't have two "yellow"s here, even though we've got a "yellow" in each set:

julia> union(colors, rainbow)
Set(String["indigo","yellow","orange","blue","violet","green","black","red"])

The intersection of two sets is the set that contains every element that belongs to both sets:

julia> intersect(colors, rainbow)
Set(String["yellow","blue","green","red"])

The difference between two sets is the set of elements that are in the first set, but not in the second. This time, the order in which you supply the sets matters. The setdiff() function finds the elements that are in the first set, colors, but not in the second set, rainbow:

julia> setdiff(colors, rainbow)
Set(String["black"])

其他函数

处理数组和集合的函数有时也适用于字典和其他集合。例如，某些集合的操作可应用于词典，而不仅仅是 Set 和数组：

julia> d1 = Dict(1=>"a", 2 => "b")
Dict{Int64,String} with 2 entries:
  2 => "b"
  1 => "a"
 
julia> d2 = Dict(2 => "b", 3 =>"c", 4 => "d")
Dict{Int64,String} with 3 entries:
  4 => "d"
  2 => "b"
  3 => "c"

julia> union(d1, d2)
4-element Array{Pair{Int64,String},1}:
 2=>"b"
 1=>"a"
 4=>"d"
 3=>"c"

julia> intersect(d1, d2)
1-element Array{Pair{Int64,String},1}:
 2=>"b"
 
julia> setdiff(d1, d2)
1-element Array{Pair{Int64,String},1}:
 1=>"a"

请注意，结果是以对数组的形式返回的，而不是以字典的形式返回的。

filter(), map() 和 collect() 等函数 (我们已经看到它们用于数组) 也适用于字典：

julia> filter((k, v) -> k == 1, d1)
Dict{Int64,String} with 1 entry:
  1 => "a"

有一个 merge() 函数，它可以合并两个字典：

julia> merge(d1, d2)
Dict{Int64,String} with 4 entries:
  4 => "d"
  2 => "b"
  3 => "c"
  1 => "a"

findmin()函数可以在字典中找到最小值，然后返回值及其键。

julia> d1 = Dict(:a => 1, :b => 2, :c => 0)
Dict{Symbol,Int64} with 3 entries:
 :a => 1
 :b => 2
 :c => 0

julia> findmin(d1)
(0, :c)

«	Introducing Julia Dictionaries and sets	»
Functions	Introducing Julia Dictionaries and sets	Strings and characters