2012年12月17日星期一

Object serialization

From: http://www.javacoffeebreak.com/articles/serialization/index.html

Object Persistence Made Easy

With object serialization, your Java applets and applications can save and load the state of objects to disk or over a network. In this article, we'll examine the benefits of object serialization, and how to implement it in your own programs. By David Reilly.
One of the most critical tasks that applications have to perform is to save and restore data. Whether it be a word processing application that saves documents to disk, a utility that remembers its configuration for next time, or a game that sets aside world domination for the night, the ability to store data and later retrieve it is a vital one. Without it, software would be little more effective that the typewriter - users would have to re-type the data to make further modifications once the application exits.

Take it from someone who's been there - creating save & restore functions is not a fun task.

Writing the code for saving data, however, can become boring repetitive work. First, the programmer must create a specification document for the proposed file structure. Next, the programmer must implement save and restore functions that convert object data to & from primitive data types, and test it with sample data. 
If the application later requires new data to be stored, the file specification must be modified, as well as the save and restore methods. Take it from someone who's been there - creating save & restore functions is not a fun task.
The solution to this is object serialization. Object serialization takes an object's state, and converts it to a stream of data for you. With object serialization, it's an easy task to take any object, and make it persistent, without writing custom code to save object member variables. The object can be restored at a later time, and even a later location. With persistence, we can move an object from one computer to another, and have it maintain its state. This very cool feature, in Java, also happens to be very easy to use.

Serializing objects

Java makes it easy to serialize objects. Any object whose class implements the java.io.Serializable interface can be made persistent with only a few lines of code. No extra methods need to be added to implement the interface, however - the purpose of the interface is to identify at run-time which classes can be safely serialized, and which cannot. You, as a programmer, need only add the implements keyword to your class declaration, to identify your classes as serializable.
public class UserData implements
       java.io.Serializable
Now, once a class is serializable, we can write the object to anyOutputStream, such as to disk or a socket connection. To achieve this, we must first create an instance of java.io.ObjectOutputStream, and pass the constructor an existing OutputStream instance.
// Write to disk with FileOutputStream
FileOutputStream f_out = new 
 FileOutputStream("myobject.data");

// Write object with ObjectOutputStream
ObjectOutputStream obj_out = new
 ObjectOutputStream (f_out);

// Write object out to disk
obj_out.writeObject ( myObject );
Note that any Java object that implements the serializable interface can be written to an output stream this way - including those that are part of the Java API. Furthermore, any objects that are referenced by a serialized object will also be stored. This means that arrays, vectors, lists, and collections of objects can be saved in the same fashion - without the need to manually save each one. This can lead to significant time and code savings.

Restoring objects from a serialized state

Reading objects back is almost as easy. The one catch is that at runtime, you can never be completely sure what type of data to expect. A data stream containing serialized objects may contain a mixture of different object classes, so you need to explicitly cast an object to a particular class. If you've never cast an object before, the procedure is relatively straightforward. First check the object's class, using the instanceofoperator. Then cast to the correct class.
// Read from disk using FileInputStream
FileInputStream f_in = new 
 FileInputStream("myobject.data");

// Read object using ObjectInputStream
ObjectInputStream obj_in = 
 new ObjectInputStream (f_in);

// Read an object
Object obj = obj_in.readObject();

if (obj instanceof Vector)
{
 // Cast object to a Vector
 Vector vec = (Vector) obj;

 // Do something with vector....
}

Further issues with serialization

As you can see, it's relatively easy to serialize an object. Whenever new fields are added to an object, they will be saved automatically, without requiring modification to your save and restore code. However, there are some cases where this behavior is not desirable. For example, a password member variable might not be safe to transmit to third parties over a network connection, and might need to be left blank. In this case, the transient keyword can be used. The transient field indicates that a particular member variable should not be saved. Though not used often, it's an important keyword to remember.
public class UserSession implements 
         java.io.Serializable
{
 String username;
 transient String password;
}

Summary

Java's support for object serialization makes the implementation of persistent objects extremely easy. In contrast, the amount of code required to save and restore every field of an object is complex and repetitive work. While it is certainly possible to write your own serialization mechanism, the simplicity of that provided by Java would be hard to beat.
Serialization benefits programmers by
  • Reducing time taken to write code for save and restoration of object or application state
  • Eliminating complexity of save and restore operations, and avoiding the need for creating a new file format
  • Making it easier for objects to travel over a network connection.
With relatively little effort, you can apply serialization to a variety of tasks. Not only do applications benefit from serialization, but also applets. Rather than specifying a long list of parameters, or performing time consuming initialization and parsing, an applet can simple reload a configuration object whose member variables contain all the information needed to execute. It's not just useful for Java applications - even applets can make benefit, by loading their configuration details or parameters. With a little imagination, serialization may just have a place in your next project.

2012年12月16日星期日

Why We Created Julia

http://julialang.org/blog/2012/02/why-we-created-julia/

In short, because we are greedy.
We are power Matlab users. Some of us are Lisp hackers. Some are Pythonistas, others Rubyists, still others Perl hackers. There are those of us who used Mathematica before we could grow facial hair. There are those who still can’t grow facial hair. We’ve generated more R plots than any sane person should. C is our desert island programming language.
We love all of these languages; they are wonderful and powerful. For the work we do — scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing — each one is perfect for some aspects of the work and terrible for others. Each one is a trade-off.
We are greedy: we want more.
We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.
(Did we mention it should be as fast as C?)
While we’re being demanding, we want something that provides the distributed power of Hadoop — without the kilobytes of boilerplate Java and XML; without being forced to sift through gigabytes of log files on hundreds of machines to find our bugs. We want the power without the layers of impenetrable complexity. We want to write simple scalar loops that compile down to tight machine code using just the registers on a single CPU. We want to write A*B and launch a thousand computations on a thousand machines, calculating a vast matrix product together.
We never want to mention types when we don’t feel like it. But when we need polymorphic functions, we want to use generic programming to write an algorithm just once and apply it to an infinite lattice of types; we want to use multiple dispatch to efficiently pick the best method for all of a function’s arguments, from dozens of method definitions, providing common functionality across drastically different types. Despite all this power, we want the language to be simple and clean.
All this doesn’t seem like too much to ask for, does it?
Even though we recognize that we are inexcusably greedy, we still want to have it all. About two and a half years ago, we set out to create the language of our greed. It’s not complete, but it’s time for a 1.0 release — the language we’ve created is calledJulia. It already delivers on 90% of our ungracious demands, and now it needs the ungracious demands of others to shape it further. So, if you are also a greedy, unreasonable, demanding programmer, we want you to give it a try.

2012年12月7日星期五

git command


【转载】浅复制与深复制中的传值与传址


这个标题念起来有点拗口,但却是理解数据结构的关键。标题中的4个术语,对应的英文分别是:shallow copy(注意,不是shadow copy)、deep copy、pass by value、pass by reference(或pass by address)。传址和传引用是一回事。
一门编程语言的核心是数据结构,粗略来讲,可以把数据结构分成不可变类型(immutable)和可变类型(mutable)。为什么这么分呢?这涉及到内存分配问题。对于不可变类型,只要分配有限的内存空间即可,而对于可变类型,理论上则要分配没有大小限制的空间。因此,这么分是出于合理利用系统资源的考虑。实际上,栈内存和堆内存分别用于保存不可变类型值和可变类型值。
什么是不可变类型?就是该值一旦赋予某个变量,就只属于某个变量,不能同属于其他变量。如:
1 var stringValue = "I'm immutable data structure, mean you can't modify me!";
2 var anotherStringValue = stringValue;
3 stringValue = "I have changed";
此时,anotherStringValue中保存的值会不会也变成“I have changed”?不会。因为
1 var anotherStringValue = stringValue;
照stringValue中保存的字符串的原样,复制一个字符串(相应地,在内存中分配一块新空间),并将该字符串赋给anotherStringValue。换句话说,这两个变量虽然保存的值相同,但它们的值并不在一块内存中。因此,修改任何一个变量,都不会影响另一个变量。即
1 stringValue = "I have changed";
只会影响stringValue的值。但是,确切来讲,stringValue = “I have changed”;并不是修改stringValue,而是创建了一个新字符串(相应地,在内存中分配一块新空间),然后让stringValue引用该字符串——更像是替换变量的值;原来的字符串呢?因为没有变量引用它,也就成为垃圾了(当然,垃圾所占用的内存会被回收)。
由此可见,赋值操作对于不变类型而言,传递的是内存中的值本身。那么,对于可变类型呢?当然,传递的是内存中值的引用(或者说地址),而且无论传递多少次,内存中始终都只有一份原始值——毕竟可变类型大小莫测,只保存一份原始值能最大限度节省内存空间。例如:
1 var objectValue = {1:1,'s':'string','innerObject':{'innerArray' : [1,2,3]}};
2 var anotherObjectValue = objectValue;
3 objectValue[1] = 100;
4 anotherObjectValue[1]; //100
不言自明,这里的anotherObjectValue通过赋值操作,从objectValue那里只获得了对原始对象( {1:1,’s':’string’,'innerObject’:{‘innerArray’ : [1,2,3]}})的引用,也就是该对象在内存中的地址,或者说“门牌号码”。因此,通过objectValue修改原始对象的第一个元素(objectValue[1] = 100;),结果同样会在anotherObjectValue[1]那里得到反映——因为这两个变量共享同一份原始值。
在JavaScript中,给函数传递参数是按照上述默认约定——即对不可变类型,传值;对可变类型,传址——进行的。如:
1 function example(str, obj){
2 ……
3 }
4 example(stringValue,objectValue);
调用example函数时,第一个参数传递的是实际的字符串值,第二参数传递的是对象的引用(内存地址)。
在PHP中,定义函数时可以指定相应参数是传值还是传址——通常是传值。其实,这也很容易理解:假如函数要求为某个可变类型参数传值,而不是传址,那么也就意味着内存中会因此多出一份该类型值的副本。相应地,在函数中修改这份新副本,不会影响函数外的原副本。因为新旧副本在内存中就不是同一个地址。
说到这,也就引出了浅复制和深复制的概念。事实上,浅复制和深复制的区别恰恰在于复制可变类型时,是传值还是传址。如果是像往常一样传址(传引用),那么就是浅复制。如果是传值,那么就是深复制。浅复制和深复制到底有什么区别呢?以下面的Python代码为例:
复制代码
1 >>> x = {'username': 'admin', 'machines': ['foo', 'bar', 'baz']}
2 >>> y = x.copy()
3 >>> y['username'] = 'mlh'
4 >>> y['machines'].remove('bar')
5 >>> y
6 {'username': 'mlh', 'machines': ['foo', 'baz']}
7 >>> x
8 {'username': 'admin', 'machines': ['foo', 'baz']}
复制代码
调用字典x的copy方法返回一个新字典并赋值给y,新字典中带有与原字典相同的键-值对。注意,copy方法采用浅复制创建的新字典,与原字典有区别也有联系。区别体现在,对于原字典中不可变的值,如数字、字符串、元组等,会在新字典中重新生成一份新副本;因此,修改(实际上是替换,或者说是重新赋值)这些键的值(y['username'] = ‘mlh’)不会影响原字典。联系体现在,对于原字典中可变的值,如列表、字典,不会在新字典中生成新副本,而只复制值的引用,即新字典中相应的键保存的是引用,当然,原字典中相应的键保存的也是引用,而且这两个引用都指向同一块内存地址。这就是所谓的浅复制。因此,如果修改的是可变类型的值(y['machines'].remove(‘bar’)),就意味着修改了新、旧字典共享的值(即本例中的列表['foo', 'bar', 'baz']),因此一定会影响引用该值的原字典。
深复制则不然。深复制是实实在在地把原字典中所有的值全都照原样子重新创建一遍,无论是不变类型值,还是可变类型值。执行深复制后,内存中会存在两份完全一样的数据段,但分别处于不同内存空间中,即地址不同。而且,分别由不同变量(原字典、新字典)引用。因此,经过深复制后修改一个字典,不会影响另一个字典。Python的copy模块中的deepcopy函数可以实现深复制:
复制代码
 1 >>> from copy import deepcopy
 2 >>> d = {}
 3 >>> d['names'] = ['Alfred', 'Bertrand']
 4 >>> c = d.copy()
 5 >>> dc = deepcopy(d)
 6 >>> d['names'].append('Clive')
 7 >>> c
 8 {'names': ['Alfred', 'Bertrand', 'Clive']}
 9 >>> dc
10 {'names': ['Alfred', 'Bertrand']}
复制代码
显然,修改深复制得到的新值不会影响原值;而修改浅复制得到的“新”值,在某种程度上仍然会影响原值。

作者想表达的意思应该是在javascript中变量的复制和赋值,对于普通的变量而言,赋值仅仅是一个复制,而对于对象而言,赋值则是一个引用。
比如:var a=1;var b=a;
在这里,b其实是a的一个复制,所以b=1,正因为是复制,所以复制完后,b和a就没有任何关系了,当a重新赋值的时候,对于b则没有影响,同样,对于b再重新赋值,对a也没有影响 。
但是,如果a是一个对象,那就不一样了
例如var test ={a:1,b:2};var test1 =test;
在这样的情况下,test1就不再是test的复制了,而是直接取了test的地址,所以对于Test的值的改变,也影响到了test1,比如我test.a = 2,那么我test1.a的值也就自动变成了2
这点其实在PHP5里面已经也这样了,在PHP4的时候,对象的赋值也是一个复制,所以我们为了保证只有一个实例,往往都是采用:$a = &$b ;但是从5开始则不一样,对于对象而言,如果没有特别指定的操作,那么就相当于是对地址的一个引用。效果和上面的JS代码类似。
作者在最后举了一个PYTHON的例子来说明深复制,其实也就是为对象也做一个拷贝而不是采用引用,这个,当然在PHP里也有,clone就是实现的这个效果。


1. 浅拷贝是指源对象与拷贝对象共用一份实体,仅仅是引用的变量不同(名称不同)。对其中任何一个对象的改动都会影响另外一个对象。
利用python中的字典来简单说明下:
复制代码
 1 >>> x = {'username': 'admin', 'machines': ['foo', 'bar', 'baz']}
 2 >>> x
 3 {'username': 'admin', 'machines': ['foo', 'bar', 'baz']}
 4 >>> y = x.copy()
 5 >>> y
 6 {'username': 'admin', 'machines': ['foo', 'bar', 'baz']}
 7 >>> y['username'] = 'mlh'
 8 >>> y
 9 {'username': 'mlh', 'machines': ['foo', 'bar', 'baz']}
10 >>> x
11 {'username': 'admin', 'machines': ['foo', 'bar', 'baz']}
12 >>> y['machines'].remove('bar')
13 >>> y
14 {'username': 'mlh', 'machines': ['foo', 'baz']}
15 >>> x
16 {'username': 'admin', 'machines': ['foo', 'baz']}
复制代码
为什么可以改变username的值,并且未对原字典产生影响(迷惑中。。。),但是修改元组,则会反应在原字典上。
2. 深拷贝是指源对象与拷贝对象互相独立,其中任何一个对象的改动都不会对另外一个对象造成影响。
利用python中的字典来简单说明下:
复制代码
 1 >>> from copy import deepcopy
 2 >>> d = {}
 3 >>> d['names'] = ['Alfred', 'Bertrand']
 4 >>> d
 5 {'names': ['Alfred', 'Bertrand']}
 6 >>> c = d.copy()
 7 >>> c
 8 {'names': ['Alfred', 'Bertrand']}
 9 >>> dc = deepcopy(d)
10 >>> dc
11 {'names': ['Alfred', 'Bertrand']}
12 >>> d['names'].append('Clive')
13 >>> d
14 {'names': ['Alfred', 'Bertrand', 'Clive']}
15 >>> c
16 {'names': ['Alfred', 'Bertrand', 'Clive']}
17 >>> dc
18 {'names': ['Alfred', 'Bertrand']}
复制代码
这里为深复制,复制了一份键值相同的原字典的副本
3. 引用对象的浅拷贝原理,引用对象之间的赋值之所以执行的是浅拷贝动作,与引用对象的特性有关,一个引用对象一般来说由两个部分组成:
  (1). 一个具名的Handle,也就是我们所说的声明(如变量)
  (2). 一个内部(不具名)的对象,也就是具名Handle的内部对象。它在Manged Heap(托管堆)中分配,一般由新增引用对象的New方法是进行创建如果这个内部对象已被创建,那么具名的Handle就指向这个内部对象在Manged Heap中的地址,否则就是null(从某个方面来讲,如果这个具名的handle可以被赋值为null,说明这是一个引用对象,当然不是绝对)。两个引用对象如果进行赋值,它们仅仅是复制这个内部对象的地址,内部对象仍然是同一个,因此,源对象或拷贝对象的修改都会影响对方。这也就是浅拷贝。

http://docs.python.org/2/library/copy.html